UNDERSTANDING TRAIT AT THE LEVELS OF A CIS-

REGULATORY ELEMENT AND A

Dissertation

Submitted to

The College of Arts and Sciences of the

UNIVERSITY OF DAYTON

In Partial Fulfillment of the Requirements for

The Degree of

Doctor of Philosophy in Biology

By

William A. Rogers

Dayton, Ohio

December, 2014

UNDERSTANDING TRAIT EVOLUTION AT THE LEVELS OF A CIS-

REGULATORY ELEMENT AND A GENE REGULATORY NETWORK

Name: Rogers, William A.

APPROVED BY:

Thomas M. Williams, Ph.D. Faculty Advisor

Mark Nielsen, Ph.D. Biology Department Chairperson Committee Member

Carissa Krane, Ph.D. Committee Member

Madhuri Kango-Singh, Ph.D. Committee Member

Mark Rebeiz, Ph.D. Committee Member

ii

© Copyright by

William A. Rogers

All rights reserved

2014

iii ABSTRACT

UNDERSTANDING TRAIT EVOLUTION AT THE LEVELS OF A CIS-

REGULATORY ELEMENT AND A GENE REGULATORY NETWORK

Name: Rogers, William A. University of Dayton

Advisor: Dr. Thomas Williams

For closely-related species, development begins at a very similar state yet the adult organisms display an array of distinguishing morphological traits. A major focus in evolutionary developmental biology is to understand what genetic steps were taken on evolutionary paths towards this array of traits. Historically, early studies into morphological diversity emphasized differences through new genes and changes to their protein-coding sequence. In the -era of genetics research, it has become clear that many species’ protein-coding sequence identities are very similar and changes in gene numbers have been somewhat modest. Thus, another type of genetic change must have contributed largely to diversity.

In recent years, a plethora of case studies have been reported in which genetic alterations responsible for morphological evolution were found that

iv modify how genes are expressed. These alterations occur in non-protein coding sequences called cis-regulatory elements (CREs), which control through their interactions with transcription factor proteins. Moreover, the transcription factor to CRE interactions connects genes into a regulatory network.

At the onset of my dissertation research, little was known about the paths of CRE evolution, and how gene regulatory networks evolve to alter morphology.

Moreover, tools were inadequate to study both CREs and gene regulatory networks. My dissertation research focused on gaining insights on the mechanistic underpinnings of the evolution of a CRE known as the dimorphic element (Chapter 3), which functions in an evolved gene regulatory network for patterning (D.) melanogaster fruit abdomen pigmentation

(Chapter 4). These studies required the establishment of a quantitative method for comparing the gene regulatory capabilities of CREs (Chapter 2).

In Chapter 2, a protocol, utilized throughout my dissertation, was developed that enables the quantification of CRE activity by measuring the level of Green Fluorescent Protein (GFP) production within D. melanogaster. In

Chapter 3, this method showed that evolved differences in abdomen pigmentation recurrently involved function-altering in the dimorphic element for D. melanogaster populations and closely-related species. Many of these key mutations did not overlap known transcription factor binding sites. This outcome may be due to pleiotropic constraints on these conserved binding sites while other transcription factors binding sites were perhaps gained or loss. In

v order to find potential transcription factors for these evolved binding sites, I led a genetic screen to find pigmentation network transcription factors by RNA interference. We found twenty-four novel transcription factors controlling abdomen pigmentation (Chapter 4). These results show that the abdominal pigmentation network is quite complex and future studies are needed to connect these factors to the CREs they regulate.

A remaining obstacle to understand CRE function and evolution is to understand the in vivo effects of mutations. In Chapter 5, a protocol “CRISPR

CREam” is presented which I have been developing to remove a CRE from its endogenous gene context and replace it with a variant CRE. Collectively, my dissertation has furthered the understanding of CREs and a model gene regulatory network. With the development of new genetic tools, CRE and network biology should be poised for drastic progress.

vi TABLE OF CONTENTS

ABSTRACT...... iv

LIST OF FIGURES...... ix

LIST OF TABLES...... xii

CHAPTER I: INTRODUCTION...... 1

CHAPTER II: QUANTITATIVE COMPARISON OF CIS-REGULATORY ELEMENT (CRE) ACTIVITIES IN TRANSGENIC ...... 20 Abstract...... 20

Protocol...... 22

Results...... 35

Materials...... 38

Discussion...... 38

CHAPTER III: RECURRENT MODIFICATION OF A CONSERVED CIS- REGULATORY ELEMENT UNDERLIES FRUIT PIGMENTATION DIVERSITY...... 41

Abstract...... 41

Introduction...... 42

Results...... 49

Discussion...... 72

Materials and Methods...... 83

Supplementary Information...... 86

vii CHAPTER IV: A SURVEY OF TRANS-REGULATORY LANDSCAPE FOR DROSOPHILA MELANOGASTER ABDOMINAL PIGMENTATION...... 92

Abstract...... 92

Introduction...... 93

Materials and Methods...... 99

Results...... 104

Discussion...... 120

Supplementary Information...... 134

CHAPTER V: CONCLUSIONS AND FUTURE DIRECTIONS...... 137

BIBLIOGRAPHY...... 148

APPENDIX...... 166

viii LIST OF FIGURES

Figure 1.1: Transcription factor binding dictates cell specific gene expression.....6

Figure 1.2: Early mesoderm gene regulatory network...... 8

Figure 1.3: The theoretical relationship between pleiotropy and for coding sequences and CREs...... 10

Figure 1.4: Variable pigmentation patterning in different populations and species of fruit flies...... 14

Figure 1.5: GFP reporter transgene elucidates regulatory patterning...... 17

Figure 2.1: Using the intensity of the white-rescue eye to make transgenic lines homozygous...... 36

Figure 2.2: Using morphological markers to determine the metamorphic stage for Drosophila...... 36

Figure 2.3: Quantitative comparison of cis-regulatory element activities...... 37

Figure 3.1: Abdomen pigmentation correlates with the regulatory activity of dimorphic element ...... 49

Figure 3.2: bab locus allelic variation underlies phenotypic variation...... 54

Figure 3.3: Population level differences in Bab paralog expression...... 57

Figure 3.4: Dimorphic element alleles diverged from an ancestral state...... 61

Figure 3.5: Functionally-relevant mutations in dimorphic element alleles...... 63

Figure 3.6: Interspecific evolution of pigmentation and dimorphic element activity...... 71

Figure 3.7: Pigmentation gene network model and the evolution of an ancestral CRE regulatory logic...... 74

ix Figure S3.1: Abdomen pigmentation for Drosophila melanogaster population stocks...... 86

Figure S3.2: Mapping of the bab -phenotype association...... 87

Figure S3.3: Chimeric dimorphic elements map functionally-relevant derived mutations to the core region...... 88

Figure S3.4: Regulatory activity affects of derived dimorphic element mutations...... 89

Figure 4.1: Abdominal pigmentation pattern and gene network...... 97

Figure 4.2: An RNAi-based screen to identify transcription factor genes regulating abdominal pigmentation...... 101

Figure 4.3: Abd-A expression is suppressed by an inhibitory RNA transgene...... 107

Figure 4.4: Transcription factor genes whose RNAi-mediated suppression results in reduced abdominal tergite pigmentation...... 108

Figure 4.5: Transcription factor genes whose RNAi-mediated suppression results in ectopic abdominal tergite pigmentation...... 111

Figure 4.6: Transcription factor genes whose RNAi-mediated suppression results in conspicuous mutant phenotypes...... 113

Figure 4.7: Genetic interaction between Class I transcription factors and pigmentation network cis-regulatory elements...... 116

Figure 4.8: Genetic interaction between Class II transcription factors and pigmentation network cis-regulatory elements...... 118

Figure 4.9: Abd-A expression is conserved between species with diverse abdominal pigmentation phenotypes...... 120

Figure 4.10: Summarizing representation of expression patterns for key patterning transcription factors...... 129

Figure S4.1: Spatial and sex-specific patterns of pigmentation gene expression...... 134

Figure S4.2: Phenotypic comparisons of independent RNAi lines that target the same transcription factor genes...... 135

x Figure S4.3: Bab expression remains off in males following RNAi suppression of Abd-A...... 136

Figure 5.1: Modifying CRE sequences in their natural environment...... 141

Figure 5.2: Removing and replacing the dimorphic element...... 144

Figure 5.3: CRE replacement by recombination-mediated cassette exchange...... 146

Figure A1: Sequence alignments for dimorphic elements...... 166

Figure A2: Protein coding sequence variation for the bab alleles...... 219

xi LIST OF TABLES

Table 1.1: Estimated numbers of CREs and genes within model organism ...... 3

Table 2.1: Materials for visualizing transgenic flies...... 38

Table S3.1: Association between pigmentation phenotype and bab dimorphic element genotype...... 90

Table S3.2: Association between pigmentation phenotype and bab dimorphic element genotype...... 90

Table S3.3: Primers used to PCR amplify D. melanogaster bab protein coding exons and their splice junctions...... 90

Table S3.4: Primers used for PCR-based genotyping of the D. melanogaster bab locus...... 91

Table S3.5: Primer combinations used to amplify and clone dimorphic element alleles and orthologous sequences...... 91

Table S3.6: Oligonucleotides used to make gel shift assay binding sites...... 91

Table 4.1: Class I transcription factor DNA binding domains and Gene ontology terms...... 121

Table 4.2: Class II transcription factor DNA binding domains and Gene ontology terms...... 122

xii CHAPTER I

INTRODUCTION

The Importance of Gene Regulation in Development, Disease, and

Evolution

Cellular production of functional RNAs or proteins is often referred to as

“gene expression”. For that develop a myriad of cell, tissue, and organ types, each gene in the genome requires gene expression that is highly regulated. The gene sequences encoding functional RNAs or proteins reside in exon sequences, and the developmental timing and cell-type specific expression patterns of genes are generally controlled by non-coding sequences (Davidson,

2006) that are often referred to as cis-regulatory elements or CREs (Carroll,

2008). While exon sequences are essential for development and mutations in these sequences can cause the evolution of a trait or disease, my dissertation focused on the function and evolution of CREs, whose functional sequences and mechanisms of evolution have remained poorly understood.

In the 1960s, the “genetic code” was elucidated by which the amino acid sequence for polypeptides and proteins are encoded in exon sequences

(Matthaei et al., 1962). Importantly, the same code is used by all

1 animals and is near universal (Giacomoni and Spiegelman, 1962). Since the time the code was cracked, genetic investigations could pursue the mutational changes in protein coding sequences which altered the encoded information for proteins and resulted in phenotypic variation (Gibert et al., 2005; Innis et al.,

2002). During this time of progress in understanding the protein coding part of the genome, comparatively little progress was made in understanding the non- coding genome. Specific questions needing answers were which sequences function as CREs, how is regulatory information encoded in CREs, and how does regulatory information evolve.

In recent years, several large-scale genomic studies have led to the inference that much of the genetic difference responsible for human phenotypic variation is not in the protein coding sequences, but rather resides in the non- coding sequences (Sethupathy and Collins, 2008; Visel et al., 2009). Regarding the evolution of humans, about 40 years ago a comparison of protein-coding sequences for orthologous human and chimpanzee proteins revealed few amino acid altering differences. This led to a hypothesis that human traits evolved largely from changes in non-coding sequences that regulate gene expression

(King and Wilson, 1975). Thus, phenotypic differences in populations and over evolutionary time seemed to be largely shaped by cis-regulatory sequence evolution compared to protein-coding sequence. One reason for this disparity might be the sheer number of CREs compared to protein coding genes, which is apparent in genome analyses for several animal species (Table 1.1). For the

2 human genome, the ENCODE (Encyclopedia Of DNA Elements) project produced data suggesting that cis-regulatory sequences comprise between 20-

80% of the genome (The ENCODE Project Consortium, 2012) compared to <5% protein-coding sequence (Pennisi, 2012).

Estimated # of Estimated # of Species Citation Citation CREs genes (Kvon et al., (Pertea and Salzberg, H. sapiens Several million ~25,000 2014) 2010) (Shen et al., (Pertea and Salzberg, M. musculus >300,000 ~25,000 2012) 2010) (Kvon et al., (Pertea and Salzberg, D. melanogaster >100,000 ~15,000 2014) 2010) Table 1.1: Estimated numbers of CREs and genes within model organism genomes.

Recently, some seminal case studies have identified which mutations in

CREs underlie various developmental disorders. One such study found axial polydactyly was due to a nucleotide substitution in a CRE controlling the expression of the Shh gene in the developing limb (Lettice et al., 2012). In another study, the development of a cleft lip/plate was caused by nucleotide substitutions in a CRE controlling Wnt9b-Wnt3 expression that induced cleft lip/plate (Ferretti et al., 2011). Elsewhere, CRE mutations were found that increases the risk for heart attack (Musunuru et al., 2010) and separately for type

II diabetes (Kulzer et al., 2014). While the aforementioned mutations in CREs have unwanted effects, some mutations can be beneficial. A study found that the occurrence of malaria resistance was the result of a single “T” to “C” nucleotide substitution within a CRE for the DARC gene. This disrupts the expression of the Duffy antigen in red blood cells, a protein whose presence on

3 the surface of red blood cells is needed by a malaria parasite for infection

(Tournamille et al., 1995). For most humans, the frequency of lactose tolerance declines quickly after weaning. A minority, including many northern Europeans and some African populations, of humans can digest lactose throughout their adulthood. It was found that this persistence of lactose tolerance in these individuals were due to the inheritance of mutant CRE alleles that regulates the

LCT gene’s expression. A gene which encodes the enzyme lactase breaks down dietary lactose (Tishkoff et al., 2007).

Besides variation between individuals of the same species, CRE mutations have been identified that lead to changes between species. For example, the opposable thumb of primates correlates to a CRE with an evolved activity from multiple mutations (Prabhakar et al., 2008). Evolutionary changes in a CRE controlling Prx1 expression correlates with the evolution of bat wings

(Cretekos et al., 2008). Moreover, female-specific pheromone production for D. melanogaster fruit flies was shaped by an evolved activity for a CRE controlling desatF expression (Shirangi et al., 2009). Since starting my dissertation in 2009, many other instances of CRE evolution have been characterized (Martin and

Orgogozo, 2013). My dissertation research has aimed to understand how CREs: encode their specific regulatory activity that controls a pattern of gene expression, and connect genes together into coherent networks that operate during development. Additionally, my research has formalized methods to study

CREs in vivo.

4 CREs and Gene Regulatory Networks Govern Trait Development

For multicellular organisms, generally the same genome is in every cell following the formation of the single cell zygote. Yet, cells have different functions at different times in development, and ultimately become differentiated as a specific cell type (Davidson, 2006). A fundamental question is how can cells have such diversity in identity and function? The answer is through the interactions of numerous genes into gene regulatory networks, each makes coherent patterns of gene expression. These temporal differences during development and ultimately differences between cell types result from differences in gene expression

(Carroll, 2008; Davidson, 2006). While all transcription is initiated at the promoter regions of genes, the temporal and cell-specific patterns of expression are controlled by CREs (Figure 1.1). CREs are typically non-coding DNA sequences that range from around a hundred to a thousand base pairs in length and are often found upstream, downstream, or in the intron of the gene whose expression they regulate (Levine and Davidson, 2005). However, some CREs can be located at a great distance (Ghavi-Helm et al., 2014), including Shh which has a limb bud specific CRE that is ~1 million base pairs away from its promoter

(Lettice, 2003).

The CREs in an animal genome collectively drive a diverse repertoire of expression patterns, although these regulatory activities are encoded with just the four DNA nucleotides: A, T, C, and G. For any one CRE, its expression

5 pattern occurs through the possession of short DNA motifs for which each functions as a binding site for a transcription factor. The typical CRE has an average binding site size of 10 nucleotides (a range of 5-31 nucleotides) per binding site in (Stewart et al., 2012) and a comparison of several well characterized CREs found that each possessed binding sites for 4 or more transcription factors (Arnone and Davidson, 1997). In cells lacking the necessary transcription factors for a CRE, gene expression remains “off” (Figure 1.1A). But cells with the correct combination of transcription factors, the binding sites are occupied and expression can be turned “on” (Figure 1.1B).

Figure 1.1: Transcription factor binding dictates cell specific gene expression. (A) The lack of transcription factor interactions with CRE binding sites (color rectangle) can leave transcription “off” in cell type 1. (B) Combinatorial binding of transcription factors to their CRE binding sites results in transcription in Cell Type 2. Transcription factors and their interactions with CREs establish regulatory connections between the ensembles of genes whose coordinated expression results in a specific developmental outcome. Arrow represents the promoter region.

6 The interaction of transcription factors with CREs of downstream genes form a gene regulatory network. One well-studied gene regulatory network is the set of genes that are responsible for mesoderm development in the fruit fly

Drosophila (D.) melanogaster (Figure 1.2). An early event in network’s activity is to turn “on” the expression of the transcription factor Twist. This activation occurs, in part, by the transcription factor Dorsal interacting with binding sites in a

CRE controlling Twist expression (Sandmann et al., 2007). Twist then activates the expression of many target genes, including Mef2. Mef2 expression requires the combinatorial CRE-binding of several other network transcription factors.

Similar regulatory events occur throughout the network, turning transcription “on” and “off” by the interaction of the upstream transcription factors with the downstream CREs. These interactions drive different mesoderm cells to differentiate into fat body, somatic, cardiac, and visceral cell types (Bonn and

Furlong, 2008). Collectively, this well studied gene regulatory network demonstrates how many transcription factors function within a single network through interactions with target gene CREs, and these connections result in the specialization of a cell type. Similarly, other developmental processes are each controlled by a gene regulatory network with its unique set of transcription factors interacting with target gene CREs (Davidson, 2006). Thus, networks have complex structures for transcription factor to CRE interactions. It’s possible these structures affect where in the networks changes occur that are responsible for trait evolution.

7 Figure 1.2: Early mesoderm gene regulatory network. The activation of twist from the binding of the Dorsal transcription factor causes a cascade of transcriptional activation and repression events that eventually lead to the specialization of various cell types through the expressions of the requisite differentiation genes. Coloration denotes the genes governing early mesoderm development. Arrows represent transcription factor binding which contributes to the activation of gene expression. slp1 (purple) represses the visceral cell specific pathway which is represented with a connection ending with a nail-head shape. Squares indicate specification of cell type genetic pathways (Adapted from Bonn and Furlong, cis-Regulatory networks during development: a view of Drosophila, 2008).

Pleiotropy and its Impact on the Genetic Paths of Evolution

When morphological diversity evolves, do the genetic, molecular, and developmental paths favor the fixation of certain types of genetic changes?

Several leading evolutionary biologists have argued that evolution will more frequently utilize CRE alterations to affect gene expression (Carroll, 2008; Stern and Orgogozo, 2008; Wray, 2007). One reason for this proposed evolutionary preference is that diverse animal phyla share many of the same developmental

8 genes and signaling pathways, thus evolution seemingly has repurposed this

“genetic toolkit” in innumerable ways (Carroll, 2008). A second reason rests on the observation that genes are generally pleiotropic and pleiotropy places a constraint on the ability of a sequence to evolve (Stern and Orgogozo, 2009). So a key question emerges, why would evolution prefer modifying CREs relative to protein-coding sequences?

The majority of protein-coding genes are expressed in multiple cell types and/or times during an animal’s life. Thus, protein-coding sequences are quite pleiotropic. Regulatory genes, such as transcription factors and signaling pathway components have further magnified pleiotropy because they regulate numerous downstream genes. This high pleiotropy seemingly exceeds that for terminal differentiation genes that encode proteins that make the cell specific phenotype and which do not regulate other genes (Stern and Orgogozo, 2008). A pleiotropic gene can be regulated by multiple CREs to subdivide the regulatory control of the gene’s composite expression pattern (Carroll, 2005; Wittkopp and

Kalay, 2012). For a hypothetical gene with a protein-coding sequence and a single CRE, there would be no difference in the pleiotropy between these gene components. In this situation, the evolvability of both gene components would be similarly constrained (Figure 1.3A). In contrast, for a hypothetical gene with a protein coding sequence and multiple CREs, there would be greater pleiotropy for the protein-coding sequence compared to the CRE. In this situation, the evolvability of a CRE would be greater than that for the protein-coding sequence

9 (Figure 1.3B) since evolution could “tinker” with a single domain of expression without altering the gene’s function across the whole organism unlike the protein- coding sequence (Carroll, 2005; Jacob, 1977).

Figure 1.3: The theoretical relationship between pleiotropy and evolvability for coding sequences and CREs. (A) For a gene with a single protein-coding region and a single CRE, evolvability will be similarly limited for both gene parts. (B) For a gene with a protein-coding sequence and multiple CREs each regulating a unique expression pattern, the evolvability of a CRE will be greater than the protein-coding sequence due to its reduced pleiotropy.

Several case studies have shown instances of protein-coding sequence evolution underlying an evolved morphological trait (Stern and Orgogozo, 2008).

However, these cases can be considered the exceptions that support why CRE evolution may be the more common genetic path of evolution (Carroll, 2005).

One well known case is for the Rock pocket mice, which have evolved a melanic coat color to adapt to their environment of dark larva rocks. This phenotype differs from neighboring populations that live on light colored sands and which mice generally exhibit a brown pelage. The pelage color differences between

10 these populations are due to mutations in the melanocortin-1-receptor (Mc1r) protein-coding sequence (Nachman et al., 2003). A second example is where an exon deletion in the ocular and cutaneous albinism type II (Oca2) gene played a prominent role in the evolution of albinism in several cavefish populations (Protas et al., 2006). Mc1r and Oca2 are thought to only function in pigmentation. Thus, the coding sequences of these genes are of low pleiotropy perhaps equal to that for the CRE(s) controlling their expression.

However, gene pleiotropy seems to be the norm (Stern and Orgogozo,

2008) and resultantly, coding sequence evolution may be generally less evolvable than CREs. For example, the Sonic hedgehog (Shh) gene plays multiple functions in the same species and diverse functions between related species. These include the development of the central nervous system, fins, limbs, feathers, hair, teeth, foregut, and morphogenesis in oral cavity and hindgut

(Lettice, 2003; Sagai et al., 2009). Since Shh is used in so many different contexts, the gene is considered highly pleiotropic, and null mutants are unviable

(Heussler and Suri, 2003). Moreover, mutations that broadly alter protein function can be expected to have highly deleterious effects to the organism due to pleiotropy, thus restricting the evolvability of the protein coding sequence. Not surprisingly, Shh has multiple CREs to regulate these different facets of expression throughout development (Rebeiz and Williams, 2011). This regulatory modularity allows for individual uses of Shh to be modulated by CRE evolution, which contributed to Shh utilization in novel traits like that for feathers (Yu et al.,

11 2002). Similar cases of morphological evolution have been reported for changes in the limb CRE of the Prx1 gene between bats and mice (Cretekos et al., 2008), a deletion in a CRE for the Pitx1 gene caused the loss of pelvic fins in freshwater threespine sticklebacks while allowing Pitx1 to remain functional elsewhere

(Chan et al., 2010), and CRE alterations within D. melanogaster underlie sex specific abdominal pigmentation patterning (Williams et al., 2008). Thus, pleiotropy seems to act as a guiding force for the “paths of genetic evolution”, a force that favors CRE alterations when multiple CREs control the composite expression for a pleiotropic gene (Rebeiz et al., 2009a). At the outset of my graduate studies, few cases of CRE evolution were shown for a pleiotropic gene and moreover, it was unknown to what extent pleiotropy may shape the repeated evolution of the same gene or CRE.

Fruit Fly Pigmentation Patterning as a Model Trait for the Evolution of Gene Regulation

Fruit flies, in particular D. melanogaster, have been a prolific model to study genetics, a model that was championed by Thomas Hunt Morgan who received the Nobel Prize for Physiology or Medicine in 1933 for his work on the role of chromosomes in heredity (Morgan, 1910). In the1980’s, multiple genes were discovered that controlled embryonic development in fruit flies which led to the Nobel Prize for Physiology or Medicine in 1995 (Lewis, 1978; Nüsslein-

Volhard and Wieschaus, 1980). This strong genetic foundation was followed by the sequencing of the D. melanogaster genome (Adams et al., 2000). In one

12 study it was found that this species has an ortholog for 75% of the known human disease genes (Reiter et al., 2001), and many of these homologous genes are used in the development of the body plans for diverse animals (Carroll et al.,

2004). My dissertation research utilized fruit flies to understand how the gene regulatory activity for a CRE is encoded in DNA sequence, and to see how CRE evolution occurs when development and traits evolve. One challenge with studying CREs pertains to their rapid rate of evolution (Swanson et al., 2011).

For this reason, the evolution of traits between long diverged organisms (at the level of genera or higher taxonomic ranks) hides function altering mutations in a sea of additional changes that are not relevant to the trait. Therefore, model traits are needed for which diversity has evolved between closely-related species or even within the same species.

The fruit fly pigmentation patterns on the cuticle tergites covering the dorsal surface of the A1-A6 abdominal segments vary significantly among various species of Drosophila (Figure 1.4); species spanning up to 50-70 million years of evolutionary divergence. Some species exhibit sexually monomorphic patterns, where both sexes exhibit the same pigmentation pattern in their abdomen (Figure 1.4A-B). Other species have evolved sexually dimorphic pigmentation patterns (Figure 1.4C-E). In D. melanogaster, different populations display sexually dimorphic patterning which varies in the amount of pigmentation on the female A5 and A6 tergites, while males generally have pigmentation in the

A5-A6 segments at the tip of the abdomen (Figure 1.4E and F). In addition, one

13 CRE for D. melanogaster populations pigmentation patterning (known as the dimorphic element) has a sequence identity of ~98% (Rogers et al., 2013). With the variation in fruit fly pigmentation patterning and the high sequence identity for certain gene sequences, this ideal model trait preserves the changes relevant to trace the evolutionary underpinnings of pigmentation development and evolution while the collateral irrelevant changes are seemingly modest.

A A1 B C A2 A3 A4 A5 A6

D E F

A5 A5 A6 A6 D. mel. D. mel. Figure 1.4: Variable pigmentation patterning in different populations and species of fruit flies. Adult abdomens photographed with bright-field stereomicroscopy. The color patterning of the dorsal tergites is shown. (A) Zaprionus ghesquierei; monomorphic. (B) Drosophila tripunctata; monomorphic. (C) Drosophila guttifera; sexually dimorphic. (D) Drosophila funebris; sexually dimorphic. (E) Drosophila melanogaster population from Oahu, ; sexually dimorphic. (F) Drosophila melanogaster population from Mumbai, India; sexually dimorphic.

In the past decade, the genetic basis for tergite pigmentation has received considerable attention in D. melanogaster. Through a multitude of studies, genes encoding enzymes involved in pigment metabolism have been identified

(Wittkopp et al., 2003). This includes the genes tan (True et al., 2005) and yellow

14 (Walter et al., 1991) whose expression and function are needed to make black melanins (Jeong et al., 2008), and the ebony gene whose expression and function are needed to make more yellow colored pigments (Hovemann et al.,

1998; Rebeiz et al., 2009a). In addition, a few key transcription factors had been identified that regulates pigmentation enzyme expression. These transcription factors consist of Abd-B that specifies the abdominal segments from A5 and A6 through to the posterior tip of the abdomen (Jeong et al., 2006; Kopp and

Duncan, 2002; Kopp et al., 2000; Williams et al., 2008), DSXM and DSXF which are sex-specific isoforms that allows for female repression and male activation of pigmentation through interacting with the bric-a-brac 1 (bab1) and 2 (bab2) genes of the bab locus. Both Bab1 and Bab2 proteins are thought to function as transcription factors that repress pigmentation (Couderc et al., 2002; Kopp et al.,

2000; Williams et al., 2008). Together, they control the sexually dimorphic pigmentation pattern through the pigmentation genes. The sexually dimorphic expression of bab plays a central role in the D. melanogaster pigmentation network, and this sexually dimorphic pattern of expression evolved from a monomorphic ancestral state (Williams et al., 2008).

In 2008, a paper was published that revealed the CRE basis for sexually dimorphic bab expression in D. melanogaster and showed how this expression evolved (Williams et al., 2008). The study found two CREs that govern the bab expression called the anterior element and the dimorphic element. The anterior element regulates bab expression in the A2-A4 abdominal segments in both

15 sexes while the dimorphic element regulates female specific bab expression in the A5-A7 segments and genitalia. Additionally, this study found fourteen Abd-B and two DSX transcription factor binding sites were discovered that are required for the regulatory capability of the dimorphic element. Surprisingly at the time, it was found that a monomorphic species, D. willistoni, possesses a dimorphic element, albeit with activity limited to the female A7 segment and genitalia.

Moreover, this species CRE possessed many of the same Abd-B and DSX binding sites, but changes in the number, spacing, and polarity of these binding sites altered the dimorphic element’s regulation of bab (Williams et al., 2008).

From this work, many new questions emerged about bab function and its evolution. Has the dimorphic element been recurrently modified in more recent evolutionary history to modify pigmentation? My dissertation addressed this question and the results are presented in Chapter 3. Besides Bab, Abd-B, and

DSX, what other transcription factors shaped D. melanogaster tergite pigmentation? My dissertation addressed this question and the results are presented in Chapter 4. In 2009, when my dissertation research began to address these questions, I needed the ability to qualitatively visualize and quantitatively compare the regulatory capabilities of CREs. I formalized methods to accomplish this, which is presented in Chapter 2.

16 Methodological Approaches to Study the Evolution of Gene Regulation

Figure 1.5: GFP reporter transgene elucidates regulatory patterning. (A) Three components that comprise a reporter gene; the cis-regulatory element directs the expression pattern of the gene (brown rectangle), the promoter is the region where the transcriptional machinery is assembled and where transcription begins (arrow), and the protein-coding sequence (here, Green Fluorescent Protein or GFP; green rectangle) is used to visualize the CRE regulatory pattern. (B) Fly pupa cartoon illustrating a hypothetical CRE driving reporter gene expression in the developing eyes (green ovals).

CRE mutations are suspected to be a major source of variation in phenotypic traits (Visel et al., 2009), and morphological diversity (Carroll, 2008), and certainly modified the regulatory capability of the dimorphic element

(Williams et al., 2008). Thus, methods are needed to quantitatively compare the effects of such mutations. In D. melanogaster, the use of reporter transgenes provided a way to visualize the activity of a CRE. Specifically, a reporter transgene is comprised of a CRE, situated adjacent to a minimal promoter

(where the transcriptional machinery gathers and beings transcription but can’t initiate expression on its own). Together this CRE and promoter initiate the transcription of a 3’ situated coding sequence for a visible protein like the Green

Fluorescent Protein or GFP (Figure 1.5A).These reporter transgenes can be inserted into the D. melanogaster genome by various methods to visualize a

17 CRE regulatory pattern; for example a CRE that regulates gene expression in the eye of a fruit fly pupae (Figure 1.5B). By comparing the CRE alleles from different populations or comparisons of orthologous CREs from different species, differences in their regulatory activity could be observed and further studied.

For D. melanogaster, about 30 years ago a method was developed to use

P-elements, a type of transposon, to introduce reporter transgenes into the D. melanogaster genome (Spradling and Rubin, 1982). A major limitation with this technique is the genomic placement of a reporter transgene is random and its expression is influenced by its local chromatin. Thus, the comparisons of CREs in these separate genome insertion sites are at best qualitative but not quantitative. Several years ago a new methodology was created to circumvent this limitation. Through site-specific integration with ϕC31 integrase and attP and attB attachment sites (Groth et al., 2004), reporter transgenes could be inserted into the same genomic site. In Chapter 2, I present a publication in which I formalized a quantitative approach to study the regulatory activities of site- specifically inserted CREs.

While CREs adjacent to reporter transgenes make possible the studies of the regulatory activities for CRE sequences, this approach is incapable of entirely understanding how a CRE functions in its endogenous context. This includes communicating with its endogenous target gene promoter, and whether interactions occur with additional CREs. Until recently, homologous

18 recombination, a very labor intensive and non-scalable experiment, was used to substitute gene sequences in D. melanogaster (Rong and Golic, 2001). Following the discovery of CRISPR (Clustered Regularly Interspaced Short Palindromic

Repeats) in bacteria (Ishino et al., 1987; Jansen et al., 2002; Mojica et al., 2000), scientists have begun to utilize these components in animals and plants to edit genomes with precision and with relative ease (Gratz et al., 2013b; Guo et al.,

2014; Hai et al., 2014; Hruscha et al., 2013; Jiang et al., 2013; Niu et al., 2014;

Wang et al., 2014). In Chapter 5, I report my efforts to devise a technique to combine CRISPR with ϕC31 integrase, called “CRISPR CREam”, in order to substitute various dimorphic element alleles into the D. melanogaster bab locus.

Success here will facilitate understanding of how the mutations I identified through reporter transgenes (Chapter 3) functionally translate to the endogenous bab locus context.

Collectively, this dissertation has contributed to the understanding of

CREs and gene regulatory networks, and their evolution. En route, this dissertation has resulted in the advancement of methods to study gene regulation and its evolution. These outcomes present opportunities for future studies to further elucidate an understanding of the non-coding genome.

19 CHAPTER II

QUANTITATIVE COMPARISON OF CIS-REGULATORY ELEMENT (CRE)

ACTIVITIES IN TRANSGENIC DROSOPHILA MELANOGASTER

This work was originally published in the peer-reviewed scientific journal, Journal of Visualized Experiments, in 2011 under the following citation: Rogers, W.A. and Williams, T.M. (2011). Quantitative comparison of cis-Regulatory Element

(CRE) activities in transgenic Drosophila melanogaster. Journal of Visualized

Experiments. Dec; 58 (e3395): 1-6.

Abstract

Gene expression patterns are specified by cis-regulatory element (CRE) sequences, which are also called enhancers or cis-regulatory modules. A typical

CRE possesses an arrangement of binding sites for several transcription factor proteins that confer a regulatory logic specifying when, where, and at what level the regulated gene(s) is expressed. The full set of CREs within an animal genome encodes the organism’s program for development (Davidson, 2006), and empirical as well as theoretical studies indicate that mutations in CREs

20 played a prominent role in morphological evolution (Carroll, 2008; Stern, 2000;

Wray, 2007). Moreover, human genome wide association studies indicate that in CREs contribute substantially to phenotypic variation

(Sethupathy and Collins, 2008; Visel et al., 2009). Thus, understanding regulatory logic and how mutations affect such logic is a central goal of genetics.

Reporter transgenes provide a powerful method to study the in vivo function of CREs. Here a known or suspected CRE sequence is coupled to heterologous promoter and coding sequences for a reporter gene encoding an easily observable protein product. When a reporter transgene is inserted into a host organism, the CRE’s activity becomes visible in the form of the encoded reporter protein. P-element mediated transgenesis in the fruit fly species Drosophila (D.) melanogaster (Spradling and Rubin, 1982) has been used for decades to introduce reporter transgenes into this model organism, though the genomic placement of transgenes is random. Hence, reporter gene activity is strongly influenced by the local chromatin and gene environment, limiting CRE comparisons to being qualitative. In recent years, the phiC31 based integration system was adapted for use in D. melanogaster to insert transgenes into specific genome landing sites (Bischof et al., 2007; Groth et al., 2004; Venken et al.,

2006). This capability has made the quantitative measurement of gene and, relevant here, CRE activity (Markstein et al., 2008; Rebeiz et al., 2009a; Williams et al., 2008) feasible. The production of transgenic fruit flies can be outsourced,

21 including phiC31-based integration, eliminating the need to purchase expensive equipment and/or have proficiency at specialized transgene injection protocols.

Here, we present a general protocol to quantitatively evaluate a CRE’s activity, and show how this approach can be used to measure the effects of an introduced mutation on a CRE’s activity and to compare the activities of orthologous CREs. Although the examples given are for a CRE active during fruit fly metamorphosis, the approach can be applied to other developmental stages, fruit fly species, or model organisms. Ultimately, a more widespread use of this approach to study CREs should advance an understanding of regulatory logic and how logic can vary and evolve.

Protocol

Overview: This protocol demonstrates a method capable of quantitatively measuring the gene regulatory activities for cis-regulatory element (CRE) sequences in Drosophila (D.) melanogaster. This protocol can be used to compare the regulatory activities possessed by: wild type and mutant CRE forms, naturally-occurring CRE alleles found within a species, or orthologous CREs between diverged species.

22 I. Site-specific integration of reporter transgenes into the Drosophila melanogaster genome.

A. Reporter transgene construction

1. As a first step, any CRE evaluated must be separately cloned into a reporter

vector that contains an (1) attB bacterial attachment site sequence used for

site-specific transgene integration(Bischof et al., 2007; Groth et al., 2004;

Markstein et al., 2008; Venken et al., 2006), (2) multiple cloning site upstream

of a (3) heterologous promoter that is followed by the (4) coding sequence for

a fluorescent protein (such as EGFP or DsRed).

2. While any vector can be made compatible for phiC31 mediated site-specific

integration by introducing an attB sequence into the vector backbone, the

vector pS3aG (Rebeiz et al., 2009a; Shirangi et al., 2009; Williams et al.,

2008) can be obtained from the addgene plasmid repository

(http://www.addgene.org/pgvec1). pS3aG is a customized version of the P-

based transformation vector (Barolo et al., 2000) in which one of the two gypsy

insulators was replaced by a SfIb insulator, it contains an enhanced multiple

cloning site, and possesses a 250 base pair sequence containing an attB

attachment site. CREs of interest are inserted into the multiple cloning site

upstream of an hsp70 minimal promoter and the EGFP gene that encodes an

enhanced version of Green Fluorescent protein that localizes to the nucleus.

23 B. Site-specific integration of reporter transgenes

1. For a set of CREs whose activities are to be compared as a part of reporter

transgene vectors, the created vectors are separately integrated into the same

genomic landing site. Here the phiC31 phage integrase protein catalyzes the

unidirectional recombination between the bacterial (attB) attachment site in the

transgene vector and a genomically located phage (attP) attachment site

(Groth et al., 2004). Several groups (Bischof et al., 2007; Groth et al., 2004;

Markstein et al., 2008; Venken et al., 2006) have made a variety of transgenic

lines that include an attP site (so-called landing sites) and contain a genomic

source (Bischof et al., 2007) of phiC31 that expresses this protein in the germ

cells. These fly stocks can be readily obtained from the Bloomington

Drosophila stock center (http://flystocks.bio.indiana.edu/).

2. A published protocol exists detailing how to create transgenic Drosophila site-

specifically using phiC31 integrase (Fish et al., 2007). The equipment and

technical expertise to make transgenic Drosophila lines is no longer required

as several vendors exist (Table 2.1) to whom this service can be outsourced.

3. Transgene behavior can vary greatly from one tissue to the next (Markstein et

al., 2008). Thus a single attP landing site is not optimal for the analysis of all

CREs, developmental stages, and/or tissues of study. It is advisable to assess

the activity of a CRE of interest in several different landing sites in order to find

24 a site where the CRE’s activity is representative of the endogenous target

gene(s) expression pattern.

4. For obtained transgenic lines it is advisable to make them homozygous for the

transgene. CRE activity is generally more robust in individuals with two copies

of the transgene and for quantitative measurements it is imperative that

comparisons are made between individuals with the same number of

transgene copies. Homozygous individuals can be obtained through the use of

balancer chromosomes. For well characterized landing sites though, it is often

simpler to collect virgin males and females, and cross those with the more

intense eye color phenotype conferred by the mini-white gene (Figure 2.1), as

the white mutant rescue provided by the integrated transgene vector is more

dramatic in individuals with two vector copies.

II. Obtaining specimens or tissues of the appropriate developmental stage.

Obtain specimens for analysis at the time in development when the gene expression pattern(s) of interest occur(s) endogenously, hence the time when the

CRE under investigation should activate reporter gene expression. For

Drosophila, this can be either embryonic, larval, pupal or adult stages. Below we describe a method to stage adult and metamorphic flies; the latter of which takes place inside the puparium, a hard larval skin that contains the immobile specimen until it ecloses as an adult.

25 A. Expand transgenic lines to ensure specimens of the proper stage are

available when ready to analyze reporter gene expression by confocal

microscopy. Set up two vials each with several (~5- 10) male and female flies.

On alternating days, bump the flies from one of the vials to an unused vial.

Repeat this for a week. By the end of two weeks, transgenic fruit flies will be

available that range from larval to adult developmental stages.

B. Staging adult flies: The duration of time spent inside a puparium is 98 hours

for females and 102 hours for males when reared at 25oC. Adult flies can be

easily collected when they emerge from the puparium (called eclosion). This

timepoint can be considered 0 hours post eclosion. Adult flies can be reared

until the required timepoint for analysis. For example, a CRE called mel-oe1

that controls expression of the desatF gene in the adult oenocytes of D.

melanogaster females is not active immediately after eclosion but CRE activity

switches on after one day (Shirangi et al., 2009). When the correct stage is

obtained, anesthetize flies and situate in a drop of Halocarbon oil on a slide

and proceed to confocal evaluation.

C. Staging specimens during metamorphosis: At the end of the third-instar larval

stage the puparium is formed. Though at this stage the animal technically is

still a third instar larva, this developmental stage is easily identifiable as the

specimen is non-mobile, the puparium is soft and white, and the spiracles

have everted (Figure 2.2A). This timepoint can be considered 0 hours After

26 Puparium Formation (hAPF). At this stage, males can be distinguished from

females by the presence of bilaterally situated gonads near the midpoint of the

body that appear as translucent circles (boxed in Figure 2.2A and indicated by

black arrow in 2.2A”).

1. Staging based upon length of metamorphosis: At 0 hAPF, specimens can

be transferred to a fresh vial or Petri dish using a moistened paint brush.

To keep specimens from drying out, add a piece of Kimwipe and moisten

with water. Transfer vial or dish to a 25oC incubator until the desired hAPF.

2. Staging by visible morphological features: It is often inconvenient to

analyze specimens for CRE activity at a specific timepoint following the

collection of 0 hAPF specimens. Alternatively, one can identify correctly

staged specimens at a time convenient for analysis based upon the

presence and position of several morphological markers (detailed below)

that are visible through the puparium or after puparium removal (Figure

2.2).

2a. Morphological criteria for approximating metamorphic stage:

 0 hAPF: White prepupa stage where larva stops moving; anterior

spiracles everted (arrow in Figure 2.2A); lateral trachea visible; soft white

puparium (Figure 2.2A).

27  ~14 hAPF: Puparium tanned and hardened; head sac everted; lateral

trachea and gonads less distinct; legs and wings fully extended along

abdomen; Malpighian tubules not yet prominent and green (dashed boxed

region in Figure 2.2B); eyes are unpigmented (Figure 2.2B’).

 ~25 hAPF: Two parallel Malpighian tubules prominent and green in color

(dashed boxed region in Figure 2.2C).

 ~40 hAPF: Dark green yellow body positioned between the anterior ends

of the Malpighian tubules (red arrowhead in Figure 2.2D).

 ~50 hAPF: Yellow body positioned at the mid-point of the Malpighian

tubules (boxed region in Figure 2.2E and enlarged in 2.2E’’) and the eyes

are amber in color if wild type for white gene (needed for red eye color).

Eye color is lacking at this stage when the white mutant phenotype is

rescued by the mini-white gene that is part of the integrated reporter

transgene vector (Figure 2.2E’).

 ~70 hAPF: Dorsal thoracic microchaetae and macrochaetae visible

(Figure 2.2F, arrow); yellow body positioned at the posterior of the

Malpighian tubules (boxed region in Figure 2.2F boxed and enlarged in

2.2F’’); eye color is pale pink when rescued by the mini-white gene (Figure

2.2F’).

28  ~80 hAPF: Wing tips gray; Malpighian tubules located between anterior to

the midpoint of the abdomen (boxed region in Figure 2.2G and enlarged in

2.2G”); folds between abdominal segments and bristles on the abdominal

tergites are visible (Figure 2.2G and 2.2G”); and rescued eye color is

bright red (Figure 2.2G’).

 ~90 hAPF: Wings darkened to black; Malpighian tubules and yellow body

obscured by tanning of tergites; thoracic (arrow) and abdominal bristles

are mature and darkened (Figure 2.2H); and green meconium appears at

the dorsal posterior tip of the abdomen (Figure 2.2H’’).

Notes:

a. At late stages of metamorphosis, the sex of the specimens can be determined by genital morphology (arrowheads in Figure 2.2I and 2.2J) or the presence of darkened sex combs on the first set of male legs.

b. Further precision in staging can be achieved by consideration of additional morphological markers as described previously (Ashburner, M., Golic, K. G. &

Hawley, 2011).

29 D. Steps to remove specimen from puparium:

1. Using lab tape adhere a piece of packing tape to a dissection board with the

sticky surface facing upwards.

2. Wet a paint brush and apply moisture to pupariated specimens. Using a paint

brush, transfer specimens to a Kimwipe.

3. Using forceps, transfer specimens to the packing tape and adhere with the

dorsal surface facing up (curved side of puparium).

4. Allow specimens to dry for ~15 minutes. Dry puparium are much easier to

open up than those that are moist. At this point specimens can be coarsely

staged by morphological markers visible through the puparium.

5. Using forceps, progressively open up the puparium beginning at the anterior

operculum and proceed towards the posterior end. If a more fine-scale

dissection is needed to isolate a specific tissue this can be done in a watch

glass with PBS solution. Otherwise, transfer pupae to a drop of viscous HC700

Halocarbon oil (Sigma-Aldrich) on a microscope slide.

6. Situate specimens on a slide, using a stereomicroscope, and using the above

described criteria (section 2a) a specimen’s stage can be determined.

30

III. Confocal microscopic evaluation of reporter gene expression in a whole mount sample

1. For whole mount specimens, use either the 4X or 10X objectives. Using

fluorescence stimulated by mercury lamp exposure, or alternatively by viewing

in bright field, adjust the microscope stage to position specimen in the optical

field then bring specimen into focus.

2. Using the confocal microscope software, adjust the excitation wavelength for

EGFP (or a suitable wavelength when using another fluorescent protein).

3. To prevent photobleaching a specimen, begin with a laser intensity of 5-10%.

If signal is underwhelming, then increase the intensity as needed.

4. In order to improve the signal to noise ratio for confocal images, enable

software settings that average pixel measurements from replicate scans; such

as Kalman averaging or line and stack averaging. In our experience Kalman

averaging for three scans improves the signal to noise ratio without making the

time for image collection too long.

5. For quantitative comparisons of CRE activity it is essential that fluorescence

intensity for a specimen has not saturated many pixels. Using a z-section

31 where fluorescence appears most intense, adjust settings so that few pixels

are saturated. This can be done by switching to a saturation-warning lookup

table and adjusting the channel voltage, gain, and offset to settings where few

pixels are saturated. Lastly, in order to collect sufficient signal from a specimen

it is often necessary to adjust the confocal aperture.

6. Once the optimal settings are determined, run the z-scan.

7. When the scan is complete convert the image stack into a projection and save this image in the Tagged Image File Format or “TIFF”.

Note:

a. It is essential to use the same confocal settings for all replicate specimens

and reporter transgene lines that will be included in a quantitative comparison

of CRE activities.

IV. Quantifying cis-regulatory element activity

While some confocal acquisition software allows for further processing of projection images, we prefer to use the freely available Image J software program to evaluate confocal images (Abramoff and Magelhaes, 2004). Using

32 Image J recorded GFP expression patterns can be quantified as pixel value statistics within a specified area. Although images do not need to be in gray scale we prefer to do so.

1. With the Image J program started, open a TIFF image to be evaluated.

2. Click on the freehand selection button and outline the area of the specimen

where quantitation is desired; for example the region in Figure 2.3 within the

dashed yellow border. (see Note a below regarding the selection of an area to

quantitate.)

3. To get a pixel value statistic, click on the Analyze tab and then select measure.

A results box will appear showing the area, and the mean, minimum and

maximum pixel value scores. Record the mean pixel value score and repeat to

generate pixel value statistics for replicate specimens (see Note b regarding

replicate number).

4. We recommend collecting a second mean pixel value from a control region

where the CRE is not active; for example the region in Figure 2.3 within the

dashed red border. This latter value can be subtracted from the former value

to remove background effects from the measurement to get an adjusted mean

pixel value. In our experience this further reduces variation between replicate

specimens (see Note c regarding size selection for control region).

33

5. Use the adjusted mean pixel values from replicate specimens to calculate the

average regulatory activity for a CRE and the standard error of the mean. This

regulatory activity value and measurement of error can be compared to other

reporter transgenes where the CRE’s sequence differs (for example Figure

2.3).

Notes:

a. Image J allows for the user to select a defined shape and area from which a mean pixel value score will be determined. However, as specimen size often varies we prefer to use the Freehand selection tool to specify the area to be evaluated for each specimen.

b. A larger number of replicates (N) results in a better estimation of the mean regulatory activity for a given CRE. We find that an N between 4 and 8 is generally sufficient.

c. In our experience the size of the control area selected has little affect on the measured mean pixel value for the control region. In general, we select an area proportional in size to the area of interest.

34 Results

A wild type version of a D. melanogaster CRE, called the Dimorphic Element

(Williams et al., 2008), drives a high level of EGFP reporter expression in the A6 abdominal segment of female pupae (Figure 2.3A). A 13 base pair sequence within this element was found to be a binding site for the DSX transcription factor, and the in vivo importance of this sequence can be demonstrated by quantifying the regulatory activity for this CRE when this binding site is mutated. Considering the regulatory activity of the wild type Dimorphic Element to be 100±5%, mutation of this DSX site reduced the regulatory activity to 28±3%. Similar comparisons can be done of regulatory activities for intraspecific CRE alleles or between orthologous CREs from separate species. For example, considering the levels of

EGFP in female segment A6 driven by the D. melanogaster Canton S strain as a regulatory activity of 100±4%, orthologous CREs for the species D. simulans and

D. willistoni were found to respectively possess regulatory activities equal to

75±4% and 0±0% (Figure 2.3B).

35

Figure 2.1: Using the intensity of the white-rescue eye phenotype to make transgenic lines homozygous. In order to quantitatively evaluate reporter transgenes it is important that each specimen have the same reporter transgene genotype. (A) A white gene mutant genetic background with an attP landing site sequence is utilized for site-specific transgene integration. Transgenic individuals (B) hemizygous and (C) homozygous for the integrated vector can typically be distinguished by the intensity of the rescued eye color phenotype by the number of copies of the integrated mini white gene. Homozygous lines can be established by crossing (C) male and female flies with the darkest eye color phenotype.

Figure 2.2: Using morphological markers to determine the metamorphic stage for Drosophila. During metamorphosis from a larva to an adult fruit fly, the specimen transitions though a series of stereotypic morphologies that can be used to determine the sex and approximate the development stage. Specimens in B-H have been removed from their puparium. Boxes in panels A, E, F, G and H indicate the zoomed in regions respectively for panels A”, E”, F”, G”, and H”.

36 Figure 2.3: Quantitative comparison of cis-regulatory element activities. For all samples EGFP-reporter gene expression mediated by a particular Dimorphic Element was assessed in females at 80 hours after puparium formation. The number at the top of each image is the EGFP-expression level for the specimen that is calculated as the mean pixel value for the A6 abdominal segment (indicated for the upper left most image as the region within the dashed yellow border) minus the mean pixel value for the A4 segment (background correction; indicated for the upper left most image as the region within the dashed red border). For each reporter transgene four replicates were assessed to calculate a mean segment A6 pixel value. Regulatory activity is reported as the % of the (A) wild type or (B) Canton S strain female mean pixel value ±SEM. (A) GFP expression for females possessing a wild type of the Dimorphic Element and a mutant version where a single binding site for the DSX transcription factor was ablated. This mutant binding site reduced Dimorphic Element activity to 28±3% of the wild type sequence. (B) Comparison of the regulatory activities for sequences orthologous to the D. melanogaster (Canton S strain) Dimorphic Element. Considering the GFP expression mediated by a D. melanogaster Dimorphic Element allele as 100%, the orthologous sequences from the related species D. simulans and D. willistoni respectively possess activities of 75±4% and 0±0%.

37 Materials

Catalogue Material Name Type Company Comment Number Site-specific Choose a landing Best transgene Service Plan H site, receive Gene integration transformant lines Used to mount Sigma- Halocarbon Oil Reagent H8898 specimens for Aldrich confocal imaging Fine Fine pointed and Dumont #5 Tool Science 11252-40 durable for Forceps Tools dissection SZ61 Zoom Excellent optics for Stereo Tool Olympus Customizable working with fruit microscope flies FluoView Provides high- Confocal Tool Olympus Customizable resolution confocal Microscope images Table 2.1: Materials for visualizing transgenic flies.

Discussion

cis-regulatory elements encode the genomic program that specifies gene expression patterns and thereby the process of development (Davidson, 2006), and are prominent locations for both mutations underlying morphological evolution (Carroll, 2008; Stern, 2000; Wray, 2007) and phenotypic variation for human traits (Musunuru et al., 2010; Sethupathy and Collins, 2008; Visel et al.,

2009). In spite of this importance, the regulatory logic for CREs remain poorly understood. A prominent reason for this understanding deficit has been the lack of suitable methods to quantitatively compare functional sequences of CREs.

38 Here we present a protocol that capitalizes on improved transgenesis methods for D. melanogaster to add a quantitative aspect to the study of CREs.

In our experience when quantifying a CRE’s regulatory activity, variation between replicate specimens is of 10% or less of the adjusted mean pixel value when replicates are (1) of the same developmental stage and (2) the reporter transgene is in the same genomic landing site. This amount of variation is consistent with that determined for the CREs presented in Figure 2.3, and places a limitation on the use of this quantitative method to situations where genetically distinct versions of a CRE differ in regulatory activity by a value greater than

10%. Importantly though, this limitation is a significant improvement over the qualitative descriptions of “reduced” or “increased” that those studying CREs previously had to use when describing varying activity.

When versions of a CRE are known or found to quantitatively differ in their regulatory activities, this protocol makes feasible determining which of the mutational differences are responsible for or contribute to the activity difference

(Rebeiz et al., 2009a; Williams et al., 2008). The ability to identify these functionally-relevant mutations is a necessary first step to determine the molecular mechanisms, such as the gain or loss of transcription factor binding sites, causing CRE activities to differ. Looking forward, the use of this protocol for other Drosophila CREs and the application of similar methods in protocols for other model organisms should allow for a better understanding of CRE logic to

39 emerge. This includes an ability to discriminate functionally-relevant CRE mutations from the morass of functionally-neutral mutations.

Acknowledgments: We thank Nicolas Gompel and Benjamin Prud’homme for their contributions to the development of this protocol; Melissa Williams and four anonymous reviewers for comments on the manuscript; the University of Dayton

Graduate School for research fellowships to WAR; and the University of Dayton

Biology Department and Research Institute (UDRI) for research support for

TMW.

Disclosures: We have nothing to disclose.

40 CHAPTER III

RECURRENT MODIFICATION OF A CONSERVED CIS-REGULATORY

ELEMENT UNDERLIES FRUIT FLY PIGMENTATION DIVERSITY

This work was originally published in the peer-reviewed scientific journal, PLoS

Genetics, in 2013 under the following citiation: Rogers WA, Salomone JR, Tacy

DJ, Camino EM, Davis KA, et al. (2013) Recurrent Modification of a

Conserved Cis-Regulatory Element Underlies Fruit Fly Pigmentation Diversity.

PLoS Genet 9(8): e1003740.

Abstract

The development of morphological traits occurs through the collective action of networks of genes connected at the level of gene expression. As any node in a network may be a target of evolutionary change, the recurrent targeting of the same node would indicate that the path of evolution is biased for the relevant trait and network. Although examples of parallel evolution have implicated recurrent modification of the same gene and cis-regulatory element

(CRE), little is known about the mutational and molecular paths of parallel CRE

41 evolution. In Drosophila melanogaster fruit flies, the Bric-à-brac (Bab) transcription factors control the development of a suite of sexually dimorphic traits on the posterior abdomen. Female-specific Bab expression is regulated by the dimorphic element, a CRE that possesses direct inputs from body plan (ABD-

B) and sex-determination (DSX) transcription factors. Here, we find that the recurrent evolutionary modification of this CRE underlies both intraspecific and interspecific variation in female pigmentation in the melanogaster species group.

By reconstructing the sequence and regulatory activity of the ancestral

Drosophila melanogaster dimorphic element, we demonstrate that a handful of mutations were sufficient to create independent CRE alleles with differing activities. Moreover, intraspecific and interspecific dimorphic element evolution proceeded with little to no alterations to the known body plan and sex- determination regulatory linkages. Collectively, our findings represent an example where the paths of evolution appear biased to a specific CRE, and drastic changes in function were accompanied by deep conservation of key regulatory linkages.

Introduction

Recurrence is a widespread phenomenon in (Conway

Morris, 2005), where similar derived traits have often been found to evolve in parallel. This theme of recurrence extends to the molecular level, as the same genes are often targeted by evolutionary change to generate convergent

42 phenotypes (Gompel and Prud’homme, 2009). Illustrative examples include Pitx1 for pelvic reduction in stickleback fish (Shapiro et al., 2006), Oca2 for cavefish albinism (Protas et al., 2006), svb for fruit fly larval trichome loss (Sucena et al.,

2003), yellow for fruit fly wing pigmentation spots (Prud’homme et al., 2006),

Mc1r for vertebrate melanism (Mundy, 2005; Nachman et al., 2003), and ATPα for (Zhen et al., 2012) and RNASE1 for monkey dietary specializations

(Zhang, 2006). These examples of mechanistically biased evolution include gene duplications (Zhang, 2006; Zhen et al., 2012), amino acid altering mutations

(Mundy, 2005; Nachman et al., 2003; Protas et al., 2006; Zhang, 2006; Zhen et al., 2012), and mutations that modify gene regulatory sequences (Frankel et al.,

2012; Jones et al., 2012; Prud’homme et al., 2006). While the phenomenon of recurrent evolution of regulatory sequences is now well established, a mechanistic understanding of how transcriptional regulatory sequences change function is still in its infancy. Specifically, does bias in the path of evolutionary change extend to the level of individual protein-DNA interactions in the regulatory sequences that influence transcription?

Traits are generated during development through the combined activities of cooperating genes (Bonn and Furlong, 2008; Davidson, 2006; Levine and

Davidson, 2005). Most genes are composed of a coding sequence, and non- coding sequences that include one or more cis-regulatory elements (CREs) that control a gene’s overall expression pattern (Carroll, 2008). CREs possess binding sites for numerous transcription factor proteins (Arnone and Davidson,

43 1997), where each unique transcription factor and binding site(s) interaction can be considered a “regulatory linkage”. The types of linkages and their organization form a “regulatory logic” that integrates the regulatory state of a cell, and thereby directs a spatial and temporal output pattern of gene expression (Davidson,

2006). For a given trait, the multitude of genes, their coding and non-coding regions, and CRE regulatory linkages present an abundance of mutational targets to alter the phenotype. Hence, it might be expected that the genetic path of evolution could proceed by many routes and resultantly, would appear unpredictable in retrospect. However, mutations that are pleiotropic often reduce fitness (Cooper et al., 2007) and bear considerable deleterious effects (Carroll,

2008). As a result, evolution may more readily proceed by paths that minimize pleiotropy (Stern and Orgogozo, 2009).

It is unclear whether and how pleiotropy constrains the path of regulatory logic evolution: the gain and loss of binding sites for transcription factors.

Relatively few cases of CRE evolution have been characterized in sufficient detail (Arnoult et al., 2013; Chan et al., 2010; Crocker et al., 2008; Jeong et al.,

2006; Shim et al., 2012; Shirangi et al., 2009; Tournamille et al., 1995; Wang and

Chamberlin, 2002; Williams et al., 2008), and often a connection remains to be made between the causative mutations and the molecular mechanisms of evolved activity (Cretekos et al., 2008; Emera and Wagner, 2012; Frankel et al.,

2011; Jeong et al., 2008; Marcellini and Simpson, 2006; McGregor et al., 2007;

Prud’homme et al., 2006; Rebeiz et al., 2009a, 2011; Tishkoff et al., 2007;

44 Werner et al., 2010). Furthermore a vanishingly small number of studies have investigated the pleiotropic consequences of a CRE’s evolution. Thus, an important research goal is to advance a general understanding of the paths by which CRE function evolves. Extant CREs appear to be elegantly built with an intricate regulatory logic of transcription factor binding sites, and yet, when a

CRE’s function changes, how many steps does it take? Do the relevant mutations create or destroy binding sites for transcription factors that already interact with the CRE, or do they represent new factor inputs? If a model exists where independent paths of evolution can be traced in parallel, one could assess the general attributes of successful paths of CRE divergence. One suitable model is the sexually dimorphic abdominal pigmentation exhibited among species within the subgroup of Drosophila, which includes the model organism species Drosophila (D.) melanogaster.

The fruit fly abdomen consists of ten abdominal segments (annotated A1-

A10), the first seven of which are covered by dorsal cuticle plates (tergites). For

D. melanogaster, tergite pigmentation is sexually dimorphic, where the male A5 and A6 tergites are completely pigmented (Figure 3.1A) and female pigmentation is typically restricted to a posterior stripe similar to that observed on the more anterior A2-A4 tergites of both sexes (Figure 3.1B). These sex-specific phenotypes are the outcomes of a regulatory network that includes prominent genes from the body plan and sex determination pathways. The HOX protein

ABD-B is expressed in segments A5 and A6 of both sexes (Kopp, 2009; Wang

45 and Yoder, 2012), and positively activates melanin synthesis enzymes that generate dark color (Jeong et al., 2006, 2008). While ABD-B provides body-plan positional information to activate pigmentation enzymes, their male-limited expression results from the sexually dimorphic expression of the tandem duplicate bab1 and bab2 genes (collectively bab, Figure S3.2A). These paralogous genes encode the transcription factors Bab1 and Bab2 (collectively

Bab) that function as repressors of pigmentation development (Couderc et al.,

2002; Kopp et al., 2000). In the pupal abdomen, both Bab1 (Williams et al., 2008) and Bab2 (Kopp et al., 2000) are expressed in the A2-A7 segments of females, whereas male expression is limited to segments A2-A4.

Bab expression in female posterior abdominal segments is controlled by a

CRE located in the first intron of bab1 named the dimorphic element (Figure

S3.2A). This CRE contains regulatory linkages with the Hox protein ABD-B and sex-specific DSX protein isoforms through its possession of multiple binding sites for these two transcription factors (Figure 3.4A). Thus, the dimorphic element functions as a sexually dimorphic genetic switch controlling Bab expression. In males, ABD-B and DSXM (male DSX isoform) binding to this CRE represses Bab expression in segments A5 and A6; whereas in females, ABD-B and DSXF

(female DSX isoform) binding activates Bab expression at increasing levels from the posterior most A5 segment to the posterior A7 segment (Williams et al.,

2008).

46 The bab genes have been implicated in both intraspecific and interspecific pigmentation evolution. Variation in female abdomen pigmentation exists among

D. melanogaster populations (Parkash et al., 2008, 2009; Robertson and Louw,

1966; Robertson et al., 1977) and in some cases this variation has been linked to genetic differences at the bab locus (Bickel et al., 2011; Kopp et al., 2003).

Within the Sophophora subgroup of Drosophila, large-scale differences in pigmentation have been attributed to altered dimorphic element activity and consequent Bab expression (Williams et al., 2008). Furthermore, male-specific pigmentation and underlying dimorphic Bab expression are inferred to be the derived state, evolving from an ancestor with sexually monomorphic Bab expression and pigmentation (Jeong et al., 2006). This ancestor possessed a

CRE orthologous to the dimorphic element that drove Bab expression in the A7 and A8 segments (presumptive genitalia) of females (Williams et al., 2008), where it presumably regulated the development of other dimorphic traits

(Couderc et al., 2002; Kopp et al., 2000). In the lineage of D. melanogaster, the dimorphic element was modified to drive female-specific expression in the more anterior A6 and A5 segments. This expanded Bab expression pattern was essential to limit full tergite pigmentation to the male A5 and A6 segments.

Surprisingly, the ancestral dimorphic element was inferred to have possessed both the orthologous Dsx binding sites and 13 of the 14 Abd-B sites found in the

D. melanogaster CRE. An amalgam of changes were introduced along an evolutionary path of greater than 30 million years to arrive at the derived activity; including Abd-B binding site number, Dsx site polarity, and the spacing between

47 conserved binding sites (Williams et al., 2008). Whether gains and losses of other regulatory linkages were a part of this transition remains unknown.

Moreover, the simplicity and multiplicity of the mutations that occurred over this mesoevolutionary timescale (Abouheif, 2008; Wray, 2000) inspired several questions: Do evolutionarily relevant mutations in the dimorphic element occur over microevolutionary time scales? Have orthologous dimorphic elements been repeatedly functionally modified? Do commonalities exist between independent cases of dimorphic element evolution?

Here, we implicate alterations in the bab dimorphic element as an underlying cause of the recurrently evolving diversity of female abdomen pigmentation at both the intraspecific and interspecific scales of comparison.

Using this system to examine the evolution of regulatory logic along parallel paths, we characterized the mutational paths of dimorphic element divergence responsible for the diversification of intraspecific phenotypes using a gene reconstruction approach (Thornton, 2004). Inferring the ancestral dimorphic element sequence of extant D. melanogaster populations, we found that a small number of functionally-relevant mutations altered the ancestral CRE’s regulatory activity to generate derived capabilities. Intriguingly, mutations largely avoided the ancestral ABD-B and DSX regulatory linkages, presumably to preserve the ancestral function of this CRE in the A7 segment and genitalia where it presides over other dimorphic aspects of abdominal development. While not definitive, these results can be viewed to support the notion that evolution can be biased to

48 follow certain paths and such biases can pertain not only to certain genes in a network, or particular CREs, but that bias also permeates in how a CRE’s encoded regulatory logic evolves.

Results

Allelic Variation in a Sexually Dimorphic cis-Regulatory Element

Figure 3.1: Abdomen pigmentation correlates with the regulatory activity of dimorphic element alleles. (A) The A5 and A6 segment dorsal tergites of D. melanogaster males are fully pigmented, (B-H) whereas the female A5 and A6 tergite pigmentation varies from “Light” to a male-like “Dark” phenotype. (A’-H’) GFP-reporter transgene activity was measured in transgenic pupae at 85 hours after puparium formation (hAPF) and activity measurements were represented as the % of the D. melanogaster CantonS allele female A6 mean ± SEM. (A’) The regulatory activity of a male CantonS pupae. The regulatory activity of alleles from the following locations were measured: (B’) Oaxaca, Mexico (called Light 2), (C’) Crete, Greece, (D’) Kuala Lumpur, Malaysia (called Light 1), (E’) Mumbai, India, (F’) Kisangani, , (G’) Uganda, Africa (called Dark 1), and (H’) Bogota, Columbia (called Dark 2).

Bab expression in the female A5 through A8 abdominal segments of D. melanogaster is driven by the dimorphic element. This regulatory activity evolved from an ancestral state limited to the female A7 and A8 segments since the most recent common ancestor of D. melanogaster and D. willistoni, species that

49 diverged over 30 million years ago (Russo et al., 1995; Williams et al., 2008). It remained unknown whether the functional evolution of this CRE was limited to mesoevolutionary timescales, or whether recent transitions in activity occurred over microevolutionary timescales to diversify pigmentation patterns. Thus, we surveyed individuals from geographically diverse populations of D. melanogaster to identify those that differ in the extent of dimorphic abdominal pigmentation

(Figure S3.1).

In contrast to the invariant male pigmentation phenotype (Figure S3.1 and

Figure 3.1A), the extent of pigmentation varied greatly among the female A5 and

A6 tergites (Figure S3.1, and Figure 3.1B-H). Phenotypes ranged from unpigmented tergites that bear only a posterior stripe of pigment (e.g. Figure

3.1B) to complete A6 pigmentation (Figure 3.1H), extending in one instance to the A5 tergite (Figure 3.1G). We suspected that these “Light” and “Dark” pigmentation phenotypes stem from differences in Bab expression, due to dimorphic element alleles with different regulatory activities. Indeed, sequencing of dimorphic element alleles isolated from twenty seven separate populations revealed many genetic differences (Figure A1). To test whether the observed genetic variation could cause divergent dimorphic element activities, we tested a subset of these alleles for the ability to drive GFP reporter gene expression

(referred to as regulatory activity) in transgenic pupae. Relative to a previously characterized dimorphic element allele (Williams et al., 2008), we observed female regulatory activities ranging from 182 ± 10% down to 9 ± 2% (Figure

50 3.1B’-1H’), a 20 fold difference between the extreme alleles. Additionally, the level of dimorphic element activity generally correlated with the extent of female pigmentation (Figure 3.1), suggesting that this allelic variation is not coincidental but contributes to this variable phenotype.

bab Genotypic Variation Underlies Pigmentation Variation

The correspondence between dimorphic element allele activity and pigmentation was suggestive of causation. Hence, we performed a series of genetic tests to further implicate the bab locus, and more importantly, the dimorphic element. First, we sought a genetic association between dimorphic element allele genotype and pigmentation phenotype. Males from a stock derived from a Uganda population (Pool and Aquadro, 2007) that produces a “Light” female pigmentation phenotype (called Light 1, Figure 3.1D and S3.1A) were separately crossed to females from two different population stocks that exhibit a

“Dark” female pigmentation phenotype (called Dark 1, Figure 3.1G and S3.1AM; and called Dark 2, Figure 3.1H and S3.1AJ). F1 siblings were crossed to derive

F2 progeny. The phenotypes of 102 F2 female progeny from the Light 1 x Dark 1 cross were evaluated and 25, 54, and 23 respectively had Light, Intermediate, and Dark female pigmentation (Figure 3.2B-2D). This near 1:2:1 ratio (chi square p = 0.787) is indicative that this variable phenotype is largely due to a single semi-dominant gene. A subset of the F2 progeny were genotyped for a BstXI restriction fragment length (RFLP) present in the Light 1 dimorphic

51 element allele but not the Dark 1 allele. We found an invariant association between female progeny with the Light (Figure 3.2B) and Dark (Figure 3.2D) phenotypes respectively with homozygous for the Light 1 and Dark 1 dimorphic element alleles (Table S3.1). Moreover, females with an intermediate phenotype were heterozygous for this RFLP. We also found a similar genetic association for the F2 progeny hailing from the cross of Light 1 and Dark 2 (Table

S3.2). After backcrossing the Dark 1 phenotype into the Light 1 genetic background for ten generations, we found that two independent backcross lines retained a Dark 1 bab locus haplotype (Figure S3.2F). Thus, the bab locus or something in close linkage causes this strain’s Dark phenotype.

We performed genetic complementation tests to rule out the possibility that the genotype-phenotype associations were due to a variant linked to the bab locus. Light 1 and Dark 1 individuals were separately crossed to individuals with a bab locus null allele and pigmentation phenotypes were assessed for F1 progeny. Homozygous bab null mutants exhibit phenotypes present in both sexes, including fusion of the TS5, TS4, and TS3 leg tarsal segments and ectopic pigmentation on the A2-A4 segment tergites (Figure 3.2P and 3.2H), and several phenotypes limited to females. These female phenotypes include male- like pigmentation on the A5 and A6 tergites, posterior to anterior transformations of the A6, A7 and A8 (genitalia) segment morphologies (Couderc et al., 2002;

Kopp et al., 2000) (Figure 3.2H and 3.2L). While the Light 1, Dark 1, and Dark 2 bab loci complemented the bab null allele (bab-) with respect to the leg, A2-A4

52 tergite pigmentation, and female A7-A8 segment phenotypes, only the Light 1 locus fully-complemented the bab null allele with respect to female A5 and A6 tergite pigmentation (compare Figure 3.2E to 3.2F and 3.2G). These same patterns of complementation and non-complementation were reproduced when

Light and Dark lines were crossed to a deficiency line that included the entire bab locus (not shown), suggesting that the abdominal pigmentation phenotype is not due to mutations in the genetic background of the bab null allele, but rather allelic variation at bab between Light and Dark strains. Collectively, the most parsimonious conclusion from the genotype-phenotype association, genetic mapping, and complementation results is that the genetic basis for these Light and Dark female pigmentation phenotypes resides largely within the bab locus.

The failure of Dark lines to complement female A5/A6 phenotypes, whilst otherwise rescuing body-wide phenotypes of the bab null allele, suggested the existence of regulatory mutations underlying this phenotypic variation. Although a small number (6) of non-synonymous mutations were found that could potentially contribute to variation in abdominal pigmentation by altering Bab protein function

(Figure A2), we pursued the hypothesis that relevant mutations would be located in the dimorphic element since this CRE controls Bab activity in the segments where bab-regulated phenotypes vary among the studied populations.

53

Figure 3.2: bab locus allelic variation underlies phenotypic variation. (A) The A5 and A6 tergite phenotype for F1 females were intermediate to those from the parental Light 1 and Dark 1 stocks. F2 females had pigmentation phenotypes that were (B) “Light”, (C) “Intermediate”, or (D) “Dark”. (E-P) Complementation tests for population stock bab loci with a bab locus null allele. (E) The Light 1 stock complemented the bab locus null allele with regards to abdomen tergite pigmentation, whereas the (F) Dark 1, and (G) Dark 2 stocks failed to complement the null allele in segments A5 and A6 but complemented the null allele for the A3 and A4 segments. Light 1, Dark 1, and Dark 2 stocks complemented the bab locus null allele for (I-K) posterior abdomen phenotypes and (M-O) for the development of the leg tarsal segments. Females with a homozygous bab locus null genotype displayed (F) ectopic pigmentation on segments A3 through A6, and (L) lacked bristles on the A6 and A7 ventral sternites and the genitalia (g) had altered bristles and morphology. (P) Individuals with a homozygous bab locus null genotype had tarsal segments 5, 4, and 3 fused, and altered bristle morphology on tarsal segments 2 and 3. Red arrowheads and black arrows respectively indicate the location abnormal posterior abdomen and tarsus features.

54 Variation in Bab1 and Bab2 Expression

Considering that the phenotypic effects of these naturally occurring dimorphic element alleles and pigmentation phenotypes were restricted to the A6 and to a lesser extent the A5 abdominal segment (Figure 3.1), we suspected that mutations in the dimorphic element could cause the observed differences in pigmentation. This hypothesis would be supported by differing levels and/or patterns of Bab expression in the pupal abdominal epidermis for females that develop different pigmentation phenotypes. Thus, we characterized the pattern of Bab expression in the abdominal epidermis at the end of pupal development when tergite pigmentation is being specified. If the regulatory activity for the dimorphic element alleles identified in reporter transgene assays (Figure 3.1) were indicative of the endogenous Bab expression, then Bab1 and Bab2 expression should be elevated in females with Light tergite pigmentation compared to those with Dark pigmentation. Consistent with this expectation,

Bab1 and Bab2 were expressed robustly throughout the A2-A7 abdominal segments of Light 1 females (Figure 3.3A and 3.3F), while Bab1 and Bab2 expression were reduced in the A5 and A6 abdominal segments of Dark 1 female pupae (Figure 3.3B and 3.3G, red arrowheads). This reduction corresponds with the reduced regulatory activity of this strain's dimorphic element allele (Figure

3.1G’) and where the pigmentation develops on adult females (Figure 3.1G).

Compared to Dark 1 females that possess expanded pigmentation on the A5 and

A6 tergites, expanded pigmentation is limited to the A6 tergite of Dark 2 females

55 (Figure 3.1H). Consistent with the Dark 2 phenotype, the expression of Bab1, but not Bab2, was reduced in the A6 segment and to a lesser extent the A5 segment

(Figure 3.3C and 3.3H). These patterns of expression are consistent with the finding that the bab1 null pigmentation phenotype is limited to the female A6 tergite, whereas a bab2 null phenotype affects both the A6 and A5 tergite

(Couderc et al., 2002). We also characterized Bab expression in the developing female genitalia and analia that respectively develop from the A8 and A9/A10 segments. In contrast to the reduced expression seen in the A5 and A6 segments epidermis of Dark 1 females, expression in these more posterior structures was comparable to that observed for Light 1 females (compare Figure

3.3D and 3.3I to 3.3E and 3.3J).

Collectively, the genetic and expression data strongly supports the conclusion that the conspicuous Light and Dark female pigmentation phenotypes are due, at least in part, to allelic differences in dimorphic element regulatory activity. We were interested in revealing how these modified regulatory activities evolved. To accomplish this, it was essential to know the ancestral sequence and regulatory state.

56

Figure 3.3: Population level differences in Bab paralog expression. (A-C) The expression of Bab1 in the dorsal abdomens of female pupae at 85 hAPF. (A) Light 1 females display uniform Bab1 expression throughout segments A2-A6, whereas expression is reduced in the A5 and A6 segments of (B) Dark 1 and (C) Dark 2 females. (D and E) Expression of Bab1 in the female genitalia (g) and analia (a) at 29 hAPF. (F-H) Bab2 expression in the dorsal abdomen of female pupae is at 85 hAPF. Bab2 expression is (F) uniform throughout the A2-A6 segments of Light 1 females, (G) reduced in the A5 and A6 segments of Dark 1 females, and (H) uniform throughout the A2-A6 of Dark 2 females. (I and J) Expression of Bab2 in the female genitalia (g) and analia (a) is at 29 hAPF. Red arrowheads indicate segments where expression is reduced compared to more anterior segments, whereas yellow arrowheads indicate the segments where Bab2 is expressed at a higher level than that observed for Bab1 for Dark 2 females.

57 Resurrection of an Ancestral Dimorphic Element

Ancestral Sequence Reconstruction (ASR) has been an effective approach to study the path of protein functional evolution (Harms and Thornton,

2010; Thornton, 2004). This approach, to our knowledge, had been used only sparingly to study CRE evolution in Drosophila (Rebeiz et al., 2011), and primates (Prabhakar et al., 2008; Tishkoff et al., 2007), presumably due to the fact that CRE sequences evolve at an accelerated rate compared to protein coding sequence (Andolfatto, 2005; Richards et al., 2005; Shen et al., 2012), making reconstruction untenable when comparing organisms of distantly-related taxa. In the case here, the dimorphic element alleles share an ~98% sequence identity (Figure A1) and a most recent common ancestor of extant Drosophila melanogaster populations that existed ~60,000 years ago (Stephan and Li,

2007). Hence, we suspected that the ancestral sequence for these populations could be reasonably inferred.

The dimorphic elements from 27 populations of D. melanogaster were sequenced and aligned to those from several outgroup species. From this alignment (Figure A1), we used the principle of parsimony to infer the nucleotide state at each position for the most recent common ancestor of the D. melanogaster populations, including 52 polymorphic sites; a sequence that was named the “Concestor element” (Dawkins, 2004). For this sequence, the ancestral nucleotide states were unambiguous at 44 of the 52 sites. To test the robustness of this sequence’s regulatory activity to the ambiguous eight sites, we

58 tested alternate reconstructions that differed in the nucleotide states for these sites. We determined the regulatory activities for these reconstructions were comparable to that for the Concestor element (Figure A1 and S3.4). Therefore, we sought to identify which of the 44 unambiguous derived mutations were responsible for the diverse regulatory activities possessed by the Light and Dark alleles. From this point forward, the Concestor element sequence was utilized for the ancestral sequence and regulatory activity state.

Several observations were made from a comparison of the Concestor element sequence to the dimorphic element alleles (Figure 3.4A-E). First, the

Concestor element possessed all of the ABD-B (14) and DSX (two) sites that were characterized for the D. melanogaster CantonS strain sequence (Williams et al., 2008). Second, the Light 1, Light 2, Dark 1, and Dark 2 alleles respectively differ from the Concestor element by 20, 20, 22, and 20 derived mutations

(Figure 3.4A-E, vertical red lines), many of which are common to multiple alleles

(Figure A1). Third, we observed an excess of nucleotide substitutions relative to indel mutations (Figure 3.4B-E, thin versus thick red lines). Fourth, of the known binding sites, the only site gain/loss event caused by a derived mutation was

ABD-B binding site 10, which was lost in the Dark 1 and Dark 2 alleles (caused by mutation “G”, Figure A1).

With the dimorphic element alleles differing in regulatory activity by up to

20 fold (Figure 3.1), we wanted to evaluate how these activities compare to that

59 of the Concestor element. The regulatory activities were evaluated for the Light

1, Light 2, Dark 1, Dark 2, and Concestor element in a quantitative reporter transgene assay (Rogers and Williams, 2011). The Concestor element drove

GFP expression in females throughout the epidermis of the A6 and A7 abdominal segments and the genitalia, and at a comparatively lower level in segment A5

(Figure 3.4A’ and 3.4A’’). Compared to the Concestor element’s regulatory activity, the Light 1 and 2 alleles' activities were increased in the A6 segment to

184±8% and 220±8% of concestor, respectively (Figure 3.4B’ and 3.4C’).

Moreover, the Light 2 activity was increased in the A5 segment and expanded into the posterior region of segment A4. Conversely, compared to the Concestor element the A6 segment regulatory activities for the Dark 1 and Dark 2 alleles were reduced to 58±4% and 27 ± 3% respectively (Figure 3.4D’ and 3.4E’).

Additionally, the range of regulatory activities for the A6 segment was much greater than that for the A7 segment and genitalia (Figure 3.4A’’- E’’). These results demonstrate that the ancestral dimorphic element for extant D. melanogaster populations drove low, modest, and high levels of bab expression respectively in the female A5, A6, and A7-A8 segments (Figure 3.4). This ancestral regulatory element was modified by mutation events resulting in derived alleles that include increased, expanded, and reduced activities in the relatively more anterior abdominal segments. We next sought to determine which of the derived mutations were functionally-relevant to the evolved regulatory activities.

60

Figure 3.4: Dimorphic element alleles diverged from an ancestral state. (A- E) To scale representations of various dimorphic elements, including the (A, Concestor) inferred allele for the most recent common ancestor of extant D. melanogaster populations, and alleles from populations with Light (B, Light 1; C, Light 2) and Dark (D, Dark1; E, Dark 2) female pigmentation phenotypes. Dark blue and yellow rectangles respectively represent the fourteen Abd-B and two Dsx binding sites. Thin and thick red lines respectively represent derived point and indel mutations. (A’-E’ and A’’-E’’) Comparison of GFP-reporter gene activities in female transgenic pupae was at 85 hAPF. Activity measurements are represented as the % of the D. melanogaster Concestor element female (A’-E’) A6 mean ± SEM and (A’’-E’’) A7 mean ± SEM. Red upward and downward arrows respectively indicate segments with increased and decreased regulatory activity. Yellow arrowhead indicates a region of expanded regulatory activity. Lowercase letter “g” indicates expression in the genitalia.

Derived Regulatory Activities Stem From Few Functionally-Relevant Mutations

In order to identify allele sub-regions that possess functionally-relevant mutations, we created a series of chimeric dimorphic elements and quantitatively compared their regulatory activities to that of the Concestor element. Each

61 chimeric element was composed in part of Light 2 or Dark 1 allele sequence and the remaining sequence was from the Concestor element (Figure S3.3). For the chimeric elements containing some Light 2 dimorphic element sequence, most of this allele’s derived activity was conveyed by the central “core” region that is occupied by the previously characterized binding sites for the ABD-B and DSX transcription factors. The Light 2 core flanked by Concestor element sequences had a regulatory activity to 239±5%, compared to 153±10% when the Concestor element core was within Light 2 flanks (Figure S3.3E and S3.3F). A similar outcome was found for the Dark 1 dimorphic element. When this allele’s core sequence was flanked by Concestor element sequences, the chimeric element had an activity of 58±5%, whereas the reciprocal swap had no regulatory activity effect (106±2%; compare Figure S3.3J to S3.3K). Thus, for these two derived dimorphic element alleles, their unique regulatory activities principally stem from mutations in the core region.

The Light 2 core region has seven derived mutations (referred to as the

“C”, “F”, “H”, “J”, “K”, “L”, and “N” mutations, Figure A1), four of which also reside in the Light 1 core (C, F, K, and N). We individually substituted each of these mutations into the Concestor element in place of the ancestral nucleotide, and then tested whether these substitutions caused measurable effects on regulatory activity (Figure S3.4). Large mutational effects were only measured for the C, F,

62

Figure 3.5: Functionally-relevant mutations in dimorphic element alleles. (A) Dimorphic element allele phylogeny, including the outgroup species D. simulans (D. sim.). Alignment of sequences encompassing the (B) “D” mutation, (C) “E” mutation, (D) “F” mutation, (E) and the “L” mutation. Black background color for the E mutation indicates the 1 base pair overlap for the derived deletion and the adjacent Dsx binding site. (F-J) Comparison of GFP-reporter activity in female transgenic pupae at 85 hAPF, represented as the % of the D. melanogaster Concestor element female A6 mean ± SEM. Red upward and downward arrow respectively indicate segments with increased and decreased regulatory activity. Yellow arrowhead indicates expanded regulatory activity. Regulatory activities differing from the Concestor element due to the following derived mutations: (G) D mutation; (H) E mutation; (I) F mutation; and (J) L mutation. (K) Summary for the female A6 regulatory activities for modifications to the E mutation region. The Concestor element sequence is provided and the introduced modifications indicated by red bases. (L) Gel shift assays for annealed oligonucleotide probes containing the wild type (Concestor element, lanes 1-7), E mutation (lanes 8-14), and mutant (Dsx1 KO, lanes 15-19) Dsx1 binding site. The binding site sequences are included with mutant bases in red. For the Concestor element and E mutation probes, binding reactions used increasing amounts of the DSX DBD protein (from left to right: 0 ng, 8 ng, 16 ng, 31 ng, 63 ng, 125 ng, 250 ng, and 500 ng). For the Dsx1 KO probe, binding reactions used the following amounts of protein (from left to right: 0 ng, 8 ng, 31 ng, 125 ng, 500 ng). Blue and red arrowheads point to the respective locations of single or pair of DSX DBD monomers bound to the probe.

63 and L mutations; respectively these substitutions increased Concestor element activity to 140±6%, 160±6%, and 215±4% (Figure S3.4G, 3.5I and 3.5J). The C mutation is present in both the Light and Dark alleles being studied (Figure A1) and hence, cannot account for their differences in regulatory activity. When the F and L mutation were substituted together, regulatory activity was measured at a nearly additive 241±9% (Figure S3.4S). The Light 1 core differs from that of Light

2 by possessing a derived mutation, called “I” and lacking the L mutation.

However, the I mutation had no affect on regulatory activity when it was substituted into the Concestor element (Figure S3.4M). Collectively, the derived regulatory activities of the Light 1 and 2 dimorphic element alleles both require the F mutation (Figure 3.5D and 3.5I), and the further increased and spatially expanded activity of the Light 2 allele requires the L mutation (Figure 3.5E and

3.5J).

The Dark 1 core sequence possesses six derived mutations that include: the “C”, “D”, and “G” mutations, each of which also reside in the Dark 2 allele, and the “M” mutation that is unique to the Dark 1. This core also has the “H” and

“K” mutations that alter the C and T nucleotide expansions, though these occur in the Light alleles and were found not to cause significant regulatory effects (Figure

S3.4L and S3.4O). Interestingly, the G mutation had no measurable effect on activity (Figure S3.4K), although it was the only one found to alter a known ADB-

B site among the surveyed dimorphic element alleles. We conclude that the diversity of regulatory activities observed did not involve changes to the

64 regulatory linkage between ABD-B and the dimorphic element. Testing the D and

M mutations highlighted the functional relevance of the D mutation. When individually substituted into the concestor element, the D and M mutations respectively altered regulatory activity to 68±4% and 118±3% of the Concestor element (Figure S3.4H and S3.4Q). Though, when both the D and M mutations were substituted together, the net result was an activity of 68±3% (Figure S3.4T).

Thus, the strong effect of the D mutation is epistatic to the moderate effect of M.

As the complete Dark 1 core inserted between Concestor element flanking sequences had a regulatory activity of 58±5%, one or more core mutations must further reduce the Dark 1 allele’s activity, either by increments below our capability to detect or through epistatic interactions. However, the D mutation is responsible for most of this allele’s reduced regulatory activity (Figure 3.5G).

We next sought to find mutations underlying the further reduced regulatory activity of the Dark 2 allele. Like Dark 1, this allele possesses the D mutation, indicating the existence of an additional functionally-relevant mutation(s) in the core element. The only mutation unique to the Dark 2 core region was a 9 base pair deletion referred to as the “E” mutation (Figure 3.5). When the E mutation was substituted into the Concestor element, regulatory activity was reduced to

78±2% (Figure 3.5C and 3.5H). Moreover, the Dark 1 allele’s activity was 58±4%.

The addition of the E mutation to this allele lowered activity to 34±2%, near the

27 ± 3% activity of the Dark 2 allele (Figure S3.4U). Collectively, the evolutionary

65 paths of the Dark 1 and Dark 2 alleles include one shared functionally-relevant mutation and one that is unique to the Dark 2 allele.

A Derived Mutation Disrupts a Conserved Transcription Factor Binding Site

The derived E mutation deletes nine base pairs, and the 9th base pair is the first base pair for a DSX binding site (called Dsx1, Figure 3.5C), though this mutation creates a sequence that still matches the consensus motif for Dsx binding (Erdman et al., 1996). Mutational ablation of the Dsx1 site reduced the

Concestor element’s regulatory activity in the female A6 segment to 67±6% and raised activity in males from 6±2% to 73±5% (Figure S3.4Y-AA). This demonstrated that the Dsx1 site was necessary for robust female-specific regulatory activity. A priori, the E mutation could alter the quality of this Dsx1 site or reduce this allele’s activity through other mechanisms. Such alternate mechanisms include: removing a binding site for a neighboring transcriptional activator, the formation of a novel binding site for a repressor, or by placing the

Dsx1 site close to an adjacent transcription factor site. To obtain evidence supporting either of these mechanisms, we created and measured the regulatory activities for a set of modified Concestor elements with alterations to ancestral sequence at the E mutation region (Figure 3.5K). First, we introduced non- complementary transversions in the Concestor element at the 2nd, 4th, 6th, and 8th base pairs of the E mutation (E Scramble). Here, the 9th base pair and hence the consensus DSX binding site was not altered, but the other mutations would

66 seemingly degrade an adjacent transcription factor binding site. This set of mutations did not alter Concestor element activity, indicating the E mutation did not delete a binding site adjacent to that of the DSX site. To disentangle regulatory effects due to the loss of sequence next to the Dsx1 site from loss of the 1st base pair of the DSX site, we created two separate modifications to the

Concestor element. One modification was a deletion of the first eight base pairs of the E mutation (called E8Del), and the second removed only the ninth base pair of the E mutation, which is the first of the Dsx1 site (called E Dsx1).

Surprisingly, the 8 base pair deletion modestly increased Concestor activity to

118±3%, indicating that the E mutation’s impact was not due to reduced spacing between the Dsx1 site and a more remote transcription factor binding site. The other modification was to delete only the 9th base pair of the E mutation that reduced Concestor element activity to 80±3%. This reduction was nearly equal to that induced by the complete E mutation (Figure 3.5K). Collectively, these results demonstrate that the E mutation rendered the Dsx1 site less functional. One possible mechanism is that the E mutation made a derivative Dsx1 site with reduced affinity for the DSX protein. In order to validate this possibility, we compared the binding of the DSX DNA-binding domain (DBD) to the Concestor element, E mutant, and knockout (KO) Dsx1 site sequences in gel shift assays

(Figure 3.5L). The Concestor element sequence was bound with high affinity by the DSX protein, and specifically as the KO site sequence is not readily bound

(compare 5L lanes 1-7 to lanes 15-19). In comparison, DSX bound the site with the E mutation with reduced affinity compared to the wild type site (Figure 3.5L,

67 lanes 8-14). A shift of the Concestor Dsx1 site was evident with as low as 16 ng of DSX protein, whereas binding of the E mutant site was not detected with this amount of DSX, but was with 32 ng (compare Figure 3.5L lane 3 to lanes 10 and

11). From these data, we estimate that the E mutation resulted in a Dsx1 site with ~50% of the Concestor element site’s affinity for the DSX protein.

Of the four prominent functionally-relevant mutations identified for the

Light and Dark dimorphic element alleles (Figure 3.5), only one affects a known regulatory linkage. Specifically, the E mutation weakens the regulatory linkage between DSX and the dimorphic element by creating a lower affinity binding site.

The D, F, and L mutations appear unremarkable compared to the other mutations that had no measureable regulatory effects (Figure S3.4). Moreover, the D, F, and L mutations caused regulatory effects comparable in magnitude to mutations implicated in the mesoevolutionary expansion of dimorphic element activity into the A6 and A5 segments (Williams et al., 2008). Hence, it can be concluded that short mutational paths are sufficient to evolve pronounced alterations in this CRE’s activity. This conclusion inspired the hypothesis that changes in female abdominal pigmentation may frequently occur through the alteration of the dimorphic element via similarly short paths.

68 Correspondence between Dimorphic Element and Interspecific Pigmentation

Evolution

In the oriental lineage of the Sophophora subgenus, males of extant species generally are fully pigmented on the A5 and A6 tergites (Jeong et al.,

2006). Female pigmentation is more variable, ranging from the complete absence of pigmentation like that seen for D. fuyamai, to a more male-like pattern like that seen for D. yakuba (Figure 3.6B). Bab2 expression was found to be robustly sexually dimorphic for D. fuyamai (Kopp et al., 2000), and Bab1 expression is reduced in the A5 and A6 segments of females (Salomone and

Williams, manuscript in preparation). These observations suggest that differences in Bab expression contribute to these different female pigmentation patterns. Multiple mechanisms could underlie these differences in Bab expression, including a change in the activity of or the expression pattern for a trans-acting regulator of the dimorphic element (trans-regulatory evolution). An alternative mechanism is through changes in orthologous dimorphic elements that result in differing responses to a conserved set of trans-regulators (cis- regulatory evolution).

An effective test to distinguish between instances of cis- and trans- regulatory evolution is to compare the activities of CREs in a common genetic background and observe whether reporter expression patterns resemble that of the host species (trans-regulatory evolution) or the species from which the CRE

69 was derived (cis-regulatory evolution) (Wittkopp et al., 2003). We isolated orthologous dimorphic elements from D. yakuba, D. fuyamai, and an outgroup species D. auraria (from the Sophophora montium group) that is also sexually dimorphic for pigmentation and Bab expression though limited to the A6 segment

(Kopp et al., 2000). The regulatory activities for these orthologous CREs were evaluated in transgenic D. melanogaster pupae and normalized to the Concestor element (Figure 3.6). The D. auraria dimorphic element exhibited an A6 segment regulatory activity of 51±3% of the Concestor element’s activity (Figure 3.6Q).

The regulatory activity of the D. fuyamai element was 209±10% (Figure 3.6O) and extended into segments A5-A2. The A6 regulatory activity for D. yakuba was

62±7% (Figure 3.6M). These results support a scenario where evolutionary changes in the extents of female posterior abdomen pigmentation for the presented clade (Figure 3.6) occurred, at least in part, via cis-regulatory evolution that altered the activity of orthologous dimorphic elements. Interestingly, of the 14

ABD-B and two DSX sites typical of the D. melanogaster dimorphic element, the orthologous D. yakuba and D. fuyamai sequences had the same 13 of the 14

ABD-B sites and both DSX sites (Figure A1B). Even the D. auraria dimorphic element, the most distantly related in this comparison, possessed 12 ABD-B sites and both DSX sites. Thus, like the situation for the D. melanogaster dimorphic element alleles, the functional diversification of these orthologous CREs occurred largely, if not entirely, by modifying CRE properties other than the ABD-B and

DSX regulatory linkages.

70

Figure 3.6: Interspecific evolution of pigmentation and dimorphic element activity. (A) Phylogeny for species that differ in the extent of sexually dimorphic pigmentation. (B-I) Dorsal view of adult abdomens, pigmentation of the (E) D. yakuba female A5 and A6 segments is more (D) male-like, whereas pigmentation is altogether absent on the A5 and A6 segments of (G) D. fuyamai females. (J-Q) Comparison of GFP-reporter gene activity in female transgenic pupae at 85 hr APF. Activity measurements are represented as the % of the (K) Concestor element female A6 mean ± SEM for (M) D. yakuba, (O) D. fuyamai, and (Q) D. auraria.

71 Discussion

Here, we have shown that the D. melanogaster dimorphic element, a CRE that regulates a suite of sexually dimorphic traits, has alleles of strikingly different regulatory activities that impact just one of these traits, female abdomen pigmentation. By reconstructing the ancestral dimorphic element sequence for these alleles and determining its regulatory activity, we were able to identify the derived mutations responsible for the divergent activities of various alleles. These functionally-relevant mutations were few in number, each responsible for measureable effects on regulatory activity, and all but one modify a property other than the known ABD-B and DSX regulatory linkages identified previously

(Williams et al., 2008). Furthermore, we discovered that species related to D. melanogaster harbored evolutionarily relevant mutations in this same CRE, altering its regulatory activity in magnitudes and patterns comparable to the D. melanogaster alleles. These CRE modifications likely contribute to the divergent patterns of abdomen pigmentation for females of these species. These interspecific differences in dimorphic element activity occurred in the absence of noteworthy alterations to the known ancestrally encoded body plan and sex- determination pathway regulatory linkages. As a result, this CRE’s regulatory activity in the terminal body segments (A7 and genitalia) has been conserved, while activity in more anterior segments has diverged. Collectively, this study can be interpreted to support a model where recurrent evolution can be biased to

72 target certain genes and CREs (Figure 3.7A-C), while preserving certain ancestral linkages (Figure 3.7D).

Genetic Networks, CREs, and the Predictability of Evolution

The collaborative interactions of genes during development are hierarchically structured through the formation of a gene network at the level of expression (Davidson, 2006). At the top of these networks are patterning genes, prominently transcription factors that can form connections directly with CREs of differentiation genes, or with CRE(s) of intermediate level transcription factors that act as “Input-Output switches” (Davidson, 2006; Stern and Orgogozo, 2009).

For the latter, the inputs are converted into a regulatory output that is directed to multiple target genes. On one hand, mutations altering a patterning gene may be sufficient to alter a network’s phenotype, but these highly pleiotropic mutations tend to alter other phenotypes too, typically in a deleterious manner (Stern,

2011). On the other hand, mutations altering the function of a single differentiation gene, while generally less pleiotropic often are insufficient to alter a phenotype. For these reasons, evolution may be biased to target Input-Output genes, an expectation that has been observed for several traits (Stern and

Orgogozo, 2009).

73

Figure 3.7: Pigmentation gene network model and the evolution of an ancestral CRE regulatory logic. (A-C) Schematic of the hierarchical structure of the D. melanogaster pigmentation gene network. Direct regulation is represented as solid connections and dashed connections represent connections where regulation has not been shown to be direct. Activation and repression are respectively indicated by the arrowhead and nail-head shapes. This network includes an (A) upper level of patterning genes, including Abd-B and dsx respectively of the body plan and sex-determination pathways, (B) a mid-level tier that integrates patterning inputs, (C) and a lower level that includes pigmentation genes whose encoded products function in pigment metabolism. Although Abd-B directly regulates the pigmentation gene yellow, sexually dimorphic expression of the yellow and tan genes results from the sexually dimorphic output of the bab locus that acts to repress tan and yellow expression in females. (D) A model for the evolution of diverse dimorphic element regulatory activities. The common ancestor of D. melanogaster populations and related species possessed a dimorphic element with both DSX and ABD-B regulatory linkages and that drove expression in the female A6-A8 segments. This ancestral regulatory logic was recurrently modified to increase the levels and expand the segmental domain of activity, or to decrease and contract activity. These changes occurred amidst the preservation of the core ABD-B and DSX regulatory linkages, perhaps though the loss (TF 3) and/or gain (TF 4) of other transcription factor linkages.

In the D. melanogaster pigmentation network, the bab genes function as an Input-Output node through the dimorphic element’s integration of patterning inputs that include body plan (ABD-B) and sex determination (DSX) pathway inputs (Figure 3.7A). These inputs are converted into a female-specific pattern of expression that culminates in the repression of the differentiation genes yellow and tan in females (Jeong et al., 2006, 2008) (Figure 3.7C). In principle, changes

74 in the expression or activity of a patterning gene, differentiation gene, or the

Input-Output gene (bab) could alter pigmentation phenotypes. In application though, it is logical that bab expression and dimorphic element encodings were modified as those alterations minimize negative pleiotropic effects while being sufficient to alter the female pigmentation phenotype. For example, ectopic yellow expression failed to create additional melanic pigmentation (Gompel et al.,

2005; Wittkopp et al., 2002), and changes in either DSX or ABD-B expression result in ectopic abdominal pigmentation in addition to several other trait phenotypes (Baker and Ridge, 1980; Jeong et al., 2006; Williams et al., 2008).

Thus, sufficiency for pigmentation is counterbalanced by the negative pleiotropic affects for these genes. In contrast, increased Bab expression in the A5 and A6 segments was sufficient to suppress pigmentation, and ectopic abdomen pigmentation develops in bab heterozygous and homozygous null mutant females (Figure 3.2E and 3.2H).

Bab though is not dedicated to pigmentation (Couderc et al., 2002; Kopp et al., 2000). In the pupa, Bab expression includes the leg tarsal segments, abdomen epidermis, sensory organ precursor cells, oenocytes, and dorsal abdominal muscles, and each of these expression patterns are governed by a modular CRE (s) (Williams et al., 2008). Thus, Bab itself is highly pleiotropic, however it’s CREs are far less pleiotropic. For this reason, mutations altering female pigmentation would maximize sufficiency and minimize pleiotropy if they occurred in the dimorphic element, an expectation borne out in this study.

75 Pigmentation of the A5 and A6 segments, though, is only one of many traits influenced by the regulatory activity of the dimorphic element. This CRE drives

Bab expression in the female A7 and A8 segments, regulating numerous female- specific traits, including the size, shape, trichome density, and bristle morphologies of the resident dorsal tergites and ventral sternites (Couderc et al.,

2002). As expression in these more posterior segments require the ABD-B and

DSX regulatory linkages, these regulatory linkages remain highly pleiotropic. For this reason, it seems logical that evolution would disfavor mutations that have deleterious consequences to these linkages and favor mutations that alter other

CRE properties. This scenario reflects how dimorphic element function was modified in both the intraspecific and interspecific comparisons presented here as well as the long term conservation of the ABD-B and DSX linkages previously described (Williams et al., 2008).

The Relationship between CRE Sequence and Functional Conservation

Our findings provide a unique contrast with previous investigations of the relationship between CRE conservation and CRE evolution. Although Drosophila non-coding DNA, including CRE sequences, evolves slower than synonymous sites (Andolfatto, 2005), several well studied CREs were found to undergo substantial sequence evolution without matching regulatory activity evolution.

During Drosophila embryonic development, the pair-rule gene even-skipped

(eve) is expressed in seven stripes along the anteroposterior axis, with the

76 second stripe of eve expression being specified by the stripe 2 element (S2E)

CRE. In D. melanogaster, the S2E possesses binding sites for four transcription factors that collectively specify the eve expression output (Small et al., 1992;

Stanojevic et al., 19991). The orthologous S2E from the species D. pseudoobscura differs in sequence for numerous binding sites, the overall content of binding sites, and spacing between conserved binding sites (Ludwig et al., 1998, 2000), yet the orthologous S2Es function equivalently in vivo (Ludwig et al., 2005). Hence, the S2E is an exemplar as to how selection acting at the level of the character (eve stripe expression) can accommodate a surprising amount of CRE evolution. Similarly, CRE sequence evolution without corresponding functional evolution was found between Drosophila species for the sparkling (spa) CRE that directs cone cell expression for the dPax2 gene

(Swanson et al., 2011). The content and spatial proximity of binding sites for neurogenic ectoderm enhancers (NEEs) evolved in order to conserve expression pattern outputs in response to changing regulatory inputs (Crocker et al., 2008).

These case studies, demonstrate how CRE sequence conservation is not a prerequisite for CRE functional conservation.

In contrast, we found little divergence in the content and sequence of known binding sites for the D. melanogaster dimorphic element alleles and orthologous sequences. At the sequence level, these CRE alleles and orthologs respectively posses identities of ~98% and ~80%. Indeed, the vast majority of binding sites in the dimorphic element have been conserved for over 30 million

77 years, showing conservation to D. willistoni (Williams et al., 2008). At the functional level, these CREs exhibited striking differences in their regulatory activities (Figure 3.4 and Figure 3.6). Thus, in contrast to S2E, spa, and the

NEEs, the dimorphic element demonstrates how CREs can derive dramatic changes in function that drive phenotypic divergence, with little-to-no alteration to the characterized pre-existing regulatory linkages.

Integrating CRE Evolution into the Context of the Gene Locus

While the regulatory activity of the Light and Dark dimorphic elements alleles correlated with female A5 and A6 pigmentation (Figure 3.1), some outcomes suggest that these variant sequences are affected by other features within or perhaps outside of the bab locus. For instance, the Light 2 and Dark 2 alleles exhibit the highest and lowest regulatory activities respectively.

Surprisingly, the Light 1 and Dark 1 alleles and their intermediate regulatory activities are associated with the more extreme Light and Dark female pigmentation phenotypes. At the expression level, Bab1 and Bab2 showed similar patterns in females from the Light 1 (prominent expression in segments

A5 and A6) and Dark 1 (reduced expression is A5 and A6) strains (Figure 3.3). In the Dark 2 strain, Bab1 but not Bab2 expression was reduced in females.

Several possible explanations might explain the uncoupled expression of the Bab paralogs in Dark 2. For example, it is possible that a separate, as of yet unidentified CRE controls Bab2 expression. However, a screen of the entire ~160

78 kb locus failed to identify such a CRE (Williams et al., 2008). A second possibility is that a mutation(s) in the Dark 2 allele has paralog-specific regulatory effects, perhaps by modifying an interaction with the promoter for bab1 but not that of bab2.

Another possible explanation would involve the existence of CREs that coordinate communication between bab1 and bab2. In such a scenario, the Dark

2 allele could contain mutations that alter interaction with coordinating elements to result in paralog-specific expression patterns in the female A5 and A6 segments. This possibility is consistent with observations of bab locus evolution in another population where females differ in A6 segment pigmentation (Bickel et al., 2009; Kopp et al., 2003). For this population, fine-scale genetic mapping found that three disparate non-coding regions of the bab locus collaborate to compose a major effect QTL (Bickel et al., 2011). One of these regions spans the dimorphic element, though no mutations reside with this CRE’s core element.

The other two regions include an intergenic sequence between bab1 and bab2 and a large sequence that includes the bab2 promoter. In the future, it will be important to understand what roles these other regions serve, and how they may interact with polymorphisms in the dimorphic element to produce paralog-specific effects on gene expression.

79 Resurrecting Ancestral cis-Regulatory Elements

With the centrality of CREs and their evolution to the diversification of phenotypic traits (Carroll, 2008; Wray, 2007), a major obstacle to reaching this goal is understanding the processes by which CRE regulatory logics were modified to contemporary forms (Rebeiz and Williams, 2011). Often studies of

CRE evolution involve comparisons of two divergent derived regulatory states, where one sequence assumes the role of a surrogate for the ancestral function

(Clark et al., 2007; Frankel et al., 2011; Gompel et al., 2005; Williams et al.,

2008). This approach has been successful in making inferences about the ancestral states for regulatory linkages and identifying gains and losses of other key derived transcription factor binding sites. However, it is important to acknowledge a key limitation of this comparative approach; a CRE derived from an outgroup species that serves as a surrogate for the ancestor has also evolved along a unique lineage since divergence.

Studies into the evolution of divergent protein activities encountered a similar problem when comparing extant proteins forms (Harms and Thornton,

2010). For several cases, key amino acid residues necessary for a derived function were identified. When substituted into the surrogate ancestral protein, these changes were insufficient to impart the derived function and thereby indicating that the paths of evolution were more intricate. As a solution, the reconstruction of ancestral protein sequences, combined with functional testing

80 of inferred ancestral proteins has allowed a more realistic simulation of evolutionary events. As a result, inferences about the paths of protein evolution were made that likely would not have been found from comparisons of extant proteins (Harms and Thornton, 2010; Thornton, 2004).

A more ideal research program to study CRE evolution would include reconstruction of ancestral CREs as a starting point to trace the paths of evolutionarily relevant mutations. To our knowledge, few studies have used CRE reconstruction (Prabhakar et al., 2008; Rebeiz et al., 2011; Tishkoff et al., 2007).

For one study, a novel optic lobe expression pattern for the D. santomea Nep-1 gene occurred via the modification of a CRE that drove an eye field pattern of expression for an ancestor that existed ~0.5 million years ago (Rebeiz et al.,

2011). Importantly, by reconstructing and evaluating the ancestral CRE, the wrong conclusion - that this optic lobe activity evolved de novo – was avoided and the correct conclusion was found - a latent optic lobe CRE activity was augmented into a robust derived state. In our study, had the Concestor element not been reconstructed, the Dark 1 and Dark 2 dimorphic element sequences would have been considered hypomorphic CRE alleles compared to the robust wild type-like activity of the Light 1 and Light 2 alleles. The Light alleles possessed activities more similar to a previously characterized dimorphic element allele (Williams et al., 2008) and consistent with the narrative of D. melanogaster being a sexually dimorphic species where females lack posterior abdominal pigmentation. Reconstruction of the dimorphic element revealed a

81 more complex reality, where neither alleles were good surrogates for the ancestral state. Using ancestral sequences as a starting point, we found that the evolutionary paths for these alleles to be short in number of steps (one to two mutations) and in time frame (in the last ~60,000 years) (Stephan and Li, 2007).

Thus, demonstrating how simple and rapid an existing CRE regulatory logic can evolve.

The cases of Nep1 optic lobe CRE and the bab dimorphic element evolution demonstrate the utility for reconstructing ancestral CRE states; though it must be pointed out that these cases involved comparisons of very closely- related species/populations. As a result of these short time frames for divergence, the extant CRE forms differ at fewer than two percent of the nucleotide sites. This made possible ancestral sequence reconstruction by the principle of parsimony. However, not all compelling instances of functional CRE evolution occur over similarly short time frames. Therefore, studies will need to reconstruct CREs that existed further in the past and for which the method of parsimony will need to be replaced by methods of maximum likelihood-based inference coupled with the testing of multiple alternate reconstructions (Thornton,

2004).

82 Materials and Methods

Fly Stocks and Genetic Manipulations

D. melanogaster populations from disparate geographical regions were obtained from the San Diego Drosophila Stock Center and are identified in Figure

S3.1. Dark 1 stock was obtained from M. Rebeiz (Rebeiz et al., 2009a), stocks for other species were obtained from S.B. Carroll. Reporter transgenes in Figure

3.1 were introduced into the attP site VK00006 on the X chromosome (Venken et al., 2006), all other reporter transgenes were introduced into the attP2 site on chromosome 3L (Groth et al., 2004). Complementation test progeny were obtained by crossing individuals from a D. melanogaster population stock to a line possessing the bab locus null allele babAR07 (Couderc et al., 2002). The homozygous bab null genotype was a heteroallelic combination of the babAR07 and the deficiency chromosome Df(3L)BSC799 for which the entire bab locus is deleted.

Immunohistochemistry

Pupal abdomens were dissected for immunohistochemistry at ~29 and

~85 hours after puparium formation (hAPF), the former a time point when Bab1 and Bab2 are expressed in the developing genitalia and analia and the later a time point when the dimorphic element drives high levels of reporter gene

83 expression in the A5-A7 segments, and downstream targets of bab repression have begun to be expressed in males (Jeong et al., 2006, 2008). The primary antibodies used were rabbit anti-Bab1 (Williams et al., 2008) and rat anti-Bab2

(Godt et al., 1993) each at a dilution of 1:250. The secondary antibodies used were goat anti-rat Alexa Fluor 488 (Invitrogen) and goat anti-rabbit Alexa Fluor

647 (Invitrogen) at a dilution of 1:500. The expression patterns presented are consistent with patterns seen in replicate specimens.

GFP reporter transgenes and Gel Shift Assays

GFP reporter transgenes were used as a proxy to measure the in vivo gene-regulatory activity of CREs. In brief, CREs are cloned into a vector upstream of the green fluorescent protein (GFP) coding sequence forming a

“reporter transgene”. Transgenes were individually inserted into the D. melanogaster germline at the same genomic location via site-specific integration methods to avoid confounding position effects, which permits a quantitative comparison of CRE regulatory capabilities (Groth et al., 2004; Rogers and

Williams, 2011; Williams et al., 2008) (BestGene Inc.). Gel shift assays used the

DSX DNA-binding domain proteins and wild type and mutant Dsx1 sites as previously published (Williams et al., 2008). Details about the cloning and evaluation of reporter transgenes, and protocol for gel shift assay are provided in the Supporting Protocols.

84 Imaging of Fly Abdomens

Whole-mount images were taken using an Olympus SZX16 Zoom

Stereoscope outfitted with an Olympus DP72 digital camera. Projection images for immunohistochemistry and reporter transgenes where obtained using an

Olympus Fluoview FV 1000 confocal microscope and software. All TIFF images used in a specific comparison were processed through the same modification using Photoshop CS3 (Adobe).

Acknowledgments

Thanks to: S.B. Carroll for inspiring this project, J. Selegue for technical assistance, and F. Laski for the anti-Bab2 antibody.

85 Supplementary Information

Figure S3.1: Abdomen pigmentation phenotypes for Drosophila melanogaster population stocks. (A-AN) Whole mount images of adult male and female dorsal abdomens. Geographic locations for the populations from which lab stocks were started are listed and, when applicable, in parentheses are the Drosophila Species Stock Center stock numbers. Representative images for the stocks referred to as (A) Light 1 population, (D) Light 2, (AM) Dark 1, and (AJ) Dark 2.

86

Figure S3.2: Mapping of the bab genotype-phenotype association. (A) To scale representation of the ~155 kb bab locus, where the bab1 and bab2 genes are situated between the CG13912 and trio genes. Exons are indicated as the tall rectangles, and sites and directions for each gene’s transcription are indicated by the black arrows. The location of polymorphic markers used to establish bab loci haplotypes are indicated by “1”, “2”, “3”, and “4”, and downward projecting red lines. Polymorphism 3 is the BstXI site polymorphism that resides within the Light 1 and Light 2 dimorphic element alleles. Blue dot with arrow indicates location of the dimorphic element. Representative female phenotypes (B-F) for the A5 and A6 segment tergites (Left) and the inferred bab locus haplotypes associated with the pigmentation phenotype (Right). (B) Dark 1 and (C) Light 1 specimens were homozygous for alternate nucleotide states at the four bab locus markers, establishing a Dark 1 and Light 1 haplotypes (Black and Yellow bars respectively). (D) Female F1 progeny from Dark 1 and Light 1 cross were heterozygous for bab locus markers. (E) Phenotypically Dark F2 progeny from parental Dark 1 and Light 1 cross were homozygous for the Dark 1 nucleotide state at each of the four evaluated bab locus markers. (F) Following 10 generations of backcrossing the Dark 1 phenotype into the Light 1 genetic background, a pure line was established where females exhibit the Dark 1 phenotype. This line was homozygous for the Dark 1 nucleotide state at each of the four evaluated bab locus markers.

87

Figure S3.3: Chimeric dimorphic elements map functionally-relevant derived mutations to the core region. (A) To scale representation of the dimorphic element, with Abd-B and Dsx binding sites shown as blue and yellow rectangles respectively. Green dashed lines indicate the positions where central core dimorphic element sequences were joined with flank sequences. Blue dashed line indicates the position where left and right halves of various dimorphic elements were joined. (B-L) GFP-reporter gene activity in female transgenic pupae at 85 hAPF. Activity measurements are represented as the % of the D. melanogaster Concestor element female A6 mean ± SEM. The illustration above each image indicates the sequence composition of the evaluated dimorphic elements. Gray, yellow, and brown colors respectively indicate sequence from the Concestor element, Light 2 dimorphic element, and the Dark 1 dimorphic element.

88

Figure S3.4: Regulatory activity affects of derived dimorphic element mutations. (A-AA) GFP-reporter gene activities in female transgenic pupae at 85 hours after puparium formation (hAPF). (A) The Concestor element’s mean activity measurement in the dorsal A6 segment was set as 100%, all other regulatory activities (B-AA) are reported as a percentage of the Concestor element’s activity ± the standard error of the mean (SEM). For each reporter transgene, a representative image is presented. (C-F) Activities for population stock dimorphic element alleles. (G-R) Activities for Concestor elements with a substitution of a single mutation. (S and T) Activities for Concestor elements substituted with two Light 2 (S) and two Dark 1 (T) derived mutations. (U) The regulatory activity of the Dark 1 allele that included the E mutation. (V) The Concestor element’s regulatory activity when the native sequence at the site of the E mutation was altered by non-complementary transversion at every 2nd base pair. (W) The Concestor element’s regulatory activity when the first 8 of 9 base pairs of the E mutation were deleted. (X) The Concestor element’s regulatory activity when only base pair 9 of the E mutation was deleted. (Y) The Concestor element’s regulatory activity when the Dsx1 Site was mutated. (Z) The Concestor element’s regulatory activity in males when the Dsx1 Site was mutated. (AA) The Concestor element’s regulatory activity in males relative to its activity in females.

89 Observed Genotype Counts, n (%) Expected Genotype Counts Phenotype L1L1 L1D1 D1D1 Total L1L1 L1D1 D1D1 Total P Light 16 0 0 16 4 8 4 16 <0.00001 Intermediate 0 11 0 11 2.75 5.5 2.75 11 0.0041 Dark 0 0 16 16 4 8 4 16 <0.00001 Total 16 11 16 43 10.75 21.5 10.75 43 Table S3.1: Association between pigmentation phenotype and bab dimorphic element genotype. L1L1 and D1D1 respectively indicate individuals homozygous for the Light 1 and Dark population dimorphic element alleles and L1D1 indicates heterozygotes. In each row of the contingency table, the p-value was derived using the Chi-Square test.

Observed Genotype Counts, n (%) Expected Genotype Counts Phenotype L1L1 L1D2 D2D2 Total L1L1 L1D2 D2D2 Total P Light 5 0 0 5 1.25 2.5 1.25 5 <0.001 Dark 0 3 14 17 4.25 8.5 4.25 17 <0.00001 Total 5 3 14 22 5.5 11 5.5 22 Table S3.2: Association between pigmentation phenotype and bab dimorphic element genotype. L1L1 and D2D2 respectively indicate individuals homozygous for the Light 1 and Dark 2 population dimorphic element alleles and L1D2 indicates heterozygotes. In each row of the contingency table the p-value was derived using the Chi-Square test.

Gene Exon PCR Size Primer Sequence Primer Name bab1 1 CTTCTCTTAACTGGCATATGTACTAAC bab1 exon 1 Fwd 1192 bab1 1 AGCATCCCATTGTGATCATG bab1 exon 1 seq 2 bab1 1 CGCAATCAGGCTGCTCAGTG bab1 exon 1 seq 1 983 bab1 1 GGTGATCCAACGATCAACGATCAACG bab1 exon 1 Rvs bab1 2 CAGCTGGTACTGGTAGCTGTCTG bab1 genotype 1A 211 bp bab1 2 AGCTTTCTTCTCTTGCCTCACTTTGC bab1 exon 2,3 Rvs bab1 3 GAATTTGGCAAAGTCCGCTAAGCG bab1 exon 3 Fwd 2 527 bp bab1 3 GGCAGCTGGAAACCAACTGATCG bab1 exon 3 Rvs 2 bab1 4 AGGCGTAGGCTTACACAGGCCAAG bab1 exon 4 Fwd 746 bp bab1 4 TGGAGGCAGGGCCAGCATATC bab1 exon 4 seq 2 bab1 4 TGCGCAACGGACTGCTCTTGG bab1 exon 4 seq 1 676 bp bab1 4 GGGATTTCTTGAACCGAACTAACTGC bab1 exon 4 Rvs bab2 2 GCTGGAAAAATCGCTATGCCTCAG bab2 exon 2 Fwd 1291 bp bab2 2 CAAGGCACACAAGATGGTGC bab2 exon 2 seq 2 bab2 2 GCAGCAGTCTGGGCCTCTTG bab2 exon 2 seq 1 1138 bp bab2 2 CGCAGAAAACCTGCAAGACAACCG bab2 exon 2 Rvs bab2 3 & 4 AGTCTTCCAAGGAAATTTGCAAC bab2 exon 3,4 Fwd2 646 bp bab2 3 & 4 TGCAAGTTCCTGCACAGTGAAC bab2 exon 3,4 Rvs2 bab2 5 CAGCGTGTGTCCCAATGCCAC bab2 exon 5 Fwd 1036 bp bab2 5 TTCTAACCCACTATTTCGCCCTC bab2 exon 5 Rvs Table S3.3: Primers used to PCR amplify D. melanogaster bab protein coding exons and their splice junctions.

90 # Description Size (bp) Primer Sequence Primer Name Light P1 = GAGCTCCAAGAAAACGGTGCC Trh intron 3 Fwd Upstream of bab2 119 1 ACCTGAGGAGGTGAAAACCTG Trh intron 3 Rvs indel (Trh intron 3) Dark P1 = ~140 Light P1 = CTCGGGTTCCTCGCTTGTC TREindelFwd bab1 & bab2 128 2 GCCCCAACACATCCCAGACTG TREindelRvs intergenic region indel Dark P1 = 107 bab1 intron (in ATGCTTAGATTTGCTCCAGCAGTGG BstXI Fwd 1 3 dimorphic element) 381* GAGTGGCTGTATAACTATTGCAC BstXI Rvs 1 RFLP Light P1 = CTAACCCGAGAGCAGTTAGTTCG bab1 intron 3 Fwd ~600 2 4 bab1 intron 3 indel Dark P1 = GACATCAAGGACGAGAGCCTGG bab1 intron 3 Rvs ~200 3 Table S3.4: Primers used for PCR-based genotyping of the D. melanogaster bab locus. *Light P1 has a BstXI restriction enzyme site where the PCR product is cleaved into a 232 and 149 base pair fragment by BstXI. Both Dark P1 and Dark P2 alleles lack this site.

Range Sequence Name Site ggcgcgccCACATAAAAATCAGCAACAAASTTGC sub1orthoF1 AscI Sophophoraa cctgcaggCAAAACKGCRCATAAAAMSAAATTACA dimorphic SbfI Rvs1 Table S3.5: Primer combinations used to amplify and clone dimorphic element alleles and orthologous sequences. aThis primer combination has proven capable to amplify dimorphic element sequences from species representing the diverse lineages of the Sophophora subgenus. Note: restriction enzyme sequences for cloning CRE sequences are indicated by lower case letters at the 5’ end of the primers.

Binding Site Sequence (5’ to 3’) Name TTTGGCCGCAACAATGTTGCTGCATTTA Dsx1 con Top Dsx 1 Concestor TAAATGCAGCAACATTGTTGCGGCCAAA Dsx1 con Bottom CGGTCTGACAACAATGTTGCTGCATTTA Dsx1 delta E Top Dsx1 E mutant TAAATGCAGCAACATTGTTGTCAGACCG Dsx1 delta E Bottom TTTGGCCGCAGGGGGCGTGCTGCATTTA Dsx1 KO Top Dsx 1 KO TAAATGCAGCACGCCCCCTGCGGCCAAA Dsx1 KO Bottom Table S3.6: Oligonucleotides used to make gel shift assay binding sites.

91 CHAPTER IV

A SURVEY OF TRANS-REGULATORY LANDSCAPE FOR DROSOPHILA

MELANOGASTER ABDOMINAL PIGMENTATION

This work was originally published in the peer-reviewed scientific journal,

Developmental Biology, in 2014 under the following citation: Rogers WA, Grover

S, Stringer SJ, Parks J, Rebeiz M, and Williams TM (2014). A survey of the trans-regulatory landscape for Drosophila melanogaster abdominal pigmentation.

Developmental Biology. 385 (2): 417-432.

Abstract

Trait development results from the collaboration of genes interconnected in hierarchical networks that control which genes are activated during the progression of development. While networks are understood to change over developmental time, the alterations that occur over evolutionary times are much less clear. A multitude of transcription factors and a far greater number of linkages between transcription factors and cis-regulatory elements (CREs) have been found to structure well-characterized networks, but the best understood

92 networks control traits that are deeply conserved. Fruit fly abdominal pigmentation may represent an optimal setting to study network evolution, as this trait diversified over short evolutionary time spans. However, the current understanding of the underlying network includes a small set of transcription factor genes. Here, we greatly expand this network through an RNAi-screen of

558 transcription factors. We identified 26 genes, including previously implicate abd-A, Abd-B, bab1, bab2, dsx, exd, hth, and jing, as well as 20 novel factors with uncharacterized roles in pigmentation development. These include genes which promote pigmentation, suppress pigmentation, and some that have either male- or female-limited effects. We show that many of these transcription factors control the reciprocal expression of two key pigmentation enzymes, whereas a subset controls the expression of key factors in a female-specific circuit. We found the pupal Abd-A expression pattern was conserved between species with divergent pigmentation, indicating diversity resulted from changes to other loci.

Collectively, these results reveal a greater complexity of the pigmentation network, presenting numerous opportunities to map transcription factor-CRE interactions that structure trait development and numerous candidate loci to investigate as potential targets of evolution.

Introduction

A major undertaking in evolutionary developmental biology is to understand how gene regulatory networks that control trait development change

93 during evolution. Most physical traits develop through coordinated programs of gene expression, orchestrated by a network of linkages between transcription factors and downstream target genes. Linkages between transcription factors and target genes are encoded within cis-regulatory element (CRE) sequences that determine when, where, and at what level a gene is expressed during development (Arnone and Davidson, 1997). The structure of regulatory networks, including the genes, CREs and their transcription factor linkages are of great interest to the field of developmental biology as they embody the programs for cell, tissue, and organ development (Davidson, 2006). Gene networks in extant organisms represent the product of a complex, genome-wide evolutionary process. Although several individual networks in a single species have been well- characterized (Bonn and Furlong, 2008; Imai et al., 2009; Levine and Davidson,

2005; Ochoa-Espinosa et al., 2005; Oliveri et al., 2008; Peter and Davidson,

2011; Sandmann et al., 2007; Zeitlinger et al., 2007), and several examples of macro-evolutionary changes to networks have been explored (Hinman et al.,

2007; Weatherbee et al., 1999; Zinzen et al., 2006), a mechanistic understanding of the incipient events of network evolution can be considered to be in its infancy.

To better understand these early events in network evolution, well characterized gene networks are needed for traits that evolved between closely-related species.

One well-suited model to study the incipient stages of regulatory network evolution is the cascade of transcription factors and enzymes that generate

94 abdominal pigmentation patterns among fruit fly species from the genus

Drosophila (D.). These species evolved extensive diversity in pigmentation since diverging from a common ancestor ~50 million years ago (Wittkopp et al., 2003).

This diversity includes sexual dimorphism such as that exhibited by D. melanogaster, where the dorsal cuticle tergites on the posterior two (A5 and A6) abdominal segments are fully pigmented in males. Female tergite pigmentation, though, is typically limited to a posterior stripe (Figure 4.1A). Among related species, male pigmentation can be limited to the A6 tergite (e.g. D. baimaii) or spans the A4-A6 tergites (e.g. D. prostipennis). This range of dimorphic patterns is thought to have evolved from a monomorphic ancestor that gave rise to extant species such as D. willistoni (Figure 4.1A) (Jeong et al., 2006). In distantly- related lineages, male-specific pigmentation has seemingly evolved convergently

(e.g. D. funebris). As the genes encoding pigmentation enzymes and most transcription factors are conserved between fruit fly species with sequenced genomes (Clark et al., 2007; Richards et al., 2005), it seems that abdominal pigmentation diversity evolved largely by changes in the structure of the pigmentation network, causing differences in pigmentation enzyme expression.

In D. melanogaster, many of the genes encoding pigmentation enzymes have been extensively characterized (True et al., 2005; Wittkopp et al., 2003). In particular, yellow and tan are required for the production of black pigments, and are expressed specifically in the abdominal epidermal cells that underlie black cuticle, such as male A5 and A6 segments (Jeong et al., 2006, 2008). In a

95 pattern reciprocal to yellow and tan, ebony is expressed in more anterior A2-A4 segments in males and throughout the abdomen of females to promote a yellow cuticle color (Figure S4.1) (Rebeiz et al., 2009a; Richardt et al., 2003). Some of the patterning mechanisms that sculpt these reciprocal patterns of pigmentation are known.

A network of four transcription factors has been shown to regulate the pigmentation gene battery directly or indirectly (Figure 4.1B). Activation of yellow, and presumably tan, in the A5 and A6 segments requires direct regulation by the

Hox protein Abdominal-B (Abd-B) which interacts with CRE binding sites (Jeong et al., 2006, 2008). The absence of comparable yellow and tan expression in females is due to the Bric-à-brac 1 and 2 (collectively referred to here as Bab) transcription factors (Couderc et al., 2002; Kopp et al., 2000). Female Bab expression in the posterior abdomen is directed by a CRE that is directly activated by Abd-B and the female isoform of the Doublesex (Dsx) transcription factor (DsxF). Bab expression is lacking in males due to the male isoform of Dsx

(DsxM) that acts as a direct repressor (Williams et al., 2008). Mechanisms mediating the restriction of ebony from pigmented portions of the male cuticle are currently unknown.

96

Figure 4.1: Abdominal pigmentation pattern and gene network. (A) The dorsal abdomens of fruit flies are covered by cuticle plates, called tergites that exhibit diverse patterns of pigmentation. This includes the sexually monomorphic pattern of D. willistoni, and dimorphic patterns that evolved convergently between D. funebris and D. melanogaster. The number of pigmented tergites differs for D. baimaii and D. prostipennis males, two species more closely-related to D. melanogaster. (B) Contemporary understanding of the D. melanogaster abdominal pigmentation network. Direct regulatory interactions between transcription factors and target gene cis-regulatory elements are represented as solid lines. Dashed lines indicate a regulatory relationship that has not been demonstrated to be direct. Regulatory interactions terminating with arrow heads and nail heads respectively indicate activating and repressing regulatory inputs.

97 Previous genetic studies have identified a handful of additional D. melanogaster transcription factor genes that govern abdomen pigmentation in some manner, including abd-A, exd, hth, and jing (Culi et al., 2006; González-

Crespo and Morata, 1995; Rauskolb et al., 1995; Ryoo et al., 1999; Sanchez-

Herrero et al., 1985). However, it is not yet known how these genes interact with downstream pigmentation network target genes. Hence, the contemporary model of pigmentation network structure remains a work in progress (Figure 4.1B). In this model, the Bab proteins serve as a key node for a female-specific circuit that suppresses pigmentation. Male-specific gene expression and pigmentation appear to be the ground state promoted by Abd-B. While Bab and Abd-B are necessary for male-specific pigmentation, these factors certainly are not sufficient but depend on contributions from other transcription factors. Moreover, the full pigmentation pattern of the abdomen includes posterior tergite stripes, tergite midline spots, and a degree of pigmentation on the female A6 tergite.

Hence, to understand the production of this composite trait and to address how this trait has evolved, it is essential to know the other relevant transcription factor genes and to elucidate how they interact with target gene CREs.

The expression of genes can be substantially reduced through the expression of RNA inhibitory (RNAi) molecules in trans (Fire et al., 1998). In D. melanogaster, genome-wide RNAi screens have emerged as an effective approach to identify novel genes involved in developmental processes, as inducible RNAi transgenes exist for a high percentage of this species’ genes

98 (Dietzl et al., 2007; Mummery-Widmer et al., 2009). Using an RNAi-based screen, we demonstrate that the gene network underlying dimorphic abdominal pigmentation includes a cohort of at least 28 transcription factor genes. This includes transcription factors with known roles in body plan development and sex-determination, and many novel factors that include a cohort that participate in chromatin-remodeling complexes. We show that many of these factors regulate the reciprocal expression patterns of ebony and tan. Some factors operate upstream of Bab, and some that operate in a male-specific circuit.

Collectively these transcription factor genes provide numerous opportunities to further resolve the regulatory architecture of the pigmentation network and candidate loci for nodes of incipient evolutionary change.

Materials and Methods

RNAi Screen

RNAi screen fly stocks were maintained at 21oC. Of the 749 predicted transcription factors in the D. melanogaster genome (Pfreundt et al., 2010) we obtained primary RNAi lines that collectively target 558 unique transcription factor genes (Figure 4.2 and Table S4.1), and an additional 16 lines that were secondary lines for genes where primary lines resulted in a conspicuous phenotype (Figure S4.2 and Table S4.2). The bab1 RNAi and bab2 RNAi lines presented in this studied were obtained from the Vienna Drosophila RNAi Center

99 (transformant ID #6960 and #49042 respectively) (Dietzl et al., 2007). All other

RNAi stocks were generated by the Transgenic RNAi Project at Harvard Medical

School, where each transgene evaluated was inserted into the attP2 transgene landing site on chromosome 3 (Groth et al., 2004). Gene targeting occurs through the GAL4/UAS-mediated expression of an RNAi hairpin, where RNAi hairpins were designed to suppress the expression of a single gene. The absence of off target effects and the effectiveness of RNAi transgenes in inducing gene-specific loss-of-function phenotypes was shown for a cohort of developmental genes (Ni et al., 2008). Stocks derived from the VALIUM1 and

VALIUM10 vectors include long double-stranded hairpins (Ni et al., 2008, 2009), whereas those from the VALIUM20, VALIUM21, and VALIUM22 vectors use short hairpin microRNA technology that are designed to have no off target effects

(Haley et al., 2008, 2010; Ni et al., 2011). For each UAS-RNAi line stock, males were crossed to virgin female flies with the genotype pnr-GAL4/TM3, Ser1

(Bloomington Drosophila Stock Center #3039) (Figure 4.2B). Cross progeny receiving the pnr-GAL4 chromosome and the UAS-RNAi transgene bearing chromosome express the RNAi hairpin in the dorsal-medial thorax and abdomen throughout development (Calleja et al., 2000), including the pupal dorsal abdominal epidermis. RNAi hairpin expression is not induced in cross progeny receiving the TM3, Ser1 and UAS-RNAi chromosomes. These control individuals were identified by a serrated phenotype in the anterior distal wing. For each cross, four or more males and four or more females (both RNAi hairpin expressing and non-expressing) were phenotypically analyzed. Classifications for

100 encoded DNA-binding domains and molecular functions were obtained from the

InterPro (Hunter et al., 2012) and FlyBase (Marygold et al., 2013) resources.

Figure 4.2: An RNAi-based screen to identify transcription factor genes regulating abdominal pigmentation. (A) Green fluorescent protein expression driven in the dorsal abdomen midline pattern of the pnr gene for a pupa of genotype UAS-GFP;pnr-GAL4. (B) Crossing of male flies with an inhibitory RNA transgene for a transcription factor gene under regulation of UAS binding sites (UAS-TF-RNAi) to female flies possessing the pnr-GAL4 chromosome results in F1 progeny with the inhibitory RNA transgene expressed in the pnr domain. Phenotypic outcomes for F1 progeny include those with reduced pigmentation (Class I), ectopic pigmentation (Class II), and a wild type dimorphic pattern (Class III). (C) Representative Class III adult abdomens. (D) Summary for the phenotypic screening of inhibitory RNA transgenes targeting 558 unique transcription factor genes.

RNAi effects on reporter transgene expression

Flies of genotype CRE-Enhanced Green Fluorescent Protein (EGFP); pnr-

GAL4/TM6b were crossed to certain UAS-RNAi lines, resulting in progeny of genotype CRE-EGFP/+; pnr-GAL4/UAS-RNAi. The CREs evaluated were the

101 bab locus dimorphic element (Williams et al., 2008), the tan gene t_MSE (Jeong et al., 2008) and the full ebony gene regulatory region. This ebony region contains abdominal activating element, male-specific repression element, and a silencer element that represses transcription at the posterior edge of tergites

(“stripe repression element”) (Rebeiz et al., 2009a). The tan and ebony reporter transgenes were inserted at the 51D attP docking site (Bischof et al., 2007) and the dimorphic element was inserted into the attP40 site (Markstein et al., 2008).

For progeny possessing the dimorphic element and t_MSE bearing transgenes,

EGFP expression was evaluated by confocal microscopy at 85 and 95 hours after puparium formation (hAPF) respectively (Rogers and Williams, 2011).

EGFP expression was assessed in progeny bearing the ebony reporter transgene within the first 6 hours following eclosion.

Immunohistochemistry

Pupal abdomens were dissected at ~85-90 hAPF to isolate the dorsal epidermis. Wild type species stocks for Drosophila melanogaster (14021-

0231.04), Drosophila baimaii (14028-0481.01), and Drosophila willistoni (14030-

0811.24), were obtained from the San Diego Drosophila Stock Center, and the

Drosophila funebris stock was obtained from the lab of Sean B. Carroll. These species stocks were grown at 25oC. Drosophila melanogaster parental stocks of genotypes UAS-abd-A RNAi (Flybase ID #28739) and pnr-GAL4/TM3, Ser1

(Bloomington Drosophila Stock Center #3039) were crossed to obtain pupae of

102 genotype UAS-abd-A RNAi/pnr-GAL4, in order to test for RNAi-mediated suppression of Abd-A along the dorsal-medial abdominal epidermis and the effects of Abd-A suppression on Bab1 and Bab2 expression. Following dissection, samples were fixed for 35 minutes in PBST (phosphate buffered saline with 0.1% Triton X-100) supplemented with 4% paraformaldehyde, and then blocked for 1 hour at room temperature in blocking buffer (PBST supplemented with 1% Bovine Serum Albumin). Samples were then incubated overnight with guinea pig anti-Abd-A (Li-Kroeger et al., 2008) at a 1:500 dilution in PBST. Following four washes in PBST and a one hour incubation in blocking buffer, specimens were incubated with a 1:500 dilution in PBST of goat anti- guinea pig Alexa Fluor 647 secondary antibody (Invitrogen).

Immunohistochemistry for Bab1 and Bab2 was done as described previously

(Rogers et al., 2013). Samples were then incubated for 10 minutes in a 1:1 solution of Glycerol Mount (80% glycerol, 0.1M Tris pH 8.0) and PBST, and then transferred in to Glycerol Mount. Samples were then mounted between a glass cover slip and slide for imaging.

Microscopy

Bright field images of the abdomen pigmentation phenotypes were taken using a zoom stereomicroscope (Olympus SZX-16) outfitted with a digital camera

(Olympus DP72). EGFP reporter expression and immunohistochemistry analysis of Bab expression was recorded using a confocal microscope (Olympus FV1000)

103 by previously described methods (Rogers and Williams, 2011). Confocal projection images were similarly processed using Adobe Photoshop for inclusion in figures. Figures include a representative image from replicate specimens

(n≥3).

Results

An RNAi screen to identify fruit fly abdomen pigmentation network transcription factors

In order to identify transcription factors involved in the abdomen pigmentation network, we performed a genetic screen using the GAL4/UAS system to individually express RNAi hairpins that initiate the post-transcriptional silencing of target transcription factor genes (Figure 4.2). We obtained RNAi lines for 558 of 749 genes in the D. melanogaster genome that are known or suspected to encode site-specific transcription factors (Adryan and Teichmann,

2006; Pfreundt et al., 2010), or 74.5% of the transcription factor genes. These

RNAi transgenes were integrated into a single genome landing site, which minimizes position-effects on transgene performance and their expression can be induced using the GAL4/UAS system (Brand and Perrimon, 1993; Markstein et al., 2008). We used a GAL4 gene insertion into the pannier (pnr) locus as a tissue-specific driver of RNAi expression (Heitzler et al., 1996). This chromosome results in GAL4 expression in dorsal-medial tissues, including the abdomen

104 (Figure 4.2A; activation of a UAS-EGFP transgene in the pnr domain) and thorax epidermis. We anticipated the occurrence of three general classes of phenotypic outcomes following the induction of an RNAi hairpin in the pnr expression domain

(Figure 4.2B). These were: Class I, where pigmentation is reduced; Class II, where pigmentation occurs ectopically; and Class III, where pigmentation appears wild type (Figure 4.2C).

Summarizing our results (Figure 4.2D), we found that ~2.0% (n=15) and

~1.7% (n=13) of the transcription factor genes had reduced or ectopic tergite pigmentation, respectively. We also identified 15 transcription factor genes that had other noteworthy phenotypic effects on dorsal-medial structures, which we refer to as Class IV. 524 of the RNAi transgenes had no observable effects in this assay, so called Class III outcomes, indicating that the target genes likely do not function in the development of abdominal pigmentation. Of the remaining 6 genes tested, the RNAi line caused lethality following induction. The details for this screen can be found in Table S4.1.

RNAi-hairpins were designed to have little to no off target effects and previously shown to be efficacious in suppressing target gene function in vivo (Ni et al., 2008). However, for any observed phenotype from an RNAi screen, it is possible that off-target interactions are fully or partially responsible. We reasoned that specific interactions would be supported if similar pigmentation phenotypes occurred when using a second independent RNAi line designed to target a

105 distinct region of the same gene. Of the genes whose primary RNAi line resulted in a conspicuous pigmentation phenotype, we obtained secondary lines for 16

(Figure S4.2). For 11 of these secondary lines, their phenotypes were strikingly similar to that generated by the primary line (Figure S4.2: abd-A, abd-B, da, dalao, dsx, exd, grh, hth, jing, Mi-2, and Su(var)2-10), whereas four resulted in a similar but more modest phenotype (Figure S4.2: CG10348, Mad, MBD-like, and osa). For one secondary line, which targeted pdm3, its dorsal-medial expression resulted in a reciprocal effect (Figure S4.2, compare O and O’). Here pigmentation was reduced in males rather than females, whereas the primary line caused a gain in female midline pigmentation.

To confirm the efficacy of RNAi knockdown, we monitored Abd-A protein expression in animals driving abd-A RNAi in the pannier domain. This resulted in reduced but detectable levels of protein in the dorsal-medial domain where the

RNAi transgene was expressed (Figure 4.3B, compare intensity of medial and lateral expression). Collectively, this RNAi-screen implicated 28 transcription factors as being part of the D. melanogaster abdomen pigmentation network.

This included the genes abd-A, Abd-B, bab1, bab2, dsx, exd, hth and jing for which previous studies revealed a role in abdomen pigmentation, and many novel factors that had little to no prior characterized role in this trait’s development.

106

Figure 4.3: Abd-A expression is suppressed by an inhibitory RNA transgene. (A and B) Immunohistochemical analysis of Abd-A expression in the D. melanogaster dorsal pupal abdominal epidermis. Specimens were prepared at 85 hours after puparium formation. Note the absence of expression in the A1 segment. Arrowheads indicate the reduced dorsal-medial expression of Abd-A in a specimen expressing an RNAi transgene that targets abd-A.

Transcription factor genes whose suppression results in reduced tergite pigmentation (Class I)

What we refer to here as a Class I phenotype was one possible outcome anticipated following RNAi suppression of a transcription factor involved in the abdomen pigmentation network. Reduced pigmentation compared to wild type specimens would imply that the target locus normally functions to promote tergite pigmentation. As a positive control, we evaluated the Hox gene Abd-B that regulates the expression of three pigmentation network genes: yellow, bab1, and bab2 (Jeong et al., 2006; Williams et al., 2008). RNAi suppression of Abd-B reduced pigmentation of the anterior male A5 tergite (Figure 4.4A). The otherwise wild type pigmentation of the male and female A5 and A6 tergites suggests two things. One, that the RNAi-targeting does not result in a complete

107 loss in Abd-B expression, since this phenotype is more subtle than that reported for Abd-B null males (Kopp et al., 2000). Second, that the anterior A5 segment is more sensitive to Abd-B reduction, which is consistent with the lower level of expression that occurs in this region compared to more posterior regions (Kopp and Duncan, 2002).

Figure 4.4: Transcription factor genes whose RNAi-mediated suppression results in reduced abdominal tergite pigmentation. (A-O) Pigmentation is reduced in progeny expressing an inhibitory RNA transgene that targets a specific transcription factor gene. Each specimen is of genotype pnr-GAL4/UAS- TF-RNAi. Arrowheads point to tergite regions where pigmentation is notably reduced.

108 Reduced pigmentation was observed following RNAi-mediated targeting of fourteen additional transcription factors (Figure 4.4). Phenotypes ranged from a subtle Abd-B-like reduction in the anterior male A5 tergite (Figure 4.4B; jing) and a loss on the female A6 tergite pigmentation (Figure 4.4C, vvl), to very severe reductions in male tergite pigmentation (Figure 4.4D-G; respectively abd-A, Gug, osa, and sbb). Though less conspicuous, female tergite pigmentation was also reduced for abd-A, Gug, osa, and sbb. Suppression of several transcription factor genes resulted in tergite pigmentation reductions that appeared female-limited.

These included the A6 tergite for vvl, unpg, Sox102F, scrt, and Su(var)2-10

(Figure 4.4C, and 4.4H-K). Pigmentation dots along the female tergite midline were reduced following the suppression of Mad (Figure 4.4L). For da, CG10348, and Hr4, their suppression resulted in reductions in tergite pigmentation irrespective of gender (Figure 4.4M-O). Collectively, these results reveal that the development of the variety of abdomen pigmentation pattern elements requires the activity of numerous transcription factor genes that promote melanic pigment formation. Some of these transcription factors have activity limited to one sex, and some whose activity affects both sexes.

Transcription factor genes whose suppression results in ectopic tergite pigmentation (Class II)

Ectopic tergite pigmentation, referred to here as a Class II phenotype, was a second outcome anticipated for RNAi knockdown of transcription factors that

109 are part of the abdomen pigmentation network. This outcome would implicate that the target locus normally functions to suppress tergite pigmentation. Such a role has been already characterized for bab1, bab2, dsx, exd, and hth (Baker and

Ridge, 1980; Couderc et al., 2002; Culi et al., 2006; Kopp et al., 2000; Ryoo et al., 1999). We used these genes as controls to test the efficacy of this RNAi screen to identify other Class II transcription factors. While targeting dsx resulted in ectopic pigmentation for the female A6 tergite (Figure 4.5C), to our surprise neither bab1 nor bab2 targeting resulted in ectopic pigmentation (Supplementary

Table 4.1, RNAi lines were from the TRiP collection), in spite of these genes’ established role in suppressing female tergite pigmentation. We obtained independently derived transgenic RNAi lines for bab1 and bab2 (Dietzl et al.,

2007) and found that their dorsal-medial expression resulted in ectopic pigmentation on the female A6 tergite for bab1 and the A6 and A5 tergite for bab2 (Figure 4.5A and 4.5B), closely matching the described null phenotypes of bab1 and bab2 (Couderc et al., 2002; Kopp et al., 2000). Suppression of both exd and hth resulted in ectopic pigmentation in the anterior A4 and A3 tergites of males but not females (Figure 4.5J and 4.5K), though a mild increase in pigmentation was seen on the female A6 tergite.

110

Figure 4.5: Transcription factor genes whose RNAi-mediated suppression results in ectopic abdominal tergite pigmentation. (A-M) Pigmentation occurs ectopically in progeny expressing an inhibitory RNA transgene that targets a specific transcription factor gene. (A and B) The production and genomic placement of these inhibitory RNA transgenes differ from all others used in this study. (C-M) Specimens are of genotype pnr-GAL4/UAS-TF-RNAi. Arrowheads point to tergite regions with noteworthy ectopic pigmentation.

We found eight, seemingly novel, transcription factor genes whose suppression resulted in ectopic abdomen pigmentation. These included crol, lmd,

MBD-like, and pdm3 for which subtle expansions in the dorsal-medial tergite pigmentation were observed (Figure 4.5D-G), and grh whose suppression resulted in a more diffuse pigmentation accompanied by tergite defects (Figure

4.5I). Suppression of Mi-2 resulted in ectopic pigmentation on the A4 tergite of males and the A5 and A6 tergites of females (Figure 4.5H), indicating that this

111 transcription factor similarly regulates pigmentation in both sexes, although in different spatial domains. Lastly, targeting vfl and Eip74EF resulted in ectopic pigmentation limited to the female A6 tergite (Figure 4.5L and 4.5M). Collectively, these results reveal that the development of the various abdomen pigmentation pattern elements requires the activity of multiple transcription factor genes that act in a repressive manner. Among these, the regulatory activity for some is limited to one sex, whereas others have activity in both sexes.

Transcription factor genes whose suppression causes other conspicuous phenotypes (Class IV)

Besides reductions and gains in abdomen pigmentation, RNAi-mediated suppression of some transcription factor genes yielded other conspicuous phenotypes, which we categorized as Class IV. RNAi targeting of Ssrp, dalao, pnr, MBD-R2, pita, and tai each altered pigmentation along the tergite midline

(Figure 4.6A-F). However, these outcomes were accompanied by split-tergite phenotypes, making it likely that the pigmentation defects were secondary to a failure in tergite development. Tergite microchaetae failed to develop following the suppression of the gene tgo, tx, and CG9797 (Figure 4.6G-I). Following RNAi suppression of nej (Figure 4.6J) and lmd (Figure 4.6L), ectopic pigmentation occurred on the thoracic scutum and scutellum and the scutellum, respectively.

Suppression of tap also resulted in a mild darkening of the scutum (Figure 4.6M).

Suppression of mxc and svp led to abnormal development of macrochaetae on

112 the scutellum (Figure 4.6K and 4.6O), while lmd and Su(var)2-10 suppression resulted in clefts forming along the dorsal-medial thorax (Figure 4.6L and 4.6N).

Interestingly, a previous RNAi-screen for thorax phenotypes evaluated nej, mxc, lmd, tap, Su(var)2-10, and svp, though an abnormal phenotype was reported for just mxc and lmd (Mummery-Widmer et al., 2009). Thus, this screen reveals several genes with a role in adult thorax development and illustrates the utility of evaluating multiple RNAi lines for the same genes where possible.

Figure 4.6: Transcription factor genes whose RNAi-mediated suppression results in conspicuous mutant phenotypes. (A-F) Split tergite phenotypes with associated pigmentation defects occur in progeny expressing an inhibitory RNA transgene that targets a specific transcription factor gene. (G-I) Tergite bristles fail to grow following suppression of tgo, tx, and CG9797 expression. (J- O) Mutant notum phenotypes. (A-J) Specimens are of genotype pnr-GAL4/UAS- TF-IR. Arrowheads point to regions with noteworthy phenotypes. Dashed boxes enclose bristle phenotypes along the abdomen midline.

113 Genetic Interactions between transcription factors and pigmentation network

CREs

With 28 pigmentation network transcription factor genes known, it is necessary to determine the direct target gene CREs that the encoded proteins interact with. As a first step, we sought to determine whether some of these transcription factors genetically interact with CREs for the tan and ebony genes that play opposing roles in pigmentation metabolism and that direct reciprocal expression patterns (Figure S4.1). We selected eight transcription factors for which RNAi targeting resulted in relatively dramatic alterations in pigmentation. In a wild type genetic background the ebony CRE region drives EGFP reporter gene expression in epidermis cells of pharate adults underlying tergite regions that will lack black pigmentation. This pattern of EGFP expression includes the anterior regions of segments, except for the male A5 and A6 segment where the regulatory region remains inactive (Figure 4.7A and 4.7A’). Reciprocally, a CRE for the tan gene drives reporter gene expression in the A5 and A6 segments of male pupae, where the overlying tergites will become completely black in adults

(Figure 4.7B and 4.7B’). The activity of these CREs were evaluated for qualitative differences when in genetic backgrounds (CRE-EGFP ; pnr-GAL4/UAS-RNAi transgene) for which the expression of a Class I (Figure 4.7) or Class II (Figure

4.8) transcription factor was suppressed along the dorsal midline. We found that ebony CRE activity was upregulated in the male A5 and A6 segments in backgrounds with reduced expression for either abd-A, sbb, Gug, osa, and

114 CG10348 (Figure 4.7D, 4.7G, 4.7J, 4.7M, and 4.7P). CRE activity was also upregulated in the posterior regions of the female A6 segment (Figure 4.7D’,

4.7G’, 4.7J’, 4.7M’, and 4.7P’). These gains in ebony CRE activity correspond with the reductions in pigmentation observed in the abdomens following RNAi- mediated suppression (Figure 4.4). Next we tested the effects that RNAi for the

Class II genes hth, exd, and Mi-2 had on ebony CRE activity. Consistent with the

Class II mutants exhibiting ectopic pigmentation, ebony CRE activity was reduced in the anterior regions of the male A3 and A4 segments (Figure 4.8D,

4.8G, and 4.8J) and the female A5 and A6 segments (Figure 4.8D’, 4.8G’, and

4.8J’).

We next tested whether the same Class I and Class II transcription factors regulate the male-specific expression of tan. For the Class I genes abd-A, sbb,

Gug, osa, and CG10348, expression in the dorsal midline of the A5 and A6 segments was reduced following RNAi (Figure 4.7E, 4.7H, 4.7K, 4.7N, and

4.7Q). Conversely, for the Class II genes hth, exd and Mi-2, ectopic expression occurred in the dorsal midline following RNAi (Figure 4.8E, 4.8H, and 4.8K). A more moderate level of ectopic expression was also observed in the A5 and A6 segments of females (Figure 4.8E’, 4.8H’, and 4.8K’).

For each of the pigmentation network transcription factor genes evaluated here, our results demonstrate that they act upstream of tan and ebony, though with reciprocal regulatory effects. We next sought to determine whether these

115 transcription factor genes acted upstream of bab, whose expression and function are needed for this sexually dimorphic trait.

Figure 4.7: Genetic interaction between Class I transcription factors and pigmentation network cis-regulatory elements. CRE driven expression patterns for a GFP reporter transgene in genetic backgrounds where the expression of Class I transcription factor genes were suppressed by dorsal- medial expression of an RNAi transgene. Expressions in (A-R) male and (A’-R’) female specimens. The ebony X,Y,Z, tan t_MSE, and bab dimorphic element CREs were respectively assessed in pharate adults, pupa at 95 hours after puparium formation, and pupa at 85 hours after puparium formation. Arrowheads indicate regions where transgene expression was reduced or gained.

116 Genetic Interactions between pigmentation network transcription factors and bab

The dimorphic element directs the expression of the pigmentation- suppressing Bab transcription factors in the female A5 and A6 segments. While this CRE is directly regulated by Abd-B and Dsx, we sought to determine whether any other Class I and Class II transcription factors influence this CRE’s activity.

RNAi-mediated suppression of abd-A, sbb, Gug, osa, and sbb resulted in reduced tergite pigmentation on the male A5 and A6 tergites. One possible explanation for these phenotypes would be through the secondary up-regulation of Bab expression. However, we observed no noteworthy increase in dimorphic element regulatory activity in males with genetic backgrounds where these Class

I factors’ expression was suppressed (Figure 4.7F, 4.7I, 4.7L, 4.7O, and 4.7R). In females, a reduction of reporter gene expression was observed when abd-A, osa, and CG10348 were suppressed (Figure 4.7F’, 4.7O’, and 4.7R’). For the

Class II transcription factor genes exd, hth, and Mi-2, the only observed alteration to dimorphic element regulatory activity was a reduction in females following the suppression of hth (Figure 4.8F’) The reduced activity in the medial region of the female A6 segment corresponds with the location where pigmentation develops in adult females following hth suppression (Figure 4.5K).

Though dimorphic element activity in males was not altered, it remained possible that Bab expression might be upregulated in males through another mechanism, such as the activity of another CRE. We tested this possibility for

117 abd-A by driving the expression of an inhibitory RNAi transgene in the dorsal- medial abdomen. However, Bab1 and Bab2 expression remained off in the male

A5 and A6 segment epidermis where abd-A suppression occurred (Figure S4.3).

Collectively, these results demonstrate that abd-A regulates male pigmentation in a Bab-independent manner, and suggests that the same might be true for the other Class I and Class II genes evaluated here. Moreover, the results indicate that abd-A, osa, CG10348, and hth are positive regulators of the dimorphic element.

Figure 4.8: Genetic interaction between Class II transcription factors and pigmentation network cis-regulatory elements. CRE driven expression patterns for a GFP reporter transgene in genetic backgrounds where the expression of Class II transcription factor genes were suppressed by dorsal- medial expression of an RNAi transgene. Expressions in (A-L) male and (A’-L’) female specimens. The ebony X,Y,Z, tan t_MSE, and bab dimorphic element CREs were respectively assessed in pharate adults, pupa at 95 hours after puparium formation, and pupa at 85 hours after puparium formation. Arrowheads indicate dorsal-medial regions where expression was reduced or gained.

118 Abdominal pigmentation diversified while Abd-A expression has remained conserved

Alterations to the expression patterns of regulatory genes has been linked to meso-evolutionary changes in fruit fly (Arnoult et al., 2013; Werner et al.,

2010) and butterfly pigmentation patterns (Keys et al., 1999; Martin et al., 2012;

Oliver et al., 2012; Reed et al., 2011). At the macroevolutionary scale of comparison, axial shifts in Abd-A and Ubx expression correlate with the diversification appendage morphologies of crustaceans (Averof and Patel, 1997).

We suspected that shifts in Abd-A expression during the late stages of pupal development might underlie the differences in the number of pigmented male tergites seen between species closely-related to D. melanogaster (eg. D. baimaii), species lacking male-specific pigmentation (eg. D. willistoni), and species where male-specific pigmentation evolved convergently (eg. D. funebris)

(Figure 4.1A). Thus, we evaluated the expression of Abd-A at equivalent late pupal stages when pigmentation patterns are specified. For each species, Abd-A expression was observed in segments A2-A7 (Figure 4.3A, and 4.9A-C). This pattern of expression recapitulates that which occurs at an earlier stage of pupal development for D. melanogaster (Kopp and Duncan, 2002), and which occurs between distantly related insect species (Angelini et al., 2005; Peterson et al.,

1999; Shippy et al., 1998). It remains to be determined whether or not a similar degree of expression conservation exists for the other newly identified pigmentation network transcription factors.

119

Figure 4.9: Abd-A expression is conserved between species with diverse abdominal pigmentation phenotypes. (A-C) Immunohistochemical analysis of Abd-A expression in the dorsal pupal abdominal epidermis. Specimens were prepared at a developmental stage equivalent to 85 hours after puparium formation for D. melanogaster. Note the absence of Abd-A expression in the A1 segment for all species and the conservation of expression is segments A2-A6.

Discussion

Here, we show that the network of genes regulating D. melanogaster tergite pigmentation is more numerous and complex than previously appreciated.

This cohort includes 15 and 13 genes whose functions are required to promote and suppress pigmentation respectively (Table 4.1 and Table 4.2). These include major regulators of body plan development (abd-A, Abd-B, exd, and hth) regulators of sexual dimorphism (bab1, bab2, and dsx) and several genes not previously implicated with pigmentation. hth and exd largely had effects limited to the more anterior tergites of males, providing evidence that the pigmentation network has connections that control the extent of male-specific pigmentation that complements the female-specific circuit governed by the Bab transcription factors. Many of these transcription factor genes function as upstream regulators

120 of two reciprocally expressed pigmentation genes. Whereas a smaller subset function upstream of the CRE controlling female-specific Bab expression.

Collectively, these results provide numerous opportunities to identify regulatory linkages between network transcription factors and target genes CREs, and candidate genes to explore for roles in phenotypic diversity.

Name DNA-binding domain (InterPro #) Gene ontology terms (Synonym) Sequence-specific DNA binding transcription abd-A Homeodomain (IPR001356) factor activity

Sequence-specific DNA binding transcription Abd-B Homeodomain (IPR001356) factor activity

CG10348 Zinc finger, C2H2 (IPR007087) Nucleic acid binding

Myc-type, basic helix-loop-helix Sequence-specific DNA binding transcription da domain (IPR011598) factor activity

Gug (Atro) SANT domain (IPR017884) DNA binding/transcription corepressor activity

Zinc finger, nuclear hormone Ligand-activated sequence-specific DNA binding Hr4 (DHR4) receptor-type (IPR001628) RNA polymerase II transcription factor activity

Sequence-specific DNA binding transcription jing Zinc finger, C2H2 (IPR007087) factor activity/ESC/E(Z) complex

Transcription factor complex/transforming grow th Mad MAD homology, MH1 (IPR013019) factor beta receptor signaling pathw ay ARID/BRIGHT DNA-binding osa (eld) DNA binding/brahma complex domain (IPR001606) Sequence-specific DNA binding transcription sbb (mtv) Zinc finger, C2H2 (IPR007087) factor activity/transcription corepressor activity

Sequence-specific DNA binding transcription scrt Zinc finger, C2H2 (IPR007087) factor activity

High mobility group box domain Sequence-specific DNA binding transcription Sox102F (IPR009071) factor activity Su(var)2–10 Zinc finger, MIZ-type (IPR004181) DNA binding/chromosome condensation (dPIAS) Sequence-specific DNA binding transcription unpg (unp,upg) Homeodomain (IPR001356) factor activity

Transcription regulatory region sequence-specific vvl (zld) Zinc finger, C2H2 (IPR007087) DNA binding Table 4.1: Class I transcription factor DNA binding domains and Gene ontology terms. DNA-binding domains were obtained from the InterPro: protein sequence analysis and classification resource. Gene Ontology terms were selected from FlyBase gene reports.

121 Name (Synonym) DNA-binding domain (InterPro #) Gene ontology terms

DNA binding HTH domain, Psq- Sequence-specific DNA binding transcription bab1 type (IPR007889) factor activity/AT DNA binding

DNA binding HTH domain, Psq- Sequence-specific DNA binding transcription bab2 type (IPR007889) factor activity/AT DNA binding

Nucleic acid binding/regulation of chromatin crol Zinc finger, C2H2 (IPR007087) silencing

Sequence-specific DNA binding transcription dsx DM DNA-binding (IPR001275) factor activity/sex differentiation Ets domain (IPR000418) / Winged Sequence-specific DNA binding transcription Eip74EF(E74) helix-turn-helix DNA-binding factor activity domain (IPR011991) Sequence-specific DNA binding transcription exd (Dpbx) Homeodomain (IPR001356) factor activity/transcription factor complex

CP2 transcription factor Sequence-specific DNA binding transcription grh (NTF-1,Elf-1) (IPR007604) factor activity

Homeobox KN domain Sequence-specific DNA binding transcription hth (dtl) (IPR008422) factor activity/transcription factor complex

Sequence-specific DNA binding transcription lmd (minc,gfl) Zinc finger, C2H2 (IPR007087) factor activity

MBD-like Methyl-CpG DNA binding Negative regulation of transcription, DNA- (dMbD2/3) (IPR001739) dependent/NuRD complex

Chromo domain/shadow Chromatin binding/chromatin assembly or Mi-2 (dMi-2) (IPR000953) disassembly/NuRD complex

Sequence-specific DNA binding transcription pdm3 POU domain (IPR013847) factor activity

Transcription regulatory region sequence- vfl (zld) Zinc finger, C2H2 (IPR007087) specific DNA binding Table 4.2: Class II transcription factor DNA binding domains and Gene ontology terms. DNA-binding domains were obtained from the InterPro: protein sequence analysis and classification resource. Gene ontology terms were selected from FlyBase gene reports.

The regulatory complexity for the tergite pigmentation network

One common theme for trait development is the utilization of a large number of genes that are interconnected at the level of gene expression regulation (Davidson, 2006; Levine and Davidson, 2005). For example, the D.

122 melanogaster segmentation, dorsal-ventral patterning, and mesoderm development regulatory networks include some 30 to over 50 genes (Bonn and

Furlong, 2008). Most of these network genes encode transcription factors that interact with an even greater number of CREs. This complexity in transcription factor content and regulatory wiring seems logical, as embryonic development involves coordination of cell proliferation, death, determination, and differentiation events. We were curious whether abdominal tergite pigmentation, a late developing secondary sex-trait, was encoded by a network of comparable regulatory complexity.

abd-A, Abd-B, bab1, bab2, dsx, exd, hth, and jing function broadly in abdomen development, including tergite pigmentation (Culi et al., 2006;

González-Crespo and Morata, 1995; Rauskolb et al., 1995; Ryoo et al., 1999;

Sanchez-Herrero et al., 1985). These eight genes represent an important subset of network transcription factor genes. In this study, a survey of ~75% of the transcription factor genes encoded in the genome revealed that this network likely includes at least 20 additional genes. Specifically, losses in tergite pigmentation occurred when the expression was reduced for 15 genes (Table

4.1). Reduced expression for 13 genes resulted in ectopic patterns of tergite pigmentation (Table 4.2). These 28 transcription factor genes collectively encode proteins with diverse DNA-binding domains. Many of these genes, such as abd-

A, Gug, osa, sbb, and Mi-2, had mutant phenotypes that were of equal or greater

123 magnitude than those for the well recognized network gene Abd-B, bab1, and bab2 (Figure 4.4 and Figure 4.5) (Couderc et al., 2002; Kopp et al., 2000).

The total number of tergite pigmentation transcription factor genes can be expected to grow larger than 28 for three reasons. One, ~25% of D. melanogaster transcription factor genes were not evaluated in this study due to the absence of readily available RNAi lines. Two, the absence of pigmentation phenotypes for bab1 and bab2 using TRiP collection RNAi lines demonstrates how false-negative outcomes are likely to have occurred for some of the 524 genes for which pigmentation developed normally. Third, the result with Abd-A demonstrates how many genes may display only a partial knockdown in expression, which may be phenotypically silent. Irrespective of how many more genes are added, the current network hierarchy that includes 28 transcription factor genes suggests that the patterning and formation of tergite pigmentation relies on a complexity comparable to that of networks controlling earlier developmental events.

A role for chromatin remodeling in pigmentation development

In addition to spatial- and sex-specific transcription factor inputs, gene expression depends on the chromatin state at the promoters and CREs for network genes. Consistently, many of the transcription factor genes identified in this screen encode components of chromatin modifying complexes or

124 transcription factors known to interact with these complexes. Ssrp (Structure specific recognition protein) encodes a protein with an HMG box DNA binding domain and functions in the FACT complex that interacts with nucleosomes where it can remove H2A-H2B dimers (Winkler and Luger, 2011). The

FACT complex was shown to regulate the expression of Hox genes, including

Abd-B (Shimojima et al., 2003). Strong genetic differentiation between tropical and temperate populations of D. melanogaster was found at Ssrp, suggesting this locus might have been a target for environmental adaptation (Levine and

Begun, 2008).

SWI/SNF complexes function to remodel nucleosomes, which can favor

DNA binding by transcription factors. Two complexes exist in Drosophila, BAP and PBAP that have common (such as Brahma) and unique protein components

(Mohrmann et al., 2004). We found that RNAi suppression of the osa and dalao genes resulted in tergite defects that included pigmentation. These genes encode the Osa and Dalao (BAP111) proteins, the former of which is specific to the BAP complex and the latter occurring in both complexes. Though these two proteins bind to DNA, binding was non-specific in vitro (Collins et al., 1999;

Papoulas et al., 2001). In contrast, in vivo studies showed that osa (Brumby et al., 2002; Terriente-Félix and de Celis, 2009; Treisman et al., 1997; Vázquez et al., 1999) and more generally the SWI/SNF complex has specific targets of regulation (Holstege et al., 1998). It remains uncertain to what extent factors like

125 Osa and Dalao contribute to target gene discrimination, perhaps a question that can be addressed within the increasingly well understood pigmentation network.

RNAi suppression of Gug resulted in a dramatic loss of tergite pigmentation. Gug has a SANT domain that resembles a DNA-binding domain, for which direct DNA binding has not been formally demonstrated (Wang and

Tsai, 2008). Gug acts as a necessary repressor of many developmental genes through physical interactions with transcription factors that include Eve, Hkb, and

Tll (Wang et al., 2006; Wehn and Campbell, 2006; Zhang et al., 2002), where

Gug recruits HDAC1 and HDAC2 through its ELM2 and SANT domains (Wang et al., 2006, 2008). While a prominent role for Gug is to function as a transcriptional co-repressor, it was classified as a trithorax gene, as it is required for some Hox gene functions (Kankel et al., 2004).

The connection between chromatin remodeling complexes and tergite pigmentation include the Drosophila nucleosome remodeling and deacetylase

(dNuRD) complex. In our study, suppression of dMi-2 led to ectopic pigmentation on the male A4 tergite and the female A5 and A6 tergites. dMi-2 functions as the

ATPase subunit which has been found to regulate gene expression through its inclusion of both histone deacetylases and histone binding proteins (Bouazoune and Brehm, 2006). The dMi-2 protein has several noteworthy motifs in addition to the ATPase domain, including a putative DNA-binding domain (Kehle et al.,

1998). However, the significance of this latter domain and more specifically

126 whether it functions to bind DNA in a sequence-specific manner has not been shown. dMi-2 can repress gene expression through several mechanisms, including interactions with various transcription factor proteins (Kehle et al., 1998;

Murawsky et al., 2001) and through the disruption of higher order chromatin structure by destabilizing interactions between Cohesin and chromosomes

(Fasulo et al., 2012). dMi-2 is also part of an abundant binary complex (called dMec) with dMEP-1, and this complex contributes to the repression of proneural genes through a SUMOylation-mediated mechanism (Kunert et al., 2009). dMi-2 function is not universally repressive, as it has been found to be associated with genes that are transcriptionally active (Murawska et al., 2008, 2011).

Abdominal pigmentation: a composite of pattern elements and regulatory genes

The pattern of tergite pigmentation does not only consist of the dimorphic

A5 and A6 tergites, but rather is a composite of pattern elements that include posterior stripes, dorsal pigmentation spots, and the yellow coloration of the non- melanic tergite regions. Thus, the regulatory structure of this network must include transcription factors that pattern these sub-elements. Consistent with this expectation, posterior tergite pigmentation stripes were reduced by RNAi for sbb,

Gug, and osa, and widened by antagonizing pdm3 and vfl. Several genes were implicated in the patterning of midline spots. Gug, Mad, and da were required for spot development, whereas crol, lmd, MBD-like, pdm3, and hth were needed to limit the spot size and number. RNAi targeting of transcription factor genes,

127 including Ssrp, dalao, MBD-R2, pita, and pnr disrupted pigmentation along the dorsal midline, however these effects seem to be a consequence of disrupted tergite formation. Lastly, da, CG10348, grh, and Hr4 were implicated in the overall coloration of tergites, as their suppression resulted in alterations along the entire dorsal-medial domain of RNAi transgene expression.

A network with male and female specific circuits

The patterns of expression (Figure 4.10) and molecular activities of Bab1,

Bab2, and Abd-B appear sufficient to explain the male-specific pigmentation of the D. melanogaster A5 and A6 tergites. Abd-B has pupal expression limited to the posterior most abdominal segments (Kopp and Duncan, 2002; Wang and

Yoder, 2012), including A6 and A5 where it regulates the expression of the pigmentation gene yellow (Jeong et al., 2006). Pupal Bab1 and Bab2 expression occurs broadly in females compared to males (Salomone et al., in press), and these factors are essential to a female-specific circuit that suppresses tergite pigmentation (Kopp et al., 2000). Although abd-A, exd, hth, and jing are required for abdomen development, in recent years little progress has been made as to how these genes contribute to tergite pigmentation. We show that reduced expression of exd, and hth resulted in ectopic pigmentation on the male A3 and

A4 tergites. While HTH is expressed broadly in the pupal abdomen like Abd-A

(Figure 4.10), the occurrence of ectopic pigmentation demonstrates that the anterior limitation of tergite pigmentation in males is under the control of a male- specific circuit that prominently includes hth and exd.

128

Figure 4.10: Summarizing representation of expression patterns for key patterning transcription factors. Parasegments/segments are listed with the anterior PS5/T3 at the top and progressing to the posterior PS14/ A9 at the bottom. High levels of expression are represented by black rectangles, whereas relatively lower levels are represented with gray shading. Abd-B expression steadily declines from segment A7 through the A5 segment, and expression of Bab in the male A2-A4 segments is reduced compared to the levels observed in females. Ubx, Abd-A, and Abd-B were presented in Kopp, A., Duncan, I., 2002; Bab1 and Bab2 in Salomone et al., in press; and HTH is unpublished data (Grover and Williams).

The loss of male tergite pigmentation following the RNAi targeting of abd-

A showed that abd-A is necessary for A5 and A6 pigmentation. However, Abd-A is expressed in the A2-A4 segments (Figure 4.3) that are covered by tergites which lack pigmentation comparable to the male A5 and A6 tergites. Thus, Abd-A and Abd-B present a Hox code (Lewis, 1978) that overcomes the repressive effects of HTH and EXD, for which the molecular mechanism remains to be elucidated. One possibility is where Abd-B acts to repress EXD expression in the

A5 and A6 segments. In the absence of EXD, the transcriptional regulatory function of Abd-A may switch from repressive to activating or vice versa. A second possible mechanism is where Abd-A and Abd-B bind to the same CRE(s) and collaboratively regulate gene expression. Here, regulation would be in an opposite manner than for Abd-A, EXD, and HTH in the absence of Abd-B. Of the well studied regulators of abdominal pigmentation, Abd-B has the most spatially limited pattern of expression (Figure 4.10). Thus, it might be expected that many

129 of the pigmentation phenotypes reported in this study were caused by reducing the expression of genes that modulate the domain and/or levels of Abd-B expression. These possible mechanisms warrant future investigation.

Tracing the network structure through target gene CRE interactions

Within the D. melanogaster pigmentation network few direct regulatory linkages are known. These include direct CRE interactions between Abd-B and the yellow body element CRE (Jeong et al., 2006) and both Abd-B and Dsx with the bab loci dimorphic element CRE (Williams et al., 2008) (Figure 4.1B).

Repression of yellow (Jeong et al., 2006) and presumably the tan gene expression is mediated by the Bab proteins, but whether regulation is direct or indirect remains unknown. Moreover, no direct regulators of ebony are known.

In order to elucidate the relative network position and regulatory association for 8 of these network transcription factor genes, we evaluated the effects that their reduced expression had on the regulatory activity of tan, ebony, and bab CREs (Figure 4.7 and Figure 4.8). For 8 of 8 genes, the regulatory activities of the tan and ebony CREs where inversely altered, indicating that these transcription factors function as upstream regulators of these pigmentation genes. However, it remains unknown as to how these same factors direct the inverse patterns of tan and ebony expression. Thus, an important future direction is to identify the direct CRE targets for these transcription factors and to reveal

130 how the inverse regulatory outcomes are encoded. For the bab CRE, only 4 of 8 genes altered the regulatory activity in a manner supporting an upstream hierarchical role with this network. Interestingly, abd-A has regulatory connections that promote and suppress pigmentation through reciprocal activation and repression of tan and ebony, and through the regulation of bab.

When hth was targeted by RNAi, dimorphic element activity was reduced, but this outcome was not observed for exd, suggesting that hth may function independently of exd to regulate bab. This exclusive utilization of hth differed from the regulation of tan and ebony, where both hth and exd play similar repressing (tan) or activating (ebony) functions. Although reduced sbb and Mi-2 expression resulted in a loss and gain of female A6 pigmentation respectively, no corresponding alteration in the female specific activity of the bab CRE was observed in either condition. These outcomes suggest that sbb and Mi-2 are either downstream of bab or act as an independent regulatory circuit.

Collectively, these complex regulatory outcomes underscore the need to map the actual direct binding events between these transcription factors and their network

CRE targets.

Understanding morphological diversity through gene network evolution

A priori, several types of genetic changes could underlie pigmentation evolution. These include gene duplication and divergence events, the evolution

131 of novel protein activities, and the evolution of novel gene expression patterns.

Based upon the frequency and pleiotropic effects of these types of mutations, and the outcomes of their effects, it has been reasoned that changes in gene expression by CRE mutations will be the predominant driver of morphological evolution (Carroll, 2008; Stern, 2000), an outcome that has been well supported for fruit fly abdominal pigmentation traits (Jeong et al., 2006, 2008; Rebeiz et al.,

2009a, 2009b; Rogers et al., 2013; Williams et al., 2008). For cases of morphological evolution, it remains unclear whether CRE modifications preferentially target certain nodes within a network and whether nodes of change differ at differing taxonomic scales of comparison (Gompel and Prud’homme,

2009; Kopp, 2009; Martin and Orgogozo, 2013; Stern, 2011; Stern and

Orgogozo, 2009).

It is possible that mutations primarily occur in the CREs of pigmentation genes, so called cis-regulatory evolution, to establish or break connections with transcription factors that are expressed in the relevant cells making up a conserved trans-regulatory landscape (Gompel et al., 2005). Alternatively, mutations might frequently occur in CREs regulating the expression of transcription factor genes, which thereby alter the landscape of transcription factors. trans-regulatory landscape alterations are predicted to generate many more downstream changes in expression, or ripple-effects through the network.

Through a robust understanding of the D. melanogaster pigmentation network, it becomes more manageable to test such hypotheses about the nature of network

132 evolution. For example, the number of pigmented tergites differs between D. melanogaster, D. prostipennis, and D. baimaii (Figure 4.1A). Hence, these species provide a model comparison to determine whether this morphological shift in pigmentation was due to changes in the binding site content of pigmentation gene CREs or through modifications in the expression patterns for certain network transcription factor genes. Among fruit fly species the male- specific phenotype of D. funebris is thought to be convergent (Gompel and

Carroll, 2003). Thus, comparing and contrasting this species network structure to that of D. melanogaster offers an opportunity to see the extent to which convergent networks are similarly wired.

Many other cases of fruit fly tergite pigmentation evolution are known and many more will be identified. These include differences between populations, closely-related species, and distantly-related species. Resolving the gene network bases for these differences will provide insights as to whether certain nodes are recurrently targeted. This expanded knowledge of the D. melanogaster network presented here offers many candidate loci that might have contributed to the divergence of this ever-changing morphological characteristic.

Acknowledgments

The authors thank J. Salomone for help with Bab1 and Bab2 immunohistochemistry. S. Stringer was supported in part by the Honors Program

133 at the University of Dayton and a Research Experience for Undergraduates award for the National Science Foundation. W. Rogers and S. Grover were supported by fellowships from the University of Dayton Graduate School. J.

Parks was supported by a Research Experiences for Teachers award from the

National Science Foundation. T. Williams was supported by startup funding from the University of Dayton and the University of Dayton Research Institute, and grants from the American Heart Association (11BGIA7280000) and the National

Science Foundation (IOS-1146373). M. Rebeiz was supported by a grant from the National Science Foundation (IOS-1145947). We thank B. Gebelein for providing the Abd-A antibody, F. Laski for the Bab2 antibody, the Vienna

Drosophila Stock Center for the bab1 and bab2 RNAi lines, the TRiP at Harvard

Medical School (NIH/NIGMS R01-GM084947) for providing transgenic RNAi fly stocks used in this study and the Bloomington Drosophila Stock Center for distributing these RNAi fly stocks.

Supplementary Information

Figure S4.1: Spatial and sex-specific patterns of pigmentation gene expression. (A) Wild type pigmentation pattern for the tergites of the D. melanogaster abdomen. Schematic for the sex- and segment-specific expression patterns for the genes (B) yellow, (C) tan, (D) ebony. Regions in yellow, tan, and black correspond with the abdomen regions where these pigmentation genes are expressed during development.

134

Figure S4.2: Phenotypic comparisons of independent RNAi lines that target the same transcription factor genes. (A-P) Whole mount abdomen images of a representative male and female abdomen for which the listed transcription factor gene’s expression was targeted by a primary line RNAi transgene. (A’-P’) Whole mount abdomen images of a representative male and female abdomen, for which the listed transcription factor gene’s expression was targeted by a secondary line RNAi transgene. In parentheses lists the vector that the hairpin RNA resides in, and the Bloomington Drosophila Stock Center identification #. Arrowheads point to regions where noteworthy phenotypic alterations were observed.

135

Figure S4.3: Bab expression remains off in males following RNAi suppression of Abd-A. (A-D) Immunohistochemical analysis of Bab1 expression in the dorsal abdominal epidermis. (E-H) Immunohistochemical analysis of Bab2 expression in the dorsal abdominal epidermis. Specimens of genotype pnr-GAL4 express the GAL4 transcription factor in dorsal abdominal epidermis. Specimens of genotype pnr-GAL4/UAS-abd-A-RNAi express an inhibitory RNA that suppresses abd-A activity along the dorsal midline under control of the GAL4 protein. All specimens were collected from D. melanogaster pupae at 85 hours after puparium formation. (B and D) Bab1 and (F and H) Bab2 expression was similar between females with normal and reduced abd-A activity. (A and C) Bab1 and (E and G) Bab2 expression was not detected in the A5 and A6 segments of males with normal and reduced abd-A activity.

136 CHAPTER V

CONCLUSIONS AND FUTURE DIRECTIONS

The majority of the research presented in the previous chapters has utilized quasi-in vivo methods in Drosophila (D.) melanogaster to understand how cis-regulatory elements (notably the dimorphic element) operate within a gene regulatory network, and how they can evolve. The work in Chapter 2 was published as a general protocol to quantify CRE activities through the use of a reporter transgene driving Green Fluorescent Protein production. This approach can be used to analyze the effects of CRE alleles and orthologous CREs in the same genetic background and the same genomic site of intergration by the ϕC31 integrase system (Rogers and Williams, 2011). This method seems powerful for measuring the inherent activity of a CRE in isolation within a defined gene regulatory network.

In Chapter 3, the major question addressed was how evolution resulted in similar phenotypic outcomes through genetic changes. Previously, the dimorphic element, a CRE that regulates the female-specific expression of the bab gene, was found to have been modified to make a novel sexually dimorphic

137 pigmentation pattern (Williams et al., 2008). Changes to this CRE expanded the domain of sexually dimorphic regulation into the A5 and A6 abdominal segments to additionally regulate pigmentation. This expansion occurred through changes to the dimorphic element which largely reorganized existing binding sites for the transcription factor Abd-B and DSX (Williams et al., 2008). My research showed that the dimorphic element has been independently modified between populations of D. melanogaster and between species closely related to D. melanogaster. In all cases I studied, the Abd-B and DSX binding sites have been conserved. This recurrent modification of similar genes (bab1 and bab2), a similar gene component (the dimorphic element), and similar CRE modifications

(conservation of the Abd-B and DSX sites) suggest pleiotropy favors specific changes for this evolving trait (Rogers et al., 2013). Whether this similarity in paths of evolution holds more broadly requires similar investigations into traits for which similar outcomes have evolved independently.

The dimorphic element functions within a pigmentation gene regulatory network. It is regulated by Abd-B and DSX, but presumably binding sites for other factors have been responsible for its recurrent evolution, factors that remain unknown. In Chapter 4, I presented a study that sought to identify the other transcription factors regulating D. melanogaster pigmentation. I used a transcriptome-wide RNA interference screen. Here, we systematically reduced the expression of genes in the abdominal midline. These experiments revealed that this network involved at least 28 transcription factor genes (Rogers et al.,

138 2014). Perhaps some of these factors interact with the dimorphic element, and should be the focus of future research. These experiments could associate the population-specific mutations found in Chapter 3 to the gain or loss of previously unknown transcription factor binding sites in the dimorphic element.

These transgene studies on the dimorphic element revealed a major insight into an evolutionary bias within the female pigmentation patterning network (Rogers et al., 2013, 2014). Elsewhere, CREs from tan (Jeong et al.,

2008) and yellow (Jeong et al., 2006) and ebony (Rebeiz et al., 2009a) have been shown to affect abdominal pigmentation and its evolution. However, each of these studies isolates CREs from their endogenous gene context and places them with the inert Green Fluorescent Protein gene (does not make pigments or regulates gene expression). Thus, it remains unknown whether these CRE mutations are solely responsible for these evolved traits. Moreover, the reporter transgene assays will miss any interactions that occur for CREs when they are within their natural gene locus. In my last year as a Ph.D. student, I have been developing an endogenous approach to study CREs. The goal is to alter a CRE within the endogenous region of the genome (in vivo) using the CRISPR

(Clustered Regularly Interspaced Short Palindromic Repeats) system.

139 CRISPR CREam for fruit flies: customizing new genetic tools to study the in vivo function of Drosophila melanogaster cis-regulatory elements

The overarching goal for a new gene regulatory tool is to build a state of the art genetic system to remove cis-regulatory elements from their endogenous gene location and to replace them with mutation-modified forms (so called CRE variants). In addition, it has been found that several gene loci have secondary

CREs, so called “shadow enhancers”, that can compensate for the function of the primary CRE (Frankel et al., 2010; Hong et al., 2008). Determining whether a shadow enhancer exists requires the removal of the primary CRE and identifying whether gene expression remains unperturbed. Success here will allow scientists to investigate the in vivo function for a CRE and the developmental consequences of the introduced CRE variants.

While CREs are of extreme importance, the methods available to study their function are not capable of dealing with their numbers. Two major goals for

CRE research are to understand their in vivo functional roles, and to understand how mutations alter these roles. Achieving these goals remains impeded by the absence of efficient genetic techniques to remove CREs and replace them by modified forms (Figure 5.1). The tool I propose to develop creates such an approach for the fruit fly species D. melanogaster and can be used to study some evolved CRE variants that my previous research identified in Chapter 3, but this tool may also be of value to the broader genetics community. While a method

140 has been around to do this experiment for over a decade (Rong and Golic,

2001), the excessive time and reagent costs associated with this method makes performing such experiments unfeasible to scale up for multiple CREs and CRE variants.

Figure 5.1: Modifying CRE sequences in their natural environment. (A) Transcription of a gene from its promoter (arrow) can occur in multiple settings due to regulation by separate CREs. (B) Exploring the in vivo functions for CREs require techniques that can easily remove a CRE sequence. (C) Understanding how mutations affect the in vivo functions of a CRE allele (CRE 1a) or an orthologous CRE from another species (CRE 1o) requires a technique that can efficiently replace the endogenous CRE with an altered form. Each asterisk represents a mutation event.

The tool and innovative genetic approach is named CRISPR CREam, and

I have been developing this approach to remove the dimorphic element from D. melanogaster and replace it with several evolved forms. This approach will use the bacterial CRISPR/Cas9 system to delete the dimorphic element followed by

ϕC31 att site manipulation to replace the deleted CRE with variant CRE forms. At a minimum, this method would reveal the in vivo effects that a CRE’s absence

141 has on development and the effects that result from mutations to the CRE that

have accrued over evolutionary time. However, this approach is likely to be of

broad utility to others wanting to test the effects of multiple mutations to a

discrete genome region.

Engineering genomes with CRISPR

Invading viruses and plasmids can be neutralized in many bacteria and

most archaea by the adaptive CRISPR immune system (Horvath and Barrangou, C 2010). CRISPR loci contain direct repeat sequences that are separated by

spacer sequences, and these sequences are often located adjacent to

CRISPER-associated (cas) genes that encode proteins with nuclease and

D helicase activities (Haft et al., 2005) and that induce double-stranded breaks at

specific target DNA sequences. A chiRNA specifies the DNA target for Cas9

function by its possession of a complementary RNA sequence. Scientist can

engineer the guide sequence to target this CRISPR/Cas9 system to DNA

sequences of interests in organisms that do not naturally have this immune

system.

To date, it has been used in a frog (Guo et al., 2014), pig (Hai et al.,

2014), monkey (Niu et al., 2014), plant (Jiang et al., 2013), zebrafish (Hruscha et

al., 2013), axolotl ( et al., 2014), human cells (Wang et al., 2014) and in

the fruit fly species D. melanogaster (Gratz et al., 2013a). In D. melanogaster,

142 the Cas9 nuclease was directed by a chiRNA, engineered to recognize a unique

20 base pair sequence for the gene yellow that is adjacent to a three nucleotide protospacer adjacent motif (PAM with a sequence of “NGG”, where N=A,T,C, or

G). The Cas9 enzyme triggered a double-stranded break at this endogenous site that was followed by the error prone process of non-homologous end-joining.

This resulted in the production of novel yellow gene mutant alleles (Gratz et al.,

2013c).

In this same publication, the yellow gene was deleted using two guide sequences that targeted double-stranded breaks to opposite ends of the yellow gene and the gene was replaced by homologous recombination with a novel donor template sequence. In the same fashion, our proposed CRISPR CREam method utilizes the CRISPR technique to remove the dimorphic element.

Through the use of two unique chiRNAs that each targets the Cas9 enzyme to a selected proto-spacer sequence (Figure 5.2A), double-stranded breaks should be induced on the left and right sides of the dimorphic element near PAM sequences (Figure 5.2B). The broken dimorphic element will be repaired by homologous recombination with a novel donor vector that has left and right homology arm sequences identical to the bab locus (Figure 5.2C). This will incorporate the mini-white gene flanked by inverted attP sites, which will restore the normal red eye color of fruit flies when this gene is inserted in a genetic background mutant for the white gene (Figure 5.2D).

143

Figure 5.2: Removing and replacing the dimorphic element. (A) The dimorphic element is situated within the bab locus. Target sequences with adjacent PAMs occur on the left and right side of this CRE (PAM L and PAM R). (B) chiRNAs will direct the Cas9 enzyme to these target sequences where double strand breaks will be induced. (C) The bab locus will be repaired by homologous recombination occurring with a donor vector which contains identical sequence to the bab locus (homology arms) of the left and right flank of the cassette with inverted attP sites flanking the mini-white gene. (D) The end result will be the inclusion of the mini-white cassette in the place of the dimorphic element.

Site-specific integration with ϕC31 integrase

With the addition of the flanking attP site sequences from the phage ϕC31, these steps lay the foundation for the site-specific modification. The attP sequence can undergo intermolecular recombination with an attB site via the activity of the ϕC31integrase enzyme (Thorpe and Smith, 1998). These sites and

144 enzyme have become useful in fruit fly genetics to repeatedly insert DNA sequences into a genomic position where an attP site resides (Bischof et al.,

2007; Groth et al., 2004; Markstein et al., 2008). Impressively, cassettes of DNA sequences can be readily swapped by ϕC31-mediated recombination when they are situated between inverted attP and separately attB sites. This is referred to as recombination-mediated cassette exchange (RMCE), and its use can allow numerous DNA sequences to be individually introduced into the same genome location.

The insertion of the mini-white gene in the position of the dimorphic element will be replaced by variant forms of the dimorphic element. This will be achieved by ϕC31 integrase catalyzing recombination events between inverted attB sites flanking a variant dimorphic element and the attP sites that flank the mini-white gene (Figure 5.3). The protein products of the bab1 and bab2 genes function in regulating a myriad of morphological traits (Couderc et al., 2002).

These include abdomen pigmentation, leg and genitalia development, ovary follicle formation, and abdominal bristle formation and number. My goal is to analyze these traits’ phenotypes for flies lacking a dimorphic element (Figure

5.2B), and for those possessing variant dimorphic element alleles and orthologous dimorphic elements (Figure 5.3B).

145 Figure 5.3: CRE replacement by recombination-mediated cassette exchange. (A) ϕC31 integrase will catalyze crossover events between the inverted attP and attB sites of the modified bab locus and an attB cassette containing a evolutionarily-modified “variant” dimorphic element. (B) The end result will be a bab locus where the variant dimorphic element resides in the position of the original CRE.

Currently, the CRISPR CREam components, the chiRNA vectors and donor vector, have been produced to substitute the mini-white gene with the dimorphic element. Multiple attempts have been made to insert these vectors into D. melanogaster to no avail. By sequencing the D. melanogaster injection strain, a SNP was revealed in one of the guide RNA target sequences. In addition, the mini-white gene failed to develop red eyes in a white eye mutant background when incorporated into the genome using the traditional ϕC31 integrase system. The problematic guide RNA target sequence in one of the chiRNA vectors has since been corrected and the mini-white gene in the donor vector has been replaced by the dsRed gene, which we found capable of producing transgenic flies with red fluorescent eyes. These newly engineered chiRNA vectors and donor vector were sent for injection into D. melanogaster embryos. Presently, the outcome is pending, but I expect that these adjustments

146 will facilitate the removal and substitution of the dimorphic element to enable developmental comparisons.

The completion of the CRISPR CREam protocol could have a huge impact on the CRE community since anyone could directly test CRE mutations for resultant developmental changes as long as they have specific PAM targets flanking their CRE of interest. In 2009, my focus was on just being able to correlate changes in dimorphic element alleles to the regulatory variation of pigmentation patterning. Currently, the accumulation of case studies (Martin and

Orgogozo, 2013) and the building of tools for studying CRE evolution have since given rise to the potential to unlock the “regulatory language” of the non-coding genome. This accomplishment would allow for predictions of functionally-relevant mutations for disease, disease risk, and morphological evolution. Moreover, a better understanding of the regulatory language may make possible the engineering of custom CREs to turn “on” or “off” therapeutic genes, or genes for use in bioengineering applications.

147 BIBLIOGRAPHY

Abouheif, E. (2008). Parallelism as the pattern and process of mesoevolution. Evol. Dev. 10, 3–5.

Abramoff, M.D., and Magelhaes, P.J. (2004). Image Processing with ImageJ. Biophotonics Int. 11, 36–42.

Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al. (2000). The genome sequence of Drosophila melanogaster. Science 287, 2185– 2195.

Adryan, B., and Teichmann, S.A. (2006). FlyTF: a systematic review of site- specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics 22, 1532–1533.

Andolfatto, P. (2005). Adaptive evolution of non-coding DNA in Drosophila. Nature 437, 1149–1152.

Angelini, D.R., Liu, P.Z., Hughes, C.L., and Kaufman, T.C. (2005). Hox gene function and interaction in the milkweed bug Oncopeltus fasciatus (Hemiptera). Dev. Biol. 287, 440–455.

Arnone, M.I., and Davidson, E.H. (1997). The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864.

Arnoult, L., Su, K.F.Y., Manoel, D., Minervino, C., Magrina, J., Gompel, N., and Prud’homme, B. (2013). Emergence and Diversification of Fly Pigmentation Through Evolution of a Gene Regulatory Module. Science (80-. ). 339, 1423– 1426.

Ashburner, M., Golic, K. G. & Hawley, R.S. (2011). Drosophila: A Laboratory Handbook (Cold Spring Harbor Laboratory Press; 2 edition).

Averof, M., and Patel, N.H. (1997). Crustacean appendage evolution associated with changes in Hox gene expression. Nature 388, 682–686.

148 Baker, B.S., and Ridge, K.A. (1980). Sex and the single cell. i. on the action of major loci affecting sex determination in. Genetics 94, 383–423.

Barolo, S., Carver, L.A., and Posakony, J.W. (2000). GFP and beta- galactosidase transformation vectors for promoter/enhancer analysis in Drosophila. Biotechniques 29, 726, 728, 730, 732.

Bickel, R.D., Schackwitz, W.S., Pennacchio, L.A., Nuzhdin, S. V, and Kopp, A. (2009). Contrasting patterns of sequence evolution at the functionally redundant bric à brac paralogs in Drosophila melanogaster. J. Mol. Evol. 69, 194–202.

Bickel, R.D., Kopp, A., and Nuzhdin, S. V (2011). Composite effects of polymorphisms near multiple regulatory elements create a major-effect QTL. PLoS Genet. 7, e1001275.

Bischof, J., Maeda, R.K., Hediger, M., Karch, F., and Basler, K. (2007). An optimized transgenesis system for Drosophila using germ-line-specific phiC31 integrases. Proc. Natl. Acad. Sci. U. S. A. 104, 3312–3317.

Bonn, S., and Furlong, E.E.M. (2008). cis-Regulatory networks during development: a view of Drosophila. Curr. Opin. Genet. Dev. 18, 513–520.

Bouazoune, K., and Brehm, A. (2006). ATP-dependent chromatin remodeling complexes in Drosophila. Chromosom. Res. 14, 433–449.

Brand, A.H., and Perrimon, N. (1993). Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118, 401– 415.

Brumby, A.M., Zraly, C.B., Horsfield, J.A., Secombe, J., Saint, R., Dingwall, A.K., and Richardson, H. (2002). Drosophila cyclin E interacts with components of the Brahma complex. EMBO J. 21, 3377–3389.

Calleja, M., Herranz, H., Estella, C., Casal, J., Lawrence, P., Simpson, P., and Morata, G. (2000). Generation of medial and lateral dorsal body domains by the pannier gene of Drosophila. Development 127, 3971–3980.

Carroll, S.B. (2005). Evolution at two levels: on genes and form. PLoS Biol. 3, e245.

Carroll, S.B. (2008). Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36.

Carroll, S.B., Grenier, J.K., and Weatherbee, S.D. (2004). From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design (Wiley).

149 Chan, Y.F., Marks, M.E., Jones, F.C., Villarreal, G., Shapiro, M.D., Brady, S.D., Southwick, A.M., Absher, D.M., Grimwood, J., Schmutz, J., et al. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305.

Clark, A.G., Eisen, M.B., Smith, D.R., Bergman, C.M., Oliver, B., Markow, T.A., Kaufman, T.C., Kellis, M., Gelbart, W., Iyer, V.N., et al. (2007). Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218.

Collins, R.T., Furukawa, T., Tanese, N., and Treisman, J.E. (1999). Osa associates with the Brahma chromatin remodeling complex and promotes the activation of some target genes. EMBO J. 18, 7029–7040.

Conway Morris, S. (2005). Life’s Solution. Inevitable Humans in a Lonely Universe. (Cambridge).

Cooper, T.F., Ostrowski, E.A., and Travisano, M. (2007). A negative relationship between mutation pleiotropy and fitness effect in yeast. Evolution (N. Y). 61, 1495–1499.

Couderc, J.-L., Godt, D., Zollman, S., Chen, J., Li, M., Tiong, S., Cramton, S.E., Sahut-Barnola, I., and Laski, F. a (2002). The bric à brac locus consists of two paralogous genes encoding BTB/POZ domain proteins and acts as a homeotic and morphogenetic regulator of imaginal development in Drosophila. Development 129, 2419–2433.

Cretekos, C.J., Wang, Y., Green, E.D., Martin, J.F., Rasweiler, J.J., and Behringer, R.R. (2008). Regulatory divergence modifies limb length between mammals. Genes Dev. 22, 141–151.

Crocker, J., Tamori, Y., and Erives, A. (2008). Evolution acts on enhancer organization to fine-tune gradient threshold readouts. PLoS Biol. 6, e263.

Culi, J., Aroca, P., Modolell, J., and Mann, R.S. (2006). jing is required for wing development and to establish the proximo-distal axis of the leg in Drosophila melanogaster. Genetics 173, 255–266.

Davidson, E.H. (2006). The Regulatory Genome: Gene Regulatory Networks In Development And Evolution (Burlington, MA: Elsevier Inc.).

Dawkins, R. (2004). The Ancestor’s Tale (Boston: Houghton Mifflin).

Dietzl, G., Chen, D., Schnorrer, F., Su, K.-C., Barinova, Y., Fellner, M., Gasser, B., Kinsey, K., Oppel, S., Scheiblauer, S., et al. (2007). A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448, 151–156.

150 Emera, D., and Wagner, G.P. (2012). Transformation of a transposon into a derived prolactin promoter with function during human pregnancy. Proc. Natl. Acad. Sci. U. S. A. 1–6.

Erdman, S.E., Chen, H.J., and Burtis, K.C. (1996). Functional and genetic characterization of the oligomerization and DNA binding properties of the Drosophila doublesex proteins. Genetics 144, 1639–1652.

Fasulo, B., Deuring, R., Murawska, M., Gause, M., Dorighi, K.M., Schaaf, C.A., Dorsett, D., Brehm, A., and Tamkun, J.W. (2012). The Drosophila MI-2 chromatin-remodeling factor regulates higher-order chromatin structure and cohesin dynamics in vivo. PLoS Genet. 8, e1002878.

Ferretti, E., Li, B., Zewdu, R., Wells, V., Hebert, J.M., Karner, C., Anderson, M.J., Williams, T., Dixon, J., Dixon, M.J., et al. (2011). A conserved Pbx-Wnt-p63-Irf6 regulatory module controls face morphogenesis by promoting epithelial apoptosis. Dev. Cell 21, 627–641.

Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mellow, C.C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806–811.

Fish, M.P., Groth, A.C., Calos, M.P., and Nusse, R. (2007). Creating transgenic Drosophila by microinjecting the site-specific phiC31 integrase mRNA and a transgene-containing donor plasmid. Nat. Protoc. 2, 2325–2331.

Flowers, G.P., Timberlake, A.T., McLean, K.C., Monaghan, J.R., and Crews, C.M. (2014). Highly efficient targeted mutagenesis in axolotl using Cas9 RNA- guided nuclease. Development 141, 2165–2171.

Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010). Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature 466, 490–493.

Frankel, N., Erezyilmaz, D.F., McGregor, A.P., Wang, S., Payre, F., and Stern, D.L. (2011). Morphological evolution caused by many subtle-effect substitutions in regulatory DNA. Nature 474, 598–603.

Frankel, N., Wang, S., and Stern, D.L. (2012). Conserved regulatory architecture underlies parallel genetic changes and convergent phenotypic evolution. Proc. Natl. Acad. Sci. 109, 1–5.

Ghavi-Helm, Y., Klein, F.A., Pakozdi, T., Ciglar, L., Noordermeer, D., Huber, W., and Furlong, E.E.M. (2014). Enhancer loops appear stable during development and are associated with paused polymerase. Nature 512, 98–100.

151 Giacomoni, D., and Spiegelman, S. (1962). Origin and biologic individuality of the genetic dictionary. Science 138, 1328–1331.

Gibert, J.-M., Marcellini, S., David, J.R., Schlötterer, C., and Simpson, P. (2005). A major bristle QTL from a selected population of Drosophila uncovers the zinc- finger transcription factor poils-au-dos, a repressor of achaete-scute. Dev. Biol. 288, 194–205.

Godt, D., Couderc, J.L., Cramton, S.E., and Laski, F.A. (1993). Pattern formation in the limbs of Drosophila: bric à brac is expressed in both a gradient and a wave-like pattern and is required for specification and proper segmentation of the tarsus. Development 119, 799–812.

Gompel, N., and Carroll, S.B. (2003). Genetic mechanisms and constraints governing the evolution of correlated traits in drosophilid flies. Nature 424, 931– 935.

Gompel, N., and Prud’homme, B. (2009). The causes of repeated genetic evolution. Dev. Biol. 332, 36–47.

Gompel, N., Prud’homme, B., Wittkopp, P.J., Kassner, V. a, and Carroll, S.B. (2005). Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature 433, 481–487.

González-Crespo, S., and Morata, G. (1995). Control of Drosophila adult pattern by extradenticle. Development 121, 2117–2125.

Gratz, S., Wildonger, J., Harrisson, M.M., and O’Connor-Giles, K.M. (2013a). CRISPR/Cas9-mediated genome engineering and the promise of designer flies on demand. Fly (Austin). 7, 249–255.

Gratz, S.J., Wildonger, J., Harrison, M.M., and O’Connor-Giles, K.M. (2013b). CRISPR/Cas9-mediated genome engineering and the promise of designer flies on demand. Fly (Austin). 12, 249–255.

Gratz, S.J., Cummings, A.M., Nguyen, J.N., Hamm, D.C., Donohue, L.K., Harrison, M.M., Wildonger, J., and O’Connor-Giles, K.M. (2013c). Genome engineering of Drosophila with the CRISPR RNA-guided Cas9 nuclease. Genetics 194, 1029–1035.

Groth, A.C., Fish, M., Nusse, R., and Calos, M.P. (2004). Construction of Transgenic Drosophila by Using the Site-Specific Integrase. Genetics 1782, 1775–1782.

152 Guo, X., Zhang, T., Hu, Z., Zhang, Y., Shi, Z., Wang, Q., Cui, Y., Wang, F., Zhao, H., and Chen, Y. (2014). Efficient RNA/Cas9-mediated genome editing in Xenopus tropicalis. Development 141, 707–714.

Haft, D.H., Selengut, J., Mongodin, E.F., and Nelson, K.E. (2005). A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput. Biol. 1, e60.

Hai, T., Teng, F., Guo, R., Li, W., and Zhou, Q. (2014). One-step generation of knockout pigs by zygote injection of CRISPR/Cas system. Cell Res. 24, 372– 375.

Haley, B., Hendrix, D., Trang, V., and Levine, M. (2008). A simplified miRNA- based gene silencing method for Drosophila melanogaster. Dev. Biol. 321, 482– 490.

Haley, B., Foys, B., and Levine, M. (2010). Vectors and parameters that enhance the efficacy of RNAi-mediated gene disruption in transgenic Drosophila. Proc. Natl. Acad. Sci. U. S. A. 107, 11435–11440.

Harms, M.J., and Thornton, J.W. (2010). Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–366.

Heitzler, P., Haenlin, M., Ramain, P., Cauejat, M., and Simpson, P. (1996). A Genetic Analysis of pannier, a Gene Necessary for Viability of Dorsal Tissues and Bristle Positioning in Drosophila. Genetics 143, 1271–1286.

Heussler, H.S., and Suri, M. (2003). Sonic hedgehog. Mol. Pathol. 56, 129–131.

Hinman, V.F., Nguyen, A., and Davidson, E.H. (2007). Caught in the evolutionary act: precise cis-regulatory basis of difference in the organization of gene networks of sea stars and sea urchins. Dev. Biol. 312, 584–595.

Holstege, F.C., Jennings, E.G., Wyrick, J.J., Lee, T.I., Hengartner, C.J., Green, M.R., Golub, T.R., Lander, E.S., and Young, R.A. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728.

Hong, J.-W., Hendrix, D. a, and Levine, M.S. (2008). Shadow enhancers as a source of evolutionary novelty. Science 321, 1314.

Horvath, P., and Barrangou, R. (2010). CRISPR/Cas, the immune system of bacteria and archaea. Science 327, 167–170.

153 Hovemann, B.T., Ryseck, R.P., Walldorf, U., Störtkuhl, K.F., Dietzel, I.D., and Dessen, E. (1998). The Drosophila ebony gene is closely related to microbial peptide synthetases and shows specific cuticle and nervous system expression. Gene 221, 1–9.

Hruscha, A., Krawitz, P., Rechenberg, A., Heinrich, V., Hecht, J., Haass, C., and Schmid, B. (2013). Efficient CRISPR/Cas9 genome editing with low off-target effects in zebrafish. Development 140, 4982–4987.

Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T., Binns, D., Bork, P., Burge, S., et al. (2012). InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40, D306–D312.

Imai, K.S., Stolfi, A., Levine, M., and Satou, Y. (2009). Gene regulatory networks underlying the compartmentalization of the Ciona central nervous system. Development 136, 285–293.

Innis, J.W., Goodman, F.R., Bacchelli, C., Williams, T.M., Mortlock, D.P., Sateesh, P., Scambler, P.J., McKinnon, W., and Guttmacher, A.E. (2002). A HOXA13 allele with a missense mutation in the homeobox and a dinucleotide deletion in the promoter underlies Guttmacher syndrome. Hum. Mutat. 19, 573– 574.

Ishino, Y., Shinagawa, H., Makino, K., Amemura, M., and Nakata, A. (1987). Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J. Bacteriol. 169, 5429–5433.

Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166.

Jansen, R., Embden, J.D.A. van, Gaastra, W., and Schouls, L.M. (2002). Identification of genes that are associated with DNA repeats in prokaryotes. Mol. Microbiol. 43, 1565–1575.

Jeong, S., Rokas, A., and Carroll, S.B. (2006). Regulation of body pigmentation by the Abdominal-B Hox protein and its gain and loss in Drosophila evolution. Cell 125, 1387–1399.

Jeong, S., Rebeiz, M., Andolfatto, P., Werner, T., True, J., and Carroll, S.B. (2008). The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793.

Jiang, W., Zhou, H., Bi, H., Fromm, M., Yang, B., and Weeks, D.P. (2013). Demonstration of CRISPR/Cas9/sgRNA-mediated targeted gene modification in Arabidopsis, tobacco, sorghum and rice. Nucleic Acids Res. 41, e188.

154 Jones, F.C., Grabherr, M.G., Chan, Y.F., Russell, P., Mauceli, E., Johnson, J., Swofford, R., Pirun, M., Zody, M.C., White, S., et al. (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61.

Kankel, M.W., Duncan, D.M., and Duncan, I. (2004). A screen for genes that interact with the Drosophila pair-rule segmentation gene fushi tarazu. Genetics 168, 161–180.

Kehle, J., Beuchle, D., Treuheit, S., Christen, B., Kennison, J.A., Bienz, M., and Muller, J. (1998). dMi-2, a Hunchback-Interacting Protein That Functions in Polycomb Repression. Science (80-. ). 282, 1897–1900.

Keys, D.N., Lewis, D.L., Selegue, J.E., Pearson, B.J., Goodrich, L.V., Johnson, R.L., Gates, J., Scott, M.P., and Carroll, S.B. (1999). Recruitment of a hedgehog Regulatory Circuit in Butterfly Eyespot Evolution. Science (80-. ). 283, 532–534.

King, M., and Wilson, A.C. (1975). Evolution at Two Levels in Humans and Chimpanzees. 188, 107–116.

Kopp, A. (2009). Metamodels and phylogenetic replication: a systematic approach to the evolution of developmental pathways. Evolution 63, 2771–2789.

Kopp, A., and Duncan, I. (2002). Anteroposterior patterning in adult abdominal segments of Drosophila. Dev. Biol. 242, 15–30.

Kopp, A., Duncan, I., Godt, D., and Carroll, S.B. (2000). Genetic control and evolution of sexually dimorphic characters in Drosophila. Nature 408, 553–559.

Kopp, A., Graze, R.M., Xu, S., Carroll, S.B., and Nuzhdin, S. V (2003). Quantitative Trait Loci Responsible for Variation in Sexually Dimorphic Traits in Drosophila melanogaster. Genetics 787, 771–787.

Kulzer, J.R., Stitzel, M.L., Morken, M.A., Huyghe, J.R., Fuchsberger, C., Kuusisto, J., Laakso, M., Boehnke, M., Collins, F.S., and Mohlke, K.L. (2014). A Common Functional Regulatory Variant at a Type 2 Diabetes Locus Upregulates ARAP1 Expression in the Pancreatic Beta Cell. Am. J. Hum. Genet. 94, 186– 197.

Kunert, N., Wagner, E., Murawska, M., Klinker, H., Kremmer, E., and Brehm, A. (2009). dMec: a novel Mi-2 chromatin remodelling complex involved in transcriptional repression. EMBO J. 28, 533–544.

Kvon, E.Z., Kazmar, T., Stampfel, G., Yáñez-Cuna, J.O., Pagani, M., Schernhuber, K., Dickson, B.J., and Stark, A. (2014). Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature 517, 91– 95.

155 Lettice, L.A. (2003). A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.

Lettice, L.A., Williamson, I., Wiltshire, J.H., Peluso, S., Devenney, P.S., Hill, A.E., Essafi, A., Hagman, J., Mort, R., Grimes, G., et al. (2012). Opposing Functions of the ETS Factor Family Define Shh Spatial Expression in Limb Buds and Underlie Polydactyly. Dev. Cell 22, 459–467.

Levine, M., and Davidson, E.H. (2005). Gene regulatory networks for development. Proc. Natl. Acad. Sci. U. S. A. 102, 4936–4942.

Levine, M.T., and Begun, D.J. (2008). Evidence of spatially varying selection acting on four chromatin-remodeling loci in Drosophila melanogaster. Genetics 179, 475–485.

Lewis, E.B. (1978). A gene complex controlling segmentation in Drosophila. Nature 276, 565–570.

Li-Kroeger, D., Witt, L., Grimes, H.L., Cook, T.A., and Gebelein, B. (2008). Hox and senseless antagonism functions as a molecular switch to regulate EGF secretion in the Drosophila PNS. Dev. Cell 15, 298–308.

Ludwig, M.Z., Patel, N.H., and Kreitman, M. (1998). Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125, 949–958.

Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. (2000). Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567.

Ludwig, M.Z., Palsson, A., Alekseeva, E., Bergman, C.M., Nathan, J., and Kreitman, M. (2005). Functional evolution of a cis-regulatory module. PLoS Biol. 3, e93.

Marcellini, S., and Simpson, P. (2006). Two or four bristles: functional evolution of an enhancer of scute in . PLoS Biol. 4, e386.

Markstein, M., Pitsouli, C., Villalta, C., Celniker, S.E., and Perrimon, N. (2008). Exploiting position effects and the gypsy retrovirus insulator to engineer precisely expressed transgenes. Nat. Genet. 40, 476–483.

Martin, A., and Orgogozo, V. (2013). The Loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution 67, 1235–1250.

156 Martin, A., Papa, R., Nadeau, N.J., Hill, R.I., Counterman, B.A., Halder, G., Jiggins, C.D., Kronforst, M.R., Long, A.D., McMillan, W.O., et al. (2012). Diversification of complex butterfly wing patterns by repeated regulatory evolution of a Wnt ligand. Proc. Natl. Acad. Sci. U. S. A. 109, 12632–12637.

Marygold, S.J., Leyland, P.C., Seal, R.L., Goodman, J.L., Thurmond, J., Strelets, V.B., and Wilson, R.J. (2013). FlyBase: improvements to the bibliography. Nucleic Acids Res. 41, D751–D757.

Matthaei, J.H., JONES, O.W., MARTIN, R.G., and NIRENBERG, M.W. (1962). Origin and biologic individuality of the genetic dictionary. Proc. Natl. Acad. Sci. U. S. A. 48, 666–677.

McGregor, A.P., Orgogozo, V., Delon, I., Zanet, J., Srinivasan, D.G., Payre, F., and Stern, D.L. (2007). Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature 448, 587–590.

Mohrmann, L., Langenberg, K., Krijgsveld, J., Kal, A.J., Heck, A.J.R., and Verrijzer, C.P. (2004). Differential Targeting of Two Distinct SWI / SNF-Related Drosophila Chromatin-Remodeling Complexes. Mol. Cell Biol. 24, 3077–3088.

Mojica, F.J.M., Diez-Villasenor, C., Soria, E., and Juez, G. (2000). Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol. Microbiol. 36, 244–246.

Morgan, T.H. (1910). SEX LIMITED INHERITANCE IN DROSOPHILA. Science 32, 120–122.

Mummery-Widmer, J.L., Yamazaki, M., Stoeger, T., Novatchkova, M., Bhalerao, S., Chen, D., Dietzl, G., Dickson, B.J., and Knoblich, J.A. (2009). Genome-wide analysis of Notch signalling in Drosophila by transgenic RNAi. Nature 458, 987– 992.

Mundy, N.I. (2005). A window on the genetics of evolution: MC1R and plumage colouration in birds. Proc. Biol. Sci. 272, 1633–1640.

Murawska, M., Kunert, N., van Vugt, J., Längst, G., Kremmer, E., Logie, C., and Brehm, A. (2008). dCHD3, a novel ATP-dependent chromatin remodeler associated with sites of active transcription. Mol. Cell. Biol. 28, 2745–2757.

Murawska, M., Hassler, M., Renkawitz-Pohl, R., Ladurner, A., and Brehm, A. (2011). Stress-induced PARP activation mediates recruitment of Drosophila Mi-2 to promote heat shock gene expression. PLoS Genet. 7, e1002206.

157 Murawsky, C.M., Brehm, A., Badenhorst, P., Lowe, N., Becker, P.B., and Travers, A.A. (2001). Tramtrack69 interacts with the dMi-2 subunit of the Drosophila NuRD chromatin remodelling complex. EMBO Rep. 2, 1089–1094.

Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K. V, Li, X., Li, H., Kuperwasser, N., Ruda, V.M., et al. (2010). From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714– 719.

Nachman, M.W., Hoekstra, H.E., and D’Agostino, S.L. (2003). The genetic basis of adaptive melanism in pocket mice. Proc. Natl. Acad. Sci. U. S. A. 100, 5268– 5273.

Ni, J., Markstein, M., Binari, R., Pfeiffer, B., Liu, L., Villalta, C., Booker, M., Perkins, L., and Perrimon, N. (2008). Vector and parameters for targeted transgenic RNA interference in Drosophila melanogaster. Nat. Methods 5, 49–51.

Ni, J.-Q., Liu, L.-P., Binari, R., Hardy, R., Shim, H.-S., Cavallaro, A., Booker, M., Pfeiffer, B.D., Markstein, M., Wang, H., et al. (2009). A Drosophila resource of transgenic RNAi lines for neurogenetics. Genetics 182, 1089–1100.

Ni, J.-Q., Zhou, R., Czech, B., Liu, L.-P., Holderbaum, L., Yang-Zhou, D., Shim, H.-S., Tao, R., Handler, D., Karpowicz, P., et al. (2011). A genome-scale shRNA resource for transgenic RNAi in Drosophila. Nat. Methods 8, 405–407.

Niu, Y., Shen, B., Cui, Y., Chen, Y., Wang, J., Wang, L., Kang, Y., Zhao, X., Si, W., Li, W., et al. (2014). Generation of Gene-Modified Cynomolgus Monkey via Cas9/RNA-Mediated Gene Targeting in One-Cell Embryos. Cell 156, 836–843.

Nüsslein-Volhard, C., and Wieschaus, E. (1980). Mutations affecting segment number and polarity in Drosophila. Nature 287, 795–801.

Ochoa-Espinosa, A., Yucel, G., Kaplan, L., Pare, A., Pura, N., Oberstein, A., Papatsenko, D., and Small, S. (2005). The role of binding site cluster strength in Bicoid-dependent patterning in Drosophila. Proc. Natl. Acad. Sci. U. S. A. 102, 4960–4965.

Oliver, J.C., Tong, X.-L., Gall, L.F., Piel, W.H., and Monteiro, A. (2012). A single origin for nymphalid butterfly eyespots followed by widespread loss of associated gene expression. PLoS Genet. 8, e1002893.

Oliveri, P., Tu, Q., and Davidson, E.H. (2008). Global regulatory logic for specification of an embryonic cell lineage. Proc. Natl. Acad. Sci. U. S. A. 105, 5955–5962.

158 Papoulas, O., Daubresse, G., Armstrong, J.A., Jin, J., Scott, M.P., and Tamkun, J.W. (2001). The HMG-domain protein BAP111 is important for the function of the BRM chromatin-remodeling complex in vivo. Proc. Natl. Acad. Sci. U. S. A. 98, 5728–5733.

Parkash, R., Sharma, V., and Kalra, B. (2008). Climatic adaptations of body melanisation in Drosophila melanogaster from Western Himalayas. Fly (Austin). 2, 111–117.

Parkash, R., Sharma, V., and Kalra, B. (2009). Impact of body melanisation on desiccation resistance in montane populations of D. melanogaster: Analysis of seasonal variation. J. Insect Physiol. 55, 898–908.

Pennisi, E. (2012). ENCODE Project Writes Eulogy For Junk DNA. Science. 337, 1159–1160.

Pertea, M., and Salzberg, S.L. (2010). Between a chicken and a grape: estimating the number of human genes. Genome Biol. 11, 206.

Peter, I.S., and Davidson, E.H. (2011). A gene regulatory network controlling the embryonic specification of endoderm. Nature 474, 635–639.

Peterson, M.D., Rogers, B.T., Popadić, a, and Kaufman, T.C. (1999). The embryonic expression pattern of labial, posterior homeotic complex genes and the teashirt homologue in an apterygote insect. Dev. Genes Evol. 209, 77–90.

Pfreundt, U., James, D.P., Tweedie, S., Wilson, D., Teichmann, S.A., and Adryan, B. (2010). FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database. Nucleic Acids Res. 38, D443–D447.

Pool, J.E., and Aquadro, C.F. (2007). The genetic basis of adaptive pigmentation variation in Drosophila melanogaster. Mol. Ecol. 16, 2844–2851.

Prabhakar, S., Visel, A., Akiyama, J. a, Shoukry, M., Lewis, K.D., Holt, A., Plajzer-Frick, I., Morrison, H., Fitzpatrick, D.R., Afzal, V., et al. (2008). Human- specific gain of function in a developmental enhancer. Science 321, 1346–1350.

Protas, M.E., Hersey, C., Kochanek, D., Zhou, Y., Wilkens, H., Jeffery, W.R., Zon, L.I., Borowsky, R., and Tabin, C.J. (2006). Genetic analysis of cavefish reveals molecular convergence in the evolution of albinism. Nat. Genet. 38, 107– 111.

Prud’homme, B., Gompel, N., Rokas, A., Kassner, V.A., Williams, T.M., Yeh, S.- D., True, J.R., and Carroll, S.B. (2006). Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene. Nature 440, 1050–1053.

159 Rauskolb, C., Smith, K.M., Peifer, M., and Wieschaus, E. (1995). extradenticle determines segmental identities throughout Drosophila development. Development 121, 3663–3673.

Rebeiz, M., and Williams, T.M. (2011). Experimental Approaches to Evaluate the Contributions of Candidate Cis- regulatory Mutations to Phenotypic Evolution. In Methods in Molecular Biology, V. Orgogozo, and M. V. Rockman, eds. (Totowa, NJ: Humana Press), pp. 351–375.

Rebeiz, M., Pool, J.E., Kassner, V.A., Aquadro, C.F., and Carroll, S.B. (2009a). Stepwise modification of a modular enhancer underlies adaptation in a Drosophila population. Science 326, 1663–1667.

Rebeiz, M., Ramos-Womack, M., Jeong, S., Andolfatto, P., Werner, T., True, J., Stern, D.L., and Carroll, S.B. (2009b). Evolution of the tan locus contributed to pigment loss in Drosophila santomea: a response to Matute et al. Cell 139, 1189–1196.

Rebeiz, M., Jikomes, N., Kassner, V. a, and Carroll, S.B. (2011). Evolutionary origin of a novel gene expression pattern through co-option of the latent activities of existing regulatory sequences. Proc. Natl. Acad. Sci. U. S. A. 108, 10036– 10043.

Reed, R.D., Papa, R., Martin, A., Hines, H.M., Counterman, B.A., Pardo-Diaz, C., Jiggins, C.D., Chamberlain, N.L., Kronforst, M.R., Chen, R., et al. (2011). Optix Drives the Repeated of Butterfly Wing Pattern Mimicry. Science 333, 1137–1141.

Reiter, L.T., Potocki, L., Chien, S., Gribskov, M., and Bier, E. (2001). A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome Res. 11, 1114–1125.

Richards, S., Liu, Y., Bettencourt, B.R., Hradecky, P., Letovsky, S., Nielsen, R., Thornton, K., Hubisz, M.J., Chen, R., Meisel, R.P., et al. (2005). Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis- element evolution. Genome Res. 15, 1–18.

Richardt, A., Kemme, T., Wagner, S., Schwarzer, D., Marahiel, M.A., and Hovemann, B.T. (2003). Ebony, a novel nonribosomal peptide synthetase for beta-alanine conjugation with biogenic amines in Drosophila. J. Biol. Chem. 278, 41160–41166.

Robertson, A., and Louw, J.H. (1966). Polymorphism of genes affecting amount and distribution of black pigment in the abdominal cuticle of D. melanogaster. Drosoph. Inf. Serv. 41, 154–155.

160 Robertson, A., Briscoe, D.A., and Louw, J.H. (1977). Variation in abdomen pigmentation in Drosophila melanogaster females. Genetica 47, 73–76.

Rogers, W.A., and Williams, T.M. (2011). Quantitative Comparison of cis- Regulatory Element (CRE) Activities in Transgenic Drosophila melanogaster. J. Vis. Exp. 2–7.

Rogers, W.A., Williams, T.M., Salomone, J.R., Tacy, D.J., Camino, E.M., Davis, K.A., and Rebeiz, M. (2013). Recurrent Modification of a Conserved Cis- Regulatory Element Underlies Fruit Fly Pigmentation Diversity. PLoS Genet. 9, e1003740.

Rogers, W.A., Grover, S., Stringer, S.J., Parks, J., Rebeiz, M., and Williams, T.M. (2014). A survey of the trans-regulatory landscape for Drosophila melanogaster abdominal pigmentation. Dev. Biol. 385, 417–432.

Rong, Y.S., and Golic, K.G. (2001). A targeted gene knockout in Drosophila. Genetics 157, 1307–1312.

Russo, C. a, Takezaki, N., and Nei, M. (1995). Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12, 391–404.

Ryoo, H.D., Marty, T., Casares, F., Affolter, M., and Mann, R.S. (1999). Regulation of Hox target genes by a DNA bound Homothorax/Hox/Extradenticle complex. Development 126, 5137–5148.

Sagai, T., Amano, T., Tamura, M., Mizushina, Y., Sumiyama, K., and Shiroishi, T. (2009). A cluster of three long-range enhancers directs regional Shh expression in the epithelial linings. Development 136, 1665–1674.

Sanchez-Herrero, E., Vernos, I., Marco, R., and Morata, G. (1985). Genetic organization of Drosophila bithorax complex. Nature 313, 108–113.

Sandmann, T., Girardot, C., Brehme, M., Tongprasit, W., Stolc, V., and Furlong, E.E.M. (2007). A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449.

Sethupathy, P., and Collins, F.S. (2008). MicroRNA target site polymorphisms and human disease. Trends Genet. 24, 489–497.

Shapiro, M.D., Bell, M. a, and Kingsley, D.M. (2006). Parallel genetic origins of pelvic reduction in vertebrates. Proc. Natl. Acad. Sci. U. S. A. 103, 13753–13758.

Shen, Y., Yue, F., McCleary, D.F., Ye, Z., Edsall, L., Kuan, S., Wagner, U., Dixon, J., Lee, L., Lobanenkov, V. V., et al. (2012). A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120.

161 Shim, S., Kwan, K.Y., Li, M., Lefebvre, V., and Šestan, N. (2012). Cis-regulatory control of corticospinal system development and evolution. Nature 486, 74–79.

Shimojima, T., Okada, M., Nakayama, T., Ueda, H., Okawa, K., Iwamatsu, A., Handa, H., and Hirose, S. (2003). Drosophila FACT contributes to Hox gene expression through physical and functional interactions with GAGA factor. Genes Dev. 17, 1605–1616.

Shippy, T.D., Brown, S.J., and Denell, R.E. (1998). Molecular characterization of the Tribolium abdominal-A ortholog and implications for the products of the Drosophila gene. Dev. Genes Evol. 207, 446–452.

Shirangi, T.R., Dufour, H.D., Williams, T.M., and Carroll, S.B. (2009). Rapid evolution of sex pheromone-producing enzyme expression in Drosophila. PLoS Biol. 7, e1000168.

Small, S., Blair, a, and Levine, M. (1992). Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057.

Spradling, A.C., and Rubin, G.M. (1982). Transposition of cloned P elements into Drosophila germ line chromosomes. Science 218, 341–347.

Stanojevic, D., Small, S., and Levine, M. (19991). Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo. Science (80-. ). 254, 1385–1387.

Stephan, W., and Li, H. (2007). The recent demographic and adaptive history of Drosophila melanogaster. Heredity (Edinb). 98, 65–68.

Stern, D.L. (2000). Evolutionary developmental biology and the problem of variation. Evolution (N. Y). 54, 1079–1091.

Stern, D.L. (2011). Evolution, Development, & the Predictable Genome (Greenwood, Village, Colorado: Roberts & Company Publishers).

Stern, D.L., and Orgogozo, V. (2008). The loci of evolution: how predictable is genetic evolution? Evolution 62, 2155–2177.

Stern, D.L., and Orgogozo, V. (2009). Is genetic evolution predictable? Science 323, 746–751.

Stewart, A.J., Hannenhalli, S., and Plotkin, J.B. (2012). Why transcription factor binding sites are ten nucleotides long. Genetics 192, 973–985.

162 Sucena, E., Delon, I., Jones, I., Payre, F., and Stern, D.L. (2003). Regulatory evolution of shavenbaby/ovo underlies multiple cases of morphological parallelism. Nature 424, 935–938.

Swanson, C.I., Schwimmer, D.B., and Barolo, S. (2011). Rapid evolutionary rewiring of a structurally constrained eye enhancer. Curr. Biol. 21, 1186–1196.

Terriente-Félix, A., and de Celis, J.F. (2009). Osa, a subunit of the BAP chromatin-remodelling complex, participates in the regulation of gene expression in response to EGFR signalling in the Drosophila wing. Dev. Biol. 329, 350–361.

The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74.

Thornton, J.W. (2004). Resurrecting ancient genes: experimental analysis of extinct molecules. Nat. Rev. Genet. 5, 366–375.

Thorpe, H.M., and Smith, M.C. (1998). In vitro site-specific integration of bacteriophage DNA catalyzed by a recombinase of the resolvase/invertase family. Proc. Natl. Acad. Sci. U. S. A. 95, 5505–5510.

Tishkoff, S.A., Reed, F.A., Ranciaro, A., Voight, B.F., Courtney, C., Silverman, J.S., Powell, K., Mortensen, H.M., Hirbo, J.B., Ibrahim, M., et al. (2007). Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39, 31–40.

Tournamille, C., Colin, Y., Cartron, J.P., and Le Van Kim, C. (1995). Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy–negative individuals. Nat. Genet. 10, 224–228.

Treisman, J.E., Luk, a., Rubin, G.M., and Heberlein, U. (1997). eyelid antagonizes wingless signaling during Drosophila development and has homology to the Bright family of DNA-binding proteins. Genes Dev. 11, 1949– 1962.

True, J.R., Yeh, S.-D., Hovemann, B.T., Kemme, T., Meinertzhagen, I.A., Edwards, T.N., Liou, S.-R., Han, Q., and Li, J. (2005). Drosophila tan encodes a novel hydrolase required in pigmentation and vision. PLoS Genet. 1, e63.

Vázquez, M., Moore, L., and Kennison, J.A. (1999). The trithorax group gene osa encodes an ARID-domain protein that genetically interacts with the brahma chromatin-remodeling factor to regulate transcription. Development 126, 733– 742.

163 Venken, K.J.T., He, Y., Hoskins, R. a, and Bellen, H.J. (2006). P[acman]: a BAC transgenic platform for targeted insertion of large DNA fragments in D. melanogaster. Science 314, 1747–1751.

Visel, A., Rubin, E.M., and Pennacchio, L.A. (2009). Genomic views of distant- acting enhancers. Nature 461, 199–205.

Walter, M.F., Black, B.C., Afshar, G., Kermabon, A.Y., Wright, T.R., and Biessmann, H. (1991). Temporal and spatial expression of the yellow gene in correlation with cuticle formation and dopa decarboxylase activity in Drosophila development. Dev. Biol. 147, 32–45.

Wang, L., and Tsai, C.-C. (2008). Atrophin proteins: an overview of a new class of nuclear receptor corepressors. Nucl. Recept. Signal. 6, e009.

Wang, W., and Yoder, J.H. (2012). Hox-mediated regulation of doublesex sculpts sex-specific abdomen morphology in Drosophila. Dev. Dyn. 241, 1076–1090.

Wang, X., and Chamberlin, H.M. (2002). Multiple regulatory changes contribute to the evolution of the Caenorhabditis lin-48 ovo gene. Genes Dev. 16, 2345– 2349.

Wang, L., Rajan, H., Pitman, J.L., McKeown, M., and Tsai, C.-C. (2006). Histone deacetylase-associating Atrophin proteins are nuclear receptor corepressors. Genes Dev. 20, 525–530.

Wang, L., Charroux, B., Kerridge, S., and Tsai, C.-C. (2008). Atrophin recruits HDAC1/2 and G9a to modify histone H3K9 and to determine cell fates. EMBO Rep. 9, 555–562.

Wang, T., Wei, J.J., Sabatini, D.M., and Lander, E.S. (2014). Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84.

Weatherbee, S.D., Nijhout, H.F., Grunert, L.W., Halder, G., Galant, R., Selegue, J., and Carroll, S.B. (1999). Ultrabithorax function in butterfly wings and the evolution of insect wing patterns. Curr. Biol. 9, 109–115.

Wehn, A., and Campbell, G. (2006). Genetic interactions among scribbler, Atrophin and groucho in Drosophila uncover links in transcriptional repression. Genetics 173, 849–861.

Werner, T., Koshikawa, S., Williams, T.M., and Carroll, S.B. (2010). Generation of a novel wing colour pattern by the Wingless morphogen. Nature 464, 1143– 1148.

164 Williams, T.M., Selegue, J.E., Werner, T., Gompel, N., Kopp, A., and Carroll, S.B. (2008). The regulation and evolution of a genetic switch controlling sexually dimorphic traits in Drosophila. Cell 134, 610–623.

Winkler, D.D., and Luger, K. (2011). The histone chaperone FACT: structural insights and mechanisms for nucleosome reorganization. J. Biol. Chem. 286, 18369–18374.

Wittkopp, P.J., and Kalay, G. (2012). Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69.

Wittkopp, P.J., True, J.R., and Carroll, S.B. (2002). Reciprocal functions of the Drosophila yellow and ebony proteins in the development and evolution of pigment patterns. Development 129, 1849–1858.

Wittkopp, P.J., Carroll, S.B., and Kopp, A. (2003). Evolution in black and white: genetic control of pigment patterns in Drosophila. Trends Genet. 19, 495–504.

Wray, G. a (2007). The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216.

Wray, G.A. (2000). Editorial : Peering ahead (cautiously). Evol. Dev. 2, 125–126.

Yu, M., Wu, P., Widelitz, R.B., and Chuong, C.-M. (2002). The morphogenesis of feathers. Nature 420, 308–312.

Zeitlinger, J., Zinzen, R.P., Stark, A., Kellis, M., Zhang, H., Young, R.A., and Levine, M. (2007). Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390.

Zhang, J. (2006). Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nat. Genet. 38, 819–823.

Zhang, S., Xu, L., and Lee, J. (2002). Drosophila Atrophin Homolog Functions as a Transcriptional Corepressor in Multiple Developmental Processes. Cell 108, 45–56.

Zhen, Y., Aardema, M.L., Medina, E.M., Schumer, M., and Andolfatto, P. (2012). Parallel molecular evolution in an herbivore community. Science 337, 1634– 1637.

Zinzen, R.P., Cande, J., Ronshaugen, M., Papatsenko, D., and Levine, M. (2006). Evolution of the ventral midline in insect embryos. Dev. Cell 11, 895–902.

165 APPENDIX

A. AscI 1 "Concestor" ggcgcgccCA CATAAAAATC AGCAACAAAG TTGCTCTGGG CCCATAAAAG "Concestor 2" ggcgcgccCA CATAAAAATC AGCAACAAAG TTGCCCTGGG CCCATAAAAG mel.00.1 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.01.58 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.02.7 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.04.1 (Light P1) GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.05.13 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.05.16 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.07.20 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.14.26 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAG mel.17.66 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.18Ug.7 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCCCTGGC CCCATAAAAG mel.19.113 (Light P2) GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.23.74 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.24.81 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.26.32 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAG mel.34.9 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.39.37 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.40Ug.12 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAG mel.45.89 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.CanS.41 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.53.105 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.54Ug.1 (Dark P1) GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAG mel.59.3 (Dark P2) GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.yw.50 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.WI83.25 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.51.100 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.21.2 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG mel.29.3 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.64.3 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.67.1 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCTCTGGC CCCATAAAAG mel.55.2 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCTCTGGC CCCATAAAAG

Closely-Related Outgroup Species mau.5 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAG sec.38 GGCGCGCCCA CATAAAAATC AGCAACAAAC TTGCCCTGGC CCCATAAAAG sim.33 GGCGCGCCCA CATAAAAATC AGCAACAAA- TTGCCCTGGC CCCATAAAAG

Outgroup Species yak.25 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAA luc.41 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAA eug.20 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCT-GC CCCATAAAAA fuy.9 GGCGCGCCCA CATAAAAATC AGCAACAAAG TTGCCCTGGC CCCATAAAAA

166 2 "Concestor" 50 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------"Concestor 2" ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.00.1 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.01.58 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.02.7 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.04.1 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.05.13 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.05.16 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.07.20 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.14.26 ATTGCAAACA AAAAC--AGA AC------AAAA------mel.17.66 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.18Ug.7 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.19.113 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.23.74 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.24.81 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGGATGGAAT AAAA------mel.26.32 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.34.9 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.39.37 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.40Ug.12 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.45.89 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.CanS.41 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.53.105 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.54Ug.1 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.59.3 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.yw.50 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.WI83.25 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.51.100 ATTGCAAACA AAAAC—-AGA ACAACAGAAT GGCATGGAAT AAAA------mel.21.2 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.29.3 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.64.3 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.67.1 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------mel.55.2 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------

Closely-Related Outgroup Species mau.5 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------sec.38 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------sim.33 ATTGCAAACA AAAAC--AGA ACAACAGAAT GGCATGGAAT AAAA------

Outgroup Species yak.25 ATTGCAAACA AAAAC--AGA ACAAGCGGAA TGGCATGGAA TAAAA----- luc.41 ATTGCAAACA AAAAACGAGA ACAACAGAAT GGCATGGAAT AAAA------eug.20 ATGGCAAACA AAA---GAGA ACAACAGAAT GGCATGGAAT AAAA------fuy.9 ATTGCAAACA AAAAG--AGA ACAAC--AAT GGCATGGAAT AAAA------

"Concestor" 92 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A "Concestor 2" ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.00.1 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.01.58 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.02.7 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.04.1 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.05.13 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.05.16 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.07.20 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.14.26 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.17.66 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A

167 mel.18Ug.7 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.19.113 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.23.74 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.24.81 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.26.32 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.34.9 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.39.37 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.40Ug.12 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.45.89 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.CanS.41 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.53.105 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.54Ug.1 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.59.3 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.yw.50 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.WI83.25 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.51.100 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.21.2 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.29.3 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.64.3 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.67.1 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A mel.55.2 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A

Closely-Related Outgroup Species mau.5 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A sec.38 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A sim.33 ------TTTATATG AATAACAAAA AGCAGCTAAA GCA------A

Outgroup Species yak.25 ------TTTATATG AATAACAAAA AGCAaaagca gctacagcaA

Distantly-related Outgroup Species luc.41 ------TTTATATG AATAACAAAA AGCAGCTAAA AGAAGC---A eug.20 ------TTTATATG AATAACAAAA AGCAGCTAAA AGAAGC---A fuy.9 ------TTTATATG AATAACAAAA AGCAGCTAAA AGAAAC---A

34 5 "Concestor" 124 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG "Concestor2" GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.00.1 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.01.58 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.02.7 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.04.1 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.05.13 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.05.16 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.07.20 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.14.26 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.17.66 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.18Ug.7 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.19.113 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.23.74 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.24.81 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.26.32 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.34.9 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.39.37 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG

168 mel.40Ug.12 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.45.89 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.CanS.41 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.53.105 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.54Ug.1 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.59.3 GCAGCAACAA CATCAG------TTTACTGC CCCGTCTCAG CGGTACACTG mel.yw.50 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.WI83.25 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.51.100 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.21.2 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.29.3 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.64.3 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.67.1 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG mel.55.2 GCAGCAACAA CAATAG------TTTACTGC CCCGGCTCAG CGGTACACTG

Closely-Related Outgroup Species mau.5 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG AGGTACACTG sec.38 GCAACAACAA CAACAACAAC AGTTTACTGC CCCGGCTCAG AGATACACTG sim.33 GCAGCAACAA CAACAG------TTTACTGC CCCGGCTCAG AGGTACACTG

Outgroup Species yak.25 GCGGCAACAA CAACAG------TTTACTGC CCCGGCTTAG TGGTACACTG

Distantly-related Outgroup Species luc.41 ACAACAACA------G------TTTATTGC CCTGGCTCAG CTGAACACTG eug.20 GCAACAACAA CAACAG------TTTACTGC CCTGCCTTGG CCGTGCACTG fuy.9 GCAGCAACAA CAACAG------TTTACTGC TCTGGCTCAG CAGTACACTG

6 7 8 "Concestor" 168 TGCAAAACGA TTGTACTCCT CCTCATAATA ATAAGAGTA ------"Concestor2" TGCAAAACGA TTGTACTCCT CCTCATAATA ATAAGAGTA ------mel.00.1 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.01.58 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.02.7 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.04.1 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.04.7 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.05.13 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.05.16 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.07.20 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.14.26 TGCAAAACGA TTGTACTCCT CCTCATAATA ATATGAGTA------mel.17.66 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.18Ug.7 TGCAAAACGA TTGTACTCCT CCTCATAATA ATATGAGTA------mel.19.113 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.23.74 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.24.81 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.26.32 TGCAAAACGA TTGTACTCCT CCTCATAATA ATAAGTATA ------mel.34.9 TGCAAAACG- TTGTACTCCT CCTCATAATA ATAAGTATA------mel.39.37 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.40Ug.12 TGCAAAACGA TTGTACTCCT CCTCATAATA ATATGAGTA------mel.45.89 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.CanS.41 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.53.105 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.54Ug.1 TGCAAAACGA TTGTACTCCT CCTCATAATA ATAAG TATA------mel.59.3 TGCAAAATGA TTGTACTCCT CCTCATAATA ATATGAGTA------mel.yw.50 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------

169 mel.WI83.25 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.51.100 TGCAAAACGA TTGTACTCCT CCTCATAATA ATATGAGTA------mel.21.2 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.29.3 TGCAAAACG- TTGTACTCCT CCTCATAATA ATAAGTATA------mel.64.3 TGCAAAACGA TTGTACTCCT CCTCATAATA ATAAGAGTA------mel.67.1 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------mel.55.2 TGCAAAACG- TTGTACTCCT CCTCATAATA ATATGAGTA------

Closely-Related Outgroup Species mau.5 AGCAAAATGA ATGTACTCTT TTTCATACCA ATAACAGGAA G------sec.38 TGCAAAATGA GTGTGTTCCT CCTCATACCA ATAACAGAAA GTTCT----- sim.33 AGCAAAATGA TTGTGCTCCT CCTCATACCA ATAA------CT-----

Outgroup Species yak.25 TACGAAATA- Aaataactcc ctctcattaa ATAAAAGTAA actaaatcac

Distantly-related Outgroup Species luc.41 AGGAAAATAg ttatgggatt tttga------eug.20 TGGGAAAata tcacttttat ggtctcctta atatttgcca tctctttagc fuy.9 TGGAAAATA- TTGataccat tcttttttat atccataata aaggccaata

9 10 11 12 "Concestor" 207 ------TATAA AGTATATAAT ATACTATATA TCACCATTGA "Concestor2" ------TATAA AGTATATAAT ATACTATATA TCTCCATTGA mel.00.1 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.01.58 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.02.7 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.04.1 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.04.7 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.05.13 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.05.16 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.07.20 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.14.26 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.17.66 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.18Ug.7 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.19.113 ------TATAG AGTATATAAT ATACTATATA TCCCCATTGA mel.23.74 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.24.81 ------TATAG AGTATATAAT ATACTATATA TCTCCATTTA mel.26.32 ------TATAC A--TATATAT ATAATATATA TCACCATTGA mel.34.9 ------TATAT ---ATAT-AT ATACTATATA TCACCATTGA mel.39.37 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.40Ug.12 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.45.89 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.CanS.41 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.53.105 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.54Ug.1 ------TATAT ---ATAT-AT ATACTATATA TCACCATTGA mel.59.3 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.yw.50 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.WI83.25 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.51.100 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.21.2 ------TATA- --TATA---T ATACTATATA TCTCCATTGA mel.29.3 ------TATAG AGTATATAAT ATACTATATA TCACCATTGA mel.64.3 ------TATAA AGTATATAAT ATACTATATA TCTCCATTGA mel.67.1 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA mel.55.2 ------TATAG AGTATATAAT ATACTATATA TCTCCATTGA

170

Closely-Related Outgroup Species mau.5 TAGTTAATAT GAAAGTATAA AGTAAATATC ATACTATATA TCTCTATAGA sec.38 ----TAATAT AAAAGTATAA AGTAAATAAC ATACTATATA TCTCTATAGA sim.33 ----TAATAT AAAAGTATAA AGTAAATAAC CTACCATATA TCTCTATAGA

Outgroup Species yak.25 acgcaagctt tgtaaataat cggtactaca tcctagta------

Distantly-related Outgroup Species luc.41 ------AAATGTAACA ATTAAATAAT ATAATATTTA gggtcatcat eug.20 cgaggct------TG fuy.9 gagtattttt actgcatgat agtatttggg agctcataat ttgtaaacTG

13 "Concestor" 242 TAATTTCGAT CATTTTCACC T------"Concestor2" TAATTTCGAT CATTTTCACC T------mel.00.1 TAATTGCGAT CATTTTCACC T------mel.01.58 TAATTTCGAT CATTTTCACC T------mel.02.7 TAATTTCGAT CATTTTCACC T------mel.04.1 TAATTTCGAT CATTTTCACC T------mel.04.7 TAATTTCGAT CATTTTCACC T------mel.05.13 TAATTTCGAT CATTTTCACC T------mel.05.16 TAATTTCGAT CATTTTCACC T------mel.07.20 TAATTTCGAT CATTTTCACC T------mel.14.26 TAATTTCGAT CATTTTCACC T------mel.17.66 TAATTTCGAT CATTTTCACC T------mel.18Ug.7 TAATTTCGAT CATTTTCACC T------mel.19.113 TAATTTCGAT CATTTTCACC T------mel.23.74 TAATTTCGAT CATTTTCACC T------mel.24.81 TAATGTCGAT CATTTTCACC T------mel.26.32 TA------mel.34.9 TAATTTCGAT CATTTTCACC T------mel.39.37 TAATTTCGAT CATTTTCACC T------mel.40Ug.12 TAATTTCGAT CATTTTCACC T------mel.45.89 TAATTTCGAT CATTTTCACC T------mel.CanS.41 TAATTTCGAT CATTTTCACC T------mel.53.105 TAATTTCGAT CATTTTCACC T------mel.54Ug.1 TAATTTCGAT CATTTTCACC T------mel.59.3 TAATTTCGAT CATTTTCACC T------mel.yw.50 TAATTTCGAT CATTTTCACC T------mel.WI83.25 TAATTTCGAT CATTTTCACC T------mel.51.100 TAATTTCGAT CATTTTCACC T------mel.21.2 TAATTTCGAT CATTTTCACC T------mel.29.3 TAATTTCGAT CATTTTCACC T------mel.64.3 TAATTTCGAT CATTTTCACC T------mel.67.1 TAATTTCGAT CATTTTCACC T------mel.55.2 TAATTTCGAT CATTTTCACC T------

Closely-Related Outgroup Species mau.5 TAGTTTCATC ACCTTTTTTT CACCT------

171 sec.38 TAGTTTCATC GTCTTTTTTT CACCT------sim.33 TAGTTTCATC -----TTTTT CAACCT------

Outgroup Species yak.25 ------TA------

Distantly-related Outgroup Species luc.41 gattgaaggt ttgaaggttt caattgaaag ataaaatctt taattttgta eug.20 GAAACAAGTT TGTATCTGTT TTTCAtaaga atcgtatcag atttgcctga fuy.9 AAAACAAGTT TGCTTTGGTT CTTTAgggaa gaaaaaaagg agcttttaaa

"Concestor" 263 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 264 ------

Distantly-related Outgroup Species luc.41 ataggaaaaa atataatttt ctacaaatat ttgaacaatt atttgttaaa

172 eug.20 tctggcactt aagatacttt tctgaacaaa ctttaagttt t------fuy.9 tttaaaatat cattgccatt agaacaggaa aaactactta atatttgtta

14 "Concestor" 263 ------TTTAACT "Concestor2" ------TTTAACT mel.00.1 ------TTTAACT mel.01.58 ------TTTAACT mel.02.7 ------TTTAACT mel.04.1 ------TTTAACT mel.04.7 ------TTTAACT mel.05.13 ------TTTAACT mel.05.16 ------TTTAACT mel.07.20 ------TTTAACT mel.14.26 ------TTTAACT mel.17.66 ------TTTAACT mel.18Ug.7 ------TTTAACA mel.19.113 ------TTTAACT mel.23.74 ------TTTAACT mel.24.81 ------TTTAACT mel.26.32 ------TTTAACT mel.34.9 ------TTTAACT mel.39.37 ------TTTAACT mel.40Ug.12 ------TTTAACA mel.45.89 ------TTTAACT mel.CanS.41 ------TTTAACT mel.53.105 ------TTTAACT mel.54Ug.1 ------TTTAACT mel.59.3 ------TTTAACT mel.yw.50 ------TTTAACT mel.WI83.25 ------TTTAACT mel.51.100 ------TTTAACT mel.21.2 ------TTTAACT mel.29.3 ------TTTAACT mel.64.3 ------TTTAACT mel.67.1 ------TTTAACT mel.55.2 ------TTTAACT

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species * yak.25 ------GTTTATTT

Distantly-related Outgroup Species * luc.41 attatatttc aaaggatatt attctTAGAA ATCCCCTTTG ATA TTTATTT eug.20 ------fuy.9 agcct------

15 16 17

173 "Concestor" 270 AATTTATGCC CAATATAGTT G------"Concestor2" AATTTATGCC CAATGTAGTT G------mel.00.1 AATTTATGCC CAATATAGTT G------mel.01.58 AATTTATGCC CAATATAGTT G------mel.02.7 AATTTATGCC CAATATAGTT G------mel.04.1 AATTTATGCC CAATGTAGTT G------mel.04.7 AATTTATGCC CAATGTAGTT G------mel.05.13 AATTTATGCC CAATATAGTT G------mel.05.16 AATTTATGCC CAATATAGTT G------mel.07.20 AATTTATGCC CAATATAGTT G------mel.14.26 AATTTATGCC CAATATAGTT G------mel.17.66 AATTTATGCC CAATATAGTT G------mel.18Ug.7 GATTTATGCC CAATATAGTT G------mel.19.113 AATTTATGCC CAATATAGTT G------mel.23.74 AATTTATGCC CAATATAGTT G------mel.24.81 AATTTATGCC CAATGTAGTT G------mel.26.32 AATTTATGCC CAATGTAGTT G------mel.34.9 AATTTATG------TAGTT G------mel.39.37 AATTTATGCC CAATATAGTT G------mel.40Ug.12 GATTTATGCC CAATATAGTT G------mel.45.89 AATTTATGCC CAATATAGTT G------mel.CanS.41 AATTTATGCC CAATATAGTT G------mel.53.105 AATTTATGCC CAATATAGTT G------mel.54Ug.1 AATTTATGCC CAATGTAGTT G------mel.59.3 AATTTATGCC CAATATAGTT G------mel.yw.50 AATTTATGCC CAATATAGTT G------mel.WI83.25 AATTTATGCC CAATATAGTT G------mel.51.100 AATTTATGCC CAATATAGTT G------mel.21.2 AATTTATGCC CAATATAGTT G------mel.29.3 AATTTATG------TAGTT G------mel.64.3 AATTTATGCC CAATATAGTT G------mel.67.1 AATTTATGCC CAATATAGTT G------mel.55.2 AATTTATGCC CAATATAGTT G------

Closely-Related Outgroup Species mau.5 AATTTATATC CATTAAA------sec.38 AGTTTATGAC CATTAAAATT G------sim.33 TATTTATGTC CATTAAAATT G------

Outgroup Species yak.25 ------C TATTAATATT ttttttacaa tttatgtgcc caacaaagat

Distantly-related Outgroup Species luc.41 AATTattttg taaaTATATA AACTAATTAA AAAGTTATta ataaatatcc eug.20 ------fuy.9 ------TAAATA AAATAAATAC AAATTTATtc caatgcaaaa

"Concestor" 291 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------

174 mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------

Distantly-related Outgroup Species luc.41 cctaactttt aaattctgac caaataaaat ctttcttaaa tcatcaccC- eug.20 ------fuy.9 atacatgttt ttttattcaa aaaaaggctt aactaaactt tctgaacgtg

"Concestor" 292 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------

175 mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------

Distantly-related Outgroup Species luc.41 ------eug.20 ------CAAATC CAGCAAAATC fuy.9 aaacaatatt actaactagg gtatgtacta aataTAATTT GTATAAAATC

"Concestor" 292 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A "Concestor2" ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.00.1 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.01.58 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.02.7 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.04.1 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.04.7 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.05.13 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.05.16 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.07.20 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.14.26 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.17.66 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.18Ug.7 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.19.113 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.23.74 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.24.81 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.26.32 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.34.9 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.39.37 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.40Ug.12 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.45.89 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.CanS.41 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.53.105 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.54Ug.1 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.59.3 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A

176 mel.yw.50 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.WI83.25 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.51.100 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.21.2 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.29.3 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.64.3 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.67.1 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A mel.55.2 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A

Closely-Related Outgroup Species mau.5 ------G TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A sec.38 ------CAT TTCTCTGAGT GTGCAGTAAG TGTCCCAG-A sim.33 ------CAT TTCTCTGAGT GTGCAGTAAG TGCCCCAG-A

Outgroup Species yak.25 ------AAAT TTCTCTAAGT GTGCAGTAAG ------

Distantly-related Outgroup Species luc.41 ------AT TTCTCTCTGT GTACAGTAAG TGCTGGAGAA eug.20 TGAGCCAACT CTATCAAAAC TTCTCTCAGT GTGCAGTAAG ------fuy.9 TGGCCAAAAG CAATGCAAAT TTTTTGTAGT GTACAGTAAG TGCCCAAG -A

"Concestor" 323 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC "Concestor2" ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.00.1 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.01.58 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.02.7 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.04.1 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.04.7 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.05.13 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.05.16 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.07.20 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.14.26 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.17.66 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.18Ug.7 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.19.113 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.23.74 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.24.81 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.26.32 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.34.9 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.39.37 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.40Ug.12 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.45.89 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.CanS.41 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.53.105 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.54Ug.1 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.59.3 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.yw.50 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.WI83.25 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.51.100 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.21.2 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.29.3 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.64.3 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.67.1 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC mel.55.2 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACAACC

177

Closely-Related Outgroup Species mau.5 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACACCC sec.38 ATGCGAATGC ATCTCGGGTT CATCGGTGGG TCGAGTTGGT TGCAACACCC sim.33 ATGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACACCC

Outgroup Species yak.25 ------TGC ATCTCGGGTT CATCGGGTTC ---AGTTTGT TGCAACACC-

Distantly-related Outgroup Species luc.41 ATGCGAATGC ATCTCGGGTT CATTGGCGGG TCGAGTTTGT TGCAACACC- eug.20 -TGCGAATGC ATCTCGGGTT CATCGGCGGG TCGAGTTTGT TGCAACACC- fuy.9 ATGCGAATGC ATCTCGGGTT CAACGGCGGG TCGAGTTTGT TGCATCACC-

"Concestor" 373 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA "Concestor2" GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.00.1 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.01.58 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.02.7 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.04.1 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.04.7 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.05.13 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.05.16 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.07.20 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.14.26 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.17.66 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.18Ug.7 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.19.113 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.23.74 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.24.81 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.26.32 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.34.9 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.39.37 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.40Ug.12 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAGA mel.45.89 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.CanS.41 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.53.105 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.54Ug.1 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.59.3 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.yw.50 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.WI83.25 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.51.100 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.21.2 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.29.3 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.64.3 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.67.1 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA mel.55.2 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA

Closely-Related Outgroup Species mau.5 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA sec.38 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA sim.33 GAAGAAC------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA

178

Outgroup Species yak.25 ------C------GA AGAAGTTGCA GCGTGCGTTC GGCATTAAAA

Distantly-related Outgroup Species luc.41 ------CGAA gaagagaaGA AGAACTTGCA GCGTGCGTCC GGCATTAAAA eug.20 ------C------Ga AGAACTTGCA GCGTGCGTCC GGCATTAAAA fuy.9 ------C------GA AGAACTTGCA GCGTGCGTCC GGCATTAAAA

"Concestor" 412 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG "Concestor2" TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.00.1 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.01.58 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.02.7 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.04.1 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.04.7 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.05.13 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.05.16 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.07.20 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.14.26 TTGTGTTTAT GCGTTTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.17.66 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.18Ug.7 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.19.113 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.23.74 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.24.81 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.26.32 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.34.9 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.39.37 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.40Ug.12 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.45.89 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.CanS.41 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.53.105 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.54Ug.1 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.59.3 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.yw.50 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.WI83.25 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.51.100 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.21.2 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.29.3 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.64.3 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.67.1 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG mel.55.2 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG

Closely-Related Outgroup Species mau.5 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG sec.38 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG sim.33 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG

Outgroup Species yak.25 TTGTGTTTAT GCGTGTTCGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG

Distantly-related Outgroup Species luc.41 TTGTGTTTAT GCGTGTTTGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG eug.20 TTGTGTTTAT GCGTGTTTGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG fuy.9 TTGTGTTTAT GCGTGTTTGG TAATTTTATA AAAGTTAAAT TAGTTTTAAG

179

18 19 "Concestor" 462 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT "Concestor2" ACCATAAATT CAGCTCACTC TCTCTCTC------GCTC TTTCT--CTT mel.00.1 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.01.58 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.02.7 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.04.1 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.04.7 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.05.13 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.05.16 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.07.20 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.14.26 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.17.66 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.18Ug.7 ACCATAAATT CAGCTCACTC TCTCTCTCTC --TTTCGCTC TTTCT--CTT mel.19.113 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.23.74 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.24.81 ACCATAAATT CAGCTCACTC TCTCTCTCTC TCTTTCGCTC TTTCT--CTT mel.26.32 ACCATAAATT CAGCTCACTC TCTCTCTCTC TCTTTCGCTC TTTCT--CTT mel.34.9 ACCATAAATT CAGCTCACTC TCTCTCTC------GCTC TTTCT--CTT mel.39.37 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.40Ug.12 ACCATAAATT CAGCTCACTC TCTCTCTC------GCTC TTTCT--CTT mel.45.89 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.CanS.41 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.53.105 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.54Ug.1 ACCATAAATT CAGCTCACTC TCTCTCTCTC --TTTCGCTC TTTCT--CTT mel.59.3 ACCATAAATT CAGCTCACTC TCTCTCTC------GCTC TTTCT--CTT mel.yw.50 ACCATAAATT CAGCTCACTC TCTCTCTCTC --TTTCGCTC TTTCT--CTT mel.WI83.25 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.51.100 ACCATAAATT CAGCTCACTC TCTCTCTCTC -–TTTCGCTC TTTCT--CTT mel.21.2 ACCATAAATT CAGCTCACTC TCTCTCTCTC --TCGCGCTC TTTCT--CTT mel.29.3 ACCATAAATT CAGCTCACTC TCTCTCTC------GCTC TTTCT--CTT mel.64.3 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.67.1 ACCATAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT mel.55.2 ACCCTAAATT CAGCTCACTC TCTCTCTCTC ------GCTC TTTCT--CTT

Closely-Related Outgroup Species mau.5 ACCATAAATT CAGCTCACTC CCTCTCTC------GCTC TTTCT--CTT sec.38 ACCATAAATT CAGCTCACTT CCTCTCTC------GCTC TTTCT--CTT sim.33 ACCATAAATT CAGCTCACTC CCTCTCTC------GCTC TTTCT--CTT

Outgroup Species yak.25 ACCATAAATT CAGCTCACTC TCTCCCTCGG ------CCTC TCTGTCTCTT

Distantly-related Outgroup Species luc.41 ACCATAAATT CAGCTCTCGC A------C TCTCTGCCAT ------eug.20 ACCATAAATT CAGCTCACTC TCT------C TCTCTCtcgc ccgGTCT--C fuy.9 ACCATAAATT CAGCGCACTC TCTggcata------GTCT--C

Abd-B 1 Abd-B 2 "Concestor" 504 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA

180 "Concestor2" TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.00.1 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.01.58 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.02.7 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCT G GCT----AGA mel.04.1 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.04.7 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.05.13 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.05.16 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.07.20 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.14.26 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.17.66 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.18Ug.7 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.19.113 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.23.74 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.24.81 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.26.32 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.34.9 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.39.37 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.40Ug.12 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.45.89 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.CanS.41 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.53.105 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.54Ug.1 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.59.3 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.yw.50 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.WI83.25 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT----AGA mel.51.100 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.21.2 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.29.3 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.64.3 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.67.1 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA mel.55.2 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA

Closely-Related Outgroup Species mau.5 TGCCAGTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA sec.38 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA sim.33 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCT ----AGA

Outgroup Species yak.25 TGCCATTTTA ACTTTTATTA CTCTTAATAT AAAAAAGCTG GCTGGCTAGA

Distantly-related Outgroup Species luc.41 ------TTTA ACTTTTATTA CTTTTAATAT AAAAAAACTG GCT----AGA eug.20 TGCCATTTTA ACTTTTATTA CTTTTAATAT AAAAAAGCTG GCT----AGA fuy.9 TGCCATTTTA ACTTTTATTA CTTTTAATAT AAAAAAGCTG Ga----TAGA

C Abd-B 3 Abd-B 4 "Concestor" 550 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG "Concestor2" AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.00.1 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.01.58 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.02.7 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.04.1 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.04.7 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.05.13 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.05.16 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG

181 mel.07.20 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.14.26 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.17.66 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.18Ug.7 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.19.113 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.23.74 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.24.81 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.26.32 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.34.9 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.39.37 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.40Ug.12 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.45.89 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.CanS.41 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.53.105 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.54Ug.1 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.59.3 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.yw.50 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.WI83.25 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.51.100 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.21.2 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.29.3 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.64.3 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.67.1 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG mel.55.2 TGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG

Closely-Related Outgroup Species C mau.5 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG sec.38 AGC------GGGCCAGC TGTAAAAATG CTCGCGGTCA TAAAAAGTTG sim.33 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG

Outgroup Species yak.25 AGC------GGGCCAGC TGTAAAAATG CATGCGCTCA TAAAAAGTTG

Distantly-related Outgroup Species luc.41 AGCGGCGGCG GTGGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAATTG eug.20 AGCGGG------CCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG fuy.9 AGC------GGGCCAGC TGTAAAAATG CACGCGGTCA TAAAAAGTTG

"Concestor" 591 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA "Concestor2" CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.00.1 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.01.58 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.02.7 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.04.1 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.04.7 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.05.13 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.05.16 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.07.20 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.14.26 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.17.66 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.18Ug.7 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.19.113 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.23.74 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.24.81 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.26.32 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.34.9 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA

182 mel.39.37 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.40Ug.12 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.45.89 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.CanS.41 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.53.105 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.54Ug.1 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.59.3 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.yw.50 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.WI83.25 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.51.100 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.21.2 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.29.3 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.64.3 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.67.1 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA mel.55.2 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA

Closely-Related Outgroup Species mau.5 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA sec.38 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA sim.33 CAGGAGGCAT GTTGCC------AGTTGCCTG CAACCGGCAA

Outgroup Species yak.25 CAGGAGGCAT GTTGCC------AGTTGCCAG TTGCC-----

Distantly-related Outgroup Species luc.41 CAGGAGGCAT GTTGCCggt------TGCCAAGTTC eug.20 CAGGAGGCAT GTTGCCGGTT GCcagttgcc ggttgctggc TGCCAAGTTC fuy.9 CAGGAGGCAT GTTGCTGGTA GCcaagttgc cAGTTGCCGG TTGCC-----

Abd-B 5 "Concestor" 626 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA "Concestor2" CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.00.1 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.01.58 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.02.7 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.04.1 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.04.7 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.05.13 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.05.16 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.07.20 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.14.26 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.17.66 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.18Ug.7 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.19.113 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.23.74 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.24.81 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.26.32 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.34.9 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.39.37 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.40Ug.12 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.45.89 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.CanS.41 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.53.105 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.54Ug.1 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA

183 mel.59.3 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.yw.50 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.WI83.25 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.51.100 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.21.2 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.29.3 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.64.3 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.67.1 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA mel.55.2 CATTCG------C-- --AG------AACAGCAGC AACATCGTAA

Closely-Related Outgroup Species mau.5 CATCCG------C-- --AC------AACAGCAGC AACATCGTAA sec.38 CATCCG------C-- --AC------AACAGCAGC AACATCGTAA sim.33 CATCCG------C-- --AC------AACAGCAGC AACATCGTAA

Outgroup Species yak.25 ------TGCAAC-- --AG------AACAGCAGC AACATCGTAA

Distantly-related Outgroup Species luc.41 CCAAGTTGCC CCTGCAACAT CCAC------AGCAGCAGC AACATCGTAA eug.20 CCAAGTTGCC TGTGCAAC-- --ATCCACAG AAACAGCAGC AACATCGTAA fuy.9 ------TGCAAC-- --ATCCACTG AAACGGCAGC AACATCGTAA

D E Dsx1 Site "Concestor" 654 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA "Concestor2" AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.00.1 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.01.58 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.02.7 AATAACTTCT TCCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.04.1 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.04.7 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.05.13 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.05.16 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.07.20 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.14.26 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.17.66 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.18Ug.7 AATAACTTCT TCCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.19.113 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.23.74 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.24.81 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.26.32 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.34.9 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.39.37 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.40Ug.12 AATAACTTCT TCCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.45.89 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.CanS.41 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.53.105 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.54Ug.1 AATAACTTCT TCCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.59.3 AATAACTTCT TCCTCTGCGG TCTGA------CAACAA TGTTGCTGCA mel.yw.50 AATAACTTCT TCCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.WI83.25 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.51.100 AATAACTTCT TCCTCTGCGG TCTGA------CAACAA TGTTGCTGCA mel.21.2 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.29.3 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.64.3 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.67.1 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA mel.55.2 AATAACTTCT TGCTCTGCGG TCTGAGTTTG GCCGCAACAA TGTTGCTGCA

184

Closely-Related Outgroup Species D E mau.5 AATAACTTCT TGCTCTGCGG TCTGCATTTG GCCGCAACAA TGTTGCTGCA sec.38 AATAACTTCT TGCTCTGGGG TCTGCATTTG GCCGCAACAA TGTTGCTGCA sim.33 AATAACTTCT TGCTCTGCGG TCTGCATTTG GCCGCAACAA TGTTGCTGCA

Outgroup Species yak.25 AATAACTTCT TGCTCTGCGG TCTCCGTTTG GCCGCAACAA TGTTGCCGCA

Distantly-related Outgroup Species luc.41 AATAATTTCT TTCTCTGCGG TCTCCGTTTG GCCGCAACAA TGTTGCTGCA eug.20 AATAATTTCT TGCTCTGCGG TCTCCGCTCG GTTGCAACAA TGTTGCGGCA fuy.9 AATAATTTCT TGCTCTGCGG TCTCCATTTG GCCGCAACAA TGTTGCTGCA

Abd-B 6 "Concestor" 704 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA "Concestor2" TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.00.1 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.01.58 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.02.7 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.04.1 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.04.7 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.05.13 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.05.16 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.07.20 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.14.26 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.17.66 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.18Ug.7 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.19.113 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.23.74 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.24.81 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.26.32 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.34.9 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.39.37 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.40Ug.12 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.45.89 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.CanS.41 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.53.105 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.54Ug.1 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.59.3 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.yw.50 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.WI83.25 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.51.100 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.21.2 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.29.3 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.64.3 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.67.1 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA mel.55.2 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA

Closely-Related Outgroup Species mau.5 TTTATTCGTA TTATTATTAC AATTTAATGA ATAATTCTAA TTATATGCGA sec.38 TTTATTCGTA TTATTATTAC AATTTAATGA ATAATTCTAA TTATATGCGA sim.33 TTTATTCGTA TTATTATTAC AATTTAATGA ATAATTCTAA TTATATGCGA

185

Outgroup Species yak.25 TTTATTCGTA TTATTATTAC ATTTTAATGA ATAATTCTAA TTATATGCAA luc.41 TTTATTCGTA TTATTATTAC ATTTTAATGA TTAATTCTAA TTATATGCGA eug.20 TTTATTCGTA TTATTATTAC ATTTTAATGA TTAATTCTAA TTATATGCGA fuy.9 TTTATTCGTA TTATTATTAC ATTTTAATGA TTAATTCTAA TTATATGCGA

"Concestor" 754 CTTGAATAAG CCCGC------CGA------"Concestor2" CTTGAATAAG CCCGC------CGA------mel.00.1 CTTGAATAAG CCCGC------CGA------mel.01.58 CTTGAATAAG CCCGC------CGA------mel.02.7 CTTGAATAAG CCCGC------CGA------mel.04.1 CTTGAATAAG CCCGC------CGA------mel.04.7 CTTGAATAAG CCCGC------CGA------mel.05.13 CTTGAATAAG CCCGC------CGA------mel.05.16 CTTGAATAAG CCCGC------CGA------mel.07.20 CTTGAATAAG CCCGC------CGA------mel.14.26 CTTGAATAAG CCCGC------CGA------mel.17.66 CTTGAATAAG CCCGC------CGA------mel.18Ug.7 CTTGAATAAG CCCGC------CGA------mel.19.113 CTTGAATAAG CCCGC------CGA------mel.23.74 CTTGAATAAG CCCGC------CGA------mel.24.81 CTTGAATAAG CCCGC------CGA------mel.26.32 CTTGAATAAG CCCGC------CGA------mel.34.9 CTTGAATAAG CCCGC------CGA------mel.39.37 CTTGAATAAG CCCGC------CGA------mel.40Ug.12 CTTGAATAAG CCCGC------CGA------mel.45.89 CTTGAATAAG CCCGC------CGA------mel.CanS.41 CTTGAATAAG CCCGC------CGA------mel.53.105 CTTGAATAAG CCCGC------CGA------mel.54Ug.1 CTTGAATAAG CCCGC------CGA------mel.59.3 CTTGAATAAG CCCGC------CGA------mel.yw.50 CTTGAATAAG CCCGC------CGA------mel.WI83.25 CTTGAATAAG CCCGC------CGA------mel.51.100 CTTGAATAAG CCCGC------CGA------mel.21.2 CTTGAATAAG CCCGC------CGA------mel.29.3 CTTGAATAAG CCCGC------CGA------mel.64.3 CTTGAATAAG CCCGC------CGA------mel.67.1 CTTGAATAAG CCCGC------CGA------mel.55.2 CTTGAATAAG CCCGC------CGA------

Closely-Related Outgroup Species mau.5 CTTGAATAAG GCCGC------AGA------sec.38 CTTGAATAAG GCCGC------AGA------sim.33 CTTGAATAAG GCCGC------AGA------

Outgroup Species yak.25 CTTGAATAGG GCCGCTGCCG CTGGCTGAGC GTAGA ------luc.41 CTTGAATAAG GCCGCTGAC------TGAGC GCAGA------eug.20 CTTGAATAAG GCCGCTGCC------TGgct gagaccgacc agtcggttgg fuy.9 CTTGAATAAG GCCGCTGAC------TGAGC GaaaA------

186

Abd-B 7 "Concestor" 772 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG “Concestor2" ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.00.1 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.01.58 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.02.7 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.04.1 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.04.7 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.05.13 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.05.16 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.07.20 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.14.26 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.17.66 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.18Ug.7 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.19.113 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.23.74 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.24.81 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.26.32 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.34.9 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.39.37 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.40Ug.12 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.45.89 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.CanS.41 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.53.105 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.54Ug.1 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.59.3 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.yw.50 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.WI83.25 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.51.100 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.21.2 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.29.3 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.64.3 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.67.1 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG mel.55.2 ----TGCCAA TAAAAAG-CG GCGTGGCAAA GTGGAGTGGA C------TGG

Closely-Related Outgroup Species mau.5 ----AGCCAA TAAAAAGCCG GCGTGGCAAA GTGGAGTGGA T------TGG sec.38 ----AGCCAA TAAAAAGCCG GCGTGGCAAA GTGGAGTGGA T------TGG sim.33 ----AGCCAA TAAAAAGCCG GCGTGGCAAA GTGGAGTGGA T------TGG

Outgroup Species yak.25 ----AACCAA TAAAAAA-TG CCGGGGCAAA GTGGAG------TGGATTTC luc.41 ----GGCCCA TAAAAA---- -CGAGGCAAA GTGGAGTGGC TGTGGATTTT eug.20 CAGGAGCCAA TAAAAGTTGG CCGGGGCAAA GTGGA ------GTGGATTTT fuy.9 ----AGCCAA TAAAAAG-TG CCGAGGCAAA GTGGAGTGGA T------TTT

F Abd-B 8 "Concestor" 811 GGATGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT "Concestor2" GGATGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.00.1 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.01.58 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.02.7 GGATGTGTGG CGCCC----C TGCTTGTGGC ACATAAAAAT TGGCGCAAGT

187 mel.04.1 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.04.7 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.05.13 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.05.16 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.07.20 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.14.26 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.17.66 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.18Ug.7 GGATGTGTGG CGCCC----C TGCTTGTGGC ACATAAAAAT TGGCGCAAGT mel.19.113 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.23.74 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.24.81 GGTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.26.32 GGTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.34.9 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.39.37 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.40Ug.12 GGTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.45.89 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.CanS.41 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.53.105 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.54Ug.1 GGATGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.59.3 GGATGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.yw.50 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.WI83.25 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.51.100 GGTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.21.2 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.29.3 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.64.3 GGATGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.67.1 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT mel.55.2 GTTTGTGTGG CGCCC----C TGCTAGTGGC ACATAAAAAT TGGCGCAAGT

Closely-Related Outgroup Species F mau.5 CGATGTGTGG CGCCC----C AGCTAGTGGC ACATAAAAAT TGGCGCAAGT sec.38 CGATGTGTGG CGCCC----C AGCTAGTGGC ACATAAAAAT TGGCGCAAGT sim.33 CGATGTGTGG CGCCC----C GGCTAGTGGC ACATAAAAAT TGGCGCAAGT

Outgroup Species yak.25 GGATGTGTGG CGCCT----C TGCTAGTGGC ACATAAAAAT TGGTGCAAGT luc.41 GGCCGTGTGG CGCCC----C TTGTGGTGCA ACATAAAAAT TGGCGCAAGT eug.20 GGCCGTGTGG CGCCCC-TGC CACTAGTGCC ACATAAAAGT TGGCGCAAGT fuy.9 GGCCGTGTGG CGCCCCGTGC TGCTAGTGGC ACATAAAAAT TGGCGCAAGT

Abd-B 9 G "Concestor" 857 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT "Concestor2" TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.00.1 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.01.58 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.02.7 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.04.1 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.04.7 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.05.13 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.05.16 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.07.20 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.14.26 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.17.66 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.18Ug.7 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT

188 mel.19.113 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.23.74 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.24.81 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTG mel.26.32 TAATTGTGGC AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTG mel.34.9 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.39.37 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.40Ug.12 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTG mel.45.89 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.CanS.41 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.53.105 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.54Ug.1 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTA mel.59.3 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTG mel.yw.50 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.WI83.25 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.51.100 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.21.2 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.29.3 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTG mel.64.3 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.67.1 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT mel.55.2 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT

Closely-Related Outgroup Species G mau.5 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT sec.38 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT sim.33 TAATTGTGGT AGTTATTTGC TG-TTTTGCC ATTTGGTCAT TTTACAATTT

Outgroup Species yak.25 TAATTGTGGT AGTTATTTGC TGTT-TTGCC ATTTGGTCAT TTTACAATTT luc.41 TAATTGTGGT AGTTATTTGC TGTTTTTGCC ATTTGGTCAT TTTACAATTT eug.20 TAATTGTGGT AGTTATTTGC TGTT-TTGCC ATTTGGTCAT TTTACAATTT fuy.9 TAATTGTGGT AGTTATTTGC TGTT-TTGCC ATTTGGCCGT TTTACAATTT

Abd-B 10 "Concestor" 906 TACCATTTCA GC------CACA ACTT------TTCGC "Concestor2" TACCATTTCA GC------CACA ACTT------TTCGC mel.00.1 TACCATTTCA GC------CACA ACTT------TTCGC mel.01.58 TACCATTTCA GC------CACA ACTT------TTCGC mel.02.7 TACCATTTCA GC------CACA ACTT------TTCGC mel.04.1 TACCATTTCA GC------CACA ACTT------TTCGC mel.04.7 TACCATTTCA GC------CACA ACTT------TTCGC mel.05.13 TACCATTTCA GC------CACA ACTT------TTCGC mel.05.16 TACCATTTCA GC------CACA ACTT------TTCGC mel.07.20 TACCATTTCA GC------CACA ACTT------TTCGC mel.14.26 TACCATTTCA GC------CACA ACTT------TTCGC mel.17.66 TACCATTTCA GC------CACA ACTT------TTCGC mel.18Ug.7 TACCATTTCA GC------CACA ACTT------TTCGC mel.19.113 TACCATTTCA GC------CACA ACTT------TTCGC mel.23.74 TACCATTTCA GC------CACA ACTT------TTCGC mel.24.81 TACCATTTCA GC------CACA ACTT------TTCGC mel.26.32 TACCATTTCA GC------CACA ACTT------TTCGC mel.34.9 TACCATTTCA GC------CACA ACTT------TTCGC mel.39.37 TACCATTTCA GC------CACA ACTT------TTCGC mel.40Ug.12 TACCATTTCA GC------CACA ACTT------TTCGC mel.45.89 TACCATTTCA GC------CACA ACTT------TTCGC

189 mel.CanS.41 TACCATTTCA GC------CACA ACTT------TTCGC mel.53.105 TACCATTTCA GC------CACA ACTT------TTCGC mel.54Ug.1 TACCATTTCA GC------CACA ACTT------TTCGC mel.59.3 TACCATTTCA GC------CACA ACTT------TTCGC mel.yw.50 TACCATTTCA GC------CACA ACTT------TTCGC mel.WI83.25 TACCATTTCA GC------CACA ACTT------TTCGC mel.51.100 TACCATTTCA GC------CACA ACTT------TTCGC mel.21.2 TACCATTTCA GC------CACA ACTT------TTCGC mel.29.3 TACCATTTCA GC------CACA ACTT------TTCGC mel.64.3 TACCATTTCA GC------CACA ACTT------TTCGC mel.67.1 TACCATTTCA GC------CACA ACTT------TTCGC mel.55.2 TACCATTTCA GC------CACA ACTT------TTCGC

Closely-Related Outgroup Species mau.5 TACCccctct ccacc------sec.38 TACCATTTCA GC------CACA ACTT------TTCGC sim.33 TACCATTTCA GC------CACA ACTT------TTCTC

Outgroup Species yak.25 TACCATTTCA CCAT------TTCAGCCACA ACTT------TTAGC luc.41 TACCATTTTA CCAT------TTGAGCCACA gcacaacttt tcgcaTTCGC eug.20 TACCATTTTC CATTCTGCCA TTCTGCCACA ACTT------TTTCC fuy.9 TACCATTct- GC------CACA ACTT------TTCGC

H I J Dsx2 Site "Concestor" 931 ACTGCTCCCC CCC------TC TCCCAGCACA ACAATGTTGC "Concestor2" ACTGCTCCCC CCC------TC TCCCAGCACA ACAATGTTGC mel.00.1 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.01.58 ACTGCTCCCC CCCCC------TT TCCCAGCACA ACAATGTTGC mel.02.7 ACTGCTCCCC CCCCC------TC TCCCAGTACA ACAATGTTGC mel.04.1 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.04.7 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.05.13 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.05.16 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.07.20 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.14.26 ACTGCTCCCC CCCCCCCCCC ------TC TCCCAGCACA ACAATGTTGC mel.17.66 ACTGCTCCCC CCCCCC------TC TCCCAGTACA ACAATGTTGC mel.18Ug.7 ACTGCTCCCC CCCCCCCC------TC TCCCAGCACA ACAATGTTGC mel.19.113 ACTGCTCCCC CCCC------TC TCCCAGTACA ACAATGTTGC mel.23.74 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.24.81 ACTGCTCCCC CCCCC------TC TCCCAGCACA ACAATGTTGC mel.26.32 ACTGCTCCCC CCCC------TC TCCCAGTACA ACAATGTTGC mel.34.9 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.39.37 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.40Ug.12 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.45.89 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.CanS.41 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.53.105 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.54Ug.1 ACTGCTCCCC CCCCCCC------TC TCCCAGCACA ACAATGTTGC mel.59.3 ACTGCTCCCC CCCCCC------TC TCCCAGCACA ACAATGTTGC mel.yw.50 ACTGCTCCCC CCCCCC------TC TCCCAGCACA ACAATGTTGC mel.WI83.25 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.51.100 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.21.2 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC

190 mel.29.3 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.64.3 ACTGCTCCCC CCC------TC TCCCAGCACA ACAATGTTGC mel.67.1 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC mel.55.2 ACTGCTCCCC CCC------TT TCCCAGCACA ACAATGTTGC

Closely-Related Outgroup Species H I J mau.5 -----CCGCC CCTC------A ACCCAACGCA ACAATGTTGC sec.38 ACTGCCACCC CCTCTCCACC CCGCCCAACA ACCCAACGCA ACAATGTTGC sim.33 ACTGCCACCC CCTCTCCACC CCGCCCCTCA ACCCAACGCA ACAATGTTGC

Outgroup Species yak.25 ACTGCTCCC------TTCG---- -CCCGACGCA ACAATGTTGC luc.41 ATTGCTCCGT TT------GCAACATTG TTGCGTTTGC eug.20 AACTG------GTGCA ACATTGTTGC fuy.9 ATTGCTCCGC TTGCCTGGTG CAACAATGT- GC------CGCAGTCGC

Abd-B 11 K Abd-B 12 "Concestor" 960 GGCATTCTCG CAC-TTT-A- CGAGGCG--- TTTTTTT--A TATCACTTAC "Concestor2" GGCATTCTCG CAC-TTT-A- CGAGGCG--- TTTTTTT--A TATCACTTAC mel.00.1 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.01.58 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.02.7 GGCATTCTCG CAC-TTT-AG CGAGGCG-TT TTTTTTT--A TATCACTTAC mel.04.1 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.04.7 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.05.13 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.05.16 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.07.20 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.14.26 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.17.66 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.18Ug.7 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.19.113 GGCATTCTCG CAC-TTT-A- CGAGGCG-TT TTTTTTT--A TATCACTTAC mel.23.74 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.24.81 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.26.32 GGCATTCTCG CAC-TTT-A- CGAGGCG-TT TTTTTTT--A TATCACTTAC mel.34.9 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.39.37 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.40Ug.12 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.45.89 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.CanS.41 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.53.105 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.54Ug.1 GGCATTCTCG CAC-TTT-A- CGAGGCG--T TTTTTTT--A TATCACTTAC mel.59.3 GGCATTCTCG CAC-TTT-A- CGAGGCG-TT TTTTTTT--A TATCACTTAC mel.yw.50 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.WI83.25 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.51.100 GGCATTCTCG CAC-TTT-A- CGAGGCG-TT TTTTTTT—-A TATCACTTAC mel.21.2 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.29.3 GGCATTCTCG CAC-TTT-A- CGAGGCG--T TTTTTTT--A TATCACTTAC mel.64.3 GGCATTCTCG CAC-TTT-A- CGAGGCG--T TTTTTTT--A TATCACTTAC mel.67.1 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC mel.55.2 GGCATTCTCG CAC-TTT-A- CGAGGCGTTT TTTTTTT--A TATCACTTAC

Closely-Related

191 Outgroup Species K mau.5 GGCATTCTCG CAC-TTT-A- CGAGGCG--- -TTTTTT--A TATCACTTAC sec.38 GGCATTCTCG CAC-TTT-A- CGAGGCG--- -TTTTTT--A TATCACTTAC sim.33 GGCATTCTCG CAC-TTT-A- CGAGGCG--- -TTTTTT--A TATCACTTAC

Outgroup Species yak.25 GGCATTCTCG CAC-TTT-A- CGAGGCG--- -TTTTTT--A TATCAC---- luc.41 TGCATTCTCG CAC-TTTTA- CGAGGCG--- -TTTTTT--A TATCAC---- eug.20 CGCATTCTCG CAC-TTTTA- CGAGGCG--- --TTTTT--A TATCAC---- fuy.9 TGCATTCTCG CACtTTT-A- CGAGGCG--- -TTTTTT--A TATCAC----

Abd-B 13 L "Concestor" 1002 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA "Concestor2" TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.00.1 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.01.58 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.02.7 TTTACTTAGT TGATTGAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.04.1 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.04.7 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.05.13 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.05.16 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.07.20 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.14.26 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.17.66 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.18Ug.7 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.19.113 TTTACTTAGT TGATTGAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.23.74 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.24.81 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.26.32 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.34.9 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.39.37 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.40Ug.12 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.45.89 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.CanS.41 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.53.105 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.54Ug.1 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTA GA mel.59.3 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.yw.50 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.WI83.25 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.51.100 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.21.2 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.29.3 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.64.3 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.67.1 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA mel.55.2 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA CATGCTTAGA

Closely-Related Outgroup Species L mau.5 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA --TGCTTAGA sec.38 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA --TGCTTAGA sim.33 TTTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA --TGCTTAGA

Outgroup Species yak.25 -TTACTTAGT TGATTAAGGG CGTTGCCGAT GGGCCAGATA --TGCTTAGA

192 luc.41 -TTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCCAGATA --TGTTTAGA eug.20 -TTACTTAGT TGATTAAGGG CGTGGCCGAT GGGCGAGATA --TGTTTAGA fuy.9 -TTACTTAGT TGATTAAGGG CGTGGCCGAT GGGAAAGATA --TGTTTAGA

M "Concestor" 1052 TTTGCTCC------AG----C AGTGGGCTGC "Concestor2" TTTGCTCC------AG----C AGTGGGCTGC mel.00.1 TTTGCTCC------AG----C AGTGGGCTGC mel.01.58 TTTGCTCC------AG----C AGTGGGCTGC mel.02.7 TTTGCTCC------AG----C AGTGGGCTGC mel.04.1 TTTGCTCC------AG----C AGTGGGCTGC mel.04.7 TTTGCTCC------AG----C AGTGGGCTGC mel.05.13 TTTGCTCC------AG----C AGTGGGCTGC mel.05.16 TTTGCTCC------AG----C AGTGGGCTGC mel.07.20 TTTGCTCC------AG----C AGTGGGCTGC mel.14.26 TTTGCTCC------AG----C AGTGGGCTGC mel.17.66 TTTGCTCC------AG----C AGTGGGCTGC mel.18Ug.7 TTTGCTCC------AG----C AGTGGGCTGC mel.19.113 TTTGCTCC------AG----C AGTGGGCTGC mel.23.74 TTTGCTCC------AG----C AGTGGGCTGC mel.24.81 TTTGCTCC------AG----C AGTGGGCTGC mel.26.32 TTTGCTCC------AG----C AGTGGGCTGC mel.34.9 TTTGCTCC------AG----C AGTGGGCTGC mel.39.37 TTTGCTCC------AG----C AGTGGGCTGC mel.40Ug.12 TTTGCTCC------AG----C AGTGGGCTGC mel.45.89 TTTGCTCC------AG----C AGTGGGCTGC mel.CanS.41 TTTGCTCC------AG----C AGTGGGCTGC mel.53.105 TTTGCTCC------AG----C AGTGGGCTGC mel.54Ug.1 TTTGCTCC------AG----C AGTGGTCTGC mel.59.3 TTTGCTCC------AG----C AGTGGGCTGC mel.yw.50 TTTGCTCC------AG----C AGTGGGCTGC mel.WI83.25 TTTGCTCC------AG----C AGTGGGCTGC mel.51.100 TTTGCTCC------AG----C AGTGGGCTGC mel.21.2 TTTGCTCC------AG----C AGTGGGCTGC mel.29.3 TTTGCTCC------AG----C AGTGGGCTGC mel.64.3 TTTGCTCC------AG----C AGTGGGCTGC mel.67.1 TTTGCTCC------AG----C AGTGGGCTGC mel.55.2 TTTGCTCC------AG----C AGTGGGCTGC

Closely-Related Outgroup Species mau.5 TTTGCTCTAG C------sec.38 TTTGCTGTA------GC sim.33 TTTGCTCTA------GC

Outgroup Species M yak.25 TTTGGTCTAT GTATCCccgt ------G----G AGTGGGCTGT luc.41 TTTGCTCTTT GTATtaggtt tcaggcttca gTCAG ----G AGTGGGCTGC eug.20 TTTGCTCTTT GTATCCcggc attccttcag gttttccccG AGTGGGCTGC fuy.9 TTTGCTCTTT GTATCCggac attcta---- -TCAG----G AGTGGGTTGC

193

Abd-B 14 "Concestor" 1073 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------"Concestor2" ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.00.1 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.01.58 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.02.7 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.04.1 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.04.7 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.05.13 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.05.16 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.07.20 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.14.26 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.17.66 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.18Ug.7 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.19.113 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.23.74 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.24.81 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.26.32 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.34.9 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.39.37 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.40Ug.12 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.45.89 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.CanS.41 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.53.105 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.54Ug.1 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.59.3 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.yw.50 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.WI83.25 ATTTTACGAC CCTCA------AAACCCGA TCCAAAT------mel.51.100 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------mel.21.2 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------mel.29.3 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------mel.64.3 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------mel.67.1 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------mel.55.2 ATTTTACGAC CCTCA------AAACCCGA-TCCAAAT------

Closely-Related Outgroup Species mau.5 GTTTTACGAC CCTCA------AAACCCGA ACCAAAT------sec.38 GTTTTACGAC CCTCA------AAACCCGA TCCAAAT------sim.33 GTTTTACGAC CCTCA------AAACCCGA TCCAAAT------

Outgroup Species yak.25 GTTTTACGAC CCTCA------AAACCCGA TCGAAACGGA AAGAGAc--- luc.41 ATTTTACGAC CCTCCGAAAC CCAAACCCGA TCGAAAC------eug.20 ATTTTACGAC CCTCC------AAGCCGA TCGAAACGGA AACtctgacg fuy.9 ATTTTACGAC CCGCC------AAAGCCGA TCAAAACGGA AACAGAag--

N "Concestor" 1103 -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA "Concestor2" -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA mel.00.1 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.01.58 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.02.7 -GGATAAT------ATGA AAATATCGC------TAAT CCGCTTATGA mel.04.1 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA

194 mel.04.7 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.05.13 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.05.16 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.07.20 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.14.26 -GGAAAAT------ATGA AAATACGGG------TAAT CCGCTTATGA mel.17.66 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.18Ug.7 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.19.113 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.23.74 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.24.81 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.26.32 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.34.9 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.39.37 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.40Ug.12 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.45.89 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.CanS.41 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.53.105 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.54Ug.1 -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA mel.59.3 -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA mel.yw.50 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.WI83.25 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.51.100 -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA mel.21.2 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.29.3 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.64.3 -GGAAAAT------ATGA AAATATGGC------TAAT CCGCTTATGA mel.67.1 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA mel.55.2 -GGAAAAT------ATGA AAATACGGC------TAAT CCGCTTATGA

Closely-Related Outgroup Species N mau.5 -AGAAAAT------ATGA A---ATGGC------TAAT CCGCTTATGA sec.38 -GGAAAAT------ATGA A---ATGGC------TAAT CCGCTTATGA sim.33 -GGAAAAT------ATGA A---ATGGC------TAAT CCGCTTATGA

Outgroup Species yak.25 ----AAAT------ATGA AA---TGGT------TAAT CCGCTTATGG luc.41 -GGAAACGAA gaaaacATGA AA---TGGG------TAAT CCGCTTATGG eug.20 gGGAAAAT------ATGA AA---TGGTa atcggcttat gggtgctaag fuy.9 ----AAAT------ATGA AA---TGGCc tctgggTAAT CCGCTTATGG

20 21 "Concestor" 1137 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT "Concestor2" GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.00.1 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.01.58 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.02.7 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.04.1 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.04.7 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.05.13 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.05.16 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.07.20 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.14.26 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.17.66 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.18Ug.7 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT TCGATCGCCT mel.19.113 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA

195 mel.23.74 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.24.81 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.26.32 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.34.9 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.39.37 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.40Ug.12 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.45.89 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.CanS.41 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.53.105 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCAATCGCCA mel.54Ug.1 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT TCGATCGCCT mel.59.3 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.yw.50 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.WI83.25 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.51.100 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.21.2 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.29.3 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT TCGATCGCCT mel.64.3 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCT mel.67.1 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA mel.55.2 GCACAACAAA TTGGTTCACA CACTTCGATC GAAATTACTT GCGATCGCCA

Closely-Related Outgroup Species mau.5 GCACAACAAA ATGTTACACA CACTCCGATC GAAATTACTT GCAATCGCAT sec.38 GCACAACAAA ctcaaacac------sim.33 GCACAACAAA ATGTTACACA CACTCCGATC GAAATTACTT GTGATCGCAT

Outgroup Species yak.25 TAACAACAAA ATGTT----A CACTTCGATC GAAATCACTT GCGATCGCAT luc.41 GTAacgaaca gtatacagat ctgaattact aagatcgtaa tttggt---T eug.20 agttctaaga cctttcgatt tggaacttga gtttactatc aaatttgggg fuy.9 TTATAACAAT ATTatagaaa tttcgataag atttgaaata agatcttttt

22 23 24 25 "Concestor" 1187 T------TTG ATTGGTTTCA GTGTATTGCT TTAACTAGCA GGTGAACACT "Concestor2" T------TTG ATTGGTTTCA GTGTATTGCT TTAACTAGCA GGTGAACACT mel.00.1 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.01.58 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.02.7 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.04.1 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.04.7 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.05.13 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.05.16 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.07.20 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.14.26 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.17.66 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.18Ug.7 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.19.113 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.23.74 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.24.81 T------TTG ATTGCTTTCA GTGCATTGCT TTAACTGGCA GGTGAACACT mel.26.32 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTAGCA GGTGAACACT mel.34.9 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.39.37 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.40Ug.12 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.45.89 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTAGCA GGTGAACACT mel.CanS.41 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT

196 mel.53.105 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.54Ug.1 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.59.3 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.yw.50 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.WI83.25 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.51.100 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.21.2 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT mel.29.3 T------TTG ATTGCTTTCA GTGTATTGCT TTAACTGGCA GGTGAACACT mel.64.3 T------TTG ATTGGTTTCA GTGTATTGCT TTAACTAGCA GGTGAACACT mel.67.1 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GATGAACACT mel.55.2 T------TTG ATTGGTTTCA ATGTATTGCT TTAACTGGCA GGTGAACACT BstXI site

Closely-Related Outgroup Species * * * mau.5 T------TTA ATGTT------TTACT TTAACTAGCA GGTGAACACT sec.38 ------AGCA GGTAAACTCT sim.33 T------TTA AGTGGTTTCA GTGTATT-CT TTAACTAGCA GATGAACACT

Outgroup Species yak.25 T------TTA AatggcTTTA GTTTATCGCT TCAACTAGCA GGTGAAAATC luc.41 T------TTG AAATGTTTTA ATTTATTCCT TTCttaataa aggtaatttt eug.20 ctgaatgggt ttgtaattta gtgtttatta cttgactgaa tcggatctat fuy.9 ttgactaTTA AATGGTTTCA ATCCACTGCC TTttgataat ggttaatttg

26 27 "Concestor" 1231 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT "Concestor2" TTGTTT------TT TATCTAACGA TTCTTACTAT TTATTATCCT mel.00.1 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.01.58 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.02.7 TTGTTT------TT TATCTAACGA TTCTTACTAT TTATTATCCT mel.04.1 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.04.7 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.05.13 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.05.16 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.07.20 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.14.26 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.17.66 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.18Ug.7 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.19.113 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.23.74 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.24.81 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.26.32 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.34.9 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.39.37 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.40Ug.12 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.45.89 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.CanS.41 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.53.105 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.54Ug.1 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.59.3 TTGTTT------TT TATCTAACGA TTCTTACTAT TTATTATCCT mel.yw.50 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.WI83.25 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.51.100 TTGTTT------TT TATCTAACGA TTCTTACTAG TTATTATCCT mel.21.2 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.29.3 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT

197 mel.64.3 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.67.1 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT mel.55.2 TTGTTT------TT TATCTAACGA TTCTTACTAT TTAATATCCT

Closely-Related Outgroup Species mau.5 TTGTTGttac cttactaTTT ACC-TTACTA TTCTTACTAT TTGTTGTCCT sec.38 TTGTt------TTT ACC-TTAATA TTCTTACTAT TTGTTGTCCT sim.33 TGGTTG------TT ACC-TTACTA TTCTTACTAT TTGTTGTCCT

Outgroup Species yak.25 GGGTTT------TG AGT-TAACGA TTTGTACCAG TTATTGGCCA luc.41 tagttgacat ttttagacca atacttcatt ataatattgg tccagaattt eug.20 agggaaatat taagagttct gccttgaagg ttcctgaaaa tgaatctaaa fuy.9 gtttaaaatt gtaaatgttc agtttccgat taatttctat cagcccattt

28 29 30 31 "Concestor" 1269 AGTCAATTAA TGTATTTTCC AGTACTTCCA TCGATATCAC AGAGTTCCCA "Concestor2" AGTCAATTAA TGTATTTTCC AGTACTTCTA TCGATATCTC AGAGTTCCCA mel.00.1 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.01.58 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.02.7 AGTCAATTAA TGTATTTTCC AGTACTTCCA TCGATATCTC AGAGTTCCCA mel.04.1 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.04.7 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.05.13 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.05.16 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.07.20 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.14.26 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGACTTCCCA mel.17.66 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.18Ug.7 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.19.113 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.23.74 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.24.81 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTTCCA mel.26.32 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.34.9 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.39.37 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.40Ug.12 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.45.89 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.CanS.41 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.53.105 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.54Ug.1 AGTCAATTAA TGTATTTTCC AGTACTTCTA TCGATATCTC AGAGTTCCCA mel.59.3 AGTCAATTAA TATATTTTCC AGTACTTCTA TCGATATCTC AGAGTTCCCA mel.yw.50 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.WI83.25 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.51.100 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCTC AGAGTTCCCA mel.21.2 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.29.3 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.64.3 AGTCAATTAA TGTATTTTCC AGTACTTCCA TCGATATCAC AGAGTTCCCA mel.67.1 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA mel.55.2 AGTCAATTAA TGTATTTTCC ACTACTTCCA TCGATATCAC AGAGTTCCCA

Closely-Related Outgroup Species

198 mau.5 AGTCAATTAA TGTATTTTCC AGTACTTCTA TCGATATTCC CAACCCCTAG sec.38 AGTCAATTAA TGTATTTTCC AGTGCTTTCA TCGATATTCC CAACCCATAC sim.33 AGTCAATTAA TGTATTTTCC GGTACTTCCA TCGATATTCC CAACCCATAC

Outgroup Species yak.25 AGTCCCCTGA AGTATTTTCC AGTACTTCTC CGttttccga ccatcgacca luc.41 gtaaagtttc ccaaataaat gaaatatgaa ttatttaaac atttttctgc eug.20 cttgctgctg taagtttatg caaaatatga agtcaaggga gttctcccta fuy.9 tataagaaag tttttttttg ctatttgcct ttttgttttt caaattaatt

"Concestor" 1319 TT------"Concestor2" TT------mel.00.1 TT------mel.01.58 TT------mel.02.7 TT------mel.04.1 TT------mel.04.7 TT------mel.05.13 TT------mel.05.16 TT------mel.07.20 TT------mel.14.26 TT------mel.17.66 TT------mel.18Ug.7 TT------mel.19.113 TT------mel.23.74 TT------mel.24.81 TT------mel.26.32 TT------mel.34.9 TT------mel.39.37 TT------mel.40Ug.12 TT------mel.45.89 TT------mel.CanS.41 TT------mel.53.105 TT------mel.54Ug.1 TT------mel.59.3 TT------mel.yw.50 TT------mel.51.100 TT------mel.21.2 TT------mel.29.3 TT------mel.64.3 TT------mel.67.1 TT------mel.55.2 TT------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 TTtcagag------luc.41 catccggtat tctataaaat atttaattct tatctgaaat attttagaaa eug.20 Taagttcttc tcaattagtt catgtctctg taagccttaa agtgaa---- fuy.9 Tctaggatat attaaacatt ctgtattttt aaccattctt tcaataaata

199

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 ttcaattata aaggtactat tttataaagt taatattcaa cataagataa eug.20 ------fuy.9 cttttcttta tagactaatt tatttaaaaa acaaaacatg ttgtatataa

200 "Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 agtaataaat tgaatttcgt agattcatac tttactataa aatctattta eug.20 ------fuy.9 aaaacttcaa tttaattttg atattttaat caatacttaa aaaccctgaa

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------

201 mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 tacgacctga aattctttta attttccgaa atacattaaa tattaattaa eug.20 ------ATATATAT fuy.9 agcatgtttc aattaatttg taacttagta ctacatgaag tgATGTAAAG

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------

202 mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 tatacctatt atttttcttc acttctgtgt tctattaaat attaaattgt eug.20 ATCTATCCC------fuy.9 ATATGTCCG------

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------

203 mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 tgtctaaaac attgaaaaaa ttcaatacgt taggggtaat ttttaataaa eug.20 ------fuy.9 ------

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------

204 mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 ggtaatttaa aataagagtt taagtacatt ttttttaaag cactaaatat eug.20 ------fuy.9 ------

"Concestor" 1321 ------"Concestor2" 1321 ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------

205 sim.33 ------

Outgroup Species yak.25 ------luc.41 atagttgctt acaaaagttt gtaaaaaatc ctttaatttt cccattaatt eug.20 ------fuy.9 ------

"Concestor" 1321 ------"Concestor2" ------mel.00.1 ------mel.01.58 ------mel.02.7 ------mel.04.1 ------mel.04.7 ------mel.05.13 ------mel.05.16 ------mel.07.20 ------mel.14.26 ------mel.17.66 ------mel.18Ug.7 ------mel.19.113 ------mel.23.74 ------mel.24.81 ------mel.26.32 ------mel.34.9 ------mel.39.37 ------mel.40Ug.12 ------mel.45.89 ------mel.CanS.41 ------mel.53.105 ------mel.54Ug.1 ------mel.59.3 ------mel.yw.50 ------mel.WI83.25 ------mel.51.100 ------mel.21.2 ------mel.29.3 ------mel.64.3 ------mel.67.1 ------mel.55.2 ------

Closely-Related Outgroup Species mau.5 ------sec.38 ------sim.33 ------

Outgroup Species yak.25 ------luc.41 cattactttg cagaccttga cctaaaataa atgtatcatc cCATCGATTT eug.20 ------CATTGATTT fuy.9 ------CATCGATTT

206

"Concestor" 1321 ------TCGCAA AGTCACATAT "Concestor2" ------TCGCAA AGTCACATAT mel.00.1 ------TCGCAA AGTCACATAT mel.01.58 ------TCGCAA AGTCACATAT mel.02.7 ------TCGCAA AGTCACACAT mel.04.1 ------TCGCAA AGTCACATAT mel.04.7 ------TCGCAA AGTCACATAT mel.05.13 ------TCGCAA AGTCACATAT mel.05.16 ------TCGCAA AGTCACATAT mel.07.20 ------TCGCAA AGTCACATAT mel.14.26 ------TCGCAA AGTCACATAT mel.17.66 ------TCGCAA AGTCACATAT mel.18Ug.7 ------TCGCAA AGTCACATAT mel.19.113 ------TCGCAA AGTCACATAT mel.23.74 ------TCGCAA AGTCACATAT mel.24.81 ------TCGCAA AGTCACATAT mel.26.32 ------TCGCAA AGTCACATAT mel.34.9 ------TCGCAA AGTCACATAT mel.39.37 ------TCGCAA AGTCACATAT mel.40Ug.12 ------TCGCAA AGTCACATAT mel.45.89 ------TCGCAA AGTCACATAT mel.CanS.41 ------TCGCAA AGTCACATAT mel.53.105 ------TCGCAA AGTCACATAT mel.54Ug.1 ------TCGCAA AGTCACATAT mel.59.3 ------TCGCAA AGTCACATAT mel.yw.50 ------TCGCAA AGTCACATAT mel.WI83.25 ------TCGCAA AGTCACATAT mel.51.100 ------TCGCAA AGTCACATAT mel.21.2 ------TCGCAA AGTCACATAT mel.29.3 ------TCGCAA AGTCACATAT mel.64.3 ------TCGCAA AGTCACATAT mel.67.1 ------TCGCAA AGTCACATAT mel.55.2 ------TCGCAA AGTCACATAT

Closely-Related Outgroup Species mau.5 ------TCGCAA AGTCACATAT sec.38 ------TCCCAA AGTCACATAT sim.33 ------TCCCAA AGTCACATAT

Outgroup Species yak.25 ------TC CCCCTTTCCC AATGTCCCAA TGTCGCATAT luc.41 TCCAGAGCCC CGTTTttcAG CCCGTTCCCC TCGAGTCCCT TAACCCATGT eug.20 TCCtctcttc agcccctttg ccacccctga actccctt-- AGCCTCGTGT fuy.9 TCCAGAAACC CCTTTcatcc CCTTTCCCTT GAC-ACCCCA AGCCTCATGT

32

207 "Concestor" 1337 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC "Concestor2" TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.00.1 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.01.58 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.02.7 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.04.1 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.04.7 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.05.13 TTGTTCTTTT ATAACGGGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.05.16 TTGTTCTTTT ATAACGGGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.07.20 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.14.26 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.17.66 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.18Ug.7 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.19.113 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.23.74 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.24.81 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.26.32 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.34.9 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.39.37 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.40Ug.12 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.45.89 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCAAAGGCCC mel.CanS.41 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.53.105 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.54Ug.1 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.59.3 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.yw.50 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.WI83.25 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.51.100 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.21.2 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.29.3 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.64.3 TTGTTCTTTT ATAACGTGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.67.1 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC mel.55.2 TTGTTCTTTT ATAACATGAA ----CGCGT- ----ACC--- GCGAAGGCCC

Closely-Related Outgroup Species mau.5 TTGTTCTTTT ATAACGCGAA ----CGCGT- ----ACC--- GCGAAGGTCC sec.38 TTGTTCTTTT ATAACGTGAA ----CGAGT- ----ACC--- GCGAAGGCCC sim.33 TTGTTCTTTT ATAACGTGAA ----CGAGT- ----ACCgcg GCGAAGGTCC

Outgroup Species yak.25 TTGTTCTTTT ATAACGCGAA ----CGCGT- ----ACC--- GCGAAGGCCC luc.41 TTGTTCTTTT ATAACGTCAA ----CGCGTC GCGAACC--- GAGAAGGCCT eug.20 TTGTTCTTTT ACAACGTCAA ----CGCGcC GCGCACC--- GAGGAGGCCC fuy.9 TTGTTATTTT ACAACGTCAc acgtCGCGA- ----ACC--- GAGAAGGTCT

33 "Concestor" 1375 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------"Concestor2" CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------mel.00.1 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.01.58 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.02.7 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.04.1 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.04.7 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.05.13 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------

208 mel.05.16 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.07.20 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.14.26 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.17.66 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.18Ug.7 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.19.113 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.23.74 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.24.81 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.26.32 CATAGAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------mel.34.9 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.39.37 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.40Ug.12 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------mel.45.89 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------mel.CanS.41 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.53.105 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.54Ug.1 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.59.3 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.yw.50 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.WI83.25 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.51.100 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.21.2 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.29.3 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.64.3 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATAGTT ATAC------mel.67.1 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------mel.55.2 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATAGTT ATAC------

Closely-Related Outgroup Species mau.5 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATATTT GTGCTATAGT sec.38 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATATTT GTGCTATAGT sim.33 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATATTT GTGCTATAGT

Outgroup Species yak.25 CATAAAGTGT TCGTAATAAA --ATATATTG TGCAATATTT GTGCTATAGT luc.41 CATAAAGTGT TCCTAATAAA atATATATTG TGCAATATTT T-GCTATAGT eug.20 CATAAAGTGT TCGCAATAAA --ATATATTG TGCAATATTT A-GCTATAGT fuy.9 CATAAAGTGT TCGTAATAAA --ATATATTG TACAATATTT G-GCTATAGT

34 35 3637 "Concestor" 1417 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG "Concestor2" -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.00.1 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.01.58 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.02.7 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.04.1 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.04.7 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.05.13 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.05.16 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.07.20 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.14.26 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.17.66 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.18Ug.7 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.19.113 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.23.74 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.24.81 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG

209 mel.26.32 -----AGCCA CTCATATACA TTATATACAA TATATATgta tatggatGTA mel.34.9 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.39.37 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.40Ug.12 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.45.89 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.CanS.41 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.53.105 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.54Ug.1 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.59.3 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.yw.50 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.WI83.25 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.51.100 -----AGCCA CTCATATACA ATATATACAA TATATATAT------ATG mel.21.2 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.29.3 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.64.3 -----AGCCA CTCATATACA TTATATACAA TATATATAT------GTG mel.67.1 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG mel.55.2 -----AGCCA CTCATATACA TTATATACAA TATATATAT------ATG

Closely-Related Outgroup Species mau.5 TATACAGCCA CTCATATACA TTATATATAC ATAT------GTGT sec.38 TATACAGCCA CTC------ATATACGT TATATATATA TGCAtgtGTG sim.33 TATACAGCCA CTC------ATATACAT TATATATATA TGCA----TG

Outgroup Species yak.25 TATACAGCCG CTC------GTATACAT TATATATA------luc.41 TATATAGCCA CTCATATACA TTATATATAT Gtgcccaact atatatacac eug.20 TATATAGCCA CTCATATACA TTATATACAT Gcg------fuy.9 TATATAGCCA CTCATATACA TTATATATAT ------

38 39 "Concestor" 1454 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- "Concestor2" TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG-TATA TAAATTG--- mel.00.1 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.01.58 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.02.7 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- mel.04.1 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.04.7 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.05.13 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.05.16 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.07.20 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.14.26 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.17.66 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.18Ug.7 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- mel.19.113 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.23.74 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.24.81 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- mel.26.32 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.34.9 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.39.37 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.40Ug.12 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- mel.45.89 TGGATGTGTA TGTGCACAAC TATATAAC-- TGTTG-TATA TAAATTG--- mel.CanS.41 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.53.105 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.54Ug.1 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG---

210 mel.59.3 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.yw.50 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.WI83.25 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.51.100 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.21.2 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.29.3 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG -TATA TAAATTG--- mel.64.3 TGGATGTGTA TGTGCACAAC TATATAGATG TGTTG-TATA TAAATTG--- mel.67.1 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG--- mel.55.2 TGGATGTGTA TGTGCACAAC CATATAGATG TGTTG-TATA TAAATTG---

Closely-Related Outgroup Species mau.5 ------TATATAGATG TGTTG-TATA TAAATTG--- sec.38 TGTGTGTGT------AAC TATATAGATG TGTTG-TATA TAAATTG--- sim.33 TGTGTGTGT------AAC TATATAGATG TGTTG-TATA TAAATTG---

Outgroup Species yak.25 ------TA TGTGCGCAAC TATATAGATG TGTATATAGA TATAaattgc luc.41 ------A TATATTG--- eug.20 ------GCAAC TATATAGATG TATATATATA TATATTG--- fuy.9 ------A TGAGCGCAAC TATATAGATG TGTATATATA gaaggctgtc

40 "Concestor" 1500 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT "Concestor2" ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT mel.00.1 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT mel.01.58 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.02.7 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT mel.04.1 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.04.7 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAACTTCT mel.05.13 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT mel.05.16 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.07.20 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAGTTTGG mel.14.26 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.17.66 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.18Ug.7 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGG mel.19.113 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.23.74 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.24.81 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.26.32 ---TCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGG mel.34.9 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.39.37 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.40Ug.12 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.45.89 ---CCATCCC AT----TATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.CanS.41 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGT mel.53.105 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.54Ug.1 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.59.3 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.yw.50 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGG mel.WI83.25 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCT mel.51.100 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.21.2 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.29.3 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.64.3 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG

211 mel.67.1 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG mel.55.2 ---CCATCCC ATTGCTTATC –ATCGCCTTT ATAGGTAGAA TGTAATTTCG

Closely-Related Outgroup Species mau.5 ---CCATCCC ATTGCTTATC -ACCGCCTTT ATAGGTAGAA TGTAATTTCG sec.38 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTCG sim.33 ---CCATCCC ATTGCTTATC -ATCGCCTTT ATAGGTAGAA TGTAATTTGG

Outgroup Species yak.25 catCCATCCC AT----TGCT TATCGCCTTT ATAGGTAGAA TGTAATTTGT luc.41 ---CCATCCC AT----TGCT TATCGCCTTT ATAGGTAGAA TGTAATTTGT eug.20 ---CCAGCCC AT----TGCT TATCGCCTTT ATAGGTAGAA TGTAATTTGT fuy.9 ccac------TGCT TATCGCCTTT ATAGGTAGAA TGTAATTTCT

SbfI "Concestor" 1546 TTTTATGTGC CGTTTTGcct gcagg “Concestor2" TTTTATGTGC CGTTTTGcct gcagg mel.00.1 TTTTATGTGC CGTTTTGCCT GCAGG mel.01.58 TTTTATGCGC CGTTTTGCCT GCAGG mel.02.7 TTTTATGTGC CGTTTTGCCT GCAGG mel.04.1 TTTTATGTGC AGTTTTGCCT GCAGG mel.04.7 TTTTATGCGC CGTTTTGCCT GCAGG mel.05.13 TTTTATGTGC CGTTTTGCCT GCAGG mel.05.16 TTTTATGCGC AGTTTTGCCT GCAGG mel.07.20 TTTTATGCGC CGTTTTGCCT GCAGG mel.14.26 TTTTATGCGC AGTTTTGCCT GCAGG mel.17.66 TTATATGTGC AGTTTTGCCT GCAGG mel.18Ug.7 TTTTATGTGC CGTTTTGCCT GCAGG mel.19.113 TTTTATGCGC CGTTTTGCCT GCAGG mel.23.74 TTTTATGCGC CGTTTTGCCT GCAGG mel.24.81 TCTTATGCGC CGTTTTGCCT GCAGG mel.26.32 TTTTATGTGC AGTTTTGCCT GCAGG mel.34.9 TTTTATGTGC CGTTTTGCCT GCAGG mel.39.37 TTTTATGCGC AGTTTTGCCT GCAGG mel.40Ug.12 TTTTATGCGC AGTTTTGCCT GCAGG mel.45.89 TTTTATGCGC AGTTTTGCCT GCAGG mel.CanS.41 TTTTATGCGC CGTTTTGCCT GCAGG mel.53.105 TTTTATGTGC CGTTTTGCCT GCAGG mel.54Ug.1 TTTTATGCGC CGTTTTGCCT GCAGG mel.59.3 TTTTATGCGC CGTTTTGCCT GCAGG mel.yw.50 TTTTATGTGC CGTTTTGCCT GCAGG mel.WI83.25 TTTTATGTGC CGTTTTGCCT GCAGG mel.51.100 TTTTATGCGC AGTTTTGCCT GCAGG mel.21.2 TTTTATGCGC AGTTTTGCCT GCAGG mel.29.3 TTTTATGCGC AGTTTTGCCT GCAGG mel.64.3 TTTTATGCGC AGTTTTGCCT GCAGG mel.67.1 TTTTATGCGC AGTTTTGCCT GCAGG mel.55.2 TTTTATGCGC AGTTTTGCCT GCAGG

Closely-Related Outgroup Species mau.5 TTTTATGCGC CGTTTTGCCT GCAGG sec.38 TTTTATGCGC CGTTTTGCCT GCAGG

212 sim.33 TTTTATGCGC CGTTTTGCCT GCAGG

Outgroup Species yak.25 TTTTATGTGC CGTTTTGCCT GCAGG luc.41 TTTTATGCGC CGTTTTGCCT GCAGG eug.20 TTTTATGCGC AGTTTTGCCT GCAGG fuy.9 TTTTATGTGC AGTTTTGCCT GCAGG

B. AscI D. mel Light 1 1 --GGCGCGCC CACATAAAAA TCAGCAACAA AGTTGCTCTG GCCCCATAAA D. yak 1 --GGCGCGCC CACATAAAAA TCAGCAACAA AGTTGCCCTG GCCCCATAAA D. fuy 1 --GGCGCGCC CACATAAAAA TCAGCAACAA AGTTGCCCTG GCCCCATAAA D. aur 1 --GGCGCGCC CACATAAAAA TCAGCAACAA AGTTGCCCTG GCCCCATAAA

D. mel Light 1 49 AGATTGCAAA CAAAAAC--A GAACAACAGA ATGGCATGGA ATAAAATTTA D. yak 51 AAATTGCAAA CAAAAAC--A GAACAACGGA ATGGCATGGA ATAAAATTTA D. fuy 49 AAATTGCAAA CAAAAAgaga acaac----A ATGGCATGGA ATAAAATTTA D. aur 49 AAATTGCTAA CAAAAAaggA GAACAACAGA ATGGCATGGA ATAAAATTTA

D. mel Light 1 97 TATGAATAAC AAAAAGCAGC TAAAgca------AGCAGCA ACAACAATAG D. yak 99 TATGAATAAC AAAAAGCAaa agcagctaca gcaAGCGGCA ACAACAACAG D. fuy 95 TATGAATAAC AAAAAGCAGC TAAAagaaac ---AGCAGCA ACAACAACAG D. aur 99 TATGAATAAC AAAAgcagca gtagt------AGCA ACAACAACAG

D. mel Light 1 141 TTTACTGCCC CGGCTCAGCG GTACACTGTG CAAAACGTTG tactcctcct D. yak 149 TTTACTGCCC CGGCTTAGTG GTACACTGTA CGAAATAAaa taactccctc D. fuy 142 TTTACTGCTC TGGCTCAGCA GTACACTGTG GAAAATATTG ataccattct D. aur 138 TTTACGGCCC TGGCTCAACA GTACACAGAG AGAAAaaata ttcacgactt

D. mel Light 1 191 cat------D. yak 199 tcattaaata aaagtaaact aaatcacacg caagctttgt aaataatcgg D. fuy 192 tttttatatc cataataaag gccaatagag tatttttact gcatgatagt D. aur 188 ttcttagaca aaattatatt agtttgatgt agaaaaattt ttggtgttta

D. mel Light 1 194 ------D. yak 249 tactacatcc tagtatagtt tatttctatt aatatttttt ttacaattta D. fuy 242 atttgggagc tcataatttg taaactgaaa acaagtttgc tttggttctt D. aur 238 atatctttgt atatatatgg tttttatttt gtaagaaagt ggtttttaaa

D. mel Light 1 194 ------D. yak 299 tgtgcccaac aaagatgaa------D. fuy 292 tagggaagaa aaaaaggagc ttttaaattt aaaatatcat tgccattaga D. aur 288 gacgcaataa atcttaagtc ccttaaatat aaataaaata ttattcaata

D. mel Light 1 194 ------D. yak 318 ------D. fuy 342 acaggaaaaa ctacttaata tttgttaagc cttaaataaa ataaatacaa D. aur 338 tagtatttaa atgttaatac atatttttta tattatttta attactaatt

213

D. mel Light 1 194 ------D. yak 318 ------D. fuy 392 atttattcca atgcaaaaat acatgttttt ttattcaaaa aaaggcttaa D. aur 388 tgtgttttac ttttcacaga ggcagtcaga aaagggctgc cttttaggca

D. mel Light 1 194 ------A ATAATATGAG TATATAGAGT ATATAATAta D. yak 318 ------D. fuy 442 ctaaactttc tgaacgtgaA ACAATATTAC TAACTAGGGT ATGTACTAaa D. aur 438 aactttgatt aacattttaa ggtatttcag gaatcttttt aacaagataa

D. mel Light 1 225 ctatatatct ccattgataa tttcgatcat tttcaccttt taactaattt D. yak 318 ------D. fuy 492 tataatttgt ataaaatctg gccaaaagca atgcaaattt tttgtagtgt D. aur 488 aataatatgc taagaa------

D. mel Light 1 275 atgcccaatg tagttgcATT TCTCTGAGTG TGCAGTAAGT GCCCCAGAAT D. yak 318 ------ATT TCTCTAAGTG TGCAGTAag------D. fuy 542 a------CAGTAAGT GCCCAAGAAT D. aur 504 ------T TCTTTCAGTG CCAAGTAAGT GCCcgg----

D. mel Light 1 325 GCGAATGCAT CTCGGGTTCA TCG--GCGGG TCGAGTTTGT TGCAACAacc D. yak 340 -----TGCAT CTCGGGTTCA TCG--Ggttc ---AGTTTGT TGCAA--CAC D. fuy 561 GCGAATGCAT CTCGGGTTCA ACG--GCGGG TCGAGTTTGT TGCAT--CAC D. aur 531 --GAATGCAT CTCGGGTTCA TCGagGCAGG TCGAGTTTGT TGCAACACAC

D. mel Light 1 373 gaagaaCGAA GAAGTTGCAG CGTGCGTTCG GCATTAAAAT TGTGTTTATG D. yak 378 C-----CGAA GAAGTTGCAG CGTGCGTTCG GCATTAAAAT TGTGTTTATG D. fuy 607 C-----CGAA GAACTTGCAG CGTGCGTCCG GCATTAAAAT TGTGTTTATG D. aur 579 C-----CGAA GAAGTTGCAG CATGCGTCCG GCATTAAAAT TGTGTTTATG

D. mel Light 1 423 CGTGTTCGGT AATTTTATAA AAGTTAAATT AGTTTTAAGA CCCTAAATTC D. yak 423 CGTGTTCGGT AATTTTATAA AAGTTAAATT AGTTTTAAGA CCATAAATTC D. fuy 652 CGTGTTTGGT AATTTTATAA AAGTTAAATT AGTTTTAAGA CCATAAATTC D. aur 624 CGTGTTTGGT AATTTTATAA AAGTTAAATT AGTTTTAAGA CCATAAATTC

D. mel Light 1 473 AGCTCACTCT CTCTCTCtcg --CTCTTTC------TC TTTGCCATTT D. yak 473 AGCTCACTCT CTCCCTCggc ctCTCTGTC------TC TTTGCCATTT D. fuy 702 AGCGCACTCT CTggcat------AGTC TCTGCCATTT D. aur 674 AGCGCACTCT CGCTggcgca gttccccatg gcccgaAGTC TCTGCCATTT

D. mel Light 1 512 TAACTTTTAT TACTCTTAAT ATAAA----- AAAGCTGGCT ----AGATGC D. yak 514 TAACTTTTAT TACTCTTAAT ATAAA----- AAAGCTGGCT ggCTAGAAGC D. fuy 733 TAACTTTTAT TACTTTTAAT ATAAA----- AAAGCTGG-- --ATAGAAGC D. aur 724 TAACTTTTAT TACTTTTAAT ATAAAGAAAA AAAGGTGG-- --CTAGGAGC Abd-B 1 Abd-B 2

D. mel Light 1 553 GGGCCAGCTG TAAAAAT--- GCACGCGGTC ATAAAAAGTT GCAGGAGG-- D. yak 559 GGGCCAGCTG TAAAAAT--- GCATGCGCTC ATAAAAAGTT GCAGGAGGCA D. fuy 774 GGGCCAGCTG TAAAAAT--- GCACGCGGTC ATAAAAAGTT GCAGGAGGCA D. aur 770 AGGCCAGCTG TAGTAAAAAT GCACGCGGTC ATAAAAAGTT GCAGGaggca

214 Abd-B 3 Abd-B 4

D. mel Light 1 598 ------CATGTTG CCAGTTGCCT GCAACCGGCA acattcgCAG D. yak 606 TGTTGCc------AGTTGCCA GTTGCCTGCA A------CAG D. fuy 821 TGTTGCtggt agcCAAGTTG CCAGTTGCCG GTTGCCTGCA A------CAt D. aur 820 tctacatcga cgtccacatc cacatcgcca tcgggctgga gtccccggga

D. mel Light 1 635 ------AAC A---GCAGCA ACATCGTAAA ATAACTTCTT GCTCTGCGGT D. yak 635 ------AAC A---GCAGCA ACATCGTAAA ATAACTTCTT GCTCTGCGGT D. fuy 865 ccactgaAAC G---GCAGCA ACATCGTAAA ATAATTTCTT GCTCTGCGGT D. aur 870 tcggttggta tgttGCAGCA ACATCGTAAA ATAATTTCTT GCTCTGCGGT Abd-B 5

D. mel Light 1 675 CTGAGTTTGG CCGCAACAAT GTTGCTGCAT TTATTCGTAT TATTATTACA D. yak 675 CTCCGTTTGG CCGCAACAAT GTTGCCGCAT TTATTCGTAT TATTATTACA D. fuy 912 CTCCATTTGG CCGCAACAAT GTTGCTGCAT TTATTCGTAT TATTATTACA D. aur 920 CTCCGTTTGG CCGCAACAAT GTTGCCGCAT TTATTCGTAT TATTATTACA Dsx1 Site Abd-B 6

D. mel Light 1 725 TTTTAATGAA TAATTCTAAT TATATGCAAC TTGAATAAGC CCGC------D. yak 725 TTTTAATGAA TAATTCTAAT TATATGCAAC TTGAATaggg ccgctgccgc D. fuy 962 TTTTAATGAT TAATTCTAAT TATATGCGAC TTGAATAAGG CCGCTGAC -- D. aur 970 TTTTAATGAT TAATTCTAAT TATATGCGAC TTGAATAAGG CCGCCGAA --

D. mel Light 1 769 ------CGATG------CCAATAA A--AGCGGCG D. yak 775 tggcTGAGCG TAGAAA------CCAATAA A--AATGCCG D. fuy 1010 ----TGAGCG AAAAAG------CCAATAA A--AGTGCCG D. aur 1018 ----TGGCCG AATGAGaaat gctctggcag cggCCAATAA AAAATGGCCG Abd-B 7

D. mel Light 1 790 TGGCAAAGTG GAGTGGACTG GG------T TTGTGTGGCG ccc------D. yak 807 GGGCAAAGTG GAGTGGATTT CG------G ATGTGTGGCG cct------D. fuy 1038 AGGCAAAGTG GAGTGGATTT TG------G CCGTGTGGCG CCCCgtg--- D. aur 1064 GGGCAAAGTG GAGTGttttt tttttggtcG CCGTGTGGCG CCCCagggag

D. mel Light 1 826 ----CTGCTA GTGGCACATA AAAATTGGCG CAAGTTAATT GTGGTAGTTA D. yak 843 ----CTGCTA GTGGCACATA AAAATTGGTG CAAGTTAATT GTGGTAGTTA D. fuy 1078 ----CTGCTA GTGGCACATA AAAATTGGCG CAAGTTAATT GTGGTAGTTA D. aur 1114 gcgaCTCGTA GTGGGGCATA AAAATTGGTG Ct-GTTAATT GTGGTAGTTA Abd-B 8

D. mel Light 1 872 TTTGCTGTTT TGCCATTTGG TCATTTTACA ATTTTACCAT TTCAg--C-- D. yak 889 TTTGCTGTTT TGCCATTTGG TCATTTTACA ATTTTACCAT TTCAccattt D. fuy 1124 TTTGCTGTTT TGCCATTTGG CCGTTTTACA ATTTTACCAT TCT------D. aur 1163 TTTGCTGTTT TGCCATTTGG CCATTTCACA ATTTTACCAT TCTgccaC-- Abd-B 9 Abd-B 10

D. mel Light 1 918 ----CACAAC TTTTCGCACT GCTCCCcccc tttccCAGC------D. yak 939 caGCCACAAC TTTTAGCACT GCTCCCTTCG CCCGACG------D. fuy 1167 --GCCACAAC TTTTCGCATT GCTCCGCTTG CCTGGTGCAA CAATGTTGCC D. aur 1211 ----CACAAC TTTTCACATT GCTCTGGTTG CTCCGCGCAA CAAAGTTGCA DSX 2

D. mel Light 1 961 ----ACAACA ATGTTGCGGC ATTCTCGCAC -TTTACGAGG CG-TTTTTTT D. yak 983 ----ACAACA ATGTTGCGGC ATTCTCGCAC -TTTACGAGG CG------D. fuy 1215 G------CAGTCGCTGC ATTCTCGCAC TTTTACGAGG CG------D. aur 1257 tccaagagtt gcccTGCCGC ATTCTCGCAC TTTTACGAGG CGTTTTTTTT

215 Dsx2 Site Abd-B 11

D. mel Light 1 998 -----TTTAT ATCACTTACT TTACTTAGTT GATTAAGGGC GTGGCCGATG D. yak 1012 ------TTT TTTATATCAC TTACTTAGTT GATTAAGGGC GTTGCCGATG D. fuy 1248 ------TTT TTTATATCAC TTACTTAGTT GATTAAGGGC GTGGCCGATG D. aur 1307 ccccttaTTT TTTATATCAC TTACTTAGTT GATTAAGGGC GTGGCCGATG Abd-B 12 Abd-B 13

D. mel Light 1 1042 GGCCAGatac ATGCTTAGAT TTGCTCc------AGC D. yak 1055 GGCCAGAT-- ATGCTTAGAT TTGGTCTATG TATCCccgt------GG D. fuy 1291 GGaaAGAT-- ATGTTTAGAT TTGCTCTTTG TATCCGgaca ttctatcAGG D. aur 1357 GGCCAGAT-- ATGTTTAGAT TTGGTCTTTG TATCCGtcct aa------G

D. mel Light 1 1072 AGTGGGCTGC ATTTTACGAC CCTCAAAACC CGATCCAAAt ggaaaatatg D. yak 1094 AGTGGGCTGT GTTTTACGAC CCTCAAAACC CGATCGAAAC GGAAAGAGAc D. fuy 1339 AGTGGGTTGC ATTTTACGAC CCGCCAAAGC CGATCAAAAC GGAAACAGAa D. aur 1398 AGTGGGCTGC ATTTTACGAG CCTCGAAAGg tgatcgaaat ggctacggaa Abd-B 14

D. mel Light 1 1122 aaaatac------GGC------TAATC CGCTTATGAG CACAACAAAT D. yak 1144 -A---AATAT GAAATGGT------TAATC CGCTTATGGT AACAACAAAA D. fuy 1389 gA---AATAT GAAATGGCCT CTGGGTAATC CGCTTATGGT TATAACAATA D. aur 1448 agaggAATAT CAAATGGGTT CGAGGTAATC CGCTTacgaa atgagctcct

D. mel Light 1 1157 TGgttcacAC ACTTCGATCG AAATTACTTG CGATCGCcat ttgattggtT D. yak 1183 TGtt----AC ACTTCGATCG AAATCACTTG CGATCGCatt ttaaatggcT D. fuy 1436 Ttatagaaat ttcgataaga tttgaaataa gatctttttt tgactattaa D. aur 1498 tagaatcctc acactgtgat cacagctgaa ttattgccat acttatatct

D. mel Light 1 1207 TCAATGTATT GCTTTAACTG GCAGGTGAAc actttgtttt ttatcTAACG D. yak 1229 TTAGTTTATC GCTTCAACTA GCAGGTGAAa atcgggtttt gagt-TAACG D. fuy 1486 atggtttcaa tccactgcct tttgataatg gttaat------D. aur 1548 gacattaaaa attctcttgt ttaccttatt tttaatgacc ttaattactg

D. mel Light 1 1257 ATTCTTACTA TTTAATATCC TAGTCaatta atGTATTTTC CACTACTTCc D. yak 1278 ATTTGTACCA GTTATTGGCC AAGTCccctg aaGTATTTTC CAGTACTTCt D. fuy 1522 ------D. aur 1598 ccttagttat aacttacata aatgggactt tatttaacag gtttcggtgg

D. mel Light 1 1307 atcgatatca cagagttccc atttcgca------D. yak 1328 ccgttttccg accatcgacc atttcagagt ccccctttcc caatgt---- D. fuy 1522 ------D. aur 1648 aaatttgatt agaatggaag ttttaaggga ttttttaata agccactaag

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1522 ------D. aur 1698 aaattttaca ggttacttgg ttatttgcag tattaacgtt ggaaacctaa

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1522 ------D. aur 1748 acaatttttc tgttaaaaat atattttaaa cataatttaa taaatttatt

216

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1522 ------TTGGTTTA AAATTGTAAA D. aur 1798 aaatagcaag attgagagct catgaatttt gcTTGTTTAA ATATTTTAAA

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1540 TGTTcagttt ccgattaatt tctatcagcc cattttataa gaaagttttt D. aur 1848 TGTTttaaaa ttaaattaga agcatgataa atttttaaat aataccacta

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1590 ttttgctatt tgcctttttg tttttcaaat taatttctag gatatattaa D. aur 1898 cgttttaaag ccaatttaag tgccgatttt attttgtaga ttttattaca

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1640 acattctgta tttttaacca ttctttcaat aaatactttt ctttatagac D. aur 1948 aagtcaggtt ctaaagtcta caattttagt tcggtttaat caccttaact

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1690 taatttattt aaaaaacaaa acatgttgta tataaaaaac ttcaatttaa D. aur 1998 ccataaccat gccaagtgaa atctttccgc tagtatctta taaaaatgtt

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1740 ttttgatatt ttaatcaata cttaaaaacc ctgaaagcat gtttcaatta D. aur 2048 gttctacaaa tggtgtttta tttcccaggc cttgaggtga ta------

D. mel Light 1 1335 ------D. yak 1374 ------D. fuy 1790 atttgtaact tagtactaca tgaagtgatg TAAAGATATG TCCGCATCGA D. aur 2090 ------TAAAGATACC ACCGCATCGA

D. mel Light 1 1335 ------A------AG D. yak 1374 ------CCC AA------TG D. fuy 1840 TTTTCCAGAa acccctttca tcccctttcc cttgacaCCC CA------AG D. aur 2110 TTTTCCAGAg agaccccctt gtcagcccac attcccctta attcccttAG

D. mel Light 1 1338 TCACATATTT GTTCTTTTAT AACATGAA-- --CGCGTACC GCGAAGG--- D. yak 1381 TCGCATATTT GTTCTTTTAT AACGCGAA-- --CGCGTACC GCGAAGG--- D. fuy 1884 CCTCATGTTT GTTATTTTAC AACGTCAcac gtCGCGAACC GAGAAGG--- D. aur 2160 CCAAATGTTT GTTATTTTAT AACGTCAA-- --CGCGTcgc gaaccgagaa

D. mel Light 1 1381 ---CCCCATA AAGTGTTCGC AATAAAATAT ATTGTGCAAT AGTT------D. yak 1424 ---CCCCATA AAGTGTTCGT AATAAAATAT ATTGTGCAAT ATTTGTGCTA D. fuy 1931 ---TCTCATA AAGTGTTCGT AATAAAATAT ATTGTACAAT ATTTG-GCTA D. aur 2206 ggcCCCCATA AAGTGTTTGT AATAAAATAT ATTGTGCAAT ATTT-TGCTA

D. mel Light 1 1422 -----ATACA GCCACTCATA TACATTATAT Acaatatata tatatgtggA

217 D. yak 1471 TAGTTATACA GCCGCTCGTA TACATTATAT Ata------D. fuy 1977 TAGTTATATA GCCACTCATA TACATTATAT ATAT------A D. aur 2255 TAGTTATAGA GCCACTCATA TACATTATAT ATAT------A

D. mel Light 1 1467 TGTGTATGTG CACAACCATA TAGATGTGTt gTATATAAAT TG------D. yak 1504 ----TATGTG CGCAACTATA TAGATGTGTA -TATAgatat aaattgccat D. fuy 2012 Tga------G CGCAACTATA TAGATGTGTA -TATATAgaa ggctgtccca D. aur 2290 TATAAATGTG GGCAACTATA TAAATGTGTA -TATATATAT TG------

D. mel Light 1 1509 CCATCCCATT GCTTatcATC GCCTTTATAG GTAGAATGTA ATTTCTTTTT D. yak 1549 CCATCCCATT GCTT---ATC GCCTTTATAG GTAGAATGTA ATTTCTTTTT D. fuy 2055 c------T GCTT---ATC GCCTTTATAG GTAGAATGTA ATTTCTTTTT D. aur 2331 CCATGCCATT GCTT---ATC GCCTTTATAG GTAGAATGTA ATTTCGTTTT

SbfI D. mel Light 1 1559 ATGTGCAGTT TTGCCTGCAG G D. yak 1596 ATGCGCAGTT TTGCCTGCAG G D. fuy 2094 ATGTGCAGTT TTGCCTGCAG G D. aur 2378 ATGCGCAGTT TTGCCTGCAG G

Figure A1: Sequence alignments for dimorphic elements. (A) Annotated alignment of dimorphic elements used to reconstruct ancestral sequences (Concestor and Concestor 2) from extant D. melanogaster populations. Dimorphic elements from D. mauritiana (mau.5), D. sechellia (sec.38), D. simulans (sim.33), D. yakuba (yak.25), D. lucipennis (luc.41), D. eugracilis (eug.20), and D. fuyamai (fuy.9) were used as out groups. (B) Annotated alignment of orthologous dimorphic elements from D. melanogaster Light 1 allele, D. yakuba (D. yak), D. fuyamai (D. fuy), and D. auraria (D. aur). White font on purple background indicates the AscI and SbfI restriction enzyme sites that were introduced for cloning purposes. Red font on black background indicates polymorphisms among the population-stock alleles. At the top of alignment is the number or letter designation assigned to each polymorphism. Ambiguous sites in the reconstructed concestor sequences are indicated by a gray background color. Characterized Abd-B and Dsx binding sites are indicated respectively by white font on a blue background and black font on a yellow background. The BstXI restriction enzyme site used for genotyping is indicated by white font on a maroon background.

218

C. bab1 1st exon bab1 (genome) 1 ATGGCGTCGG CGCAGGCGGA GACGAATGTC GGCTTGGCGT CCGAACAGGG bab1 (Light P1) 1 ATGGCGTCGG CGCAGGCGGA GACGAATGTC GGCTTGGCGT CCGAACAGGG bab1 (Dark P1) 1 ATGGCGTCGG CGCAGGCGGA GACGAATGTC GGCTTGGCGC CCGAACAGGG bab1 (D.sec.) 1 ATGGCGTCGG CGCAGGCGGA GACGAATGTC GGTGTGGCGC CCGAACAGGG bab1 (genome) 51 ACCAGTGGCT CAGAGGCAGC GCAAAGGGAC GGGATCGGGC GCCGATTCGC bab1 (Light P1) 51 ACCAGTGGCT CAGAGGCAGC GCAAAGGGAC GGGATCGGGC GCCGATTCGC bab1 (Dark P1) 51 ACCAGTGGCT CAGAGGCAGC GCAAAGGGAC GGGATCGGGC GCCGATTCGC bab1 (D.sec.) 51 ACCAGTGGCC CAGAGGCAGC GCAAGGGGAC GGGATCGGGC GCCGATTCGC bab1 (genome) 101 CCAAGAGTAA CAGAAGCTCG CCCACTCAGC AGGAGGAGAA GCGTATCAAA bab1 (Light P1) 101 CCAAGAGTAA CAGAAGCTCG CCCACTCAGC AGGAGGAGAA GCGTATCAAA bab1 (Dark P1) 101 CCAAGAGTAA CAGAAGCTCG CCCACTCAGC AGGAGGAGAA GCGTATCAAA bab1 (D.sec.) 101 CCAAGAGTAA CAGGAGCTCG CCCACTGAGC AGGAGGAGAA GCGTATCAAA bab1 (genome) 151 AGCGAGGATC GCACTTCACC AACTGGCGGG GCCAAGGACG AGGACAAGGA bab1 (Light P1) 151 AGCGAGGATC GCACTTCACC AACTGGCGGG GCCAAGGACG AGGACAAGGA bab1 (Dark P1) 151 AGCGAGGAAC GCACTTCACC CACTGGCGGG GCCAAGGACG AGGACAAGGA bab1 (D.sec.) 151 AGCGAGGATC GCACTTCACC CACTGGCGGA GCCAAGGACG AGGAAAAGGA bab1 (genome) 201 GAGTCAAGGT CATGCTGTAG CCGGAGGGGG AGGATCTTCG CCCGTCAGTT bab1 (Light P1) 201 GAGTCAAGGT CATGCTGTAG CCGGAGGGGG AGGATCTTCG CCCGTCAGTT bab1 (Dark P1) 201 GAGTCAAGGT CATGCTGTAG CCGGAGGGGG AGGATCTTCG CCCGTCAGTT bab1 (D.sec.) 201 GAGTCAAGGT CATGCTGGAG CCGGAGGGGG AGGATCTTCG CCAGTGAGTT bab1 (genome) 251 CGCCACAGGG CAGGAGTTCT TCGGTAGCCT CGCCCAGTTC CAGCTCCCAG bab1 (Light P1) 251 CGCCACAGGG CAGGAGTTCT TCGGTAGCCT CGCCCAGTTC CAGCTCCCAG bab1 (Dark P1) 251 CGCCACAGGG CAGGAGTTCT TCGGTAGCCT CGCCCAGTTC CAGCTCCCAG bab1 (D.sec.) 251 CGCCACAGGG CAGGAGTTCT TCGGTGGCCT CGCCCAGTTC CAGCTCCCAG bab1 (genome) 301 CAATTCTGCC TGCGCTGGAA CAACTATCAG ACGAACCTGA CCACCATCTT bab1 (Light P1) 301 CAATTCTGCC TGCGCTGGAA CAACTATCAG ACGAACCTGA CCACCATCTT bab1 (Dark P1) 301 CAATTCTGCC TGCGCTGGAA CAACTATCAG ACGAACCTGA CCACCATCTT bab1 (D.sec.) 301 CAATTCTGCC TGCGCTGGAA CAACTACCAG ACGAACCTGA CCACCATCTT bab1 (genome) 351 TGACCAGCTG CTCCAGAACG AGTGCTTCGT GGACGTGACC TTGGCATGCG bab1 (Light P1) 351 TGACCAGCTG CTCCAGAACG AGTGCTTCGT GGACGTGACC TTGGCATGCG

219 bab1 (Dark P1) 351 TGACCAGCTG CTCCAGAACG AGTGTTTCGT GGACGTGACC TTGGCATGCG bab1 (D.sec.) 351 CGACCAGCTG CTCCAGAACG AGTGCTTCGT GGACGTGACC TTGGCCTGCG bab1 (genome) 401 ATGGTCGGTC CATGAAGGCC CACAAGATGG TCCTGTCCGC CTGCTCGCCC bab1 (Light P1) 401 ATGGTCGGTC CATGAAGGCC CACAAGATGG TCCTGTCCGC CTGCTCGCCC bab1 (Dark P1) 401 ATGGTCGGTC CATGAAGGCC CACAAGATGG TTCTGTCCGC CTGCTCGCCC bab1 (D.sec.) 401 ATGGTCGCTC TATGAAGGCC CACAAGATGG TCCTGTCCGC CTGCTCGCCC bab1 (genome) 451 TACTTCCAAA CACTTCTGGC CGAAACGCCC TGCCAGCATC CCATTGTGAT bab1 (Light P1) 451 TACTTCCAAA CACTTCTGGC CGAAACGCCC TGCCAGCATC CCATTGTGAT bab1 (Dark P1) 451 TACTTCCAAA CACTTCTGGC CGAAACGCCC TGCCAGCATC CCATTGTGAT bab1 (D.sec.) 451 TACTTCCAAA CGCTTCTGGC CGAGACGCCC TGCCAGCATC CCATTGTGAT bab1 (genome) 501 CATGCGGGAC GTAAATTGGT CGGATCTCAA GGCCATTGTG GAGTTCATGT bab1 (Light P1) 501 CATGCGGGAC GTAAATTGGT CGGATCTCAA GGCCATTGTG GAGTTCATGT bab1 (Dark P1) 501 CATGCGGGAC GTAAACTGGT CGGATCTCAA GGCCATTGTG GAGTTCATGT bab1 (D.sec.) 501 CATGCGGGAC GTAAACTGGT CGGATCTCAA GGCCATTGTG GAGTTCATGT

bab1 (genome) 551 ATCGCGGCGA GATCAACGTG AGCCAGGACC AGATAGGTCC TCTGCTCAGG bab1 (Light P1) 551 ATCGCGGCGA GATCAACGTG AGCCAGGACC AGATAGGTCC TCTGCTCAGG bab1 (Dark P1) 551 ATCGCGGCGA GATCAACGTG AGCCAGGACC AGATAGGTCC TCTGCTCAGG bab1 (D.sec.) 551 ACCGCGGCGA GATCAACGTG AGCCAGGACC AGATAGGTCC TCTGCTCAGA bab1 (genome) 601 ATAGCTGAGA TGTTGAAAGT GCGTGGTCTG GCGGATGTGA CCCATATGGA bab1 (Light P1) 601 ATAGCTGAGA TGTTGAAAGT GCGTGGTCTG GCGGATGTGA CCCATATGGA bab1 (Dark P1) 601 ATAGCTGAGA TGTTGAAAGT GCGTGGTCTG GCGGATGTGA CCCATATGGA bab1 (D.sec.) 601 ATAGCTGAGA TGTTGAAAGT GCGCGGTCTG GCGGATGTGA CCCACATGGA bab1 (genome) 651 GGCGGCCACG GCAGCAGCGG CTGCCGCTTC GTCGGAGAGA ATGCCCTCCT bab1 (Light P1) 651 GGCGGCCACG GCAGCAGCGG CTGCCGCTTC GTCGGAGAGA ATGCCCTCCT bab1 (Dark P1) 651 GGCGGCCACG GCAGCAGCGG CTGCCGCTTC GTCGGAGAGA ATGCCCTCCT bab1 (D.sec.) 651 GGCGGCCACG GCAGCAGCGG CTGCCGCTTC GTCGGAGAG G ATGCCCTCCT bab1 (genome) 701 CGCCCAAGGA GAGCACTTCA ACTTCCAGAA CTGAACACGA CAGGGAACGG bab1 (Light P1) 701 CGCCCAAGGA GAGCACTTCA ACTTCCAGAA CTGAACACGA CAGGGAACGG bab1 (Dark P1) 701 CGCCCAAGGA GAGCACTTCA ACTTCTAGAA CTGAACACGA CAGGGAACGG bab1 (D.sec.) 701 CGCCCAAGGA GAGTACTTCA ACTTCCAGAA CTGAACATGA CAGGGAACGG bab1 (genome) 751 GAGGCCGAGG AGCTGCTGGC CTTCATGCAG CCCGAGAAGA AGCTACGCAC bab1 (Light P1) 751 GAGGCCGAGG AGCTGCTGGC CTTCATGCAG CCCGAGAAGA AGCTACGCAC bab1 (Dark P1) 751 GAGGCCGAGG AGCTGCTGGG CTTCATGCAG CCCGAGAAGA AGCTACGCAC bab1 (D.sec.) 751 GAGGCCGAGG AGCTACTGGC CTTCATGCAG CCCGAGAAGA AGCTACGCAC bab1 (genome) 801 TTCGGACTGG GATCCCGCTG AGCTGAGGCT CTCCCCACTG GAGCGGCAGC bab1 (Light P1) 801 TTCGGACTGG GATCCCGCTG AGCTGAGGCT CTCCCCACTG GAGCGGCAGC bab1 (Dark P1) 801 TTCGGACTGG GATCCCGCTG AGCTGAGGCT TTCCCCACTG GAGCGGCAGC bab1 (D.sec.) 801 TTCGGACTGG GATCCCGCTG AGCTGAGGCT TTCCCCACTG GAGCGGCAGC bab1 (genome) 851 AGGGCAGGAA TGTAAGAAAG CGCCGGTGGC CATCGGCGGA CACAATATTC bab1 (Light P1) 851 AGGGCAGGAA TGTAAGAAAG CGCCGGTGGC CATCGGCGGA CACAATATTC bab1 (Dark P1) 851 AGGGCAGGAA TGTAAGAAAG CGCCGGTGGC CATCGGCGGA CACAATATTC bab1 (D.sec.) 851 AGGGCAGGAA TGTGAGAAAG CGCCGGTGGC CCTCGGCGGA CACAATATTC bab1 (genome) 901 AATCCACCCG CACCACCCAG TCCACTGAGC AGCCTGATTG CGGCCGAAAG bab1 (Light P1) 901 AATCCACCCG CACCACCCAG TCCACTGAGC AGCCTGATTG CGGCCGAAAG bab1 (Dark P1) 901 AATCCACCCG CACCACCCAG TCCACTGAGC AGCCTGATTG CGGCCGAAAG bab1 (D.sec.) 901 AATCCACCCG CACCACCCAG TCCACTGAGC AGCCTGATAG CCGCCGAAAG bab1 (genome) 951 GATGGAGCTG GAGCAAAAGG AAAGAGAGAG ACAGAGGGAC TGTTCGCTGA bab1 (Light P1) 951 GATGGAGCTG GAGCAAAAGG AAAGAGAGAG ACAGAGGGAC TGTTCGCTGA bab1 (Dark P1) 951 GATGGAGCTG GAGCAAAAGG AAAGAGAGAG ACAGAGGGAC TGTTCGCTGA bab1 (D.sec.) 951 GATGGAGCTG GAGCAAAAGG AAAGAGAGAG ACAGAGGGAC TGTTCGCTGA

220 bab1 (genome) 1001 TGACACCCCC ACCCAAACCA CCAATGAGCA GTGGCTCCAC AGTGGGAGCC bab1 (Light P1) 1001 TGACACCCCC ACCCAAACCA CCAATGAGCA GTGGCTCCAC AGTGGGAGCC bab1 (Dark P1) 1001 TGACACCCCC ACCCAAACCA CCAATGAGCA GTGGCTCCAC AGTGGGAGCC bab1 (D.sec.) 1001 TGACACCTCC ACCCAAACCA CCACTGAGCA GTGGCTCCGC AGCGGGAGCC bab1 (genome) 1051 ACGAGGCGCC TGGAGACCGC CATCCACGCC CTGGACATGC CATCGCCGGC bab1 (Light P1) 1051 ACGAGGCGCC TGGAGACCGC CATCCACGCC CTGGACATGC CATCGCCGGC bab1 (Dark P1) 1051 ACGAGGCGCC TGGAGACCGC CATCCACGCC CTGGACATGC CATCGCCGGC bab1 (D.sec.) 1051 ACGAGGCGCT TGGAGACCGC TATCCATGCT CTGGACATGC CATCGCCGGC

bab1 (genome) 1101 TGCCACGCCA GGACC-TCTG TCCCGATCGT CGA-GACCTC ACTCGCAGAG bab1 (Light P1) 1101 TGCCACGCCA GGACC-TCTG TCCCGATCGT CGA-GACCTC ACTCGCAGAG bab1 (Dark P1) 1101 TGCCACGCCA GGACC-TCTG TCCCGATCGT CGA-GACCTC ACTCGCAGAG bab1 (D.sec.) 1101 TGCCACGCCA GGACC-TCTC TCCCGATCCT CGA-GACCAC ACTCGCAGAG bab1 (genome) 1149 CCCCCAGCAG CAGCAGGCAC AGCAGCAGGG TCAGCTTCCT TTGCCCCTGC bab1 (Light P1) 1149 CCCCCAGCAG CAGCAGGCAC AGCAGCAGGG TCAGCTTCCT TTGCCCCTGC bab1 (Dark P1) 1149 CCCCCAGCAG CAGCAGGCAC AGCAGCAGGG TCAGCTTCCT TTGCCCCTGC bab1 (D.sec.) 1149 CCCCCAGCAG CAGCAGGCAC AGCAGCAGGG TCAGCTTCCT TTGCCCCTGC bab1 (genome) 1199 CCCTGCATCC GCACCATCAC GCATCACCCG CCCCACATCC CTCCCAGACC bab1 (Light P1) 1199 CCCTGCATCC GCACCATCAC GCATCACCCG CCCCACATCC CTCCCAGACC bab1 (Dark P1) 1199 CCCTGCATCC GCACCATCAC GCATCACCCG CCCCACATCC CTCCCAGACC bab1 (D.sec.) 1199 CCCTGCATCC GCACCACCAC GCATCACCCG CCCCACATCC CTCCCAGACC bab1 (genome) 1249 GCCGGATCAG CCCACCACCC -GGCATCGCC TGCTGGAGAT TCCCGTTTTC bab1 (Light P1) 1249 GCCGGATCAG CCCACCACCC -GGCATCGCC TGCTGGAGAT TCCCGTTTTC bab1 (Dark P1) 1249 GCCGGATCAG CCCACCACCC -GGCATCGCC TGCTGGAGAT TCCCGTTTTC bab1 (D.sec.) 1249 GCCGGATCAG CCCACCACCC -GCCATCGCC CGCTGGAGAC TCCCGTTTTT bab1 (genome) 1298 CCCTCGGCCC AGCAGCCGCC ATGGCCGCTG CCAGGGAACT GAGTGGCCTG bab1 (Light P1) 1298 CCCTCGGCCC AGCAGCCGCC ATGGCCGCTG CCATGGAACT GAGTGGCCTG bab1 (Dark P1) 1298 CCCTCGGCCC AGCAGCCGCC ATGGCCGCTG CCATGGAACT GAGTGGCCTG bab1 (D.sec.) 1298 CCCTCGGACC CGCAGCCGCC ATGGCCGCTG CCATGGAACT GAGTGGCCTG bab1 (genome) 1348 GGACCAGGTC CGTCCGCCGA GCCACGCCTT CCGCCTCCAC CGCCGCACCA bab1 (Light P1) 1348 GGACCAGGTC CGTCCGCCGA GCCACGCCTT CCGCCTCCAC CGCCGCACCA bab1 (Dark P1) 1348 GGACCAGGTC CGTCCGCCGA GCCACGCCTT CCGCCTCCAC CGCCGCACCA bab1 (D.sec.) 1348 GGACCTGGTC CCTCCGCCGA GCCACGCCTA CCGCCTCCAC CGCCGCACCA bab1 (genome) 1398 CCATGGCGGT GGTGGAGTGG GCGGCGGGGG AGTTGGAGGA GGAGGTGCAG bab1 (Light P1) 1398 CCATGGCGGT GGTGGAGTGG GCGGCGGGGG AGTTGGAGGA GGAGGTGCAG bab1 (Dark P1) 1398 CCATGGCGGT GGTGGAGTGG GCGGCGGGGG AGTTGGAGGA GGAGGTGCAG bab1 (D.sec.) 1398 CCATGGCGGT GGTGGAGTGG GCGGTGGGGG AGTT---GGA GGAGGTGCAG bab1 (genome) 1448 GCGGAGTGGG TTCAGGCGGG GGATCCTCGC TCGCCGATGA CTTGGAGATC bab1 (Light P1) 1448 GCGGAGTGGG TTCAGGCGGG GGATCCTCGC TCGCCGATGA CTTGGAGATC bab1 (Dark P1) 1448 GCGGAGTGGG TTCAGGCGGG GGATCCTCGC TCGCCGATGA CTTGGAGATC bab1 (D.sec.) 1445 GCGGAGTGGG TTCAGGCGGG GGATCCTCGC TCGCCGATGA CTTGGAGATC bab1 (genome) 1498 AAGCCAGGGA TCGCCGAGAT GATCCGAGAG GAAGAAAGG G TGAGT----- bab1 (Light P1) 1498 AAGCCAGGGA TCGCCGAGAT GATCCGAGAG GAAGAAAGG G TGAGT----- bab1 (Dark P1) 1498 AAGCCAGGGA TCGCCGAGAT GATCCGAGAG GAAGAAAGG G TGAGT----- bab1 (D.sec.) 1495 AAGCCAGGGA TCGCCGAGAT GATCCGAGAG GAAGAAAGG G TGAGT-----

bab1 2nd exon bab1 (genome) 1 CAGGCCAAAA TGATGGAGAA CTCGCACGCC TGGATGGGCG CCACCGGATC bab1 (Light P1) 1 CAGGCCAAAA TGATGGAGAA CTCGCACGCC TGGATGGGCG CCACCGGATC bab1 (Dark P1) 1 CAGGCCAAAA TGATGGAGAA CTCGCACGCC TGGATGGGCG CCACCGGATC

221 bab1 (D.sec.) 1 CAGGCCAAAA TGATGGAGAA CTCGCACGCC TGGATGGGCG CCACCGGATC bab1 (genome) 51 AACGCTGGCA GGTTCGT bab1 (Light P1) 51 AACGCTGGCA GGTTCGT bab1 (Dark P1) 51 AACGCTGGCA GGTTCGT bab1 (D.sec.) 51 CACGCTGGCA GGTTCGT

bab1 3rd exon bab1 (genome) 1 CAGCAGACAG CTACCAGTAC CAGCTGCAGT CCATGTGGCA AAAGTGCTGG bab1 (Light P1) 1 CAGCAGACAG CTACCAGTAC CAGCTGCAGT CCATGTGGCA AAAGTGCTGG bab1 (Dark P1) 1 CAGCAGACAG CTACCAGTAC CAGCTGCAGT CAATGTGGCA AAAGTGCTGG bab1 (D.sec.) 1 CAGCAGACAG CTACCAGTAC CAGCTGCAGT CCATGTGGCA GAAGTGCTGG bab1 (genome) 51 AACACCAACC AGAATCTGAT GCATCACATG CGCTTCCGCG AGCGAGGTCC bab1 (Light P1) 51 AACACCAACC AGAATCTGAT GCATCACATG CGCTTCCGCG AGCGAGGTCC bab1 (Dark P1) 51 AACACCAACC AGAATCTGAT GCATCACATG CGCTTCCGCG AGCGAGGTCC bab1 (D.sec.) 51 AACACCAACC AGAACCTGAT GCACCACATG CGCTTCCGCG AGCGAGGTCC bab1 (genome) 101 TCTGAAGTCG TGGCGACCCG AGACCATGGC GGAGGCCATT TTCAGTGTGC bab1 (Light P1) 101 TCTGAAGTCG TGGCGACCCG AGACCATGGC GGAGGCCATT TTCAGTGTGC bab1 (Dark P1) 101 TCTGAAGTCC TGGCGACCCG AGACCATGGC GGAGGCCATT TTCAGTGTGC bab1 (D.sec.) 101 TCTCAAGTCC TGGCGTCCGG AGACCATGGC GGAGGCCATT TTCAGTGTGC bab1 (genome) 151 TAAAGGAGGG TCTATCGCTA TCTCAGGCCG CCCGCAAGTA CGACATCCCG bab1 (Light P1) 151 TAAAGGAGGG CCTATCGCTA TCCCAGGCCG CCCGCAAGTA CGA TATCCCG bab1 (Dark P1) 151 TAAAGGAGGG TCTATCGCTA TCTCAGGCCG CCCGCAAGTA CGACATCCCG bab1 (D.sec.) 151 TGAAGGAGGG CCTTTCGCTC TCCCAGGCCG CCCGCAAGTA CGACATCCCG bab1 (genome) 201 TATCCAACAT TCGTGCTCTA TGCGAACAGG GTGCACAATA TGCTGGGACC bab1 (Light P1) 201 TATCCAACAT TCGTGCTCTA TGCGAACAGG GTGCACAATA TGCTGGGACC bab1 (Dark P1) 201 TATCCAACAT TCGTGCTCTA TGCGAACAGG GTGCACAATA TGCTGGGACC bab1 (D.sec.) 201 TATCCCACGT TCGTGCTCTA TGCCAACAGG GTGCACAATA TGCTGGGACC bab1 (genome) 251 ATCCATTGAC GGCGGGCCCG ATTTGCGGCC CAAGGGGCGT GGCAGGCCGC bab1 (Light P1) 251 TTCCATTGAC GGCGGGCCCG ATTTGCGGCC CAAGGGGCGT GGCAGGCCGC bab1 (Dark P1) 251 ATCCATTGAC GGCGGGCCCG ATTTGCGGCC CAAGGGGCGT GGCAGGCCGC bab1 (D.sec.) 251 TTCCATTGAC GGCGGGCCCG ATCTGCGGCC CAAGGGGCGT GGCAGGCCGC bab1 (genome) 301 AGCGAATCCT TCTGGGCATC TGGCCCGACG AGCACATTAA GGGCGTCATC bab1 (Light P1) 301 AGCGAATCCT TCTGGGCATC TGGCCCGACG AGCACATTAA GGGCGTCATC bab1 (Dark P1) 301 AGCGAATCCT TTTGGGCATC TGGCCCGACG AGCACATTAA GGGCGTCATC bab1 (D.sec.) 301 AGCGAATCCT TCTGGGCATC TGGCCCGACG AGCACATCAA GGGCGTCATC bab1 (genome) 351 AAGACGGTGG TCTTTCGCGA CACCAAGGAC ATCAAGGACG AGAGCCTGGC bab1 (Light P1) 351 AAGACGGTGG TCTTTCGCGA CACCAAGGAC ATCAAGGACG AGAGCCTGGC bab1 (Dark P1) 351 AAGACGGTGG TCTTTCGCGA CACCAAGGAC ATCAAGGACG AGAGCCTGGC bab1 (D.sec.) 351 AAGACGGTGG TCTTTCGCGA CACCAAGGAC ATCAAGGACG AGAGCCTAGC bab1 (genome) 401 CGCTCACATG CCACCCTACG GTCGACATTC GGTAGGT bab1 (Light P1) 401 CGCCCACATG CCACCTTACG GTCGACATTC GGTAGGT bab1 (Dark P1) 401 CGCCCACATG CCACCCTACG GTCGACATTC GGTAGGT bab1 (D.sec.) 401 CGCCCACATG CCACCCTACG GTCGACATTC GGTAGGT

bab1 4th exon bab1 (genome) 1 AGCCCGCGTT TCCCTTGCAG GACCTCCCTC TCAGCTATCC CGGAGCCAGT

222 bab1 (Light P1) 1 AGCCCGCGTT TCCATTGCAG GACCTCCCTC TCAGCTATCC CGGAGCCAGT bab1 (Dark P1) 1 AGCCCGCGTT TCCCTTGCAG GACCTCCCTC TCAGCTATCC CGGAGCCAGT bab1 (D.sec.) 1 AGCCCGCGTT TCCCTTGCAG GACCTCCCTC TCAGCTATCC CGGAGCCAGT bab1 (genome) 51 GGCGCCCTGG CAGGCGCGCC CAGCTCCATG GCCTGTCCGA ATGGCAGTGG bab1 (Light P1) 51 GGCGCCCTGG CAGGCGCGCC CAGCTCCATG GCCTGTCCGA ATGGCAGT GG bab1 (Dark P1) 51 GGCGCCCTGG CAGGCGCGCC CAGCTCCATG GCCTGTCCGA ATGGCAGTGG bab1 (D.sec.) 51 GGCGCCTTGG CAGGCGCGCC CAGCTCCTTG GCCTGTCCGA ATGGCAGTGG bab1 (genome) 101 ACCGCAGACC GGAGTGGGCG TGGCCGGAGA GCAGCATATG TCACAGGAAA bab1 (Light P1) 101 ACCGCAGACC GGAGTGGGCG TGGCCGGAGA GCAGCATATG TCACAGGAAA bab1 (Dark P1) 101 ACCGCAGACC GGAGTGGGCG TGGCCGGAGA GCAGCATATG TCACAGGAAA bab1 (D.sec.) 101 ACCGCAGACC GGAGTGGGCG TGGCCGGAGA GCAGCACATG TCGCAGGAAA bab1 (genome) 151 CGGCCGCCGC GGTGGCCGCC GTGGCGCACA ACATCCGCCA GCAGATGCAA bab1 (Light P1) 151 CGGCCGCCGC GGTGGCCGCC GTGGCGCACA ACATCCGCCA GCAGATGCAA bab1 (Dark P1) 151 CGGCCGCCGC GGTGGCCGCC GTGGCGCACA ACATCCGCCA GCAGATGCAA bab1 (D.sec.) 151 CGGCCGCCGC GGTGGCCGCC GTGGCGCACA ACATCCGCCA GCAGATGCAA bab1 (genome) 201 ATGGCAGCGG TTCCGCCCGG CTTATTCAAT CTGCCGCCTC ATCCGGGAGT bab1 (Light P1) 201 ATGGCAGCGG TTCCGCCCGG CTTATTCAAT CTGCCGCCTC ATCCGGGAGT bab1 (Dark P1) 201 ATGGCAGCGG TTCCGCCCGG CTTATTCAAT CTGCCGCCTC ATCCGGGAGT bab1 (D.sec.) 201 ATGGCAGCGG TTCCGCCCGG CTTATTCAAT CTGCCGCCTC ATCCGGGAGT bab1 (genome) 251 GGGCGGTGGA GTGGGCAACG TTCCCGGCGC AGCTGGAGGC AGGGCCAGCA bab1 (Light P1) 251 TGGCGGTGGA GTGGGCAGCG TTCCCGGCGC AGCTGGAGGC AGGGCCAGCA bab1 (Dark P1) 251 TGGCGGTGGA GTGGGCAACG TTCCCGGCGC AGCTGGAGGC AGGGCCAGCA bab1 (D.sec.) 251 GGGCGGTGGA GTGGGCAGCG TTCCCGGCGC AGCTGGAGGC AGGGCCAGCA bab1 (genome) 301 TATCGCCGGC CCTGAGCAGT GGCTCCGGAC CAAGGCACGC TCCCTCGCCC bab1 (Light P1) 301 TATCGCCGGC CCTGAGCAGT GGCTCCGGAC CAAGGCACGC TCCCTCGCCC bab1 (Dark P1) 301 TATCGCCGGC CCTGAGCAGT GGCTCCGGAC CAAGGCACGC TCCCTCGCCC bab1 (D.sec.) 301 TATCGCCGGC CCTGAGCAGT GGCTCCGGGC CCAGGCACGC TCCCTCGCCC bab1 (genome) 351 TGCGGTCCCG CCGGCCTCCT GCCGAACCTG CCGCCCAGCA TGGCCGTCGC bab1 (Light P1) 351 TGCGGTCCCG CCGGCCTCCT GCCGAACCTG CCGCCCAGCA TGGCCGTCGC bab1 (Dark P1) 351 TGCGGTCCCG CCGGCCTCCT GCCGAACCTG CCGCCCAGCA TGGCCGTCGC bab1 (D.sec.) 351 TGCGGGCCCG CCGGCCTCCT GCCG------CCCAGCA TGGCCGTCGC bab1 (genome) 401 TCTGCACCAC CAGCAGCAAC AGCAGGCGGC GCACCACCAC ATGCAGCAGC bab1 (Light P1) 401 TCTGCACCAC CAGCAGCAAC AGCAGGCGGC GCACCACCAC ATGCAGCAGC bab1 (Dark P1) 401 TCTGCACCAC CAGCAGCAAC AGCAGGCGGC GCACCACCAC ATGCAGCAGC bab1 (D.sec.) 392 TCTGCACCAC CAGCAGCAAC AGCAGGCGGC GCACCACCAC ATGCAGCAGC bab1 (genome) 451 TGCACCTGCA GCAGCAACAG GCCCACTTGC ACCACCATCA G------bab1 (Light P1) 451 TGCACCTGCA GCAGCAACAG GCCCACTTGC ACCACCATCA GCAGCAACAG bab1 (Dark P1) 451 TGCACCTGCA GCAGCAACAG GCCCACTTGC ACCACCATCA GCAGCAACAG bab1 (D.sec.) 442 TCCACCTGCA GCAGCAACAG GCCCACTTGC ACCACCATCA ------bab1 (genome) 491 ---CAGCAAC AGCAACAGCA GCAGCAGCAG CACCATCAGG GCGGCCATCA bab1 (Light P1) 501 CAACAGCAGC AGCAGCAGCA GCAGCAGCAG CACCATCAGG GCGGCCATCA bab1 (Dark P1) 501 CAACAGCAGC AGCAGCAGCA GCAGCAGCAG CACCATCAGG GCGGCCATCA bab1 (D.sec.) 482 ------GCA GCAGCAGCAG CACCATCAGG GCGGCCATCA bab1 (genome) 539 GGTGGCCCAC AAGTCCGGTT TCGGTGCCAG CTCCAGTTCC TCAGCCTCCT bab1 (Light P1) 551 GGTGGCCCAC AAGTCCGGTT TCGGTGCCAG CTCCAGTTCC TCAGCCTCCT bab1 (Dark P1) 551 GGTGGCCCAC AAGTCCGGTT TCGGTGCCAG CTCCAGTTCC TCAGCCTCCT bab1 (D.sec.) 515 GGTGGCCCAC AAGTCCGGTT TCGGTGCCAG CTCCAGTTCC TCAGCCTCCT bab1 (genome) 589 CGTCGTCAAT GGGCCAGCAC CATGCGCCCA AGGCCAAGAG CAGTCCGTTG bab1 (Light P1) 601 CGTCGTCAAT GGGCCAGCAC CATGCGCCCA AGGCCAAGAG CAGTCCGTTG bab1 (Dark P1) 601 CGTCGTCAAT GGGCCAGCAC CATGCGCCCA AGGCCAAGAG CAGTCCGTTG bab1 (D.sec.) 565 CGTCGTCAAT GGGCCAGCAC CATGCGCCCA AGGCCAAGAG CAGTCCGTTG

223 bab1 (genome) 639 CGCAGCGAAA CGCCTCGGCT GCACTCCCCG CTCGGCGATC TTGGCCTGGA bab1 (Light P1) 651 CGCAGCGAAA CGCCTCGCCT GCACTCCCCG CTCGGCGATC TTGGCCTGGA bab1 (Dark P1) 651 CGCAGCGAAA CGCCTCGCCT GCACTCCCCG CTCGGCGATC TTGGCCTGGA bab1 (D.sec.) 615 CGCAGCGAAA CGCCTCGGCT GCACTCCCCG CTCGGCGATC TTGGCCTGGA bab1 (genome) 689 CATGGCCAGC TACAAGCGCG AATTCTCGCC CAGCCGCCTC TTCGCCGAGG bab1 (Light P1) 701 CATGGCCAGC TACAAGCGCG AATTCTCGCC CAGCCGCCTC TTCGCCGAGG bab1 (Dark P1) 701 CATGGCCAGC TACAAGCGCG AATTCTCGCC CAGCCGCCTC TTCGCCGAGG bab1 (D.sec.) 665 CATGGCCAGC TACAAGCGCG AGTTCTCGCC CAGCCGCCTC TTCGCCGAGG bab1 (genome) 739 ATCTGGCCGA GCTGGTGGGC GCCAGTGTCT CATCTTCCTC ATCATCGGCG bab1 (Light P1) 751 ATCTGGCCGA GCTGGTGGGC GCCAGTGTCT CATCTTCCTC ATCATCGGCG bab1 (Dark P1) 751 ATCTGGCCGA GCTGGTGGGC GCCAGTGTCT CATCTTCCTC ATCATCGGCG bab1 (D.sec.) 715 ATCTGGCCGA GCTGGTGGGC GCCAGTGTCT CCTCTTCCTC ATCTTCGGCG bab1 (genome) 789 GCGGCAGCGA CGGCTCCTCC GGAAAGATCG GCAGGAGCAG CTTCCGCAGC bab1 (Light P1) 801 GCGGCAGCGA CGGCTCCTCC GGAAAGATCG GCGGGAGCAG CTTCCGCAGC bab1 (Dark P1) 801 GCGGCAGCGA CGGCTCCTCC GGAAAGATCG GCGGGAGCAG CTTCCGCAGC bab1 (D.sec.) 765 GCGGCAGCGA CGGCTCCTCC AGAAAGAGCG GCGGGAGCAG CTTCCGCAGC bab1 (genome) 839 CACAGGCGCG GATGCACCCA GTTCCTCGAG CAGTGGAGGC ATCAAGGTGG bab1 (Light P1) 851 CGCAGGCGCG GATGCACCCA GTTCCTCGAG CAGTGGAGGC ATCAAGGTGG bab1 (Dark P1) 851 CGCAGGCGCG GATGCACCCA GTTCCTCGAG CAGTGGAGGC ATCAAGGTGG bab1 (D.sec.) 815 CGCAGGCGCA GATGCATCCA GTTCCTCGAG CAGTGGAGGC ATCAAGGTGG bab1 (genome) 889 AACCCATTAC CACCACTAGC GAGTAAAGGG AGTAAAGGGA GGGTGAAACG bab1 (Light P1) 901 AACCCATTAC CACGACTAGC GAGTAAAGGG AGTAAAGGGA GGGTGAAACG bab1 (Dark P1) 901 AACCCATTAC CACCACTAGC GAGTAAAGGG AGTAAAGGGA GGGTGAAACG bab1 (D.sec.) 865 AACCCATCAC CACCACTAGC GAGTAAAGGG AG------GGTGAAACG bab1 (genome) 939 AAGGAAATGA TAAAGTTGA------GAAATGATA ATGGGTGAAT bab1 (Light P1) 951 AAGGAAATGA TAAAGTTGAG AT------GAAATGATA ATGGGTGAAT bab1 (Dark P1) 951 AAGGAAATGA TAAAGTTGAG ATGAAAGTTG AGAAATGATA ATGGGTGAAT bab1 (D.sec.) 906 AACGAAATGA TAAG------TGAAAGTTG AGAAATATTA ATGGGTGAAT

bab1 (genome) 977 GAACGCAAAT CAGAAGCTTC GGCAGCTTTA CTTGGCCTG bab1 (Light P1) 992 GAACGCAAAT CAGAAGCTTC GGCAGCTTTA CTTGGCCTG bab1 (Dark P1) 1001 GAACGCAAAT CAGAAGCTTC GGCAGCTTTA CTTGGCCTG bab1 (D.sec.) 949 GAACGCGAAT CAGAAGCTTC GGCAGCTTTA CTTGGCCTG

D. bab2 2nd exon Genome 1 ATGGACATGA CCAAACAGAT TGTGGACTTT GAAATAAAGT CGGAACTGAT Light P1 1 ATGGACATGA CCAAACAGAT TGTGGACTTT GAAATAAAGT CGGAACTGAT Dark P1 1 ATGGACATGA CCAAACAGAT TGTGGACTTT GAAATAAAGT CGGAACTGAT D. sechellia 1 ATGGACATGA CCAAACAGAT TGTGGACTTT GAAATAAAGT CGGAACTG CT

Genome 51 CGGCGAAATC GATCAGTTCG AGGCGAGTGA CTACACAATG GCTCCACCGG Light P1 51 CGGCGAAATC GATCAGTTCG AGGCGAGTGA CTACACAATG GCTCCACCGG Dark P1 51 CGGCGAAATC GATCAGTTCG AGGCGAGTGA CTACACAATG GCTCCACCGG D. sechellia 51 CGGCGAAATC GATCAGTTCG AGGCGAGTGA CTACACAATG GCTCCACAGG

Genome 101 AAGAGCCTAA GATGGTGGAA GAGTCCCCCC AGTTGGGTCA TCTAGAGGAC Light P1 101 AAGAGCCTAA GATGGTGGAA GAGTCCACCC AGTTGGGTCA CCTAGAGGAC Dark P1 101 AAGAGCCTAA GATGGTGGAA GAGTCCCCCC AGTTGGGTCA TCTAGAGGAC

224 D. sechellia 101 AGGAGCCTAA GATGGTGGCA GAGTCCCCCC AGTTGGAGCA TCTCGAGGAC

Genome 151 CAGAACAGAA AGTACTCACC CGAAAGGGAG GTTGAACCCA CTCTGCAGGA Light P1 151 CAGAACAGAA AGTACACACC CGAAAGGGAG GTTGAACCCA CTTTGCAGGA Dark P1 151 CAGAACAGAA AGTACTCACC CGAAAGGGAG GTTGAACCCA CTCTGCAGGA D. sechellia 151 CAGAACAGAA AGTACTCCCC CGAAAGAGAG GTGGAACCCA CTCTGGAGGA

Genome 201 TCCAAGTGAG GTGGTTGATC AAATGCAAAA AGATACGGAG AGCGTTGGAG Light P1 201 TCCAAGTGAG GTGGTTGATC AAATGCAAAA AGATGCTGAG AACGTTGGAG Dark P1 201 TCCAAGTGAG GTGGTTGATC AAATGCAAAA AGATACGGAG AGCGTTGGAG D. sechellia 201 TCAGGGTGAG ATGGTTGATC AAATGCAAAA AGATGCTGAG AGCGTTGGCG

Genome 251 AAGTCAAGTC ACCCGAGAAG GATGTGGAAA CGGAGCTGGT GAAGTCCAAG Light P1 251 AAGTCAAGTC ACCCGAGAAG GATGTGGAAA CGGAGCTGGT GAAGTCCAAG Dark P1 251 AAGTCAAGTC ACCCGAGAAG GATGTGGAAA CGGAGCTGGT GAAGTCCAAG D. sechellia 251 AAGTGAAGTC ACCCGAGAAA GATGTGGAAA CGGAGCTGGT GAAGCCCAAG

Genome 301 GCGAGTCCGA TGAACGACCA AGCTTTGACT CCCCCACCAC GACCTCTGAC Light P1 301 GCGAGTCCGA TGAACGACCA AGCTTTGACT CCCCCACCAC GACCTCTGAC Dark P1 301 GCGAGTCCGA TGAACGACCA AGCTTTGACT CCCCCACCAC GACCTCTG AC D. sechellia 301 GAGAGTCCGA TGAACGACCA AGCTCTGACT CCCCCACCAC GACCTCTGAC

Genome 351 CTCCAGTGAA GTGGTGGGTC TCCGGGATCC CGAACATACC GAGCTGCGCA Light P1 351 CTCCAGTGAA GTGGTGGGTC TCCGGGATCC CGAACACACC GAGCTGCGCA Dark P1 351 CTCCAGTGAA GTGGTGGGTC TCCGGGATCC CGAACACACC GAGCTGCGCA D. sechellia 351 CTCCAGTGAA GTGGTGGGTC TCCGGGATCC CGAGCACACC GAGCTGCGCA

Genome 401 TGTGCCTGGA GGCCAAGAAG TCGCGCTCCC TACCAGTTTC CCCACAGCCT Light P1 401 TGTGCCTGGA GGCCAAGAAG TCGCGCTCCC TACCAGTTTC CCCACAGCCT Dark P1 401 TGTGCCTGGA GGCCAAGAAG TCGCGCTCCC TACCAGTTTC CCCACAGCCT D. sechellia 401 TGTGCCTGGA GTCCAAGAAG TCGCGCTCCC TACCGGTTTC CCCACAGCCC

Genome 451 CAACCAAATC TTAAGCTAGC CGGATCGGCG CTCTTTGAAT TCGGTCAGAG Light P1 451 CAACCAAATC TTAAGCTAGC CGGATCGGCG CTCTTTGAAT TCGGTCAGAG Dark P1 451 CAACCAAATC TTAAGCTAGC CGGATCGGCG CTCTTTGAAT TCGGTCAGAG D. sechellia 451 CAACAAAGTC TTAAGCTAGC CGGATCGGCG CTCTTTGAAT TCGGCCAAAG

Genome 501 ATCCTCTCCC GTGGAGACCA AGATCAAAAC CAATCCAGAG ACAAAACCGC Light P1 501 ATCCTCTCCC GTGGAGACCA AGATCAAAAC CAATCCGGAG ACAAAACCGC Dark P1 501 ATCCTCTCCC GTGGAGACCA AGATCAAAAC CAATCCAGAG ACAAAACCGC D. sechellia 501 ATCCTCTCCA GTGGAGACCA AGATCAAAAC CAATCCCGAG ACGAAACCGC

Genome 551 CGAGGCGCAA AATAGTTCCT CCCAGCGGCG AGGGGCAGCA GTTCTGCCTG Light P1 551 CGAGGCGCAA AATAGTTCCT CCCAGCGGCG AGGGTCAGCA GTTCTGCCTG Dark P1 551 CGAGGCGCAA AATAGTTCCT CCCAGCGGCG AGGGGCAGCA GTTCTGCCTG D. sechellia 551 CGAGACGCAA AATAGTTCCT CCCAGCGGCG AGGGGCAGCA GTTCTGCCTG

Genome 601 AGGTGGAACA ACTATCAGTC TAACCTGACC AATGTCTTTG ACGAACTCCT Light P1 601 AGGTGGAACA ACTATCAGTC TAACCTGACC AATGTCTTTG ACGAACTACT Dark P1 601 AGGTGGAACA ACTATCAGTC TAACCTGACC AATGTCTTTG ACGAACTCCT D. sechellia 601 AGGTGGAACA ACTATCAGTC CAACCTGACC AATGTCTTTG ACGAACTCCT

Genome 651 TCAGAGCGAG TCCTTCGTGG ACGTGACCTT GTCCTGCGAA GGCCACTCGA Light P1 651 TCAGAGCGAG TCCTTCGTGG ACGTGACCTT GTCCTGCGAA GGCCACTCGA Dark P1 651 TCAGAGCGAG TCCTTCGTGG ACGTGACCTT GTCCTGCGAA GGCCACTCGA D. sechellia 651 GCAGAGCGAG TCCTTCGTGG ACGTTACCTT GTCCTGCGAA GGCCACTCGA

Genome 701 TCAAGGCACA CAAGATGGTG CTATCCGCCT GCTCACCCTA CTTCCAGGCC Light P1 701 TCAAGGCCCA CAAGATGGTG CTATCCGCCT GCTCACCCTA CTTCCAGGCC Dark P1 701 TCAAGGCACA CAAGATGGTG CTACCCGCCT GCTCACCCTA CTTCCAGGCC

225 D. sechellia 701 TCAAGGCCCA CAAGATGGTG CTATCCGCCT GCTCGCCCTA CTTCCAGGCC

Genome 751 CTGTTCTACG ACAATCCCTG CCAGCACCCC ATCATCATCA TGCGGGACGT Light P1 751 CTGTTCTACG ACAATCCCTG CCAGCATCCC ATCATCATCA TGCGGGACGT Dark P1 751 CTGTTCTACG ACAATCCCTG CCAGCACCCC ATCATCATCA TGCGGGACGT D. sechellia 751 CTGTTCTACG ACAATCCCTG CCAGCACCCC ATCATCATCA TGCGGGACGT

Genome 801 CAGCTGGTCC GACCTGAAGG CCCTGGTGGA GTTCATGTAC AAGGGGGAGA Light P1 801 CAGCTGGTCC GATCTGAAGG CCCTGGTGGA GTTCATGTAC AAGGGGGAGA Dark P1 801 CAGCTGGGCC GACCTGAAGG CCCTGGTGGA GTTCATGTAC AAGGGGGAGA D. sechellia 801 CAGCTGGTCC GACCTGAAGG CCCTGGTGGA GTTCATGTAC AAGGGGGAGA

Genome 851 TCAACGTCTG CCAGGATCAG ATAAACCCCC TGCTCAAAGT GGCCGAAACC Light P1 851 TCAACGTCTG TCAGGATCAG ATAAATCCCC TGCTCAAAGT GGCCGAAACC Dark P1 851 TCAACGTCTG CCAGGATCAG ATAAACCCCC TGCTCAAAGT GGCCGAAACC D. sechellia 851 TCAACGTCTG CCAGGATCAG ATAAACCCCC TGCTCAAAGT GGCCGAAACC

Genome 901 CTGAAGATCA GGGGTCTGGC GGAGGTCAGT GCGGGCAGGG GCGAGGGAGG Light P1 901 CTGAAGATCA GGGGTCTGGC GGAGGTCAGT GCGGGCAGGG GCGAGGGAGG Dark P1 901 CTGAAGATCA GGGGTCTGGC GGAGGTCAGT GCGGGCAGGG GCGAGGGAGG D. sechellia 901 CTGAAGATCA GGGGTCTGGC GGAGGTCAGT GCGGGCAGGG GCGAGGGAGG

Genome 951 CGCCTCCGCA CTTCCCATGT CCGCCTTCGA CGATGAGGAC GAGGAGGAGG Light P1 951 CGCCTCCGCA CTTCCCATGT CCGCCTTCGA CGATGAGGAC GAGGAGGAGG Dark P1 951 CGCCTCCGCA CTTCCCATGT CCGCCTTCGA CGATGAGGAC GAGGAAGAGG D. sechellia 951 CGCCTCCGCA CTTCCCATGT CCGCCTTCGA CGATGAGGAC GAGGAGGAGG

Genome 1001 AACTGGCCTC GGCCACTGCA ATTCTGCAGC AGGACGGTGA TGCCGATCCC Light P1 1001 AACTGGCCTC GGCCACTGCA ATTCTGCGGC AGGACGGTGA CGCCGATCCC Dark P1 1001 AACTGGCCTC GGCCACTGCA ATTCTGCAGC AGGACGGTGA TGCCGATCCC D. sechellia 1001 AACTGGCCTC GGCCACTGCT ATTCTGCGGC AGGAGGGTGA CGCCGATCCC

Genome 1051 GATGAGGAGA TGAAGGCCAA GAGGCCCAGA CTGCTGCCCG AGGGAGTCTT Light P1 1051 GATGAGGAGA TGAAGGCCAA GAGGCCCAGA CTGCTGCCCG AGGGAGTTTT Dark P1 1051 GATGAGGAGA TGAAGGCCAA GAGGCCCAGA CTGCTGCCCG AGGGAGTCTT D. sechellia 1051 GACGAGGAGA TGAAGGCCAA GAGACCCAGA CTGCTGCCCG ATGGAGTCTT

Genome 1101 GGACTTGAAT CAGCGACAAA GGAAGCGGTC CAGGGATGGC AGCTACGCCA Light P1 1101 GGACTTGAAT CAGCGACAAA GAAAGCGGTC CAGGGATGGC AGCTACGCCA Dark P1 1101 GGACTTGAAT CAGCGACAAA GGAAGCGGTC CAGGGATGGC AGCTACGCCA D. sechellia 1101 GGACTTGAAT CAGCGACAAA GGAAGCGGTC CAGGGATGGC AGCTACGCCA

Genome 1151 CTCCAAGTCC ATCCCTTCAG GGCGGAGAGT CCGAGATCTC GGAGAGGGGC Light P1 1151 CTCCGAGTCC ATCCCTCCAC GGCGGATAGT CCGAGATCTC GGACAGGGGC Dark P1 1151 CTCCAAGTCC ATCCCTTCAG GGCGGAGAGT CCGAGATCTC GGAGAGGGGC D. sechellia 1151 CTCCGAGTCC CTCCCTCCAG GGCGGTGAGT CCGAGATCTC GGAGAGGGGC

Genome 1201 TCATCC-GGC ACTCCGGGAC AGAGCCAG-- AGCCAACCCC TGGCCATGAC Light P1 1201 TCATCC-GGC ACTCCGGGAC AGAGCCAG-- AGCCAACCTC TGGCCATGAC Dark P1 1201 TCATCC-GGC ACTCCGGGAC AGAGCCAG-- AGCCAACCCC TGGCCATGAC D. sechellia 1201 TCATCC-GGC ACTCCTGGAC AGAACCAG-- AGCCAGCCCC TGGCCATGAC

Genome 1248 CA--CCTCCA CCATTGTGCG CAATCCCTTC GCCTCCCCCA ATCCTCAGAC Light P1 1248 CA--CCTCCA CCATAGTGCG TAATCCATTC GCCTCCCCCA ATCCTCAGAC Dark P1 1248 CA--CCTCCA CCATTGTGCG CAATCCCTTC GCCTCCCCCA ATCCTCAGAC D. sechellia 1248 CA--CCTCCA CCATAGTGCG CAATCCCTTC GCCTCGCCCA ATCCCCAGAC

Genome 1296 CTTGGAGGGC AGGAACAGCG CCATGAATGC AGTAGCAAAC -CAGAGGAAA Light P1 1296 CTTGGAGGGC AGGAACAGCG CAATGAATGC AGTAGCAAAC -CAGAGGAAA Dark P1 1296 CTTGGAGGGC AGGAACAGCG CCATGAATGC AGTAGCAAAC -CAGAGGAAA D. sechellia 1296 CTTGGATGGC AGGAACAGCG CCTTGAATGC AGCAGCAAGC -CAGAGGAAA

226

Genome 1345 TCACCAGCAC CAACAGCGAC AGGTCACAGC AATGGGAACA GCGGCGCCGC Light P1 1345 TCACCAGCAC CAACAGCGAC AGGTCACAGC AATGGGAACA GCGGCGCCGC Dark P1 1345 TCACCAGCAC CAACAGCGAC AGGTCACAGC AATGGGAACA GCGGCGCCGC D. sechellia 1345 TCACCAGCAC CAACAGCG-- -GGTCACAGC AATGGAAACA GCGGCGCCGC

Genome 1395 CATGCACTCC CCACCCGGGG GCGTGGCCGT CCAGTCCGC C CTTCCGCCCC Light P1 1395 CATGCACTCC CCACCCGGGG GCGTGGCCGT CCAGTCCGCC CTTCCGCCCC Dark P1 1395 CATGCACTCC CCACCCGGGG GCGTGGCCGT CCAGTCCGCC CTTCCGCCCC D. sechellia 1392 CATGCACTCC CCACCCGGGG GCGTGGCCGT GCAGTCCGCC CTGCCGCCCC

Genome 1445 ACATGGCCGC CATCGTGCCG CCACCCCCCT CCGCCATGCA CCATCATGCC Light P1 1445 ACATGGCCGC AATCGTGCCG CCACCCCCTT CCGCCATGCA CCATCATGCC Dark P1 1445 ACATGGCCGC CATCGTGCCG CCACCCCCTT CCGCCATGCA CCATCATGCC D. sechellia 1442 ACATGGCCGC CATCGTGCCC CCACCCCAAT CCGCCATGCA CCACCATGCC

Genome 1495 CAGCAACTGG CCGCCCAGCA CCAGCTGGCC CACTCGCACG CCATGGCCAG Light P1 1495 CAGCAACTGG CCGCCCAGCA CCAGCTGGCC CACTCGCACG CCATGGCCAG Dark P1 1495 CAGCAACTGG CCGCCCAGCA CCAGCTGGCC CACTCGCACG CCATGGCCAG D. sechellia 1492 CAGCAACTGG CCGCCCAGCA CCAGCTGGCC CACTCGCACG CCATGGCCAG

Genome 1545 CGCCTTGGCA GCCGCAGCCG CCGGAGCTGG CGCAGCGGGA GCGGGCGGAG Light P1 1545 CGCCTTGGCA GCCGCAGCCG CCGGAGCTGG CGCAGCGGGA GCGGGCGGAG Dark P1 1545 CGCCTTGGCA GCCGCAGCCG CCGGAGCTGG CGCAGCGGGA GCGGGCGGAG D. sechellia 1542 CGCCTTGGCA GCCGCAGCCG CCGGAGCGGG CGCAGCAGGA GCGGGCGGAG

Genome 1595 CAGGATCTGG CAGTGGATCG GGCGCCAGTG CTCCGACTGG AGGAACAGGA Light P1 1595 CAGGATCTGG CAGTGGATCG GGCGCCAGTG CTCCGACTGG AGGAACAGGA Dark P1 1595 CAGGATCTGG CAGTGGATCG GGCGCCAGTG CTCCGACTGG AGGAACAGGA D. sechellia 1592 CAGGATCCGG CAGTGGATCG GGCGCCAGTG CTCCGACTGG AGGAACGGGA

Genome 1645 GTGGCGGGAA GTGGAGCCGG CGCGGCGGTG GGATCCCATC ACGATGACAT Light P1 1645 GTGGCGGGAA GTGGAGCCGG CGCGGCGGTG GGATCTCATC ACGATGACAT Dark P1 1645 GTGGCGGGAA GTGGAGCCGG CGCGGCGGTG GGATCCCATC ACGATGACAT D. sechellia 1642 GTGGCGGGTA GTGGAGCCGG CGCGGCGGTG GGATCCCATC ACGATGACAT

Genome 1695 GGAGATCAAG CCAGAAATCG CCGAGATGAT ACGCGAAGAA GAGAGGGTGA Light P1 1695 GGAGATCAAG CCAGAAATCG CCGAGATGAT ACGCGAAGAA GAGAGGGTGA Dark P1 1695 GGAGATCAAG CCAGAAATCG CCGAGATGAT ACGCGAAGAA GAGAGGGTGA D. sechellia 1692 GGAGATCAAG CCAGAAATCG CAGAGATGAT TCGCGAAGAG GAGAGGGTGA

Genome 1745 GT------Light P1 1745 GT------Dark P1 1745 GT------D. sechellia 1742 GT------bab2 3rd exon Genome 1 CAGGCCAAGA TGATCGAGAG TGGAGGCCAC GGTGGATGGA TGGGAGCGGC Light P1 1 CAGGCCAAGA TGATCGAGAG TGGAGGCCAC GGTGGATGGA TGGGAGCGGC Dark P1 1 CAGGCCAAGA TGATCGAGAG TGGAGGCCAC GGTGGATGGA TGGGAGCGGC D. sechellia 1 CAGGCCAAGA TGATCGAGAG TGGAGGCCAC GGTGGCTGGA TGGGAGCTGC

Genome 51 AGCTGCGGCA ACTGGAGCAG CTTCTGTGGC GGGTAAGT Light P1 51 AGCTGCGGCA ACTGGAGCAG CTTCTGTGGC GGGTAAGT Dark P1 51 AGCTGCGGCA ACTGGAGCAG CTTCTGTGGC GGGTAAGT D. sechellia 51 AGCTGCGGCA ACTGGAGCAG CTTCTGTGGC GGGTAAGT bab2 4th exon Genome 1 CAGCAGATAG CTACCAGTAC CAGCTACAGT CCATGTGGCA GAAGTGCTGG Light P1 1 CAGCAGATAG CTACCAGTAC CAGCTACAGT CCATGTGGCA GAAGTGCTGG Dark P1 1 CAGCAGATAG CTACCAGTAC CAGCTACAGT CCATGTGGCA GAAGTGCTGG

227 D. sechellia 1 CAGCAGATAG CTACCAGTAC CAGCTGCAGT CCATGTGGCA GAAGTGCTGG

Genome 51 AACACCAATC AGCAGAACCT GGTGCAGCAG CTCAGATTCC GCGAGCGCGG Light P1 51 AACACCAATC AGCAGAACCT GGTGCAGCAG CTCAGATTCC GCGAGCGCGG Dark P1 51 AACACCAATC AGCAGAACCT GGTGCAGCAG CTCAGATTCC GCGAGCGCGG D. sechellia 51 AACACCAACC AGCAGAACCT GGTGCAGCAG CTCAGGTTCC GCGAGCGCGG

Genome 101 CCCATTGAAG TCCTGGCGAC CCGAGGCCAT GGCCGAGGCC ATTTTCAGTG Light P1 101 CCCATTGAAG TCCTGGCGAC CCGAGGCCAT GGCCGAGGCC ATCTTCAGTG Dark P1 101 CCCATTGAAG TCCTGGCGAC CCGAGGCCAT GGCCGAGGCC ATTTTCAGTG D. sechellia 101 CCCACTGAAG TCCTGGCGAC CCGAGGCCAT GGCCGAGGCC ATCTTCAGTG

Genome 151 TCCTGAAGGA GGGGCTCTCC CTGTCACAGG CTGCCCGCAA GTTCGACATA Light P1 151 TCCTGAAGGA GGGGCTCTCC CTGTCGCAGG CTGCCCGAAA GTTTGACATA Dark P1 151 TCCTGAAGGA GGGGCTCTCC CTGTCACAGG CTGCCCGCAA GTTCGACATA D. sechellia 151 TCCTGAAGGA GGGGCTCTCC CTGTCGCAGG CTGCCCGCAA GTTCGACATC

Genome 201 CCCTATCCCA CCTTCGTCCT GTACGCCAAT CGGGTGCACA ACATGCTGGG Light P1 201 CCCTATCCCA CCTTCGTCCT GTACGCCAAT CGGGTGCACA ACATGCTGGG Dark P1 201 CCCTATCCCA CCTTCGTCCT GTACGCCAAT CGGGTGCACA ACATGCTGGG D. sechellia 201 CCCTATCCCA CCTTCGTCCT GTACGCCAAT CGGGTGCACA ACATGCTGGG

Genome 251 ACCCTCGCTG GATGGCGGAG CTGATCCGCG GCCAAAGGCA CGCGGTCGTC Light P1 251 ACCCTCGCTG GATGGCGGAG CTGATCCGCG GCCAAAGGCA CGCGGTCGTC Dark P1 251 ACCCTCGCTG GATGGCGGAG CTGATCCGCG GCCAAAGGCA CGCGGTCGTC D. sechellia 251 ACCCTCGCTG GATGGCGGAG CTGATCCGCG GCCAAAGGCA CGCGGTCGTC

Genome 301 CCCAGAGGAT CCTGCTGGGC ATGTGGCCGG AGGAGCTCAT CCGTAGCGTC Light P1 301 CCCAGAGGAT CCTGCTGGGC ATGTGGCCGG AGGAGCTCAT CCGTAGCGTC Dark P1 301 CCCAGAGGAT CCTGCTGGGC ATGTGGCCGG AGGAGCTCAT CCGTAGCGTC D. sechellia 301 CCCAGAGGAT CCTGTTGGGC ATGTGGCCGG AGGAGCTCAT CCGCAGCGTC

Genome 351 ATTAAGGCCG TGGTGTTCCG GGACTATCGC GAGATTAAGG AGGACATGAG Light P1 351 ATTAAGGCCG TGGTGTTCCG GGACTATCGC GAGATTAAGG AGGACATGAG Dark P1 351 ATTAAGGCCG TGGTGTTCCG GGACTATCGC GAGATTAAGG AGGACATGAG D. sechellia 351 ATCAAGGCCG TGGTGTTCCG GGATTATCGC GAGATCAAGG AGGACATGAG

Genome 401 CGCCCATCAG TACGCCAATG GACAGGGTCA TGGTGTAAGT LightP1 401 CGCCCATCAG TACGCCAATG GACAGGGTCA TGGTGTAAGT DarkP1 401 CGCCCATCAG TACGCCAATG GACAGGGTCA TGGTGTAAGT D.sechellia 401 TGCCCACCAG TACGCCAATG GACAGGGCCA TGGTGTAAGT

bab2 5th exon Genome 1 CAGACCTATA TCGGAGGAGG AACCACCACG AATGGCTACC ACAGTGCTGC Light P1 1 CAGACCTATA TCGGAGGAGG AACCACCACG AATGGCTACC ACAGTGCTGC Dark P1 1 CAGACCTATA TCGGAGGAGG AACCACCACG AATGGCTACC ACAGTGCTGC D. sechellia 1 CAGACATATA TCGGAGGAGG CACCACCACG AATGGCTACC ACAGTGCTGC

Genome 51 CGCAGCCAAG CTGGCGGCTC AGAACGCTGC ACTGGCTCCG CCGGACGCAG Light P1 51 CGCAGCCAAG CTGGCGGCTC AGAACGCTGC ACTGGCTCCG CCGGACGCAG Dark P1 51 CGCAGCCAAG CTGGCGGCTC AGAACGCTGC ACTGGCTCCG CCGGACGCAG D. sechellia 51 GGCAGCCAAG CTGGCGGCCC AGAATGCTGC ACTGGCTCCG CCGGACGCAG

Genome 101 GAAGTCCGCT GAGCTCCATG ACGGAAACCC TTCGCCGCCA GATCCTCTCG Light P1 101 GAAGTCCGCT GAGCTCCATG ACGGAAACCC TTCGCCGCCA GATCCTCTCG Dark P1 101 GAAGTCCGCT GAGCTCCATG ACGGAAACCC TTCGCCGCCA GATCCTCTCG D. sechellia 101 GAAGTCCGCT GAGCTCAATG ACGGAGACCC TGCGCCGCCA GATCCTCTCG

Genome 151 CAGCAGCAGC AACATCAGCA GCACCACCAG CAGCAGGCAC ACCATCAGCA Light P1 151 CAGCAGCAGC AACATCAGCA GCACCACCAG CAGCAGGCAC ACCATCAGCA Dark P1 151 CAGCAGCAGC AACATCAGCA GCACCACCAG CAGCAGGCAC ACCATCAGCA D. sechellia 151 CAGCAGCAGC AACATCAGCA GCACCATCAG CAGCAGGCGC ACCACCAGCA

228

Genome 201 ACAGCCCTCG CACCACCAGC AACAGTCGCC CCACGCCCAG TCCATGAACA Light P1 201 ACAGCCCTCG CACCACCAGC AACAGTCGCC CCACGCCCAG TCCATGAACA Dark P1 201 ACAGCCCTCG CACCACCAGC AACAGTCGCC CCACGCCCAG TCCATGAACA D. sechellia 201 GCAGCCCTCG CACCACCAGC AGCAGTCGCC CCACGCCCAG TCCATGAACA

Genome 251 TGTACAAGTC CCCGGCCTAT CTGCAGCGAT CCGAGATCGA AGATCAAGTA Light P1 251 TGTACAAGTC CCCGGCCTAT CTGCAGCGAT CCGAGATCGA AGATCAAGTA Dark P1 251 TGTACAAGTC CCCGGCCTAT CTGCAGCGAT CCGAGATCGA AGATCAAGTA D. sechellia 251 TGTACAAGTC CCCGGCCTAT CTGCAGCGAT CCGAGATCGA AGATCAAGTA

Genome 301 TCCGCAGCGG CGGCCGTGGC AGCGGCGGCG GCCAAGCACC AGCAGCAGCA Light P1 301 TCCGCAGCGG CGGCCGTGGC AGCGGCGGCG GCCAAGCACC AGCAGCAGCA Dark P1 301 TCCGCAGCGG CGGCCGTGGC AGCGGCGGCG GCCAAGCACC AGCAGCAGCA D. sechellia 301 TCCGCAGCGG CGGCCGTGGC AGCAGCGGCG GCCAAGCACC AGCAGCAGCA

Genome 351 GGGTGAGCGA AGGGGTTCGG AGAACCTGCC CGACCTCAGT GCCCTGGGCC Light P1 351 GGGTGAGCGA AGGGGTTCGG AGAACCTGCC CGACCTCAGT GCCCTGGGCC Dark P1 351 GGGTGAGCGA AGGGGTTCGG AGAACCTGCC CGACCTCAGT GCCCTGGGCC D. sechellia 351 GGGTGAGCGA AGGGGTTCGG AGAACCTGCC CGACCTCAGT GCCCTGGGCC

Genome 401 TGATGGGTCT GCCCGGCCTG AATGTGATGC CCTCACGGGG ATCGGGTGGA Light P1 401 TGATGGGTCT GCCCGGCCTG AATGTGATGC CCTCACGGGG ATCGGGTGGA Dark P1 401 TGATGGGTCT GCCCGGCCTG AATGTGATGC CCTCACGGGG ATCGGGTGGA D. sechellia 401 TGATGGGTCT GCCGGGGCTG AACGTGATGC CCTCGCGGGG ATCGGGTGGA

Genome 451 GGAAGTGGTG GCGCAGCGCC GAATAGTGCC GCCTCCTATG CCCGCGAGTT Light P1 451 GGAAGTGGTG GCGCAGCGCC GAATAGTGCC GCCTCCTATG CCCGCGAGTT Dark P1 451 GGAAGTGGTG GCGCAGCGCC GAATAGTGCC GCCTCCTATG CCCGCGAGTT D. sechellia 451 GGAAGTGGTG GCGCAGCTCC GAACAGTGCC GCCTCCTATG CCCGCGAGTT

Genome 501 ATCCCGCGAA AGGGAACGCG ATCGGGAGCG CGAAAGGGAG CGGGAGCTGT Light P1 501 ATCCCGCGAA AGGGAACGCG ATCGGGAGCG CGAAAGGGAG CGGGAGCTGT Dark P1 501 ATCCCGCGAA AGGGAACGCG ATCGGGAGCG CGAAAGGGAG CGGGAGCTGT D. sechellia 501 GTCCCGCGAA AGGGAGCGCG ATCGGGAGCG CGAGAGGGAG AGGGAGCTGT

Genome 551 CCCGCCAGTA TGGCAGCCAG TCGCGGGGAT CGAGCTCCGG TTCCGGAAGC Light P1 551 CCCGCCAGTA TGGCAGCCAG TCGCGGGGAT CGAGCTCCGG TTCCGGAAGC Dark P1 551 CCCGCCAGTA TGGCAGCCAG TCGCGGGGAT CGAGCTCCGG TTCCGGAAGC D. sechellia 551 CGCGCCAGTA TGGTAGCCAG TCGCGGGGAT CGAGCTCCGG TTCCGGAAGC

Genome 601 GCCAAGTCCC TGACCGCCAG CCAAAGACCA GGAGCCGCCT CGCCGTACTC Light P1 601 GCCAAGTCCC TGACCGCCAG CCAAAGACCA GGAGCCGCCT CGCCGTACTC Dark P1 601 GCCAAGTCCC TGACCGCCAG CCAAAGACCA GGAGCCGCCT CGCCGTACTC D. sechellia 601 GCCAAGTCCC TGACCGCCAG CCAAAGACCA GGAGCCGCCT CGCCGTACTC

Genome 651 CGCCGCCCAC TATGCCAAAC ATCAGGCGAG TGCCTACAAC AAGAGGTTTC Light P1 651 CGCCGCCCAC TATGCCAAAC ATCAGGCGAG TGCCTACAAC AAGAGGTTTC Dark P1 651 CGCCGCCCAC TATGCCAAAC ATCAGGCGAG TGCCTACAAC AAGAGGTTTC D. sechellia 651 CGCCGCCCAC TATGCCAAAC ATCAGGCGAG CGCCTACAAC AAGAGGTTTC

Genome 701 TCGAGAGCCT GCCCGCCGGC ATTGACTTGG AGGCCTTCGC CAACGGACTG Light P1 701 TCGAGAGCCT GCCCGCCGGC ATTGACTTGG AGGCCTTCGC CAACGGACTG Dark P1 701 TCGAGAGCCT GCCCGCCGGC ATTGACTTGG AGGCCTTCGC CAACGGACTG D. sechellia 701 TCGAGAGCCT GCCCGCCGGC ATCGACTTGG AGGCCTTCGC CAACGGGCTG

Genome 751 CTCCAGAAGT CGGTGAACAA GAGTCCGCGC TTCGAGGACT TCTTCCCGGG Light P1 751 CTCCAGAAGT CGGTGAACAA GAGTCCGCGC TTCGAGGACT TCTTCCCGGG Dark P1 751 CTCCAGAAGT CGGTGAACAA GAGTCCGCGC TTCGAGGACT TCTTCCCGGG D. sechellia 751 CTCCAGAAGT CGGTGAACAA GAGTCCGCGC TTCGAGGACT TCTTCCCGGG

Genome 801 ACCCGGCCAG GACATGAGTG AACTGTTTGC CAATCCGGAC GCGAGTGCAG Light P1 801 ACCCGGCCAG GACATGAGTG AACTGTTTGC CAATCCGGAC GCGAGTGCAG

229 Dark P1 801 ACCCGGCCAG GACATGAGTG AACTGTTTGC CAATCCGGAC GCGAGTGCAG D. sechellia 801 ACCCGGCCAG GACATGAGTG AACTGTTTGC CAATCCGGAC GCGAGTGCAG

Genome 851 CTGCCGCGGC GGCCGCCTAC GCGCCTCCTG GCGCCATCCG CGAATCGCCT Light P1 851 CTGCCGCGGC GGCCGCCTAC GCGCCTCCTG GCGCCATCCG CGAATCGCCT Dark P1 851 CTGCCGCGGC GGCCGCCTAC GCGCCTCCTG GCGCCATCCG CGAATCGCCT D. sechellia 851 CTGCCGCGGC GGCCGCCTAT GCGCCTCCAG GCGCCATCCG GGAATCGCCT

Genome 901 CTGATGAAGA TCAAGCTGGA GCAGCAGCAT GCCACCGAAC TGCCGCACGA Light P1 901 CTGATGAAGA TCAAGCTGGA GCAGCAGCAT GCCACCGAAC TGCCGCACGA Dark P1 901 CTGATGAAGA TCAAGCTGGA GCAGCAGCAT GCCACCGAAC TGCCGCACGA D. sechellia 901 CTGATGAAGA TCAAGCTGGA GCAGCAGCAT GCCACCGAAC TGCCGCACGA

Genome 951 GGATTGA Light P1 951 GGATTGA Dark P1 951 GGATTGA D. sechellia 951 GGATTGA

Figure A2: Protein coding sequence variation for the bab alleles. To scale representations of the (A) Bab1 and (B) Bab2 proteins, including the BTB Domain (red) and Bab conserved domain (CD, blue). The positions of nonsynonymous differences between the Light 1 and Dark 1 sequences are annotated and compared to the amino acid states for the D. melanogaster genome strain and the outgroup species D. sechellia. The aligned DNA sequences for (C) bab1 and (D) bab2 protein-coding exons and adjacent splice donor and acceptor sequences (shown with black text on gray background.

230