A MECHANISTIC ANALYSIS OF REGULATION AND ITS EVOLUTION IN

A DROSOPHILA MODEL

Dissertation

Submitted to

The College of Arts and Sciences of the

UNIVERSITY OF DAYTON

In Partial Fulfillment of the Requirements for

The Degree of

Doctor of Philosophy in Biology

By

Eric M. Camino

Dayton, Ohio

May, 2016

A MECHANISTIC ANALYSIS OF GENE REGULATION AND ITS EVOLUTION IN

A DROSOPHILA MODEL

Name: Camino, Eric M.

APPROVED BY:

Thomas M. Williams, Ph.D. Faculty Advisor

Mark Nielsen, Ph.D. Committee Member

Mark Rebeiz, Ph.D. Committee Member

Amit Singh, Ph.D. Committee Member

Panagiotis Tsonis, Ph.D. Committee Member

ii

© Copyright by

Eric M. Camino

All rights reserved

2016

iii

ABSTRACT

A MECHANISTIC ANALYSIS OF GENE REGULATION AND ITS EVOLUTION IN

A DROSOPHILA MODEL

Name: Camino, Eric M. University of Dayton

Advisor: Dr. Thomas Williams

The body plans and adorning characteristics of animals have evolved impressively diverse morphologies. These body plans and characteristics are the products of networks of whose expression are controlled by cis-regulatory elements (CREs) that can be positioned at great distances from the gene(s) whose expression is being regulated. The pattern of expression a CRE imparts on a gene stems from its collection of binding sites for , what is referred to as a regulatory logic. The overarching goal for my thesis was to understand how animal traits develop and evolve at the levels of gene regulatory networks, CREs, and CRE regulatory logics. Chapter 2 presents research into the male-pattern of abdominal pigmentation that seemingly originated, diversified, and was lost in fruit fly species of the Sophophora sub-genus. This research focused on

CREs for the terminal differentiation genes yellow and tan that independently evolved a male-specific pattern of gene regulation. Through the use of reporter transgenes we found evidence that these CREs activities evolved during the trait‟s origin, but diversity and

iv

trait loss were more impacted by changes elsewhere in the genome (in trans) than to these CREs. Moreover, we revealed that similar activities of the two CREs regulating co- expressed genes stems from unique regulatory logics, of which key activators and repressors remain unknown. CREs, such as the tan gene CRE, are often located at a distance from the promoter of the gene they regulate. Thus, CREs and promoters must find each other in the nucleus and interact to facilitate transcription. Key studies suggest that CREs and promoters encode information to make these interactions take place by the binding of regulatory proteins. However, conventional reporter transgenes study CREs when they are directly adjacent to a promoter. In Chapter 3, research is presented for a novel reporter transgene system we developed that can test the activity of a CRE when it is both proximal and distal to reporter genes. We optimized CRE spacing and fluorescent reporter parameters for this system, tested it with various CREs and promoters, and describe an experimental scheme that can be used to dissect the sequences involved in long distance regulation. This new reporter system offers the promise to find sequences involved in CRE-promoter interactions, but the critical proteins binding these sequences will then need to be found by another method. In a similar vein, the outcomes for Chapter 2 ran into an all too frequent difficulty with the molecular dissection of CREs; finding functional CRE sequences but being unable to identify the interacting transcription factors. The work presented in Chapter 4 was my attempt to utilize a high-throughput yeast one-hybrid assay to test a library of fruit fly transcription factors for interactions with various functional CRE sequences. This approach identified many interactions available for further investigations. In the future this yeast approach

v

seems well suited to be paired with more traditional methods of CRE study to expedite an understanding of gene regulation and its evolution.

vi

TABLE OF CONTENTS

ABSTRACT ...... iv LIST OF FIGURES ...... ix LIST OF TABLES ...... xii CHAPTER I: INTRODUCTION ...... 1 CHAPTER II: THE EVOLUTIONARY ORIGINATION AND DIVERSIFICATION OF A DIMORPHIC GENE REGULATORY NETWORK THROUGH PARALLEL INNOVATIONS IN CIS AND TRANS ...... 20 Abstract ...... 20 Introduction ...... 21 Results ...... 26 Discussion ...... 48 Materials and Methods ...... 60 Acknowledgments ...... 66 Supplementary Information...... 67 CHAPTER III: RED LIGHT, GREEN LIGHT: A NOVEL APPROACH TO STUDYING INTERACTIONS BETWEEN CIS-REGULATORY ELEMENTS AND GENES ...... 80 Abstract ...... 80 Introduction ...... 81 Results ...... 84 Discussion ...... 99 Materials and Methods ...... 106 CHAPTER IV: FUTURE DIRECTIONS: THE USE OF A HIGH THROUGHPUT APPROACH TO REVEAL THE HIDDEN REGULATORY LOGIC OF GENE EXPRESSION REGULATION AND ITS EVOLUTION ...... 112

vii

Abstract ...... 112 Introduction ...... 113 Results ...... 124 Discussion ...... 130 Materials and Methods ...... 134 Acknowledgements ...... 137 BIBLIOGRAPHY ...... 157 APPENDICES APPENDIX A. Sequence alignment of yBE0.6 scanning mutagenesis ...... 169 APPENDIX B. Sequence alignment of t_MSE scanning mutagenesis ...... 172 APPENDIX C. Sequence alignment of fine scale t_MSE2 scanning mutagenesis .... 177 APPENDIX D. Sequence alignment of t_MSE2 binding site mutants ...... 180 APPENDIX E. Sequence alignment of yellow Bait sequences ...... 181 APPENDIX F. Sequence alignment of t_MSE Bait sequences ...... 184

viii

LIST OF FIGURES

Figure 1.1 Representation of a gene locus...... 5

Figure 1.2 Modular cis-regulatory elements drive distinct patterns of gene expression. ... 7

Figure 1.3 Regulation at a distance...... 9

Figure 1.4 Abdominal pigmentation for representative species of the Sophophora subgenus...... 13

Figure 1.5 The fruit fly pigment metabolic pathway...... 15

Figure 1.6 The understanding of the gene regulatory network for abdominal pigmentation at the outset of my PhD studies...... 17

Figure 2.1 Correlation between pigmentation and the gene expression of tan and yellow in the Sophophora subgenus...... 24

Figure 2.2 Tracing the ancestry and evolution of CREs that drive male-specific tan and yellow expression...... 30

Figure 2.3 Tracing the CRE bases for losses in male tergite pigmentation...... 36

Figure 2.4 Genetic interactions between pigmentation network transcription factors and CREs regulating abdominal tan and yellow expression...... 39

Figure 2.5 Scanning mutagenesis identifies CRE sequences required for yellow and tan expression...... 43

Figure 2.6 Characterization of the direct Hox inputs shaping tan expression...... 47

Figure 2.7 Gene network models for unpigmented and pigment abdominal segments. ... 50

Figure S 2.1 Mapping the CRE sequences sufficient to drive male-specific tan and yellow expression...... 67

Figure S 2.2 Conserved synteny between tan and the upstream genes between which the t_MSE is located in D. melanogaster...... 69

ix

Figure S 2.3 The regulatory activity of the D. pseudoobscura sequence 5‟ of the yellow gene...... 70

Figure S 2.4 Mapping the CRE architecture of the 5‟ region of the D. willistoni yellow gene...... 71

Figure S 2.5 Mapping functional yellow regulatory sequences through CRE scanning mutagenesis...... 72

Figure S 2.6 Mapping functional tan regulatory sequences through CRE scanning mutagenesis ...... 73

Figure S 2.7 Fine-scale mapping of functional tan regulatory sequences through CRE scanning mutagenesis...... 74

Figure S 2.8 In vitro interactions between the DNA binding domains of Hox proteins and a known Hox site...... 75

Figure 3.1 Gene regulation via long distance CRE-promoter interactions………………83

Figure 3.2 Design of the Red Light Green Light dual reporter transgene system...... 86

Figure 3.3 The effects of CRE-promoter spacing on the expression of proximal and distal reporter genes...... 91

Figure 3.4 Test of long distance regulatory activity for several D. melanogaster CREs...... 91

Figure 3.5 Attempts to rescue distal reporter gene expression through the addition of flanking CRE sequence and the utilization of an optimized Drosophila promoter...... 94

Figure 3.6 Comparison of fluorescence properties of various fluorescent reporters when regulated by a CRE...... 97

Figure 3.7 E2-Crimson-NLS and EGFP-NLS reporters provide optimal readouts of distal and proximal regulatory activities of a CRE...... 99

Figure 4.1 The sexually dimorphic pigmentation of Drosophila melanogaster is driven by sex-specific CRE activities...... 117

Figure 4.2 Three derived mutations alter the regulatory activity of the dimorphic element...... 119

Figure 4.3 Conceptual overview of the yeast one-hybrid assay...... 122

Figure 4.4 Annotation of the yeast one-hybrid bait sequences upon the full yBE0.6 CRE...... 125

x

Figure 4.5 Annotation of the yeast one-hybrid bait sequences upon the full t_MSE CRE...... 127

Figure 4.6 Binding site motifs for Vvl in t_MSE2...... 133

xi

LIST OF TABLES

Table 2.1 A pigmentation enzyme gene perspective of network evolution ...... 57

Table S 2.1 Primers used to create in situ hybridization probes ...... 76

Table S 2.2 Primers to clone D. melanogaster yellow and tan CREs ...... 77

Table S 2.3 Primers used to create reporter transgenes with orthologous yellow 5‟ ...... 78

Table S 2.4 Oligonucleotides used to make t_MSE gel shift assay binding sites...... 79

Table 3.1 The confocal microscope settings utilized for imaging transgenic D. melanogaster pupae with S3a-series fluorescent reporter transgenes that possess the yBE0.6 CRE ...... 110

Table 3.2 The confocal microscope settings utilized for imaging transgenic D. melanogaster pupae with Red Light Green Light-series dual reporter transgenes that possess the dimorphic element CRE ...... 111

Table 4.1 yellow Bait Sequences ...... 138

Table 4.2 Transcription factor preys that interacted with yBE0.6 baits in at least one condition with a p-value ≤0.01 ...... 139

Table 4.3 tan Bait Sequences ...... 142

Table 4.4 Transcription factor preys that interacted with t_MSE baits in at least one condition with a p-value ≤0.01 ...... 143

Table 4.5 DRE bait sequence oligos ...... 148

Table 4.6 Transcription factor preys that interacted with dimorphic element baits in at least one condition with a p-value ≤0.01 ...... 149

Table 4.7 Transcription factor preys that interacted with control baits in at least one condition with a p-value ≤0.01 ...... 154

Table 4.8 Forward and reverse primer pairs used to PCR amplify yBE0.6 sequences for inclusion as bait sequence ...... 156

xii

CHAPTER I

INTRODUCTION

Animal diversity and its genetic basis

The world is populated by animals that possess an innumerable assortment of physiological, behavioral, and morphological traits. These include traits that diversified from the modification of homologous traits shared by many species and that were present in their common ancestor, and those that evolved to perform a novel function and that bare no apparent resemblance to an ancestral trait (Moczek, 2008). An open question in biology is to understand how this wealth of diversity and novelty evolved. Traits are the end products of instructions encoded in the DNA sequence of genes that are followed during the timeframe of development. Thus, it is essential to understand how the instructions and thereby functions of these developmental genes evolve in order to understand the origin of trait diversity. For my thesis I have focused on three topics pertinent to diversity and novelty. One, how are genes functionally connected into a network of genes to make a trait and how do these gene regulatory networks (GRNs)

1

evolve? This is the main topic of Chapter 2. Two, how can we study the linear array of

DNA instructions that control the use of genes which operate within a 3-dimensional nucleus? This is the topic of Chapter 3. Three, how can we decode the instructions embedded within the arrangements of A, G, C, and T letters of DNA that control the use of genes during development. This topic is touched upon in the concluding Chapter 4.

Diversity of metazoan animals

The Earth is currently inhabited by an estimated number of 8.7 million multicellular animal (metazoan) species (Mora et al., 2011). These extant metazoan species are categorized into ~36 phyla based upon the possession of groups of morphological traits that represent fundamental body plans. These disparate body plans range from simple organisms such as Placozoans, which have only 4 distinguishable types of cells (Srivastava et al., 2008), the Cnidarians that possess two germ layers, radial symmetry, and cnidocyte cells used for prey capture, to the more complex body plans of the Arthropods with 3 germ layers, bilateral symmetry, exoskeletons, and jointed appendages, and the Chordates with 3 germ layers, bilateral symmetry, and a notochord among other features. Within each phylum are numerous species that are distinguished by the possession of diverse and/or novel morphological traits that adorn these body plans.

Metazoan body plans were already present in the Cambrian period some 520 million years ago, and since this time diversification has been the persistent theme. This diversity begs the question, what is the genetic basis for metazoan diversity?

2

The genetic basis of diversity evolved through the differential use of a similar genetic toolkit

It was once thought that gene duplication was a prerequisite for evolution and that the subsequent modification of these duplicate genes was the genetic basis for new or modified traits (Kimura and Ohta, 1974). With the wealth of animal species, it seemed reasonable to anticipate that the genomes of diverse animals would possess many novel genes, especially genes that control developmental processes. This notion and its support began to erode after Edward Lewis, colleagues, and peers characterized the Hox gene cluster for the fruit fly species, Drosophila (D.) melanogaster (Duncan and Montgomery,

2002a, 2002b). These eight Hox genes were found to pattern the identities of the fruit fly body segments and appendages. Surprisingly, it was found that it was not just the fruit fly that had Hox genes, but so did other invertebrate and vertebrate animals that possess very different body plans (McGinnis et al., 1984). For the next couple decades it was repeatedly found that animals from different phyla possess largely the same set of genes that operate during development, what became known as the “genetic toolkit” (Carroll et al., 2004). This toolkit is comprised of genes that regulate the process of gene transcription. Notably, genes encoding proteins involved in signaling pathways (i.e. transmembrane receptors and extracellular ligands), and transcription factors that bind to certain short DNA sequence motifs (called binding sites) and thereby facilitate transcriptional activation or repression.

3

The finding that vastly different animals possessed many of the same developmental genes raised a new question. How can diversity evolve from genomes that had such similar inventories of toolkit genes? The obvious but mechanistically vague answer was that these evolutionarily conserved genes evolved to function differently.

Two lines of evidence suggested that gene function was not likely to often evolve from changes in the protein coding sequence of genes. One, mutation to these sequences are often injurious as these genes play many functions during development, known as gene pleiotropy. This means that any benefit that a mutation in the protein coding sequence of a gene causes for one function will likely be offset by negative effects of the mutation in the pleiotropic roles that the protein performs (Stern and Orgogozo, 2008). Two, when related genes (homologs from another species) were expressed in a second vastly different animal, the homolog functioned like the endogenous gene of the host animal

(Carroll, 2008). For example, the mouse Pax6 gene when expressed in fruit flies caused the formation of additional fruit fly eyes, a phenotypic equivalent to what the homologous fruit fly eyeless gene caused when ectopically expressed during development

(Callaerts et al., 1997; Halder et al., 1995). This “functional equivalence” of homologous genes/proteins indicates that protein biochemical function has largely remained static, thus evolution must have occurred by these conserved genes becoming expressed at different times (temporally), in different cells (spatially) during development, and/or have evolved to control the transcription of different “target genes.” Each of these outcomes are due to changes in the way that gene transcription (gene expression) is regulated. To understand how this gene regulation evolves it is essential to understand the structure of genes.

4

Genomes, genes, and gene regulation

While not all genes encode a protein product, many do. This includes most toolkit genes which function to control gene expression. Many genes exist that do not contribute to the regulation of gene expression, but encode proteins that give cells their characteristic “differentiated” phenotypes. These include proteins that perform diverse molecular and cellular functions, including cellular adhesion, motility, proliferation, shape, programmed cell death, and those that function as metabolic enzymes. The specific DNA sequences that encode the amino acid content of proteins reside in “exons” and are often referred to as the protein coding sequences (Figure 1.1). The genetic code of proteins follows very simple rules that are followed by all life on this planet due to their descent from a common ancestor. Protein products are made during translation when the ribosome reads non-overlapping adjacent 3 nucleotide codons in the transcribed messenger RNAs (mRNA) for a gene.

Figure 1.1 Representation of a gene locus. Animal genes can have one or more exons such as GeneX, in which the protein coding sequence will reside. cis-regulatory elements (CREs) can be located nearby the promoter (arrow) they regulate or in a more distant location with respect to the promoter. Such as in an intron between the exons (X CRE_2), or even in the locus of an adjacent gene (X CRE_3). Each CRE has a unique set of binding sites for transcription factors which encodes its pattern of transcriptional regulation. 5

In order for a gene‟s protein to be translated, the protein coding sequence has to be transcribed into mRNA. This decision to transcribe or not transcribe is highly regulated during development by processes categorically referred to as “gene regulation.”

Metazoan gene expression regulation requires various functional non-coding sequences

(Figure 1.1, arrow). This includes a promoter located near the transcriptional start site of a gene. The promoter is where the RNA polymerase (generally RNA polymerase II) pre- initiation complex is recruited to initiate transcription at the adjacent +1 site of a gene

(Snustad and Simmons, 2015). Non-coding sequences also include one or more cis- regulatory elements (CREs), to which various transcription factor proteins bind (Figure

1.1 and Figure 1.2). Unlike promoters and their stereotypical positioning within a gene, a gene‟s CRE or CREs can be close to a promoter or at a considerable distance from it, whether in an intron or even located in the neighborhood of a more distant gene (Figure

1.1). When gene expression is active, it is thought that proteins bound to CREs interact with proteins bound to the promoter to recruit the RNA polymerase pre-initiation complex or to instruct an already promoter bound RNA polymerase complex to initiate transcription (Lagha et al., 2013). The encoding of gene regulatory information in promoters and CREs is thus a key property of genes and their roles in the development of the diverse morphological traits adorning body plans. Thus an important goal for the field of developmental genetics is to understand how this information is encoded within these regulatory sequences.

6

Figure 1.2 Modular cis-regulatory elements drive distinct patterns of gene expression. A representation of modular CREs, each for a gene that is expressed during mouse embryonic development. The CREs contain binding sites for transcription factors (colored DNA sequence indicates a binding site and different colors represent sites for different transcription factors), and the collection of sites makes a regulatory logic that encodes a pattern of transcriptional regulation, such as forebrain (1), spinal cord (2), or limb bud (3).

The encodings of diverse patterns of gene expression in CREs

Genes are typically pleiotropic, meaning that they are expressed in multiple cell types and times during an animal‟s life. With pleiotropy being a common property of metazoan genes, the question is how are these complex temporal and spatial patterns of expression controlled by CREs? Generally, one CRE is not sufficient to direct a gene‟s expression at all the correct times and cellular places. Rather multiple modular CREs are the norm (Figure 1.1 and Figure 1.2). Moreover, many animals produce hundreds of different cell types during the course of development and whose growth and

7

physiological state often changes in response to the environment. Thus CREs need to exist to drive these different patterns of gene expression. The current paradigm for CRE function is that patterned expression occurs through the binding of combinations of transcription factors to binding sites within the CRE that then allows for gene activation to occur in the cells for which the proper transcription factors are expressed (Figure 1.2).

Several well characterized CREs have been found to contain binding sites for four or more unique transcription factors (Arnone and Davidson, 1997; Davidson, 2006a; Small et al., 1992). More specifically, each CRE contains an individualized arrangement of multiple binding sites for combinations of transcription factors, yielding a so-called

“regulatory logic” that encodes its pattern of activation. Take the fruit fly D. melanogaster for example. Its genome possess over 750 transcription factor genes

(Pfreundt et al., 2010), thus the number of potential regulatory logics is astounding.

Though which combinations are functional and what pattern of expression each CRE drives remains difficult to predict.

8

Figure 1.3 Regulation at a distance. (A) Promoter proximal elements known as Tethering Elements (TE) allow a CRE (T1) to bypass neighboring genes (e.g. ftz) to regulate its target gene promoter (Scr). (B) Sequences encoded in a CREs regulatory logic can allow for an intronic CRE (sparkling) to locate and interact with its somewhat distal target gene promoter (dPax2). (C) An example of chromosomal looping where a distant CRE is brought into close proximity to the promoter of the gene it regulates.

A CREs combinatorial regulatory logic can act to positively turn on or increase a gene‟s transcriptional output (called an enhancer) or it can function to repress (called a silencer) transcription. The order and spacing of transcription factor binding sites can also affect CRE function as a “regulatory grammar”. The distance between transcription factor binding sites can play a large role in the how transcription factors physically interact on the CRE and the extent to which the bound factors recruit and interact with proteins that function as co-activators or co-repressors (Li-kroeger et al., 2009; Mann et al., 2010).

9

Another facet of gene regulation is encoded in the ability of a CRE to find its target gene promoter in the 3-dimensional volume of the nucleus (Dekker et al., 2013).

Although many CREs are found in close proximity to the gene promoter they regulate

(less than10,000 base pairs), some can be found at great distances (Kvon et al., 2014). For example, within a developing mouse embryo the Sonic hedgehog (Shh) gene (which encodes a signaling pathway ligand that acts as a morphogen to pattern the developing limb bud) is expressed in the posterior limb bud mesenchyme (Cooper et al., 2014;

Riddle et al., 1993). This expression pattern is controlled by a CRE known as the ZRS which is located over one million base pairs away from its promoter and is situated in the gene locus of the neighboring lmbr1 gene (Lettice et al., 2003). The specificity of how a particular CRE finds its target gene promoter, given that not all CREs are located proximal to the promoter they regulate, remains perplexing. Seminal studies in D. melanogaster found that long distance regulation can occur through promoter-CRE interaction (Figure 1.3). Such interactions can require specific promoter-proximal sequences (Figure 1.3A) (Calhoun et al., 2002) and sequences within the CRE, so-called remote-control elements (Figure 1.3B) (Swanson et al., 2011). While chromosomal looping, bringing the CRE into physical contact with the promoter (Figure 1.3C), has long been known as a mechanism for CRE-promoter interactions and regulation

(Maniatis et al., 1980, 1987; Myers et al., 1986), the promoter-proximal and remote- control sequences remain difficult to find and understand their mechanism of function.

Three major challenges for contemporary genetics research are: to resolve how gene expression patterns during development are encoded in the regulatory logic and

10

regulatory grammars of CREs, to understand which sequences act as remote control elements to facilitate promoter interaction, and to reveal how these CRE logics, and grammars, and remote control elements evolve.

CREs link genes together into collaborative networks that perform developmental processes

The proper formation of a morphological trait requires the coordinated expression of a myriad of genes that are interconnected into a GRN (Bonn and Furlong, 2008). This coordination is orchestrated by CREs and their regulatory linkages with the transcription factors that bind to the resident binding sites. Through these regulatory linkages, GRNs take on a hierarchical structure. At the top of these hierarchies are the toolkit genes that regulate the expression of other toolkit genes and some of the differentiation genes that encode the proteins that make the cellular/tissue/organ phenotypes. The toolkit genes whose expression is controlled by top tier regulators can act as “middle managers” that give feedback to the top tier regulators and that control the expression of the lowest tier of differentiation genes.

To understand how a trait is formed it is fundamental to identify what genes exist within a GRN and how these genes are connected by regulatory linkages. This has been accomplished for several traits in model organism animals (Bonn and Furlong, 2008;

Davidson, 2006a). As genes of the highest/middle/lowest tier of GRNs are regulated by

CREs, it poses an interesting question as to whether evolution favors modifying CREs of

11

genes from a certain GRN tier (Stern and Orgogozo, 2008). In order to fully grasp the mechanisms behind GRN and CRE evolution one must study the evolutionary histories of

GRNs for morphological traits that have evolved between closely related species so that the events of CRE and GRN structural evolution can be inferred by comparing the GRNs of species with the presumed derived and ancestral trait forms. Unfortunately, the animal traits with the best characterized GRNs do not represent such recently evolved traits.

Thus, the mechanisms of GRN evolution remain ambiguous. Moreover, it is helpful for the trait to exist within a species amenable to experimental manipulation of genes and

CREs. The pigmentation patterns residing on the abdomens of the fruit fly D. melanogaster (Figure 1.4) is one such trait for which the color pattern has widely diversified among closely-related species within the Sophophora sub-genus.

12

Figure 1.4 Abdominal pigmentation for representative species of the Sophophora subgenus. (A, B) D. melanogaster has a derived pattern of sex-specific male abdominal pigmentation. (A) Fully pigmented tergites are limited to D. melanogaster A5 and A6 segments. (E, F) Monomorphic patterns of melanic and sclerotized abdomens within the Sophophora subgenus. (F) The monomorphic pigmentation state of D. willistoni is presumed to represent the ancestral state of pigmentation for the Sophophora subgenus. (A, C, D) Derived patterns of male pigmentation within the melanogaster group, with both retractions (C) and expansions (D) of pigmentation.

Fruit fly pigmentation as model for gene expression and morphological evolution

When it comes to the mechanistic study of traits, GRNs, and CREs, not all traits and model organism are equally well suited for characterization. Several characteristics are prerequisites. The trait needs to have evolved within a well-supported phylogeny so that direction of ancestral and derived character states can be inferred. Organisms need to be closely related (taxonomic rank at the level of the genus or lower) so that orthologous genes and CREs can be readily identified. Shorter divergence time‟s aid in the comparison of derived regulatory logics as CREs tend to evolve rapidly at the nucleotide sequence level (Stone and Wray, 2001). The trait must exist for a model organism species 13

or that can be at least studied in a model organism species for which diverse genetic tools can be utilized to study the genes and mechanisms involved in the trait‟s development.

One such suitable trait is the diverse pattern of pigmentation that is present on the abdomen of fruit flies from the Sophophora subgenus (Figure 1.4).

The D. melanogaster abdomen and its color pattern

The D. melanogaster abdomen consists of seven abdominal segments referred to as A1 through A7 (Figure 1.4A and 1.4B). The size of the male A7 segment is greatly reduced (Yoder, 2012). Each abdominal segment is partially covered on its dorsal surface by a hardened cuticle plate known as a tergite. These tergites have a color pattern along the anterior-posterior axis. In D. melanogaster females, the posterior region of the A2-A6 tergites are black (melanic) in color (Figure 1.4B) resembling a stripe of color. The extent of the color on the female A6 and to a lesser extent A5 is variable amongst females in certain populations and between populations (Rogers et al., 2013). Tergite coloration is sexually dimorphic in this species, as males have posterior stripes on the A2-A4 tergites similar to females, but the A5 and A6 tergites are completely melanic (Figure 1.4A). The conspicuous nature of the sexually dimorphic trait in a genetic model organism has resulted in considerable attention which has revealed many details about its genetic, molecular, and enzymatic basis (Figure 1.5) (Wittkopp et al., 2003).

14

Figure 1.5 The fruit fly pigment metabolic pathway. The making of colored cuticle begins with Tyrosine from the hemolymph that enters epithelial cells where it is converted into various pigments. Bold names represent metabolites in the pathway, bold italicized names are for genes that encode key enzymes, red names are for the encoded enzymes, and colored boxes for the final pigments secreted outside the cell and deposited in cuticle of the tergite.

The generation of the melanic patterns seen in the posterior stripe of both sexes and the fully pigmented male A5 and A6 tergites of males requires genes that encode enzymes that can catalyze the production of said melanins. For D. melanogaster there are two important terminal differentiation genes (so-called pigmentation genes) that encode enzymes necessary for producing the black melanic pigments. These genes are known as yellow and tan (True et al., 2005; Wright, 1987). These two genes are expressed during the latter stages of pupal development (~80 and ~95 hours after puparium formation, or hAPF, for yellow and tan respectively) in the dorsal abdominal epidermis, and encode the

15

enzymes necessary to derive the black colored melanins. Each of these terminal differentiation genes is under the regulation of a single CRE, the yellow Body Element

(yBE) and the tan Male Specific Element (t_MSE), responsible for generating the abdominal pattern of expression. A previous study found that the regulatory logic for male A5 and A6 expression included two binding sites for the Hox protein Abd-B in the yBE (Jeong et al., 2006). It remained unknown whether the t_MSE evolved a similar

Hox-regulatory logic to drive tan’s pattern of expression that largely resembles that of yellow (Figure 1.6). The absence of full tergite pigmentation and the general absence of yellow and tan expression in the female A5 and A6 segments are due to the toolkit gene locus known as bric-à-brac (bab). This locus contains tandem duplicate genes that each encode a presumed transcription factor protein (Bab1 and Bab2) that act to repress the melanic pigmentation through the repression of yellow and tan gene expression. bab expression is under the regulation of other trans-regulatory factors such as the doublesex

(dsx) gene, which encodes two sex-specific protein isoforms DSXM and DSXF , and through which it is a key regulator of somatic sex determination (Baker et al., 1991). bab is also regulated by one of the Hox genes, Abdominal-B (Abd-B) (Williams et al., 2008), and expression is directed by two modular CREs named the dimorphic element and anterior element (Figure 1.6). The dimorphic element has been well characterized to drive bab gene expression in the female A5 and A6 tergites and its regulatory logic has been scrutinized (Rogers et al., 2013; Williams et al., 2008) and shown to include direct binding sites for Abd-B and the DSX isoforms. The anterior element is known to regulate bab expression in the more anterior A2-A4 tergites of both males and females, though the

16

regulatory logic driving this segment-specific pattern has not been resolved, and thus remains to be seen if it includes Hox proteins.

Figure 1.6 The understanding of the gene regulatory network for abdominal pigmentation at the outset of my PhD studies. Abd-B directly interacts and activates both bab (dimorphic element or DRE for short) and yellow (yBE) gene CREs. bab (DRE) is also regulated by doublesex, where DSXM represses its activity while DSXF activates it. In some manner bab represses the expression of the terminal differentiation genes yellow (yBE) and tan (t_MSE) which are responsible for the enzymes needed to generate the derived melanic/black pigmentation patterns.

Rapid pigmentation evolution on an otherwise conserved abdomen

There are currently well over 100 extant species of fruit flies in the Sophophora sub-genus to which D. melanogaster belongs. Moreover, the phylogeny for Sophophora has been well resolved (Markow and O‟Grady, 2006). While these species share a conserved number of abdomen segments, the tergite color patterns have considerably diversified over the past ~35 million years, notably on the male abdomen (Figure 1.4).

This diversity includes the monomorphically un-pigmented abdomens of D. willistoni

17

(Figure 1.4F), a member of the willistoni species group. This monomorphic phenotype is thought to resemble the ancestral state of the Sophophora sub-genus (Kopp et al., 2000).

There are also monomorphic pigmentation patterns found on species of the obscura group (Figure 1.4E), though for this clade black/brown color predominates. The melanogaster species group differs from the previous two, as each of its three great clades includes many species with male-specific pigmentation typically found on the posterior most segments (Figure 1.4A, 1.4C, 1.4D). The ananassae subgroup harbors species where the pigmentation spans the male A4-A6 segments, such as D. malerkotliana (Figure 1.4D). The montium subgroup contains species for which male pigmentation is limited to the A6 tergite, such as D. auraria (Figure 1.4C). The oriental clade is largely composed of species that have the typical derived male-specific A5 and

A6 pigmented tergites, such as D. melanogaster. Previously it was inferred that that the common ancestor of these three clades most likely possessed a male-specific pattern of posterior segment tergite pigmentation (Jeong et al., 2006). The melanogaster species group also has several cases of atavism, where the pigmentation phenotype has reverted back to a monomorphic state, such as D. kikkawai and D. ananassae. This fruit fly pigmentation model trait also can be informed by the phenotypes and developmental mechanisms for hundreds of more distantly related out-group species, such as D. busckii and D. virilis that also have monomorphic patterns of tergite pigmentation. This diverse set of phenotypic patterns over a relatively short evolutionary time frame provides a superb model to study how alterations to CREs, genes, and a GRN have shaped this morphological novelty and its diversity.

18

Thesis overview

My thesis utilized the pigmentation genes, CREs, and GRN as a model to better understand gene regulation, its role in development, and how it evolves. In Chapter 2, my research showed how two unique Hox regulatory logics evolved to drive similar patterns of tan and yellow expression, and how these logics may explain why diversity and trait loss were largely shaped by evolution of the trans-regulatory tier of the pigmentation

GRN. In Chapter 3, I share research that I did in order to create genetic tools to study how long distance regulation is encoded within the sequences of CREs and promoters.

This set of tools should help the fruit fly community investigate the mechanisms of long distance CRE-promoter interactions in a detailed mechanistic manner that reveals key remote control element and promoter-proximal elements. My research repeatedly ran into the same obstacle, trying to ascertain the most pertinent transcription factors to investigate for potential roles within the regulatory logic of CREs with similar and different patterns of gene regulatory activity. The future of genetics will be challenged to crack the regulatory logic code of CREs. I talk about this in Chapter 4, and show how a yeast one-hybrid assay may help facilitate achieving this goal when many hundreds of transcription factors have to be assessed for binding sites in CRE and promoter sequences.

19

CHAPTER II

THE EVOLUTIONARY ORIGINATION AND DIVERSIFICATION OF A

DIMORPHIC GENE REGULATORY NETWORK THROUGH PARALLEL

INNOVATIONS IN CIS AND TRANS

This work was originally published in the peer-reviewed scientific journal PLoS Genetics in 2015 with the following citation: EM Camino, JC Butts, A Ordway, JE Vellky, M

Rebeiz, TM Williams. The evolutionary origination and diversification of a dimorphic gene regulatory network through parallel innovations in cis and trans. PLoS Genet.11 (4), e1005136 (2015)

Abstract

The origination and diversification of morphological characteristics represents a key problem in understanding the evolution of development. Morphological traits result from gene regulatory networks (GRNs) that form a web of transcription factors, which regulate multiple cis-regulatory element (CRE) sequences to control the coordinated expression of differentiation genes. The formation and modification of GRNs must ultimately be understood at the level of individual regulatory linkages (i.e. transcription factor binding sites within CREs) that constitute the network. Here, we investigate how 20

elements within a network originated and diversified to generate a broad range of abdominal pigmentation phenotypes among Sophophora fruit flies. Our data indicates that the coordinated expression of two melanin synthesis enzymes, Yellow and Tan, recently evolved through novel CRE activities that respond to the spatial patterning inputs of Hox proteins and the sex-specific input of Bric-à-brac transcription factors.

Once established, it seems that these newly evolved activities were repeatedly modified by evolutionary changes in the network‟s trans-regulators to generate large-scale changes in pigment pattern. By elucidating how yellow and tan are connected to the web of abdominal trans-regulators, we discovered that the yellow and tan abdominal CREs are composed of distinct regulatory inputs that exhibit contrasting responses to the same Hox proteins and Hox cofactors. These results provide an example in which CRE origination underlies a recently evolved novel trait, and highlights how coordinated expression patterns can evolve in parallel through the generation of unique regulatory linkages.

Introduction

The complexity of developmental processes often hinders our ability to trace their evolutionary history. Genetic programs of development are structured into convoluted networks of genes, interconnected at the level of transcriptional regulation (Davidson and

Erwin, 2006). Each network connection, or regulatory linkage, is formed through interactions between a transcription factor protein and binding site sequences within a cis-regulatory element (CRE). The collection of regulatory linkages possessed by a CRE encodes the pattern of gene expression driven by the CRE. Networks culminate in the

21

regulation of differentiation genes whose encoded products generate cell type-specific phenotypes. Hence, to understand how a developmental program originated or was diversified, one must trace how individual connections were formed between transcription factors and the CREs of the network.

CRE evolution is suspected to be a prime mode of trait evolution (Carroll, 2008;

Stern and Orgogozo, 2008; Wray, 2007), and in recent years several case studies have described CREs that have been modified to generate phenotypic consequences (Arnoult et al., 2013; Chan et al., 2010; Cretekos et al., 2008; Frankel et al., 2011; Gompel et al.,

2005; Guerreiro et al., 2013; Martin and Orgogozo, 2013; Prabhakar et al., 2008;

Prud‟homme et al., 2006; Rebeiz et al., 2011, 2009; Rogers et al., 2013; Shirangi et al.,

2009; Spitz et al., 2003; Williams et al., 2008). However, our current understanding of network evolution is hampered by the general difficulty of resolving the direct regulatory linkages within CREs and how CRE mutations alter specific linkages (Rebeiz and

Williams, 2011). For example, when genes are coordinately expressed, how similar are the encoded regulatory linkages within their CREs? What factors preside over the tendency of a network to evolve at upper level regulators or the terminal differentiation genes? The answers to these questions require studies of well-defined networks that govern morphologies that have diversified during recent evolutionary history.

The diverse abdominal pigmentation patterns of fruit fly species represent an optimal model to study the evolution of morphological characteristics (Wittkopp et al.,

2003). The model organism Drosophila (D.) melanogaster belongs to the fruit fly

22

subgenus Sophophora (Markow and O‟Grady, 2006), which contains species with a wide diversity of abdominal pigmentation patterns (Figure 2.1). A central theme that typifies

D. melanogaster and its close relatives (the melanogaster species group) is dimorphism, in which darkly pigmented males differ substantially from the generally unpigmented females. An ancestral character reconstruction analysis supports (84% posterior probability) an evolutionary scenario in which the most recent common ancestor of the melanogaster species group possessed a male-specific pattern of abdomen pigmentation

(Figure 2.1, node 3) (Jeong et al., 2006). Unlike the melanogaster species group, monomorphic pigmentation is predominant among extant species in the more distantly- related obscura (e.g. D. pseudoobscura) and willistoni (e.g. D. willistoni) species groups, supporting the scenario that most recent common ancestor of the melanogaster, obscura, and willistoni species groups had a monomorphic pattern of pigmentation (Figure 2.1, node 1) (Jeong et al., 2006). However, its remains uncertain whether the most recent common ancestor of the melanogaster and obscura group species (Figure 2.1, node 2) had dimorphic or monomorphic abdomen pigmentation. Once dimorphic pigmentation arose, the patterns diversified, expanding and contracting along the body axis in Oriental

(e.g. D. melanogaster), montium (e.g. D. auraria), and ananassae (e.g. D. malerkotliana) clades (Figure 2.1) (Jeong et al., 2006; Kopp, 2006; Kopp and True, 2002). Within the melanogaster species group, several instances of reversion to the monomorphic state occurred, as exemplified by D. kikkawai (montium subgroup) and D. ananassae

(ananassae subgroup) (Figure 2.1, nodes 4 and 5). Collectively, Sophophora tergite pigmentation provides an optimal model to investigate trait evolution, especially

23

considering the extensively characterized network and CREs governing the development of pigmentation in D. melanogaster.

Figure 2.1 Correlation between pigmentation and the gene expression of tan and yellow in the Sophophora subgenus. Each image shown is from a male abdomen. Whole-mount images of dorsal fruit fly abdomens from species representing diverse Sophophora lineages. D. melanogaster bears a derived pigmentation pattern on the male A5 and A6 tergites, which arose after it diverged from its most recent common ancestor (MRCA) shared with the monomorphically pigmented D. willistoni (node in phylogeny marked “1”) and perhaps after diverging from the MRCA shared with D. pseudoobscura (node “2”). Node 3 represents the MRCA of the melanogaster species group that includes the Oriental lineage (D. melanogaster), montium subgroup (includes D. auraria and D. kikkawai), and ananassae subgroup (includes D. malerkotliana and D. ananassae). This MRCA is suspected to have possessed male-specific tergite pigmentation that is indicated by the hemi-filled in circle. Since its origin, the number of pigmented male tergites has expanded (D. malerkotliana), retracted (D. auraria) and was independently lost (D. kikkawai and D. ananassae; nodes 4 and 5 that are indicated by the circles with a superimposed X). (B‟-H‟) Abdominal tan mRNA expression shown by in situ hybridization at a developmental stage equivalent to 85-95 hours after puparium formation (APF) for D. melanogaster pupae. (B‟‟-H‟‟) Abdominal yellow mRNA expression shown by in situ hybridization at a developmental stage equivalent to 75-85 hours APF for D. melanogaster pupae. Species are identified by labels at the top of each column.

24

Within the abdomen of D. melanogaster, coloring of the dorsal cuticle plates

(tergites) requires the co-expression of the genes tan and yellow in the underlying epidermal cells (Figure 2.1B‟ and 2.1B‟‟) (Jeong et al., 2008). Robust expression of tan and yellow in the male A5 and A6 segments are respectively regulated by CREs known as the tan male specific element (t_MSE) (Jeong et al., 2008) and the yellow body element (yBE) (Jeong et al., 2006; Wittkopp et al., 2002a). The Hox protein Abd-B, expressed in the pigmented A5 and A6 segments, is a direct activator of the yBE (Jeong et al., 2006), and represents a likely regulator of tan. However, little else is known about the regulatory linkages encoding the spatial and temporal activities for these CREs.

In this study we investigated the evolutionary histories and regulatory encodings of the yBE and t_MSE that coordinate expression of the Tan and Yellow pigmentation enzymes. Our results indicate that these CREs originated at different time points in the lineage leading to the common ancestor of the melanogaster species group. Our data supports a scenario where expansions, contractions, and losses of male-specific pigmentation evolved through a preponderance of trans-regulatory changes to the abdominal pigmentation gene network. In dissecting trans-regulatory inputs to the yBE and t_MSE, we discovered that these two CREs respond differently to alterations of the trans-landscape, notwithstanding their superficially similar patterns of expression. Lastly, our results indicate that these differences in responsiveness may be due to unique binding site architectures at these CREs, including a novel mechanism for the regulation of the t_MSE by the Hox protein Abd-A.

25

Results

The evolution of tan and yellow expression patterns

In D. melanogaster, tan and yellow are required for the pigmentation that develops on the male A5 and A6 segment tergites (Jeong et al., 2008; True et al., 2005;

Wittkopp et al., 2002b), and these genes‟ expression patterns in the dorsal epidermis underlying the tergites closely matches the pattern of pigmentation (compare Figure

2.1B‟ and 2.1B‟‟ to Figure 2.1B). For D. melanogaster, this is both the first report of tan expression and the first report of the RNA expression pattern for yellow. It has been shown that a similar expansion in tan and yellow expression occurs in D. prostipennis, a species that displays pigmentation extending into the male A4 tergite (Ordway et al.,

2014). Additionally, the loss of pigmentation in D. santomea was accompanied by the joint loss of yellow and tan expression (Jeong et al., 2006, 2008). Thus, we anticipated that the origin, diversification, and loss of male pigmentation among the Sophophora subgenus will have been driven in part by physically corresponding changes in tan and yellow expression.

For D. auraria, we found that tan and yellow expression is limited to the dorsal epidermis of the male A6 segment (Figure 2.1C‟ and 2.1C‟‟), the only segment in this species that manifests male-specific pigmentation. While yellow expression occurs throughout the A6 segment, the hemispherical pattern of tan expression more-closely matches that of the pigmentation pattern, suggesting that tan plays an important role in

26

the spatial-limitation of this male pattern element. For D. malerkotliana, we found that the expression of yellow (Figure 2.1E‟‟) extends into the A4 segment that exhibits expanded pigmentation (Figure 2.1E). However, tan expression remained restricted to the

A5 and A6 segments (Figure 2.1E‟), suggesting that the evolved pattern of yellow expression plays a role in this derived phenotype, and the absence of tan expression may explain why the A4 tergite has less intense pigmentation than that present on the A5 and

A6 tergites.

Pigmentation has been secondarily lost from D. kikkawai and D. ananassae males, which may have resulted from the loss of expression of tan and yellow. While yellow expression was found to be absent from the abdomen of D. kikkawai (Figure

2.1D‟‟), tan expression was observed in the A6 segment in a pattern similar to that for D. auraria (compare Figure 2.1C‟ and 2.1D‟). This implies that the loss of male-specific pigmentation for this species included the loss of yellow expression, but did not require the inactivation of tan expression. For D. ananassae, neither tan nor yellow were found to be expressed in the male abdominal epidermis (Figure 2.1F‟ and 2.1F‟‟). It remains possible that these cases of pigmentation loss were initially due to a mechanism that had no bearing on yellow and tan expression. At a minimum, our results suggest that pigmentation loss at some point was accompanied by changes that eliminated these genes expression from the abdominal epidermis. This outcome was previously shown to have occurred in D. santomea (Jeong et al., 2006, 2008), a third Sophophora species for which male tergite pigmentation was independently lost (Lachaise et al., 2000).

27

The aforementioned expression patterns support a role for changes in tan and yellow expression in the diversification and secondary loses of male pigmentation within the melanogaster species group. To date though, the expression of these pigmentation genes had not been investigated in species from the obscura and willistoni species groups in which most known extant species possess sexually monomorphic patterns of tergite pigmentation. For D. pseudoobscura of the obscura species group, we found that tan and yellow are both expressed throughout the male abdominal epidermis (Figure 2.1G‟ and

2.1G‟‟), a pattern that corresponds with this species‟ dark coloration (Figure 2.1G). For

D. willistoni, of the willistoni species group, we found that neither yellow nor tan are expressed at appreciable levels in the abdominal epidermis during the stages when tergite pigmentation is being specified (Figure 2.1H‟ and 2.1H‟‟). This suggests that D. willistoni lacks pigmentation in part due to the absence of these genes‟ expression, and this monomorphic absence may reflect the ancestral state from which male-specific pigmentation evolved, as suggested elsewhere (Jeong et al., 2006; Kopp et al., 2000;

Rogers et al., 2013; Salomone et al., 2013; Williams et al., 2008).

Collectively, our expression analyses support the hypothesis that the origin, diversification, and loss of male-specific pigmentation involved numerous alterations to the expression of tan and yellow. The mutational and mechanistic basis for CRE evolution remains poorly understood, especially in cases for genes whose expression patterns physically correspond. Thus, we next sought to determine the CRE basis for these evolved gene expression patterns and phenotypic differences.

28

The evolutionary origin of CREs controlling tan and yellow expression

The recent evolution of sexually dimorphic pigmentation within the subgenus

Sophophora provides a model trait whose origination can be resolved to the level of

CREs activating pertinent genes within a network. We sought to elucidate the evolutionary histories of the yBE and t_MSE that respectively control the coordinated expression of the yellow and tan genes. The sequences orthologous to these tan and yellow abdominal CREs were isolated from species with the derived male-specific tergite pigmentation, as well as species from more distantly-related lineages and that possess monomorphic patterns of tergite pigmentation. For these sequences, we directly compared their regulatory activities in in vivo reporter transgene assays in D. melanogaster (Figure 2.2).

29

Figure 2.2 Tracing the ancestry and evolution of CREs that drive male-specific tan and yellow expression. (A-E) A schematic representation of the male abdomens for several Sophophora species with diverse patterns of pigmentation. (A‟-E‟) EGFP-reporter transgene expression driven by sequences orthologous to the D. melanogaster t_MSE in transgenic male pupae at ~95 hours after puparium formation. (A‟-C‟) Sequences from species with male-specific tergite pigmentation drive robust male-specific reporter expression in the A5 and A6 segments, (D‟ and E‟) whereas the sequences from more distantly related species with monomorphic tergite pigmentation have little-to-no abdominal regulatory activity. (A‟‟-E‟‟) EGFP-reporter transgene expression driven by sequences orthologous to the D. melanogaster yellow wing/body element, here referred to as yellow 5‟, in transgenic male pupae at ~85 hours after puparium formation. (A‟‟-C‟‟) Sequences from species with male-specific tergite pigmentation drive robust male- specific reporter expression in the A5 and A6 segments. (D‟‟) The D. pseudoobscura sequence drives pan-abdomen reporter expression. (E‟‟) The D. willistoni sequence lacks abdominal regulatory activity. Blue arrowheads indicate D. auraria reporter expression in transgenic D. melanogaster that extends one segment anterior to the endogenous domain of pigmentation. Red arrowheads indicate the absence of D. malerkotliana reporter expression in the A4 segment of transgenic D. melanogaster, a segment that is endogenously pigmented.

30

The region orthologous to the t_MSE (Figure S2.1A) was isolated from the dimorphically pigmented species D. auraria (montium subgroup), and D. malerkotliana

(ananassae subgroup), and tested for abdominal activity in transgenic D. melanogaster pupae. In each case, robust reporter expression was observed in the male A5 and A6 abdomen segments (Figure 2.2B‟ and 2.2C‟). Likewise, orthologous sequences containing the yBE CREs derived from these same species each drove a male-specific pattern of reporter gene expression (Figure 2.2B‟‟ and 2.2C‟‟). These similarities in regulatory activities to the D. melanogaster CREs support a scenario in which the common ancestor of the melanogaster species group (Figure 2.1, node 3) possessed male- specific tan and yellow gene expression driven respectively by an ancestral t_MSE and yBE.

To determine the timing and mechanism by which the novel trait of sexually dimorphic pigmentation arose, we isolated and tested orthologous sequences from the genomes of D. pseudoobscura and D. willistoni, two species from inferred ancestrally monomorphic lineages. As the t_MSE lies in between two upstream genes (CG1537 and

Gr8a, Fig S2.1A), we confirmed their syntenic organization in these species (Figure

S2.2), suggesting a conserved gene order in the common ancestor of these disparate

Sophophora lineages. However, the D. pseudoobscura and D. willistoni intergenic sequences between CG1537 and Gr8a had little-to-no t_MSE-like CRE activity (Figure

2.2D‟ and 2.2E‟). The absence of CRE activity parallels our observations of tan expression for D. willistoni (Figure 2.1H‟), but contrasts with the monomorphic pattern of expression observed for the monomorphically pigmented D. pseudoobscura (Figure

31

2.1G‟). These results are consistent with a scenario where the t_MSE originated to generate dimorphic expression in the lineage leading to the melanogaster species group after it diverged from the obscura group lineage (Figure 2.1, node 3). Moreover, the monomorphic expression of tan for D. pseudoobscura seemingly would be driven by another regulatory sequence or sequences.

The sequence 5‟ of the D. pseudoobscura yellow gene possessed abdominal CRE activity that was enhanced in males compared to females (Figure S2.3), though the domain of activity spanned all abdominal segments (Figure 2.2D‟‟). Previously, a pan- abdomen CRE activity was observed for D. pseudoobscura (Kalay and Wittkopp, 2010) and for the orthologous gene region from D. subobscura, another obscura group species

(Jeong et al., 2006). The results here show that the D. pseudoobscura sequence has abdominal regulatory activity that is enhanced in males when assayed in the D. melanogaster trans-regulatory environment. This suggests that a CRE with spatial and sex-specific inputs was present in the yellow gene of the most recent common ancestor of the obscura and melanogaster species groups (Figure 2.1, node 2).

To determine whether the yBE has an even deeper Sophophora ancestry, we inspected the regulatory capability of sequences 5’ of the D. willistoni yellow gene

(Figure S2.4). Since little-to-no sequence conservation is detectable in comparisons of D. willistoni yellow 5‟ sequence to that of D. pseudoobscura and D. melanogaster, we evaluated the regulatory activity of two partially overlapping (1.2 kilobase, or kb, overlap) sequences that collectively span the first 5.1 kb of sequence 5‟ of yellow exon 1

32

(Figure S2.4A). The proximal 3 kb to yellow exon 1 (called y wil 5‟ 2) lacked abdominal regulatory activity (Figure S2.4C), whereas the more distal (y wil 5‟ 1) sequence had

CRE activity limited to monomorphic stripes at the posterior edges of each abdominal segment (Figure S2.4B) and throughout the pupal wing (Figure S2.4B‟). These stripe and wing activities are characteristic of the D. melanogaster wing element CRE that is similarly positioned more distal to the 1st exon of yellow than the yBE (Wittkopp et al.,

2002a) and corresponds with the location of this species tergite pigmentation (Figure

2.1H). Thus, the D. willistoni yellow locus possesses an orthologous wing element, but lacks a CRE with activity characteristic of the yBE. This D. willistoni CRE architecture can also be inferred from a previous study that looked at 5.9 kb of 5‟ sequence in a single reporter transgene (Kalay and Wittkopp, 2010). These results support an evolutionary scenario where the most recent common ancestor of monomorphic and dimorphic

Sophophora lineages (Figure 2.1, node 1) lacked an orthologous body element.

Moreover, the evolution of sexually dimorphic pigmentation was accompanied by the origination of novel CRE activities of yellow and tan that integrate spatial and sex- specific regulatory inputs, and the t_MSE appears to be of more recently origin than the yBE.

Expansion and contraction of pigment patterns through trans evolution

Following the origin of male-specific tergite pigmentation, the number of pigmented tergites expanded and contracted to range from the single A6 segment of D. auraria to the A5 and A6 segments of D. melanogaster, and the A4-A6 segments seen

33

for D. malerkotliana (Figure 2.1). These phenotypic changes correspond with expansions and retractions in the expression of tan (Figure 2.1B‟, 2.1C‟, and 2.1E‟) and yellow

(Figure 2.1B‟‟, 2.1C‟‟, and 2.1E‟‟) along the anterior-posterior axis. A priori, such changes in spatial expression could originate from sequence changes in the t_MSE and yBE (hereafter referred to as “cis-evolution”) or through changes in an upstream regulatory gene or genes (referred to hereafter as “trans-evolution”). We found that the t_MSE and yellow 5‟ regulatory sequences from D. auraria each drove reporter gene expression in the male A5 and A6 segments of transgenic D. melanogaster (Figure 2.2B‟ and 2.2B‟‟). This domain of activity matches the output driven by the D. melanogaster

CREs, yet extends one segment anterior relative to their endogenous expression in D. auraria (compare Figure 2.2B‟ and 2.2B‟‟ to Figure 2.1C‟ and 2.1C‟‟). Similarly, the t_MSE and yellow 5‟ sequences for D. malerkotliana each drove reporter expression in the A5 and A6 segments of transgenic D. melanogaster (Figure 2.2C‟ and 2.2C‟‟), a domain that is shifted posterior by one segment compared to the pigmentation phenotype and that for the endogenous pattern of yellow expression in D. malerkotliana (Figure

2.1E‟‟). The up-regulation of yellow in the D. malerkotliana A4 segment is modest relative to the A5 and A6 segments, matching the lighter phenotype of this segment

(Figure 2.1E). However, the D. malerkotliana yellow 5‟ sequence is functionally- indistinguishable from the D. melanogaster CRE in the A4 segment. This similarity could be explained by “cis” evolution elsewhere in the yellow locus, such as the wing element, promoter, or intron. However, the yellow 5‟ sequences evaluated here included the wing element and the putative D. malerkotliana promoter region. Thus, such a “cis-elsewhere” scenario would have to be due to an evolved intronic or more distally-located CRE. We

34

favor the interpretation that the D. malerkotliana expression phenotype arose by evolution in trans, and more broadly that diversification in male tergite pigmentation evolved in large part from changes in trans to tan and yellow. In the future, the reciprocal test of the orthologous sequences as reporter transgenes in D. malerkotliana could provide more definite evidence for either the cis-elsewhere in yellow or trans-evolution scenarios.

Pigmentation loss through a mosaic of cis- and trans-evolution

Male-specific pigmentation has been lost several times within the melanogaster species group (Jeong et al., 2006). We sought to trace the paths by which this has occurred independently at the network level in two monomorphic species, D. kikkawai and D. ananassae (Figure 2.1). Previously, the D. kikkawai yBE was found to lack regulatory activity due to cis-evolution, in which a key binding site for Abd-B was lost

(Jeong et al., 2006). Our in situ hybridization results confirm that yellow expression is indeed absent in the abdomen of D. kikkawai (Figure 2.1D‟‟). We were curious whether the expression of tan had similarly been lost through cis-regulatory changes to its CRE.

Surprisingly, tan expression was detected in the A6 body segment of D. kikkawai males, in a pattern reminiscent of D. auraria, a second montium species (Compare Figure 2.1C‟ and 2.1D‟), and male-limited like that seen for the expanded expression in males of species from the outgroup Oriental clade (D. melanogaster) and ananassae subgroup (D. malerkotliana) (Figure 2.1B‟ and 2.1E‟). Consistent with the endogenous expression pattern, the D. kikkawai t_MSE drove robust expression in the A5 and A6 segments of

35

transgenic D. melanogaster pupae (Figure 2.3A‟). These results indicate that the loss of pigmentation in D. kikkawai has proceeded without altering the ancestral expression of tan, suggesting that evolution of this trait occurred through other genes.

Figure 2.3 Tracing the CRE bases for losses in male tergite pigmentation. (A and B) A schematic representation of the male abdomens for two species with derived losses in male pigmentation. (A‟ and B‟) EGFP-reporter transgene expression driven by sequences orthologous to the D. melanogaster t_MSE in transgenic male pupae at ~95 hours after puparium formation. (A‟) The D. kikkawai sequence possesses robust male-specific regulatory activity in the A5 and A6 segments, (B‟) whereas the D. ananassae sequence has little-to-no abdominal regulatory activity. (A‟‟ and B‟‟) EGFP-reporter transgene expression driven by sequences orthologous to the D. melanogaster yellow wing/body element in transgenic male pupae at ~85 hours after puparium formation. (A‟‟) The D. kikkawai sequence retains the posterior stripe regulatory activities characteristic of the wing element but lacks the body element‟s male-specific activity. (B‟‟) The D. ananassae sequence possesses the regulatory activities characteristic of the wing element and body element. Red arrowheads indicate segments in which the regulatory activity is lacking.

36

For D. ananassae, we found that the loss of male pigmentation was accompanied by the loss of yellow and tan expression in males (Figure 2.1F‟ and 2.1F‟‟). Interestingly, the CRE targets for cis- and trans-evolution were distinct from the case of D. kikkawai.

Specifically, the orthologous yellow 5‟ regulatory region retained regulatory activity in the male A5 and A6 segments in transgenic D. melanogaster (Figure 2.3B‟‟), whereas the t_MSE lacked activity (Figure 2.3B‟). As the most recent common ancestor of the melanogaster species group likely possessed a male pattern of abdomen pigmentation

(Figure 2.1, node 3) (Jeong et al., 2006), our results indicate that dissimilar modifications to this ancestrally dimorphic abdominal pigmentation network were responsible for these similar morphological outcomes. These divergent evolutionary paths may reflect the use of trans-regulatory inputs that differ between the yBE and t_MSE CREs. In order to understand how the evolution of tan and yellow expression has been individualized, we sought to characterize the regulatory linkages that control these CREs in D. melanogaster.

Distinct combinations of Hox factors and co-factors control the coordinated activities of the t_MSE and the yBE

Previously, Abd-B was shown to be a direct activator of yellow expression in the

A5 and A6 segments through its interaction with two binding sites in the ~1.6 kb yBE reporter (Figure S2.1G, vertical blue lines) (Jeong et al., 2006). With our ultimate goal being to functionally characterize the regulatory inputs responsible for yellow expression in the abdominal epidermis; we sought to define a more minimal CRE sequence capable

37

of directing robust expression in the male A5 and A6 abdominal segments (Figure

S2.1H). Thus, we created three progressively truncated forms of the yBE centered on the two Abd-B binding sites (Figure S2.1G). The 1.1 and 0.9 kb sequences each drove robust but ectopic EGFP reporter gene expression (Figure S2.1I and S2.1J). The third truncated version of 0.6 kb drove reporter expression in a pattern limited to the A5 and A6 segments (Figure S2.1K). We refer to this sequence as yBE0.6, and this was the sequence that we chose to further characterize.

The coordinated patterns of tan and yellow expression in the A5 and A6 segments may be explained through the yBE0.6 and t_MSE possessing the same regulatory inputs and equivalent regulatory activities, or alternatively by these two CRE possessing unique regulatory inputs. The only known direct regulator of a pigmentation gene CRE is Abd-

B‟s interaction with the yBE. Genetic evidence for this regulatory input can be seen in the

Transabdominal (Tab) genetic background, where ectopic Abd-B expression occurs in the

A4 and A3 segments (Celniker and Lewis, 1993). Consistent with Abd-B functioning as an upstream trans-activator of the yBE0.6, EGFP expression occurred ectopically in the male A4 and A3 segments of Tab mutants (Figure 2.4B‟, blue arrowheads). When the t_MSE reporter gene was evaluated in the Tab background, a similar expansion of regulatory activity was seen (Figure 2.4B, blue arrowheads), suggesting that like yellow,

Abd-B is an upstream activator of tan expression.

38

Figure 2.4 Genetic interactions between pigmentation network transcription factors and CREs regulating abdominal tan and yellow expression. (A-F) EGFP reporter expression driven by the t_MSE was imaged at ~95 hours after puparium formation in male pupae. (A‟-F‟) EGFP reporter expression driven by the yBE0.6 was imaged at ~85 hours after puparium formation in male pupae. (A‟‟-F‟‟) EGFP reporter expression driven by the yellow 5‟ sequence was imaged at ~85 hours after puparium formation in male pupae. Genotypes altering the genetic background are listed at the top of each column. Specimens are (A, A‟, and A‟‟‟) homozygous and (B-F, B‟- F‟, and B‟‟-F‟‟) hemizygous for the EGFP reporter transgene. (B) t_MSE, (B‟) yBE0.6, and (B‟‟) yellow 5‟ regulatory activity expands into the A3 and A4 segments where Abd-B is ectopically expressed. Compared to a (C) control genetic background, the (D) t_MSE regulatory activity is dramatically reduced in the midline region where abd-A expression is suppressed. Suppression of (E) hth and (F) exd expression results in ectopic t_MSE regulatory activity in the A4 and A3 segments. Suppression of (D‟) abd-A, (E‟) hth, and (F‟) exd expression has little-to-no effect on yBE0.6 regulatory activity compared to the (C‟) control genetic background. Suppression of (D‟‟) abd-A and (F‟‟) exd has little-to-effect on the yellow 5‟ regulatory activity compared to the (C‟‟) control genetic background. Suppression of (E‟‟) hth results in a mild expansion of regulatory activity into the A3 and A4 segments. Blue arrowheads indicate segments where the genetic background modification resulted in ectopic reporter transgene activity. Red arrowheads indicate segments where the genetic background modification resulted in a loss of reporter transgene activity.

39

During development the Hox genes abd-A and Abd-B are both required to specify the identities of the A5 and A6 segments (Sánchez-Herrero, 1991; Sanchez-Herrero et al.,

1985). While Abd-B expression occurs in both the A5 and A6 segments (Kopp and

Duncan, 2002), the range of Abd-A expression includes the A2-A6 segments (Rogers et al., 2014). Previously we found that Abd-A is expressed in and required for the male- specific pattern of pigmentation and t_MSE activity in the A5 and A6 segments (Rogers et al., 2014). It seemed plausible that Abd-A and Abd-B are part of a shared Hox- regulatory circuit that directs the coordinated expression of tan and yellow. To test this possibility, we used the pnr-GAL4 to drive dorsal midline expression of an

RNA interference (RNAi) transgene that specifically silences abd-A expression. In this genetic background, EGFP expression driven by the t_MSE was markedly reduced compared to a control genetic background in which a luciferase transgene was ectopically expressed (compare Figure 2.4C and 2.4D). To our surprise, EGFP expression driven by the yBE0.6 was not noticeably altered in the abd-A silenced genetic background compared to the control genetic background (compare Figure 2.4C‟ and 2.4D‟). These outcomes indicate that abd-A functions in the abdomen as an upstream regulator of tan, but has little-to-no effect on yellow expression as directed by the yBE0.6 CRE. Hence, the regulatory wiring responsible for coordinated expression of tan and yellow seem to substantially differ.

The in vivo selectivity of Hox proteins for their target gene CREs has been found in several cases to be enhanced through cooperative binding with Hox cofactor proteins

(Mann et al., 2009). For Drosophila, the best studied Hox cofactors are the transcription

40

factors Hth and Exd. RNAi-mediated suppression of hth and exd results in ectopic pigmentation of the male A3 and A4 abdominal segments, suggesting that these Hox cofactors operate as upstream repressors of male tergite pigmentation in the A3 and A4 abdomen segments (Rogers et al., 2014). Moreover, RNAi-mediated suppression of hth and exd in the dorsal abdomen midline results in ectopic regulatory activity for the t_MSE in the male A3 and A4 segments (Figure 2.4E and 2.4F, blue arrowheads). In contrast, we found that the yBE0.6 failed to drive a comparable expanded reporter gene expression upon RNAi knockdown of hth or exd expression (Figure 2.4E‟ and 2.4F‟), highlighting that the coordinated activation of the yBE0.6 and t_MSE occur through distinct mechanisms that differ in Hox co-factor dependence.

We were concerned that the different utilizations of Hox and Hox cofactor inputs for the t_MSE and yBE0.6 might be an artifact of truncating down the full yellow regulatory sequence to a minimal element lacking key regulatory inputs. Thus, we evaluated a larger yellow gene 5‟ sequence which contains both the wing element and body element in the same genetic backgrounds where Hox and Hox cofactor expression were modified. This larger sequence had ectopic expression in the Abd-B misexpression background as seen for the yBE0.6 element (compare Figure 2.4B‟‟ to 2.4B‟). Like the yBE0.6 element, little-to-no change in reporter expression was observed in the abd-A and exd RNAi background (Compare Figure 2.4D‟‟ and 2.4F‟‟ to 2.4D‟ and 2.4F‟), results that contrast with the prominent alteration that occurs for t_MSE-directed reporter expression (Figure 2.4D and 2.4F). The only difference we observed was a modest up- regulation of yellow 5‟-driven reporter expression in the A3 and A4 segments of the hth

41

RNAi background. This suggests that yellow expression is responsive to hth through some regulatory sequence outside of the yBE0.6 element, perhaps through the wing element which drives a low level of expression in the epidermis (Wittkopp et al., 2002a).

Spatial-mapping of yBE0.6 and t_MSE regulatory inputs

In order to more comprehensively characterize the regulatory linkages within the yBE0.6, we created 10 scanning mutant (SM1 – SM10) versions of the yBE0.6

(Appendix A and S2.5). For each scanning mutant, a single contiguous block of ~70 bp was altered at every other to its non-complementary transversion (Appendix A).

The regulatory activity characteristic of the yBE0.6 was not notably altered by the SM8 and SM9 mutations. For several scan mutants, activity was either reduced (Figure 2.5C and 2.5D; SM2 and SM3) or lost (Figure 2.5F and 2.5G; SM5 and SM6), indicating that the mutated blocks likely encode binding sites for activating transcription factor inputs.

The SM5 and SM6 mutations spanned CRE sequences that include the bona fide binding sites for Abd-B (Jeong et al., 2006), though we left these binding sites unaltered in these two scanning mutants (Appendix A). The diminished regulatory activity caused by both

SM5 and SM6 indicates that Abd-B collaborates with other adjacent binding transcription factors to activate gene expression in the pupal abdomen. The reduced regulatory activity of the SM2 and SM3 mutants demonstrates that additional activating inputs reside outside of the known Abd-B sites.

42

Figure 2.5 Scanning mutagenesis identifies CRE sequences required for yellow and tan expression. (A) Ten scanning mutant versions, SM1-SM10, of the D. melanogaster yBE0.6 sequence and t_MSE sequence were created. In each mutant, a block of ~70 base pairs was altered such that every other nucleotide was altered by a non-complementary nucleotide transversion. (B-G) EGFP reporter transgene expression in D. melanogaster male pupae at ~85 hours after puparium formation. (B) The yBE0.6 sequence drives reporter expression in the male A5 and A6 segments. (C and D) The SM2 and SM3 mutations resulted in a modest reduction in regulatory activity, whereas the (F and G) SM5 and SM6 mutations resulted in a near-total loss of reporter activity. (E) The SM4 mutation led to a pan-abdomen increase in regulatory activity. (H- J) EGFP reporter transgene expression in D. melanogaster male pupae at ~95 hours after puparium formation. (H) The t_MSE sequence drives expression in the male A5 and A6 segments. (I and J) The SM5 and SM6 mutations resulted in a loss of reporter expression in the male abdomen. Red arrowheads indicate regions where the regulatory activity was greatly reduced due to a scanning mutation and the Orange arrowheads indicate regions where regulatory activity was modestly reduced. Blue arrowheads indicate regions where regulatory activity was gained due to a scanning mutation. 43

In addition to scanning mutations resulting in reductions in yBE0.6 regulatory activity, three resulted in gains in regulatory activity suggesting that the mutated sequences disrupted binding sites for repressive transcription factor inputs. yBE0.6 regulatory activity in the male A2-A6 abdomen segments was notably increased by the

SM4 alteration (Figure 2.5E). The yBE0.6 SM8 and SM10 CREs each exhibited a modest ectopic regulatory activity in the male A4 and A3 segments (Figure S2.5I and S2.5K).

Collectively, these results suggest that repressing and activating inputs are distributed throughout the 660 base pairs of yBE0.6.

A similar scanning mutagenesis strategy was carried out for the 860 bp t_MSE

(Appendix B, and Figure S2.6), in which each scanning mutation spanned ~80 bp. While the t_MSE regulatory activity was not noticeably altered in 8 of 10 scanning mutants

(Figure S2.6), the SM5 and SM6 alterations each resulted in a dramatic reduction of

EGFP reporter expression in the male A5 and A6 segments (Figure 2.5I and 2.5J). In order to more precisely localize activation inputs, we generated a series of fine-scale scanning mutations (Appendix C and S2.7) within a minimal 351 bp subfragment of the t_MSE that reproduces its activity (“t_MSE2”, Figure S2.1). For the 351 bp t_MSE, we generated 10 scanning mutants in which each mutation spanned ~20 bp and collectively covered the entire SM5 and SM6 region (Appendix C). While the 5i1, 5i3, 5i4, and 6i1-

6i3 scanning mutants had no noticeable effect on t_MSE2 regulatory activity, scanning mutants 5i2, 6i4, 6i5, and 6i6 consistently drove reduced reporter expression in the male

A5 and A6 segments (Figure S2.7). Thus, this CRE has activating inputs located within

44

the SM5 and SM6 regions. We sought to determine if these activating inputs include binding sites for Abd-A and Abd-B.

The t_MSE is an indirect and direct target for posterior Hox proteins

Abd-B is a key direct regulatory input for the yBE that is necessary to drive yellow expression throughout the male A5 and A6 segments (Jeong et al., 2006), and yet, the yBE has little-to-no response to alterations of Abd-A. To disentangle how tan generates a correlated pattern with yellow, but is genetically downstream of both Abd-B and Abd-A, we sought to determine whether these Hox factors directly bind the t_MSE.

An in vitro study has demonstrated a preference of TTAT sites for Abd-B binding and

TAAT sites for Abd-A (Noyes et al., 2008). Within the SM5i2 region required for A5 and

A6 regulatory activity, resides only a single TTAT site (Appendix C). However, this site also occurs in the overlapping region for the SM5i1 region which is dispensable for abdominal activity as this mutant CRE has 114±3% of the wild type CRE‟s activity in the male A5 segment (Figure S2.7D). Thus, we did not further consider the SM5i2 region as a location for direct Hox-regulation.

We next considered the SM6 region, for which the scanning mutation (SM6) resulted in a drastic reduction of male activity to 31±1% of the wild type CRE (compare

Figure 2.6B and 2.6F). Within this 85 bp SM6 region reside 5 sequences matching either

TTAT or TAAT sites, or both (Figure 2.6A, sites 1-5). We generated and tested two mutant t_MSE2 reporter constructs, one in which the three TTAT sequences were

45

mutated and the other for which the TTAT and TAAT sites were mutated (Appendix D).

To our surprise, the activity of the TTAT site mutant was only modestly reduced to

89±6% of the wild type CRE in the A5 segment (compare Figure 2.6B and 2.6C).

Moreover, when the two TAAT sites were additionally mutated, regulatory activity was measured at 88±3% of wild type (Figure 2.6D). This lack of a pronounced regulatory effect contrasts with that caused by the 85 bp scanning mutation 6 (Figure 2.6F). Thus, with respect to the posterior male segment activity of the t_MSE2, it appears that Abd-B and Abd-A play little-to-no role as direct activators.

While the TTAT and TAAT sites had little-to-no importance regarding the activity of the t_MSE2 in the A5 and A6 segments, these sites were important for limiting reporter expression to these more posterior abdomen segments. When the TTAT sites were mutated, the regulatory activity in the A4 and A3 segments respectively increased to

169±4% and 261±6% of the wild type sequence (Figure 2.6C, blue arrowheads). When all TTAT and TAAT sites were mutated, more pronounced increases in regulatory activity were observed in the A4 (207±2%) and A3 (281±2%) segments (Figure 2.6D, blue arrowheads). These effects contrast with the modest reductions in activity that occurred for the scanning mutation 6 (Figure 2.6F; 64±1% for A4 and 73±1% for A3).

Our results suggest that these sequences function as Hox binding sites to repress tan expression in abdominal segments anterior to that of A5. Of note, these anterior segments express Abd-A, but not Abd-B. Hence, unlike the yBE, the t_MSE appears to have co- opted direct Abd-A regulation in a unique way.

46

Figure 2.6 Characterization of the direct Hox inputs shaping tan expression. (A) The SM6 region of the t_MSE possesses five sites, S1-S5, with sequences characteristic of Abd-B (TTAT) and Abd-A (TAAT) binding sites. This CRE region also possesses sites resembling sequences bound by Exd and Hth, though the functionality of the Exd site was not studied here. (B-F) EGFP reporter transgene expression in male pupae at ~95 hours after puparium formation. (B) The t_MSE2 sequence drives robust expression in the male A5 and A6 segments. When all of the (C) TTAT sites and (D) TTAT and TAAT sites depicted in (A) were mutated, regulatory activity in the male A5 segment was reduced to 89±6% and 88±3% respectively. (F) When the entire SM6 region was mutated, regulatory activity decreased to 31±1%. (C) The TTAT site mutations resulted in activity increasing in the A4 and A3 segments respectively to 169±4% and 261±6%. (D) The TTAT and TAAT site mutations resulted in activity increasing in the A4 and A3 segments respectively to 207±2% and 281±2%. (E) When the Hth site was mutated, regulatory activity in the male A5, A4, and A3 segments respectively increased to 122±7%, 236±6%, and 276±4%.(F) The entire SM6 region mutation resulted in activity decreasing in the A4 and A3 segments respectively to 64±1% and 73±1%. Blue arrowheads indicate segments where activity was notably increased and Red arrowheads indicate segments with notably decreased activity compared to the wild type sequence. Gel shift assays for sequences possessing wild type and mutant site (G) 3, (H) 4, and (I) 5 and the DNA-binding domains for Abd-A and Abd-B. Binding reactions used increasing amounts of the GST-DNA binding domain fusion protein (from left to right: 0 ng, 111 ng, 333 ng, 1000 ng, and 3000 ng). Binding correlated with the amount of input protein for the probes with the non-mutant sequence, whereas binding was dramatically reduced for the probes with a mutant Hox site. 47

To further address whether Abd-A directly binds to and regulates the t_MSE, we focused our attention on sites S3, S4, and S5, as these sites were closely associated with a sequence resembling a site for Exd and one resembling a site for Hth (Figure 2.6A). We found that Abd-A bound to probes with the wild type sites but little-to-no binding occurred with the mutant versions (Figure 2.6G-2.6I), indicating that Abd-A can specifically bind to these sequences. A similar specific binding was shown for Abd-B, suggesting that these sequences can also be bound by this Hox protein.

It seemed plausible that the adjacent cofactor sites might be necessary to repress t_MSE activity in the anterior abdominal segments in conjunction with Abd-A. As a cursory test of this possibility, we mutated the putative Hth-site within the context of the t_MSE2 sequence (Appendix D) and tested this sequence‟s capability to regulate EGFP expression in transgenic pupa (Figure 2.6E). Consistent with this site being necessary for repression, the regulatory activity in the A3 and A4 segments respectively increased to

236±6% and 276±4% of the wild type sequence (Compare Figure 2.6B to 2.6E).

Collectively, the t_MSE‟s abdominal regulatory activity occurs through an encoding distinct from that responsible for the similar pattern of yellow expression directed by the yBE0.6 CRE.

Discussion

Here, we have traced the evolutionary history of two CREs required for a novel trait, and show that they have recently evolved similar expression patterns through

48

remarkably different architectures in a common trans-regulatory landscape. Our data indicates that the tergite-wide activities of the yBE and t_MSE did not exist in the monomorphic ancestor for Sophophora, but evolved in the lineage leading to the common ancestor of the melanogaster species group. Our results support a scenario where the subsequent expansion and contraction of male pattern was driven primarily by alteration of the trans-regulators, whereas repeated losses involved both cis- and trans- evolution with respect to these CREs. Though the t_MSE and yBE drive coordinated patterns of gene expression, we found striking differences in their upstream regulators and direct regulatory linkages (Figure 2.7). These results bear on our understanding of how new gene regulatory networks form, diversify, and how coordinated regulatory activities can arise through the independent evolution of unique regulatory codes.

49

Figure 2.7 Gene network models for unpigmented and pigment abdominal segments. Wiring diagram of pigmentation gene networks experienced by the (A) non-melanic male A2-A4 segments and (B) the melanic A5 and A6 segments. (A) Abd-B expression is lacking in the anterior A2-A4 segments and as a result yellow and tan lack the direct and indirect activating input from this transcription factor. In these segments, Abd-A forms direct repressive inputs with tan which are supported (directly or indirectly) by the repressive effects of exd and hth. (B) Abd- B is expressed in the posterior A5 and A6 segments, where it acts as a direct activator of yellow and an indirect activator of tan. In these segments, Abd-A acts as an indirect activator of tan expression as well. In these schematics, inactive genes are indicated in gray coloring, solid connections between genes indicate validated direct interactions between a transcription factor and a pigmentation gene CRE, and dashed connections indicate indirect interactions or those not yet shown to be direct. Connections terminating with an arrowhead indicate connections in which the transcription factor functions as an activator and connections terminating in a nail head shape indicate a repressive relationship.

50

Inferring a Mechanism for a Nascent Hox-Regulated Genetic Switch

Hox transcription factors play a prominent role in generating the differences in serially homologous animal body parts, and the origin of novelties (Carroll, 1995). The diversification of homologous parts can be driven by changes in the spatial domains of

Hox protein expression, as has been shown for crustacean appendage morphology

(Averof and Patel, 1997), snake limblessness (Cohn and Tickle, 1999), and for the water strider appendage ground plan (Khila et al., 2009). Changes in the downstream Hox targets are evident in cases such as the hindwings of insects (Weatherbee et al., 1999), and for fruit fly tergite pigmentation (Jeong et al., 2006). The origin of novel structures can also be traced to the co-option of Hox proteins, as exemplified by cases such as the

Photuris firefly lantern (Stansbury and Moczek, 2014) and the sex combs residing on the forelegs of certain Drosophila species (Barmina and Kopp, 2007; Tanaka et al., 2011).

For many of these evolved traits, the molecular mechanisms by which Hox expression patterns and target genes evolve remain unknown.

While mechanistic studies on the evolution of Hox-regulated CREs remain limited, several target gene CREs have been thoroughly characterized and serve as exemplars of Hox-regulation during development (Mann et al., 2009). Hox proteins can interact with CRE binding sites as monomers (Walsh and Carroll, 2007) or through cooperative interactions with Hox-cofactors (Gebelein et al., 2004; Li-Kroeger et al.,

2008; Ryoo et al., 1999). The activity of these bound complexes can be further modulated through interactions with collaborating transcription factors. However, to date, few direct

51

Hox target linkages have been traced to their evolutionary beginnings. Expression of yellow in the male A5 and A6 segments required the gain of two binding sites for Abd-B

(Jeong et al., 2006), but it remains uncertain whether these binding events require cooperative interactions with Hox cofactors and which transcription factors are acting as collaborators.

The t_MSE presented an opportunity to study how a second Hox-responsive CRE evolved in parallel to the activity at yellow. In this study, we show that Abd-A and Abd-B respectively are necessary and sufficient for t_MSE regulatory activity. However, we show that the ablation of the resident Hox sites had little effect on this CRE‟s activity in the A5 and A6 segments, though mutations to nearby CRE sequences resulted in dramatically reduced activity. This result strongly implies that both Abd-A and Abd-B indirectly activate the t_MSE through a downstream factor or factors. While it can‟t be entirely ruled out that these factors are operating directly through other non-canonical

Hox sites, our gel shift assays did not provide convincing evidence that such sites exist.

While the Hox sites were not necessary for activation in the A5 and A6 segments, their ablation resulted in a drastic gain of regulatory activity in the A4 and A3 segments, a setting in which Abd-A is the only Hox protein present. This indicates that Abd-A is a direct repressor of t_MSE function in these anterior abdomen segments. The observed dichotomy in Abd-A function can be explained by at least two – not necessarily mutually exclusive - scenarios. First, in the A5 and A6 segments Abd-B may not act as a direct activator of the t_MSE but its occupancy of Hox sites might preclude the direct repressive effects of Abd-A. Secondly, Abd-A may interact cooperatively or

52

collaboratively with other transcription factors in the more anterior segments to impart repression. Our results with Hth support this second scenario.

The Hox co-factors Hth and Exd were prime candidates to mediate the context- dependent modulation of Abd-A activity. First, RNAi suppression of hth and exd expression each resulted in ectopic pigmentation (Rogers et al., 2014) and t_MSE activity in the male A4 and A3 segments (Figure 2.4). Furthermore, inspection of the t_MSE sequence revealed sites characteristic of Hth (AGACAG) and Exd (GATCAT) binding that reside in close proximity to Hox sites (Figure 2.6A). This site content and arrangement is strikingly similar to that found in an abdominal-repressive module for the

CRE controlling thoracic Distalless expression (Gebelein et al., 2002, 2004). Along a similar vein, we show that the ablation of the Hth-like site led to an anterior expansion in t_MSE activity similar to that induced by the Hox site mutations (Figure 2.6). This outcome supports the interpretation that the more recent origin of the t_MSE involved the formation of novel regulatory linkages with Hox proteins and Hox cofactors.

The origins of a network controlling a sexually dimorphic trait

Morphological traits result from the activities of gene regulatory networks, in which each network is governed by a trans-regulatory tier of transcription factors and cell signaling components that ultimately regulate the expression of a set of differentiation genes (Bonn and Furlong, 2008; Davidson and Erwin, 2006; Stathopoulos and Levine,

2005). For animals, the trans-regulatory genes are remarkably conserved (Carroll, 2008;

53

Carroll et al., 2005). It is plausible that the origin of new morphologies occurs through the formulation of new gene regulatory networks, while diversification and losses in traits would likely occur through the modification and dismantling of extant networks. The empirical evaluation of such trends of network evolution necessitates the study of trait evolution at the level of networks, CREs, and their encoded binding sites for multiple animal lineages, traits, and evolutionary time frames. The Drosophila pigmentation system is particularly well poised to make pioneering contributions to this growing body of knowledge.

The most recent common ancestor of monomorphic and dimorphic Sophophora lineages was inferred to have possessed monomorphic tergite pigmentation (Figure 2.1, node 1) (Jeong et al., 2006), in the context of an otherwise invariant morphological landscape, in which segment number and form has remained conserved at the genus level. Hence, the origin of this novel pigmentation trait may be expected to have co-opted spatial and sex-specific patterning mechanisms that shape the conserved abdomen features. Our comparative analysis of orthologous yellow and tan non-coding sequences indicate that these co-option events involved the origination of novel CRE activities that connected a trans-regulatory tier of Hox, Hox-cofactors, and the Bab proteins to these key differentiation genes that encoded pigmentation enzymes (Figure 2.7).

The patterns of regulatory activity for the orthologous tan and yellow sequences support some additional inferences about the early events in this dimorphic trait‟s origin.

While the t_MSE abdominal activity was strikingly lower in D. pseudoobscura and D.

54

willistoni, the D. pseudoobscura yellow body element was active (albeit with expanded activity). These outcomes support at least two evolutionary scenarios. One scenario is a sequence of events where the origination of the t_MSE and yBE in the lineage of D. pseudoobscura (Figure 2.1, node 2) was followed by a secondary loss of the t_MSE. This scenario is supported by our previous observation of dimorphic Bab expression in the D. pseudoobscura abdomen (Salomone et al., 2013), backing the notion that this species‟ broad pattern of monomorphic abdominal pigmentation evolved from a dimorphic ancestral state. For the other scenario, the body element-like regulatory activity of D. pseudoobscura could be due to this CRE‟s origin preceding (Figure 2.1, node 2) that of the t_MSE (Figure 2.1, node 3). Distinguishing between these two scenarios will require a more rigorous comparison of the pigmentation phenotypes and networks within the melanogaster and obscura species groups. The outcomes would provide a more nuanced understanding of the early evolutionary history for the derived sexually dimorphic pigmentation network.

Diversification and deconstruction of sexually dimorphic pigmentation

Tergite pigmentation evolution in the Sophophora subgenus has been relatively well-studied, and the accumulated results frame an extended perspective of trait evolution within a common network (Table 2.1). Trans-evolution at the bric-à-brac (bab) locus has been found to be a major driver for the diversification of female tergite pigmentation

(Bickel et al., 2011; Kopp et al., 2003; Rogers et al., 2013). This study, in addition to previous studies, indicates that trans-evolution at as of yet unidentified loci may have

55

played prominent roles in the diversification of male-limited tergite pigmentation (Jeong et al., 2008; Ordway et al., 2014). Regarding the repeated losses in male pigmentation, our results are consistent with a scenario where both trans- and cis-evolution occurred, though the targets of cis-evolution have alternated between tan and yellow (Jeong et al.,

2006, 2008). While cis-evolution has been identified for a case of monomorphic gain

(ebony) in tergite pigmentation (Rebeiz et al., 2009), and for a case of monomorphic loss

(ebony and tan) (Wittkopp et al., 2009), the full wealth of case studies portend to a more prominent role for evolutionary changes in the trans-regulatory tier of the pigmentation gene network. However, it is important to note that many of these case studies only assessed the activities of transgenes in D. melanogaster. While similarities in CRE activity might be indicative that expression divergence occurred through trans-evolution, it does not rule out the possibility that cis-changes occurred at other regions in the pigmentation enzyme gene loci, or that expression divergence results from combined cis- and trans-changes. In the future, it will be important to validate or reject the prominent role for trans-regulatory evolution by the reciprocal tests of CREs in species with the contrasting patterns of pigmentation. One study where CREs were tested in species with contrasting pigmentation phenotypes, showed that trans-regulatory evolution was a major driver for diversification of fruit fly wing spot patterns by modifying Distalless and wingless expression (Arnoult et al., 2013; Werner et al., 2010). Thus it appears the notion of a “conserved trans-landscape” requires more scrutiny.

56

Table 2.1 A pigmentation enzyme gene perspective of network evolution Trait (sex Species Divergence Change(s) Reference affected) Increased Tergite D.melanogaster Intraspecific cis (ebony) Rebeiz et al. 2009 Pigmentation trans (bab) trans (bab) Kopp et al. 2003 Increased Tergite trans(bab) Rogers et al., 2013 Pigmentation (F) D.melanogaster Intraspecific and cis Bastide et al., 2013 (tan) Dark/Light Tergite D. yakuba Interspecific trans (bab) Rogers et al., 2013 Pigmentation (F) D. fuyamai Loss of Tergite trans (?) Jeong et al., 2006 D. santomea Interspecific Pigmentation (M) cis (tan) Jeong et al., 2008 Loss of Tergite cis Jeong et al., 2006 D. kikkawai Interspecific Pigmentation (M) (yellow) Loss of Tergite trans (?) D. ananassae Interspecific This study Pigmentation (M) cis (tan) Expansion of trans (?) Tergite D. prostipennis Interspecific cis Ordway et al., 2014 Pigmentation (M) (yellow) Expansion of trans (?) Tergite D. malerkotliana Interspecific This study

Pigmentation (M) Retraction of trans (?) Tergite D. auraria Interspecific This study

Pigmentation (M) trans (bab) Williams et al., 2008 Gain of Sexual melanogaster cis Interspecific This study Dimorphism group (yellow)

and (tan) cis (tan and Light Body D. novamexicana Interspecific ebony) Wittkopp et al., 2009 Coloration

cis Gompel et al., 2005 Gain of wing spot oriental lineage Interspecific (yellow) Arnoult et al., 2013 cis Prud‟homme et al., Loss of wing spot oriental lineage Interspecific (yellow) 2006 Diversification of oriental lineage Interspecific trans (Dll) Arnoult et al., 2013 wing spot cis Novel wing spots D. guttifera Interspecific (yellow) Werner et al., 2010 trans (Wg)

57

In this study, and elsewhere, experiments indicate that pigmentation losses are associated with and perhaps result from both changes in the trans-regulatory tier and in the cis-regulatory regions of the yellow and tan genes (Table 2.1). Interestingly, some instances of trans-regulatory modifications that cause loss of gene expression appear to leave perfectly good CREs intact. Our data provides a second instance in which loss of expression occurred without the loss of the encoded CRE. The yBE was found to be conserved in D. santomea, which diverged from D. yakuba ~400,000 years ago (Jeong et al., 2006). The activity for this CRE has also remained for D. ananassae since its divergence from a pigmented ancestor. In contrast, D. kikkawai has lost pigmentation while still expressing tan in the abdomen through a perfectly active t_MSE. These results suggest that these CREs were maintained within the population for long periods of time, perhaps indicating additional functions that promote the preservation of these CREs‟ ancestral potential (Abouheif, 2008; Abouheif et al., 2014; Rajakumar et al., 2012).

Furthermore, the observed heterogeneity of changes in cis and trans to yellow and tan were at first surprising. However, our study of the binding site architecture at the yBE and t_MSE provided key clues as to why their evolution may often be uncoupled.

Coordinate Expression Patterns through Discordant CREs

The coordinated expression of genes is a ubiquitous theme in developmental biology. Gene expression is finely regulated during development through the activities of

CREs that are individually encoded as evolved combinations of transcription factor binding sites (regulatory logic). A compelling question is whether such synchronized

58

expression results from the independent evolution of CREs with similar logics. This question was previously pursued for CREs of regulatory genes coordinately expressed in the developing fruit fly neurogenic ectoderm (Erives and Levine, 2004). In this case, the coordinately activated CREs are encoded by a common regulatory logic, or a so called

“cis-regulatory module equivalence class” (Crocker et al., 2008). However, the neurogenic ectoderm CREs are deeply conserved, and arose in the distant past (over 230 million years ago).

The recently evolved male-specific expression patterns for tan and yellow present a case in which the evolutionary formation of coordinated regulation can be observed over shorter time-scales. Though both the t_MSE and yBE0.6 drive reporter expression in the dorsal A5 and A6 segment epidermis of males during late pupal development, we found their regulatory logic to be surprisingly dissimilar. Whereas the yBE0.6 is directly activated by Abd-B, our results indicate that the t_MSE is indirectly activated by Abd-B and Abd-A, and is directly repressed in more anterior body segments by Abd-A and seemingly Hth. Thus, this study provides an example that illustrates how coordinated expression evolved through the evolution of very different binding site architectures and logic.

The disparity of regulatory logic governing the yBE0.6 and t_MSE sheds light on the evolutionary tendencies of gene regulatory networks. The incipient stages of the dimorphic pigmentation network‟s origin involved the derivation of CREs that generate similar patterns through distinct combinations of binding sites. This evolutionary history

59

establishes a “branched” network in which several of the possible trans-regulatory alterations are incapable of generating coordinated shifts in co-expressed genes patterning. Hence, an emerging theme from the work in this system is that the differences in regulatory logic of yBE and t_MSE may necessitate changes in one CRE or the other, but is unable to be altered through a common trans regulator that influences both CRE‟s patterning. Future studies are needed to substantiate the occurrence and identity of the trans changes altering this network‟s structure. As other recently derived morphological traits are resolved to the level of binding sites within their networks, it will be instructive to see whether similar branched networks and paths of cis and trans evolution permeate their origin and diversification. The net results may reveal general principles of gene regulatory network evolution.

Materials and Methods

Fly stocks and genetic crosses

Fly stocks were maintained at 25oC on a sugar food medium that was previously described (Salomone et al., 2013). CRE sequences were obtained from the D. melanogaster (14021-0231.04), D. kikkawai (14028-0561.14), D. malerkotliana (14024-

0391.00), D. ananassae (14024-0371.33), and D. willistoni (14030-0811.24) species stocks from the San Diego Drosophila Stock Center, and D. biarmipes, D. auraria, and

D. pseudoobscura species stocks that were obtained from Dr. Sean B. Carroll. Tests for genetic interactions with Abd-B, abd-A, exd, and hth were done using the yBE0.6-EGFP

60

transgene that was inserted into the attP40 site on chromosome 2 (Markstein et al., 2008) and the yellow 5‟-EGFP and t_MSE-EGFP transgenes were inserted into the 51D attP site on chromosome 2 (Bischof et al., 2007). All other reporter transgenes used in this study were inserted into the attP2 site on chromosome 3 (Groth et al., 2004).

D. melanogaster stocks possessing the Abd-Biab9Tab (BDSC ID#8620) allele, and

UAS-abd-A RNAi (BDSC ID#28739), UAS-exd RNAi (BDSC ID#34897), UAS-hth

RNAi (BDSC ID#27655), UAS-luciferase control (BDSC#35788), and pnr-GAL4

(BDSC ID#3039) transgenes were obtained from the Bloomington Drosophila Stock

Center. The effects of ectopic Abd-B on tan and yellow loci CRE activities was observed for flies of genotype CRE-EGFP/+; Abd-Biab9Tab/+. The effects of reduced abd-A/exd/hth expression on CRE activity were observed for flies of genotype CRE-EGFP/+; UAS-gene specific RNAi/pnr-GAL4. The pnr-GAL4 stock has a chromosome where the GAL4 gene is inserted in the pannier (pnr) locus resulting in GAL4 expression in the dorsal-medial abdomen (Calleja et al., 2000).

in situ hybridizations

in situ hybridizations were carried out as previously described (Jeong et al., 2008).

Briefly, digoxigenin labeled riboprobes for yellow and tan were prepared through in vitro transcription of PCR templates amplified from each species (Table S1 for primers). Pupal abdomens were dissected at optimal time points for the visualization of yellow (70-80 h

APF) and tan (85-95 h APF). Probe hybridization was visualized through an anti-

61

digoxigenin antibody (Roche Diagnostics), detected by an alkaline phosphatase reaction using BCIP/NBT (Promega).

DNA sequence alignments

The contiguous genomic DNA sequence spanning from the first exon of Gr8a through the last exon (exon 8) of tan was obtained from the D. melanogaster genome version FB2013_05. Orthologous contigs were retrieved by BLAST searches of fruit fly genomes using the tan exon 8 as the query sequence (Clark et al., 2007; Richards et al.,

2005). The following GenBank accessions were identified as possessing the genome sequences orthologous to tan: D. biarmipes (AFPP01032826), D. yakuba (CM000162),

D. kikkawai (AFFH02000000), D. ananassae (AAPP01016557), D. pseudoobscura

(AADE01002480), and D. willistoni (AAQB01008786). Sequences were aligned to the annotated tan locus of D. melanogaster using mVISTA comparative genomics tool

(Figure S2) (Frazer et al., 2004). Sequence visualizations for the tan and yellow loci of D. melanogaster (Figure S1) were made using the GenePalette tool (Rebeiz and Posakony,

2004).

EGFP reporter transgenes

EGFP reporter transgenes were used as a surrogate for the endogenous expression of the yellow and tan genes. Reporter transgene were assembled by cloning CREs into the

AscI and SbfI restriction enzyme sites of the S3aG vector (Rogers and Williams, 2011).

62

Each reporter transgene includes a CRE sequence cloned 5‟ of a minimal hsp70 promoter and the coding sequence of the EGFP-NLS reporter protein (Barolo et al., 2004). All reporter transgenes analyzed were integrated into an attP landing site using ϕC integrase methods (Best Gene Inc.) (Groth et al., 2004). The primer pairs used to clone D. melanogaster yellow and tan CREs are presented in Table S2. The primer pairs used to clone orthologous CREs are presented in Table S3. Scanning mutant sequences for the yBE0.6 (Figure S5), t_MSE (Figure S7), and t_MSE2 (Figure S9), the t_MSE2 with putative Hox sites mutated (Figure S11), and the D. ananassae t_MSE were synthesized by GenScript USA Inc. These synthesized sequences were flanked by an AscI and SbfI restriction enzyme sites for cloning CREs into the S3aG vector.

Quantitative comparisons of the levels of EGFP reporter gene expression driven by the t_MSE2, t_MSE2 5i1, t_MSE2 SM6, t_MSE2 TTAT sites knockouts (KO), t_MSE2 TTAT +TAAT sites KO, and t_MSE2 Hth site KO CRE sequences were performed similar to that previously described for another CRE (Rogers and Williams,

2011; Rogers et al., 2013). For each transgene, EGFP expression was imaged from five independent replicate specimens using a confocal microscope with settings for which few pixels were saturated when EGFP expression was driven by the wild type t_MSE2. For each confocal image, a separate pixel value statistic was determined for the dorsal epidermis of the A3, A4, and A5 segments using the Image J program (Abràmoff et al.,

2004). For each reporter transgene, the regulatory activity was calculated as the mean pixel value and standard error of the mean (SEM). Activities reported in Figure 6 were normalized to the activity for the wild type t_MSE2.

63

Imaging of fly abdomens

Images of fruit fly abdomen pigmentation patterns were taken using an Olympus

SZX16 Zoom Stereoscope and Olympus DP72 digital camera. Specimens were prepared for 5-10 day old adults. Projection images for EGFP-NLS reporter transgene expression were generated with an Olympus Fluoview FV 1000 confocal microscope and software.

The regulatory activities of the t_MSE and t_MSE2 sequences were evaluated at ~90 hours after puparium formation (hAPF), a time point during pupal development when dimorphic expression of tan is first observed in the abdomen (Jeong et al., 2008). The regulatory activities for the yellow gene CREs were evaluated at ~85 hAPF, a time point when endogenous sexually dimorphic yellow expression is observed (Jeong et al., 2006).

In each figure comparing CRE activities, a representative image was selected from replicate specimens (n≥6) and that were processed through the same modifications using

Photoshop CS3 (Adobe). In situ hybridization images were taken on the same day using the same microscope and camera, and representative images were selected for Figure 1 and processed through the same modifications.

Gel shift assays

Reverse complementary oligonucleotides were synthesized (Integrated DNA

Technologies) that contain t_MSE2 sequence with wild type or mutant Hox sites (Table

S4). Each oligonucleotide was biotin-labeled on their 3‟ end using the DNA 3‟ End

64

Biotinylation Kit (Thermo Scientific) and complementary oligonucleotides were annealed by standard protocol. Labeling efficiency for each binding site was determined using a quantitative Dot Blot assay (DNA 3‟ End Biotinylation Kit, Thermo Scientific). Each probe was separately tested for binding with a GST-Abd-B DNA Binding Domain (DBD)

(Jeong et al., 2006; Williams et al., 2008) and GST-Abd-A DBD fusion proteins. The coding sequence for amino acids 136-209 of D. melanogaster abd-A was cloned 3‟ to that for GST in the EcoRI and NotI sites of the pGEX4T1 vector (Amersham). This abd-A sequence was amplified using the primers:

ACCGgaattcTGTCCACGAAGGCGCGGTCGC and

AGCCgcggccgcTCATTAGCGTCGCGCCTGTTCATTTATTTCC. All gel shift reactions included 20 fmol of one labeled binding site and GST-fusion protein in General

Footprint Buffer (50 mM HEPES pH 7.9, 100 mM KCl, 1 mM DTT, 12.5 mM MgCl2,

0.05 mM EDTA, 17% glycerol) with 400 ng/μl of poly (dI-dC) (Thermo Scientific). For each binding site, a reaction was done that included an amount of GST-fusion protein ranging from 3,000 ng down to 111 ng. For each binding site, a control reaction was done that lacked GST-fusion protein. Binding reactions were carried out for 30 minutes on ice and then separated by a 5% non-denaturing polyacrylamide gel for 2 hours at 200 V.

Reactions were then transferred and cross linked to a Hybond-N+ membrane (GE

Healthcare Amersham) for chemiluminescent detection using the Chemiluminescent

Nucleic Acid Detection Module and manufacture‟s protocol (Thermo Scientific).

Chemiluminescent images were taken using a BioChemi gel documentation system

(UVP).

65

Acknowledgments

Samples of non-D. melanogaster species stocks were kindly provided by the San Diego

Drosophila Stock Center and S.B. Carroll. Drosophila melanogaster RNAi lines and

GAL4 drivers were provided by the TRiP at Harvard Medical School (NIH/NIGMS R01-

GM084947). Stocks obtained from the Bloomington Drosophila Stock Center (NIH

P40OD018537) were used in this study. We thank B. Gebelein for kindly sharing technical advice, providing a cDNA for abd-A, and sharing oligonucleotide sequences for making the Dll DMX-R positive control binding site for Abd-A and a mutant version.

66

Supplementary Information

Figure S 2.1 Mapping the CRE sequences sufficient to drive male-specific tan and yellow expression. (A) To scale representation of the tan locus. The t_MSE is composed of the sequence between Gr8a and CG1537, and t_MSE1-3 are three truncated forms of this larger sequence. (B-F) EGFP-reporter transgene activity driven by tan locus sequences in transgenic D. melanogaster male pupae at ~95 hours after puparium formation (hAPF). (B) At ~90 hours hAPF, the D. melanogaster t_MSE sequence drives EGFP reporter transgene expression throughout the A5 and A6 segments of males. In order to determine whether the t_MSE could be reduced to a smaller sequence, we created three truncated versions. Of the three truncations, the centrally positioned (D) t_MSE2 drove a pattern of expression comparable to the larger t_MSE. (C) For the t_MSE1 and (E) t_MSE3 truncations, EGFP expression was lacking at ~90 hAPF. (F) Interestingly, at ~100 hAPF the t_MSE3 fragment drove reporter expression throughout the abdomen. Overall, these results demonstrate that the key regulatory inputs for the t_MSE are localized to a 351 base pair sequence referred to as t_MSE2. (G) To scale representation of the yellow locus. The yBE is composed of sequence 5‟ of yellow exon 1, 67

which contains two binding sites for Abd-B. yBE 1.1, yBE 0.9, and yBE 0.6 are three nested versions of this larger sequence. (H-K) EGFP-reporter transgene activity driven by yellow locus sequences in transgenic male pupae at 85 hAPF.

68

Figure S 2.2 Conserved synteny between tan and the upstream genes between which the t_MSE is located in D. melanogaster. Contiguous sequences containing the orthologous tan locus were identified from Sophophora species with sequenced genomes. A histogram plot of sequence conservation greater than 50% between the tan exon 8 region and the first exon of Gr8a. Conserved putatively non-exon sequences are shaded in salmon color, whereas conserved exon non-coding and coding sequences are shaded in teal and lavender color respectively. The location of the scanning mutant 5 region of the t_MSE is annotated between CG1537 and Gr8a. Though the t_MSE sequence is not deeply conserved in this comparison, synteny between tan, CG1537, and Gr8a was conserved since the most recent common ancestor of D. melanogaster and D. willistoni.

69

Figure S 2.3 The regulatory activity of the D. pseudoobscura sequence 5’ of the yellow gene. The abdomens of (A) male and (B) female D. pseudoobscura have a dark brown coloration. EGFP-reporter transgene activity driven by the D. pseudoobscura yellow 5‟ sequence in transgenic (C) male and (D) female pupae at ~85 hours after puparium formation. The level of EGFP expression is notably higher in male abdomens compared to those for females.

70

Figure S 2.4 Mapping the CRE architecture of the 5’ region of the D. willistoni yellow gene. (A) Two ~3kb partially overlapping sequences 5‟ of the D. willistoni yellow first exon were included into EGFP-reporter transgenes. These sequences collectively cover the ~5.5 kb of genomic sequence immediately 5‟ of yellow exon 1. Pattern of EGFP expression in the (B and C) abdomens and (B‟ and C‟) wings of transgenic D. melanogaster pupae at ~85 hours after puparium formation. The y wil 5‟1 sequence has a regulatory activity that drives a stripe pattern on the posterior regions of each abdomen segment, a pattern which is coincident with the pigmentation pattern on D. willistoni tergites. (B‟) This sequence also possesses strong CRE activity in the wing. (C and C‟) The y wil 5‟2 sequence lacked any noteworthy regulatory activities in the abdomen and wing.

71

Figure S 2.5 Mapping functional yellow regulatory sequences through CRE scanning mutagenesis. (A) Name and location of yBE0.6 scanning mutations and the wild type pattern of EGFP expression in transgenic D. melanogaster pupae. Scan mutations are indicated as red blocks and vertical blue lines indicates the position of two Abd-B binding sites that were not mutated in this analysis. (B-K) The EGFP expression pattern in the male abdomen at ~85 hours after puparium formation driven by yBE0.6 scan mutant sequences.

72

Figure S 2.6 Mapping functional tan regulatory sequences through CRE scanning mutagenesis. (A) Name and location of t_MSE scanning mutations and the wild type pattern of EGFP expression in transgenic D. melanogaster pupae. Scan mutations are indicated as red blocks and the region between the dashed vertical blue lines indicates the sequence considered to be the t_MSE2. (B-K) EGFP expression pattern at ~95 hours after puparium formation driven by t_MSE scanning mutant sequences in male abdomens.

73

Figure S 2.7 Fine-scale mapping of functional tan regulatory sequences through CRE scanning mutagenesis. (A) Name and location of t_MSE2 scanning mutations and the wild type pattern of EGFP expression in transgenic D. melanogaster pupae. Numbered blue blocks indicate the position of putative Hox-sites in the scanning mutant 6 region. Region between the vertical dashed red lines is the SM5 and SM6 regions that were identified as being necessary for t_MSE activity. The locations of scanning mutations in SM 5i1-SM 6i6 are indicated as red regions. The red boxes indicate the putative Hox-sites that were mutated in the t_MSE2 TTAT KO and TTAT+TAAT KO sequences. (B-O) EGFP expression pattern at ~95 hours after puparium formation driven by t_MSE2 mutant sequences in male abdomens. Compared to the regulatory of the wild type t_MSE2, the activity of the t_MSE2 SM5i1 sequence is 114±3% in the A5 segment.

74

Figure S 2.8 In vitro interactions between the DNA binding domains of Hox proteins and a known Hox site. (A-C) Gel shift assays between annealed oligonucleotide probes for a Dll CRE sequence and the GST-Abd-A DNA binding domain fusion protein and the GST-Abd-B DNA binding domain fusion protein. A mutant version of the probe was tested that possessed a mutation in the known Abd-A binding site. Binding reactions used increasing amounts of the GST-DNA binding domain fusion protein (from left to right: 0 ng, 111 ng, 333 ng, 1000 ng, and 3000 ng).

75

Table S 2. 1 Primers used to create in situ hybridization probes

Species gene Primer F Primer R

D. melanogaster tan GTYAAGGAGGAGCACTTYATGTCCCT taatacgactcactataggGCACTGATSGTRTTGATGCTGAAGACC

D. melanogaster yellow GGATTCCGGCCACTCTGACCTATA TCCGCTCAAGAAAATTGCGTAAAC

D. auraria tan CTGYTGGCCACCAGCAAYGTGGAYG CATGTGSCGSACATCYTGYTCGCTC

D. auraria yellow GGGGAYTGCGCSAACAGYATYACCAC TGGGRAABAGRTGGGGVCCRCTBG

D. kikkawai tan TCTGATGATCGACTCTAGCGTC taatacgactcactataggATTGTCCGAGTACAGGGACATG

D. kikkawai yellow TTACTCCTGGGAGCTGAACAAG taatacgactcactataggCCACAAAGTCATGGTAGCTGTC

D. malerkotliana tan YTCSAGCATCTCCAGGATGCARA taatacgactcactataggGCTCMGTCTCGCTGGGATTGTC

D. malerkotliana yellow CCTACATCAACATGGACCACAG taatacgactcactatagggagaGAARGTGTTCGGATTGGTGTCC

D. ananassae tan CAGGCCAATGAGCTGATGATTG taatacgactcactataggTGTCCGAGTACAAGGACATCGT

D. ananassae yellow CGATCTGAGGAACAATGCCTAC taatacgactcactataggATATATACGCCTTGGGCACCTC

D. pseudoobscura tan TTCGATCGCATTCACCAGGATC taatacgactcactataggTGTCCGAATAGAGCGACATGGT

D. pseudoobscura yellow AGGACAGCTACCACGACTTTGT taatacgactcactataggGTTCTCATCGATCTTCACGTCG

D. willistoni tan GGGTGATAAGCAAGAGTTGTTC taatacgactcactatagggagaTTACCATATCGAAGCCGACCTC

D. willistoni yellow CCGGAATTGATACCATATCCGG taatacgactcactatagggagaATCGGGGAAGAAGTACGAATGG Notes: 1) The lowercase letters in the table represent the T7 promoter that was added to the reverse primer for in vitro transcription. 2) Probes that lack these sequences were cloned into a pGEM vector, and the insert was amplified from the vector, upon which the probe synthesis reaction was performed to generate an antisense probe.

76

Table S 2. 2 Primers to clone D. melanogaster yellow and tan CREs

Transgene ~Size Primer Sequence y 5‟ 1 (wing/body) 2600 bb y -2869 Fwd ggcgcgccCGACTATTAAATGATTATCGCCCG y -269 Rvs cctgcaggGTTTGGTATGATTTTTGGCCTTCATC yBE 1.1 1100 bb BE2 Fwd ggcgcgccGTAAATACACCATTTCATTACACAAC BE5 Rvs cctgcaggTAATACATGACAGTTGTGTTCTGAG yBE 0.9 900 bp BE2 Fwd ggcgcgccGTAAATACACCATTTCATTACACAAC BE4 Rvs cctgcaggTACTATTAAATTGGAACTCGTGCTC yBE 0.6 600 bp BE2.5 Fwd ggcgcgccCTGTGGGTGCAATGATTTAGAATG BE3.5 Rvs cctgcaggGTTATTGGCAGGTGATTTTGAGC t_MSE mel 868 bp tan MSE deep F ggcgcgccCCATGGAAGCCGAGCACCTGGTAGA tan MSE deep R cctgcaggCTACAACGTRGGTCATGTNCAGGG t_MSE 1 374 bp tan MSE F ggcgcgccGCAGGACCCGACCCAGATGGCCGCTCAT tan_MSE-right-R cctgcaggAATGGTGCAAGAGTAAAATGCACTCA t_MSE 2 350 bp tan_MSE-mid-F ggcgcgccTGAAATAATAATAAATAATCAGAAT tan_MSE-mid-R cctgcaggTGTTTCAACTCAATCCTAGCAGTTGG t_MSE 3 373 bp tan_MSE-left-F ggcgcgccTTGAGAATTCAAGATCATAATATGCA tan MSE R cctgcaggCCAGTACAGTGGTGGGCCCTATCTGTAG

Notes: 1) „ggcgcgcc‟ and „cctgcagg‟ are sequences recognized respectively by the AscI and SbfI restriction endonucleases. These restriction enzyme sites were used to clone PCR amplified sequences into the S3aG EGFP reporter vector. 2) Degenerate positions included in primer sequences utilize the IUPAC nucleic acid code: R (A or G), N (A, C, G, or T), S (C or G), M (C or A), Y (C or T), K (T or G), and W (A or T).

77

Table S 2. 3 Primers used to create reporter transgenes with orthologous yellow 5’ and t_MSE sequences

Species Transgene ~Size Primer Sequence

D. willistoni y wil Left 3274 bp ywilLrgF ggcgcgccGGAAGGGGCCATCAAGGGTGAATAG

ywilLftR cctgcaggTTGGCCACATCACATCTTCGTCTCC

D. willistoni y wil Middle 3293 bp ywilMidF ggcgcgccGGGTTTCATTTCCTTCACGCCATTT

ywilMidR cctgcaggGACCCTGTTACAATTTCGGTTCTC

D. willistoni y wil Right 3024 bp ywilRtF2.0 ggcgcgccCCCGGCTGAGTGCATAAATTAGCC

ywilLrgR cctgcaggGTAGTATCCTCTTCTGTGAACCGTG

D. pseudoobscura y wb pse 2468 bp ywingF (pse) ggcgcgccCGATTATTAATCGATTACCAGTCGA

ybodyR(msg.pse) cctgcaggGTCTTCCATGATTGATTTTCACGCAT

D. auraria y wb aur 4273 bp y 5' new F2 ggcgcgccAGGATTAYCTNAATGTGGGAGACTATG

y 5' new R4 cctgcaggATCCYCTTCTGTGGACCGTGGC

D. malerkotliana y wb mal 3577 bp y wing ana group Fwd ggcgcgccGAGCGGAACTGGAGCTGTCAAGCGGT

y 5' new R4 cctgcaggATCCYCTTCTGTGGACCGTGGC

D. kikkawai y wb kik 4332 bp y 5' new F2 ggcgcgccAGGATTAYCTNAATGTGGGAGACTATG

y 5' new R4 cctgcaggATCCYCTTCTGTGGACCGTGGC

D. ananassae y wb ana 3493 bp y wing ana group Fwd ggcgcgccGAGCGGAACTGGAGCTGTCAAGCGGT

ybodyR(msg.pse) cctgcaggGTCTTCCATGATTGATTTTCACGCAT

D. melanogaster t_MSE mel 868 bp tan MSE deep F ggcgcgccCCATGGAAGCCGAGCACCTGGTAGA

tan MSE deep R cctgcaggCTACAACGTRGGTCATGTNCAGGG

D. pseudoobscura t_MSE pse 885 bp pse tan MSE F1 ggcgcgccACGCAGATGAAAGTGCAGGACG

pse tan MSE R1 cctgcaggTCAGTACAGTGGGCCCTATCTG

D. willistoni t_MSE wil 668 bp wil tan MSE F2 ggcgcgccCATGAAAGCCAAGCAACTGATAG

wil tan MSE R1 cctgcaggTGGCGGTTACCAATACAATGGAC

D. auraria t_MSE aur 1207 bp tMSE mon ori F2 ggcgcgccAGMCGCAGRTGRAACTGCAGGACC

tMSE mon ori R2 cctgcaggTGTGGGCCATGTCCAGGGCTACGG

D. kikkawai t_MSE kik 1084 bp D. kik t_MSE Fwd 2 ggcgcgccGCCTATGGGGAGGAGGATCCGGC

D. kik t_MSE Rvs 2 cctgcaggGCAGGAGAAACAGGCCCCAAGCCCGC

D. malerkotliana t_MSE mal 1009 bp D. mal t_MSE F1 GCTGCTGACGGAGTAGCTCC

D. mal t_MSE R1 GACCTACAGGGTGATCGAGTC Notes: 1) „ggcgcgcc‟ and „cctgcagg‟ are sequences recognized respectively by the AscI and SbfI restriction endonucleases. These restriction enzyme sites were used to clone PCR amplified sequences into the S3aG reporter vector. 2) Degenerate positions included in primer sequences utilize the IUPAC nucleic acid code: K (T or G), R (A or G), Y (C or T), N (A, C, G, or T), and M (C or A). 78

Table S 2. 4 Oligonucleotides used to make t_MSE gel shift assay binding sites.

Binding Sequence (5' to 3') Name Site Motif 3 GAGAATTCAAGATCATAATATGTATACTAA Probe 4 Top TTAGTATACATATTATGATCTTGAATTCTC Probe 4 Bottom Motif 3 Mutant GAGAATTCAAGATCGCCGTATGTATACTAA Probe 4 Top B.3 KO TTAGTATACATACGGCGATCTTGAATTCTC Probe 4 Bottom B.3 KO Motif 4 ATGTATACTAATTAGACAGTCTCTTTTTTT Probe 5 Top AAAAAAAGAGACTGTCTAATTAGTATACAT Probe 5 Bottom Motif 4 Mutant ATGTATACGCCGCGGACAGTCTCTTTTTTT Probe 5 Top A.8 KO AAAAAAAGAGACTGTCCGCGGCGTATACAT Probe 5 Bottom A.8 KO Motif 5 TTTTTATTACTTCAACTATTCAAATTT Probe 6 Top AAATTTGAATAGTTGAAGTAATAAAAA Probe 6 Bottom Motif 5 Mutant TTTTTGCCGCTTCAACTATTCAAATTT Probe 6 Top B.4 v2 KO AAATTTGAATAGTTGAAGCGGCAAAAA Probe 6 Bottom B.4 V2KO Dll-con AACTGTCCGCGGGAATGATTTATGGTCCCAAAT DllR-Con (Top) ATTTGGGACCATAAATCATTCCCGCGGACAGTT DllR-Con (Bottom) Dll-con-mut AACTGTCCGCGGGACGGCGTTCGGGTCCCAAAT DllR-Con-mut (Top) ATTTGGGACCCGAACGCCGTCCCGCGGACAGTT DllR-Con-mut (Bottom)

79

CHAPTER III

RED LIGHT, GREEN LIGHT: A NOVEL APPROACH TO STUDYING

INTERACTIONS BETWEEN CIS-REGULATORY ELEMENTS AND GENES

Abstract

cis-regulatory elements (CREs) control proper spatial and temporal expression patterns of genes through their interactions with specific gene promoter(s). Many CREs are situated at a distance from the target promoter, thus long distance CRE-promoter interactions must occur and be relatively common. Case studies have shown that such interactions can involve sequences within CREs (remote control elements) and near promoters (tethering elements), though few of these sequences have been localized. As a result, the mechanisms by which they interact remain poorly understood. One impediment to understanding CRE-promoter interactions is the method commonly used to study CREs, namely reporter transgenes. Transgenes place CREs directly adjacent to a heterologous promoter, thus eliminating the need for remote control and tethering elements. Here we report a novel dual reporter transgene system that allows for the study

80

of CREs situated proximal and distal to fluorescent reporter genes. We found that moving a fluorescent reporter distal lead to the loss of one CRE‟s ability to activate a heterologous promoter. This lost activity could not be recovered when a different synthetic promoter was utilized. We also tested three additional CREs of which one failed to activate a distal promoter, whereas the other two could. These results suggest that

CREs differ in their requirements for promoter interaction and reveal how this system can be used to map tethering and remote control elements.

Introduction

The precise spatial and temporal patterning of gene expression is a fundamental feature of embryonic development (Lagha et al., 2013; Small et al., 1992). These patterns of expression are directed by cis-regulatory elements (CREs) that function as either transcriptional enhancer or silencer elements directing the initiation of transcription at the promoter of a regulated gene or genes (Bonn and Furlong, 2008; Davidson, 2010). The genes of metazoans are often expressed in many different cell types and at various developmental time points and these different spatial and temporal domains of expression are often regulated by distinct CREs. Thus, the cis-regulatory regions of genes can be vast, where some CREs are located (proximal) near the promoter region of a gene or at a

(distal) distance (Kvon et al., 2014). For example the CRE that drives Sonic hedgehog

(Shh) expression in the zone of polarizing activity of the mouse embryonic limb bud resides over 1 Megabase (Mb) away from the Shh promoter within the lmbr gene locus

(Lettice et al., 2003). Therefore the mechanism of initiating limb bud expression involves

81

this CRE identifying and interacting with the Shh promoter through the formation of a chromosomal loop (Amano et al., 2009). The formation of a chromosomal loop also occurs for the regulation of gene expression by interactions between the β-globin Locus

Control Region (LCR) and all the β-globin genes, such as β –major and β –minor

(Tolhuis et al., 2002). This loop involves the transcription factor GATA1, a , binding the LCR then recruiting the Ldb1 protein to aid in formation of the chromosomal loop (Deng et al., 2012). Although the β-globin locus has provided a detailed example of long distance regulation, it remains challenging to find the DNA sequences involved in other interactions. High-throughput studies characterizing looping conformations within (Reviewed in Dekker et al., 2013) suggest that long distance gene regulation is common. This form of regulation is not only relevant to development, but the consequences of mutations in these interacting sequences can have effects on health.

For example, the human FTO locus harbors a nucleotide variant that prevents an enhancer from activating the genes Irx3 and Irx5 that are located at a distance of ~0.5 Mb and 1 Mb respectively. The loss of expression of these genes results in increased white adipocytes, which is associated with increased obesity (Claussnitzer et al., 2015). Long distance regulation has evolutionary implications, as several evolved patterns of gene expression have been traced to CREs that are located at a distance to the target promoter of regulation. This includes the dimorphic element and t_MSE which control sex-specific patterns of gene expression involved in fruit fly pigmentation (Camino et al., 2015; Jeong et al., 2006; Williams et al., 2008), and HARE5 which controls an evolved pattern of human neocortical expression (Boyd et al., 2015).

82

With the broad importance of long distance gene regulation to development, health, and evolution, it is important to understand the mechanisms involved in establishing interactions between CREs and promoter regions. Seminal studies identified two types of sequences that facilitate these interactions. One is a tethering element (TE), which can reside proximally to a transcription start site and that is required for interaction with a distal CRE (Figure 3.1A) (Calhoun and Levine, 2003; Calhoun et al., 2002). A second type of sequence is referred to as remote control elements (RCEs), sequences embedded within a CRE and which are necessary for the CRE to interact with a distal promoter (Figure 3.1B) (Swanson et al., 2010).

Figure 3.1 Gene regulation via long distance CRE-promoter interactions. (A) A short repeat motif sequence known as a “Tethering Element” located in a promoter-proximal region can facilitate interaction with a distal CRE. The T1 CRE bypasses the proximal ftz gene promoter to interact with the Scr gene promoter that is >15 kilobase pairs away (Calhoun et al., 2002). (B) A “remote control element” sequence within the Sparkling CRE is required to activate the cone cell pattern of expression seen for the dPax2 gene. Sparkling resides in the 4th intron of D. melanogaster dPax2 gene (Swanson et al., 2010). With many CREs located at a distance from their target promoters it is possible that remote control elements (RCE) and tethering elements (TE) are a common feature of gene regulation to bring (C) distantly-located regulatory sequences into (D) close proximity to a target promoter (black arrow) to activate gene expression.

83

Whether tethering elements and CRE-embedded remote control elements are common remains unknown as often these elements are not sought out or identifiable by conventional methods. Specifically, the typical use of reporter transgenes places a CRE sequence immediately 5‟ of a minimal heterologous promoter. This architecture eliminates any requirement for a looping interaction and mutations in remote control elements would likely have no impact on the reporter gene‟s expression. Thus, we sought to develop a reporter transgene vector system that would allow us to simultaneously study the capability of a CRE to regulate a proximal and distal reporter gene. This allows for tests of CRE and promoter sufficiency to communicate over a distance and presents a platform where mutations can be introduced to find sequences necessary for such communication. Here we report the optimization of such a system for use in transgenic

Drosophila (D.) melanogaster and test this system with four different CREs and two heterologous promoters.

Results

A dual reporter transgene system to study long distance gene regulation

As an attempt to facilitate the identification of tethering elements and remote control elements we conceptualized and constructed a dual fluorescent reporter system called Red Light Green Light that can simultaneously test the regulatory capability of

CREs when they are simultaneously proximal to one fluorescent reporter gene and distal to a second (Figure 3.2A).

84

In the Red Light Green Light system a CRE can be cloned into the AscI and SbfI restriction enzyme sites that are situated between the coding sequences for two fluorescent reporter genes. In the default state, both the proximal and distal reporter genes possess a D. melanogaster Hsp70 minimal promoter. The coding sequence of the proximal reporter is the enhanced green fluorescent protein gene in-frame with the coding sequence for the nuclear localization sequence (NLS) of the tra gene on the 3‟ end (Hedley et al., 1995), which we refer to as EGFP-NLS. For the distal reporter gene we initially used the DsRed.T4-NLS coding sequence that similarly includes a 3‟ sequence for the tra NLS (Barolo et al., 2004). In this vector (called pRLGL0) the CRE is located at an equal distance to the promoter of each reporter gene. We then created modified versions that possessed an added spacer sequence (with no apparent regulatory function) of 1, 2, 4, and 8 kilobase pairs (kb) between the DsRed.T4-NLS gene and the

CRE. The vectors are called pRLGL1, pRLGL2, pRLGL4, and pRLGL8. These vectors with the added spacer sequence allow for the simultaneous testing of a CRE‟s regulatory capability on proximally and distally located promoters. Moreover, the distal Hsp70 promoter is flanked by unique StuI and BamHI restriction sites, so this promoter can either be replaced by another promoter or supplemented with additional promoter proximal sequences. It should also be noted that the distal 3‟ UTR can be removed by the flanking EcoRI and FseI restriction endonuclease sites in the event that another 3‟ UTR or mutated version needs to be tested.

85

Figure 3.2 Design of the Red Light Green Light dual reporter transgene system. (A) A cis- regulatory element (CRE) can be situated between green and red fluorescent reporter genes each with a minimal promoter. The red fluorescent reporter can be shifted distal to the CRE by the inclusion of spacer sequences of 1, 2, 4, and 8 kilobase pairs (kb). The CRE, promoter sequences, reporters, and 3‟UTRs are all flanked by unique restriction endonuclease sites allowing any of these sequences to be replaced for another sequence; grey dashed lines represent the locations of unique restriction enzyme sites. (B) Representation of scanning mutagenesis approach that can be used to identify remote control element-like sequences in a CRE. The red blocks represent sequences that have been modified by introduced mutations. (C) An example of potential outcomes for reporter gene expression in the dual-reporter system for a CRE that is active in the D. melanogaster posterior abdomen. Type 1 – Equivalent fluorescent protein expression from both the proximal and distal reporters, indicating that the CRE can regulate the reporter gene at a distance. Type 2 – Expression is seen solely for the proximal reporter gene, this indicating that the CRE or its mutant form is not capable of activating the distal reporter gene. Type 3 – No expression is observed for either reporter gene, which would be anticipated when a mutation destroyed a sequence necessary for the CREs spatial or temporal domains of regulatory activity.

86

Our planned use for these dual reporter transgene vectors was to first see whether a particular CRE can mediate its characteristic regulatory activity upon both proximal and distal reporter genes, what we refer to as Type 1 outcome (Figure 3.2C). It is also conceivable that a CRE would only be capable of activating expression of a proximal reporter gene, what we call a Type 2 outcome (Figure 3.2C). If a Type 2 outcome occurs, then modified vectors could be made that include additional sequences to find those that rescue long distance gene expression regulation. This could include replacing the heterologous Hsp70 promoter with the endogenous promoter of the gene that a CRE regulates, adding endogenous promoter proximal sequence next to the distal promoter, or testing an expanded CRE region.

Once a CRE-promoter pair has been identified, where the regulatory activity is imparted on proximal and distal reporters, it then would become feasible to use CRE mutations to map the sequences required for the remote control element activity. For instance, a series of scanning mutant CREs could be made where each mutant includes a unique block of base pairs that were altered by non-complementary transversions (Figure

3.2B). These mutant versions could then be evaluated for the capability to regulate the expression of the proximal and distal reporter genes. Mutations altering non-functional sequences should lead to a Type 1 outcome. Mutations altering sequences that encode aspects of the CREs spatial and temporal regulatory activities should result in a Type 3 outcome where both proximal and distal reporter expression is altered. The ideal outcome would be where only the distal reporter‟s expression is compromised, a Type 2 outcome, indicating that the mutations specifically altered a remote control element. A similar

87

mutagenesis approach could be applied to promoters and promoter proximal sequences in order to identify sequences functioning as tethering elements.

Testing the effects of the spacing between a CRE and a distal reporter gene

In a typical reporter transgene, a CRE is placed immediately adjacent to a heterologous promoter, such as Hsp70 promoter of D. melanogaster (Rebeiz and

Williams, 2011). However, few studies have systematically evaluated the effect that distance between a CRE and a promoter has on the ability to activate reporter gene expression. We decided to evaluate the regulatory activity of the CRE known as the dimorphic element (Rogers and Williams, 2011; Rogers et al., 2013; Williams et al.,

2008) at various distances from a fluorescent reporter gene (Figure 3.3). This CRE has been shown to drive the expression of a proximally located fluorescent reporter in the abdominal dorsal epidermis of female transgenic D. melanogaster pupae. In this experiment we manipulated the distance of this CRE from the Hsp70 promoter of the distal DsRed.T4-NLS reporter. When there was no spacer sequence between the CRE and the distal reporter, we observed robust green and red fluorescence (Figure 3.3A and

3.3A‟). This outcome indicates that the dimorphic element can activate both reporter genes simultaneously. However, when spacers of 1, 2, 4, and 8 kb were included between the CRE and DsRed.T4-NLS reporter gene, we saw a progressive reduction in red fluorescent protein expression (Figure 3.3A‟-3.3E‟). Notably, there was little-to-no expression observed for the version with the 8 kb spacer (Figure 3.3E‟). This suggests that 8 kb of spacer sequence was a sufficient impediment to a functional interaction

88

between the dimorphic element and the distal Hsp70 promoter. Interestingly, we observed a progressive, albeit less severe, reduction in green fluorescence (Figure 3.3A-

3.3E). This decline in green fluorescence occurred even though the distance between the dimorphic element and the proximal Hsp70 promoter remained constant. One possible explanation for this outcome is that some of the expressed DsRed.T4-NLS protein emits green fluorescent light rather than red. This outcome is supported by a previous publication revealing that some DsRed protein is trapped in a green fluorescent light emitting form (Baird et al., 2000). Importantly though, these data show that an 8 kb spacer sequence was suitable to interrupt the relaying of regulatory activity of the dimorphic element to a heterologous promoter in a D. melanogaster transgene system.

However, DsRed.T4-NLS seems less than ideal as a reporter to use in conjunction with

EGFP-NLS.

89

Figure 3.3 The effects of CRE-promoter spacing on the expression of proximal and distal reporter genes. (A-E) Green fluorescent and (A’-E’) red fluorescent light detected from reporter transgenes with a proximal EGFP-NLS reporter at a consistent distance from the dimorphic element and a DsRed.T4-NLS reporter positioned at varying distances of 0, 1, 2, 4, and 8 kilobase pairs from the dimorphic element.

The abilities of differing CREs to regulate a distal reporter gene

While the dimorphic element lacked the ability to impart its regulatory activity on a Hsp70 promoter at an 8 kb distance, it remained a possibility that other CREs possessed differing abilities to act over a distance. Thus, we tested three additional D. melanogaster

CREs that are active during pupal development. We first tested the tan_Male Specific

Element 2 (t_MSE2), which drives reporter gene expression in posterior dorsal abdominal segments of D. melanogaster male pupae (Camino et al., 2015). The t_MSE2 resides ~3 kb from the promoter of the tan gene (Figure 3.4A) and is situated between the

Gr8a and CG15370 genes that it is not known to regulate. This genomic arrangement 90

suggests that a mechanism exists by which the t_MSE2 specifically interacts with the tan gene. When the t_MSE2 was included in the pRLGL8 transgene, we found that it drives proximal reporter expression in the male A5 and A6 segments (Figure 3.4C). Similar to the dimorphic element, the t_MSE2 had little-to-no ability to activate expression of the distal reporter (Figure 3.4B). At least two explanations exist for this outcome. One being that a remote control element exists in a sequence flanking t_MSE2. The second is that a tethering element located proximal to the tan gene promoter exists and is needed for the t_MSE2 to activate expression over a distance.

Figure 3.4 Test of long distance regulatory activity for several D. melanogaster CREs. (A) tan gene locus, t_MSE2 is situated between CG15370 and Gr8a genes and 3239 base pairs (bp) from the tan gene promoter. (B and C) The regulatory function of the t_MSE2 drives little-to-no expression from the distal reporter and robust expression from the proximal reporter. (D) In the yellow gene locus, the yBE0.6 located 811 bp proximal of yellow promoter. (E and F) The regulatory function of yBE0.6 drives low levels of distal reporter and robust expression of the proximal reporter. (G) Within the bab gene locus is the leg and antennal enhancer (LAE). The LAE is located 27,907 bb and 46,850 bb from paralogous bab1 and bab2 gene promoters respectively. (H and I) The LAE drives similar levels of expression for the proximal and distal reporters.

91

The yellow Body Element0.6 (yBE0.6) CRE has been shown to drive expression of an adjacent reporter transgene in the posterior dorsal abdominal segments of the male abdomen during pupal development. This pattern mimics the endogenous expression of the yellow gene at this time point (Camino et al., 2015). The yBE0.6 sequence resides

811 base pairs (bp) upstream of the yellow gene‟s promoter (Figure 3.4D). In the pRLGL8 construct the yBE0.6 drove the proximal EGFP-NLS reporter in the male abdomen (Figure 3.4F). This CRE also activated the distal reporter gene, albeit with expression levels noticeably weaker than that occurring from the proximal reporter gene

(compare Figure 3.4E to 3.4F). This outcome indicates that within this CRE sequence of

632 bp resides a motif or motifs that can impose regulatory activity upon a promoter that is displaced by 8 kb. The existence of such a motif might be identifiable by subjecting the yBE0.6 to scanning mutations and dissecting any motifs as sequences that result in a

Type 2 outcome when mutated (Figure 3.2C).

The Leg and Antennal Enhancer (LAE) is a CRE that resides in the intergenic region between the paralogous bab1 and bab2 genes (Baanannou 2013) and drives expression of these paralog genes in the leg and antenna of D. melanogaster. This CRE resides ~30 kb and ~50 kb from the bab1 and bab2 gene promoters respectively (Fig 4G).

The LAE can drive the expression of the proximal EGFP-NLS reporter in the developing legs of transgenic pupae (Figure 3.4I). Interestingly, this CRE also drove expression of the distal DsRed.T4-NLS reporter in a similar pattern and levels (Figure 3.4H). This indicates that this CRE encodes a regulatory activity that can be conveyed to a

92

heterologous promoter over an 8 kb distance. Of the CREs we tested the LAE provides the best candidate for the identification of a RCE motif or motifs.

Test of flanking CRE sequence and heterologous promoter type on long distance gene regulation

One possible reason why the dimorphic element failed to activate the expression of the distal reporter gene was that the Hsp70 promoter lacked an element or elements necessary for interacting with this CRE. Thus, we replaced the distal Hsp70 promoter with the Drosophila synthetic core promoter (DSCP). The DSCP was created as a minimal promoter that would be capable of interacting with CREs from diverse D. melanogaster genes and drive reporter transgene expression. The DSCP contains a TATA box, initiator element, downstream promoter element and a motif ten element (Pfeiffer et al., 2008). This promoter‟s initial use was as part of traditional transgenes where the

CREs are situated adjacent to this promoter, thus precluding any need for long distance communication. In our dual reporter system the dimorphic element failed to activate expression of the DSCP DsRed.T4-NLS transgene when the distance between the CRE and promoter was 8 kb (compare Figure 3.5Ato 3.5B). The failure of the dimorphic element to activate expression of a distal reporter transgene with the Hsp70 or DSCP promoter might be explained by the dimorphic element having been truncated to the point where it lacks sequence encoding a remote control element. To test this hypothesis, we added ~500 (bp) of the endogenous bab locus sequence that flanks each side of the minimal dimorphic element (called expanded DE). However, this expanded CRE version

93

failed to convey the regulatory activity of the dimorphic element to the distal Hsp70 promoter (compare Figure 3.5C to 3.5D). This indicates that other cis-acting sequences are needed for this CRE to activate the expression of a gene positioned at a distance.

Moreover, the expression output from the proximal promoter also appeared to be reduced. This seemingly is due to the inclusion of ~400 bp of flank sequence between the core dimorphic element and the Hsp70 promoter of the EGFP-NLS reporter.

Figure 3.5 Attempts to rescue distal reporter gene expression through the addition of flanking CRE sequence and the utilization of an optimized Drosophila promoter. (A and B) The dimorphic element was unable to activate the expression of a distal reporter that possessed the Drosophila synthetic core promoter, which contains numerous Drosophila promoter sequence features (Highlighted nucleotides). (C and D) The additional 487 bp and 402 bp endogenous sequence added to the DE does not improve its ability to drive expression of the reporter gene, whereas proximal reporter expression appeared to be slightly reduced. The lost proximal expression likely stems from the introduction of 402 bp between the dimorphic element and the Hsp70 promoter.

94

Identifying a fluorescent reporter to be used in conjunction with EGFP-NLS

While the red fluorescence of DsRed.T4-NLS worked well as a read out of long distance transcriptional activation, it had a less than ideal effect on the expression of the proximal EGFP-NLS reporter. Thus we sought to identify a better fluorescent protein to pair with EGFP-NLS. We had synthesized the coding sequences for several fluorescent proteins in-frame with the coding sequences for a C-terminal tra nuclear localization sequence (Hedley et al., 1995). Our goal was to identify a nuclear-localized fluorescent protein whose signal is easy to detect and that does not noticeably overlap with that for

EGFP-NLS. Among the fluorescent proteins we selected and tested were mCherry-NLS, mCerulean-NLS, and E2-Crimson-NLS (Figure 3.6). We suspected that fluorescence of mCherry-NLS would be best detected using modestly red-shifted settings and to a lesser extent far-red settings, whereas the mCerulean-NLS and E2-Crimson-NLS would only be detected using blue shifted and far-red shifted settings respectively. E2-Crimson-NLS was of high interest as the published emission spectrum for E2-Crimson is the most distinct from that for EGFP (Strack et al., 2009). However, we did not know whether this protein results in a immature green light emitting form as seen for DsRed (Baird et al.,

2000) and DsRed.T4-NLS (Barolo et al., 2004) (Figure 3.3).

For testing fluorescent properties of the newly synthesized reporters we coupled them to an Hsp70 minimal promoter and the yBE0.6 CRE that drives a male-limited pattern of expression in the pupal dorsal epidermis of the abdominal segments of transgenic D. melanogaster (Camino et al., 2015). Optimal excitation and emission

95

settings were identified for each of the four fluorescent reporters (Table 3.1), and transgenic pupae with the different fluorescent reporters were imaged at each of the optimal settings (Figure 3.6). EGFP-NLS was detected using the 488 nm laser with excitation filters 480-595 nanometer light and emission filters limiting the collected signal to 500-530 nanometer light. While little-to-no fluorescence was detected from

EGFP-NLS when using the red and far-red settings, the male A5 and A6 expression was seen with the blue-shifted settings (Figure 3.6A-3.6A‟‟‟). This outcome indicates that

EGFP-NLS and mCerulean-NLS are not an ideal pair of fluorescent proteins to utilize in our dual reporter transgene experiments, even though the mCerulean-NLS signal was only observed with the blue shifted settings (Figure 3.6B-3.6B‟‟‟).

96

Figure 3.6 Comparison of fluorescence properties of various fluorescent reporters when regulated by a CRE. Transgenic D. melanogaster were made that possessed the yBE0.6 CRE driving the expression of fluorescent reporters with an Hsp70 minimal promoter. The reporters included (A-A’’’) EGFP-NLS, (B-B’’’) mCerulean-NLS, (C-C’’’) mCherry-NLS, and (D-D’’’) E2-Crimson-NLS. For all transgenic fluorescent reporters, male pupae were imaged at settings optimized for blue, green, red, and far-red light.

97

m-Cherry is a commonly utilized fluorescent protein in biological experimentation, and it possesses a red-shifted emission spectra compared to EGFP. E2-

Crimson has a reported far-red emission spectra, though it has only recently been developed and characterized (Strack et al., 2009) and to our knowledge it has not been used previously in fruit flies. We found that our mCherry-NLS and E2-Crimson-NLS reporters had noteworthy expression when using the red-shifted and far red-shifted settings respectively (Figure 3.6C-3.6C‟‟‟ and 3.6D-3.6D‟‟‟). While both reporter proteins seemed compatible for use with EGFP-NLS in dual reporter experiments, we opted to further utilize E2-Crimson-NLS as its signal seemed easier to detect among replicate specimens.

EGFP-NLS and E2-Crimson-NLS provide specific read outs on proximal and distal reporter gene expression

With E2-Crimson-NLS having fluorescent excitation and emission spectra distinct from EGFP-NLS, we sought to see whether it performs equally well in a dual reporter transgene context. Thus, we replaced the DsRed.T4-NLS coding sequence in the pRLGL0, 2, 4, and 8 kb vectors with the E2-Crimson-NLS coding sequence. When these dual reporters were site-specifically integrated in D. melanogaster, we observed a progressive decrease in far red fluorescence as the E2-Crimson-NLS reporter was moved further distal to the CRE (Figure 3.7A‟-3.7D‟). However, the green fluorescence remained relatively more consistent (Figure 3.7A-3.7D), suggesting that in this dual reporter system green light is predominately due to the EGFP.NLS reporter and far red

98

light from the E2-crimson-NLS reporter. Future pursuits with the Red Light Green Light system would benefit from utilizing the vectors dual reporter with E2-Crimson-NLS as the distal reporter.

Figure 3.7 E2 -Crimson-NLS and EGFP-NLS reporters provide optimal readouts of distal and proximal regulatory activities of a CRE. The regulatory activities of the dimorphic element (CRE) on proximal and distal promoters were evident when using EGFP-NLS and E2- Crimson-NLS reporters. (A and A’) At equal reporter spacing the dimorphic element drives identical patterns and comparative levels of EGFP-NLS and E2-Crimson-NLS reporter expression. (A-D) When spacer sequence of 2, 4, and 8 kb were situated between the CRE and the E2-Crismon-NLS reporter gene, the expression observed from the more proximal EGFP-NLS was consistent. (A’-D’) Conversely, expression seen from the E2-Crimson-NLS reporter declined proportional to the length of spacer sequence.

Discussion

Here we have shown a customized Drosophila reporter transgene system called

Red Light Green Light that makes possible the simultaneous testing of a CRE‟s capability 99

to activate the expression of two reporter genes, one whose promoter is proximal to the

CRE and the other distal. Using the CRE known as the dimorphic element, we showed that this CRE can similarly activate two fluorescent reporter transgenes when they are at equal proximal positions. However, when one of the reporters is moved more distal to the

CRE (starting at 1 kb) the level of expression declines until it can no longer observed (at a distance of 8 kb). We tested three different D. melanogaster CREs, one that could not activate the expression of an 8 kb distal reporter, one that could activate the distal reporter at a low level compared to the proximal reporter, and one that could activate the distal reporter at a level comparable to the proximal reporter. These results reveal how different CREs have differing capabilities to activate gene expression from a distally located heterologous promoter. For the dimorphic element, we showed that its inability to activate the 8 kb distal reporter gene could not be overcome by adding either endogenous

CRE flanking sequence or replacement of the distal promoter with a second synthetic promoter that was designed for use in identifying diverse Drosophila CREs. These outcomes suggest that the dimorphic element may require its endogenous promoter to activate expression at a distance. We found that our initial distal fluorescent reporter

(DsRed.T4-NLS) protein emits considerable green light that cannot be distinguished from the green light emitted by the proximal EGFP-NLS fluorescent reporter protein. We tested three other fluorescent reporter proteins and found that the far-red shifted E2-

Crimson-NLS reporter protein makes an excellent distal reporter to pair with the proximal EGFP-NLS. In contrast to DsRed.T4-NLS, we saw no evidence of E2-Crimson-

NLS producing green light. Collectively, this study presents a Drosophila system that is ideal for advancing an understanding of how spacing between a promoter and a CRE

100

effects gene expression activation. Moreover, this system can be utilized to map CRE and promoter sequences that are necessary for long distance gene expression regulation.

When does gene regulation become long distance?

An initial question we sought an answer for was how much distance between a

CRE and a distal reporter transgene promoter was sufficient to cause a loss of transcriptional regulation? To answer this question we chose the dimorphic element of D. melanogaster as our first test CRE. The endogenous function of this CRE is to control the female-specific expression of the bab1 and bab2 genes in the A5-A7 segments of the pupal abdomen (Williams et al., 2008). This CRE is situated in the large first intron of the bab1 gene, at a distance of ~16 and ~92 kb from the promoters for the bab1 and bab2 genes respectively. We suspected that since this CRE is naturally positioned at a distance from its target promoters that it may possess a “remote control element” (Swanson et al.,

2010) that enables it to impart its regulatory activity over a great distance. To our surprise, we found that this CRE‟s ability to activate the expression of a heterologous promoter began to decline when the distance of separation was 1 kb (Figure 3.3). At a distance of 4 kb, this CRE‟s activity was further reduced, and at 8 kb we saw little-to-no expression from the distal reporter gene. Thus for the dimorphic element, and in this transgenic context, 8 kb was enough distance to sufficiently impede reporter gene expression activation. This 8 kb distance was also sufficient to impede the D. melanogaster t_MSE2 CRE from imparting its male-specific regulatory activity (Camino et al., 2015) on a heterologous promoter (Figure 3.4). The endogenous position of this

101

CRE is between two genes that it is not known to regulate, and at a distance of ~3 kb from the tan gene‟s promoter. We also tested the activity of the yBE0.6 and LAE CREs for the ability to activate the distal reporter at an 8 kb distance (Figure 3.4). The endogenous position of the yBE0.6 is ~1 kb upstream of the yellow gene promoter from which it drives a male-specific pattern of pupal abdomen expression (Camino et al.,

2015). The LAE is located ~30 kb from the bab1 promoter and ~50 kb from the bab2 promoter, from which the CRE drives leg and antennal expression of the two paralogous bab genes (Baanannou et al., 2013). Interestingly, the yBE0.6 was able to drive a low- level of expression from the distal reporter even though this CRE is naturally located a close distance to its promoter. In contrast to the dimorphic element, the LAE was able to robustly activate the expression of a distal reporter.

Our results have several noteworthy implications. First, it is clear that CREs can possess differing abilities to activate gene expression from a minimal promoter when at a distance of 8 kb. While many CREs are at an even greater distance in vivo (Kvon et al.,

2014), this transgene context with 8 kb of CRE to promoter separation seemingly provides a useful compromise for mechanistic studies. Second, in our study 2 of 4 CREs tested indicated that 8 kb is effectively long distance for a reporter transgene. In a seminal study, it was shown that the sparkling CRE possessed a “remote control element” sequence that was necessary to impart the cone-cell pattern of gene expression regulation on a reporter transgene at a distance of ~0.8 kb (Swanson et al., 2010). For the dimorphic element, we observed only a subtle decrease in expression at a distance of 1 kb. If this was the only distance we tested, we would have been left to conclude that this CRE

102

possesses a remote control element that bridges the distance gap between the CRE and promoter. However, at a distance of 8 kb the absence of distal reporter expression indicates that whatever remote control element-like activity the dimorphic element possesses is overwhelmed at a greater distance. We suggest that future studies interested in D. melanogaster long distance gene regulation utilize the Red Light Green Light vector with an 8 kb spacer between the CRE and distal promoter. Moreover, it might be prudent to make this reporter vector the new standard for testing the activity of a D. melanogaster

CRE, as it provides not only a read out of long distance regulation, but a read out from a more traditional proximal reporter as well (Figure 3.2).

Differing abilities of CREs to interact with a distal heterologous promoter

The Hsp70 promoter is commonly utilized in reporter transgene experiments where a CRE is situated immediately adjacent to it (Barolo et al., 2004; Rebeiz and

Williams, 2011; Rogers and Williams, 2011). We show here that this minimal promoter can be efficiently regulated over a distance by some but not all CREs. One possible explanation for these outcomes is that some CREs, like the LAE (Figure 3.4), possesses within it a remote control element, whereas other CREs do not, like the dimorphic element. For the dimorphic element we suspected that perhaps when this element was first characterized in a traditional reporter transgene, that long distance regulation was not required and perhaps the remote control element got removed in the process of identifying the minimal sufficient sequence needed to activate a proximal reporter transgene (Williams et al., 2008). However, when we added back 487 and 402 base pairs

103

of endogenous flanking sequence to the sides of the minimal dimorphic element, we saw no noteworthy improvement in the ability of this larger CRE to activate the 8 kb distal reporter (Figure 3.5). This suggests that either a remote control element exists but in more distant bab locus sequence, or that the dimorphic element possesses a remote control element which cannot interact with the minimal Hsp70 promoter. To test this latter possibility, we replaced the distal Hsp70 promoter with the Drosophila synthetic core promoter, called the DSCP (Pfeiffer et al., 2008), and found that the dimorphic element also could not activate this promoter at an 8 kb distance. This result suggests that long distance regulation by the dimorphic element requires the endogenous bab1 or bab2 promoters, and/or that there exists another cis-acting sequence that is required for long distance regulation.

In cases where a CRE fails to activate a heterologous distal promoter, the next logical experiment would be to include the endogenous promoter and perhaps even promoter proximal sequence. If the expression occurs for the distal reporter, than it can be concluded that a “tethering element” sequence (Calhoun and Levine, 2003) exists that is required for long distance gene regulation. If expression does not occur, then it would suggest that another sequence or sequences exists that are required for long distance regulation. In such cases, additional sequences can be added between the distal promoter and the CRE to find that which results in the distal reporter being activated in the pattern characteristic of the CRE. Many searches for CREs often begin by testing large pieces of genomic DNA (≥3kb) for the ability to activate expression of a heterologous promoter in

104

a reporter transgene assay. Our results suggest that this methodology is at risk for failing to identify CREs when they are at a distance to an ill-suited promoter.

Mapping cis-acting sequences required for long distance gene regulation

Our motivation for developing Red Light Green Light was to provide a means to identify the DNA sequences involved in mediating gene regulation between a distantly located CRE and its target promoter or promoters. What has been previously referred to as remote control (Swanson et al., 2010) and tethering (Calhoun et al., 2002) elements.

For the t_MSE2 and dimorphic element CREs, we must first identify the promoter and cis-acting sequences necessary for long distance regulation. However, the LAE provides an opportunity to seek and characterize a remote control element. Towards this end, one could next test a series of truncated CRE versions to narrow down the minimal CRE capable of driving expression of the distal reporter in the pupal leg and antenna (Figure

3.4). For such a more minimal LAE sequence, a set of scanning mutant CRE versions could be created that each possess a sub-sequence (~50-100 base pairs) where every other base pair is altered with a non-complementary transversion (Figure 3.2B). The scanning mutant CREs could then be tested for their ability to activate both the proximal and distal reporters (Figure 3.2C). Scanning mutants affecting both distal and proximal reporter expression, Type 3 outcome, can be interpreted to have destroyed the encoding for a sequence element that is necessary for the CREs spatial and/or temporal level of activity.

What we seek by this approach though, are scanning mutants that leave the proximal reporter‟s expression unperturbed, but render distal reporter expression compromised,

105

referred to as a Type 2 outcome. This outcome would reveal mutations that altered a sequence that encodes a remote control element activity. These sequences could then be studied to identify the proteins that directly interact with the remote control element.

Success here should serve as a needed entry point to understand how CREs encode information that facilitates long distance gene expression activation. Moreover, this better understanding has implications for the evolution of CREs. Ultimately, this scanning mutant approach can be similarly used to dissect the sequences within a promoter or promoter-proximal sequences that participate in long distance regulation.

Materials and Methods

Generating pRLGL vector

The template used for construction of the dual reporter system was the mS3aG reporter vector (Camino et al., 2015). The core region of the dimorphic element (Rogers et al., 2013) was inserted into the AscI and SbfI restriction endonuclease sites between a

BglII site and an XhoI site. The protein coding sequence of the DsRed.T4-NLS reporter gene (Barolo et al., 2004) was synthesized (GenScript Inc.) with some minor alterations.

The SpeI site flanking the 3‟ untranslated region (UTR) was changed to an EcoRI site.

Within DsRed.T4-NLS sequence was an SbfI restriction site which we abolished with a synonymous mutation. We also added an additional FseI site at the start of the 3‟UTR.

An Hsp70 promoter sequence flanked by a 5‟ BamHI and 3‟ StuI site was added just 5‟ of the DsRed.T4-NLS coding sequence.

106

Intronic sequence of the bab1 gene was altered at every other base pair by non- complementary nucleotide transversions to make the 1, 2, 4, and 8 kb spacer sequences.

The 1 and 2 kb spacer vectors were generated by inserting a spacer element between the

BamHI and BglII sites between the Hsp70 promoter of DsRed.T4-NLS reporter and the dimorphic element. The 4 kb spacer vector was generated by In-Fusion Cloning

(Clontech Laboratories Inc) of a second 2 kb spacer with flanking BglII sites into the

BglII site. The 8 kb spacer was generated by In-Fusion cloning a 4 kb spacer sequence with flanking BamHI and BglII sites into the 4 kb vector at the BamHI site located at the

5‟ end of the Hsp70 promoter for the DsRed.T4-NLS reporter gene.

The design, synthesis, and creation of new fluorescent reporter transgene

The sequence between the AgeI and SpeI sites of the S3aG (Rogers and Williams,

2011) vector containing the yBE0.6 CRE was used as a starting point for the creation of new fluorescent protein sequences. This DNA sequence contains the start codon for the

Enhanced Green Fluorescent Protein (EGFP) gene with a 3‟ nuclear localization signal

(NLS) of the tra gene (Hedley et al., 1995), and an SV40 poly-adenylation (polyA) signal in a 3‟ (UTR). The DNA sequence from the start codon “ATG” up until the NLS sequence was removed and replaced by the coding sequence from other fluorescent proteins. This included the protein coding sequence for mCherry, which was based upon the vector pmR-mCherry (Clontech Inc.). The FASTA format sequence from the 1st codon of mCherry to the last codon amino acid was combined 5‟ and in-frame of the

107

coding sequence for the tra gene nuclear localization signal and a SV40 polyA signal containing 3‟UTR. This coding sequence possessed a single site for StuI and SbfI, two restriction enzyme sites that are commonly used in S3aG and pRLGL-type vectors. These sites were each removed by substituting a single synonymous base change. The E2-

Crimson fluorescent protein (Strack et al., 2009) coding sequence was obtained from the sequence file for the pCMV-E2-Crimson vector (Clontech Inc.) and the FASTA format sequence was grafted 5‟ and in-frame of the tra NLS sequence. The SbfI site that resided within the E2-Crimson sequence was destroyed by substituting a synonymous mutation.

The mCerulean protein coding sequence was obtained from the CMV-Brainbow-1.0L vector (Addgene plasmid #18721), synthesized (GenScript Inc.), and inserted in front of and in-frame with the tra NLS by AgeI and NotI restriction sites, replacing the EGFP sequence from S3aG. A reporter gene sequence was designed that possessed an Hsp70 promoter, E2-Crimson-NLS coding sequence, and an SV40 poly-A signal containing 3‟

UTR. This transgene was flanked by a 5‟ BamHI site and 3‟ EcoRI site. This sequence was synthesized (GenScript Inc.) and cloned in place of the distal reporter from the pRLGL0, 2, 4, and 8 kb vectors.

Analysis of fluorescent reporter output by confocal microscopy

The yBE0.6 CRE was used to drive expression of the newly designed fluorescent reporters. This CRE has a peak regulatory activity at ~85 hours after puparium formation

(hAPF) (Camino et al., 2015). The developmental time point used for the expression analysis of pRLGL transgenes were ~70hAPF for the dimorphic element, expanded

108

dimorphic element, and leg and antennal enhancer (Baanannou et al., 2013; Rogers et al.,

2013); ~85hAPF for the yBE0.6, and ~90hAPF for the t_MSE2 (Camino et al., 2015). An

Olympus Fluoview 1000 confocal microscope was used to capture projection images of the dorsal abdominal epidermal fluorescence from whole mount pupae. The settings were optimized and are listed in Table 3.1 and Table 3.2.

Cloning of cis-regulatory elements

The sequences of the CREs were obtained from D. melanogaster genomic DNA

(strain 14021‐0231.04) that was acquired from the San Diego Drosophila species stock center. CREs were PCR-amplified from the genomic DNA using the primer combination shown in Table 3, which added AscI and SbfI sites. These PCR amplified CREs were cloned into the AscI and SbfI sites of the fluorescent reporter vectors.

Transgenic creation and D. melanogaster integration

The pRLGL, pELGL, and S3aG fluorescent reporter vectors were site-specifically integrated into the D. melanogaster germline attP40 landing site on the 2nd chromosome

(Best Gene Inc.) by a ɸC31 integrase approach. Transgenic D. melanogaster were maintained at 22oC and with a Sugar Food recipe (Salomone et al., 2013).

109

Table 3.1 The confocal microscope settings utilized for imaging transgenic D. melanogaster pupae with S3a-series fluorescent protein reporter transgenes that possess the yBE0.6 CRE.

Fluorescent EGFP mCerulean-NLS mCherry-NLS E2-Crimson-NLS Protein

Laser (nm) 488 458 543 633

Laser% 10 20 20 15

HV 700 850 850 750

Gain 1 2 1 1

Offset 1 20 10 5

Aperture 200 200 200 200

Step Size (μm) 10 10 10 10

Excitation 485-595 450-585 550-648 620-780 Filters (512) (476) (580) (665)

Emission 500-530 462-485 555-625 Far Red Filters

110

Table 3.2 The confocal microscope settings utilized for imaging transgenic D. melanogaster pupae with Red Light Green Light-series dual reporter transgenes that possess the dimorphic element CRE.

Fluorescent EGFP DsRed.T4-NLS E2-Crimson-NLS

Protein

Laser (nm) 488 543 633

Laser% 10 15 15

HV 600 750 700

Gain 1 1 1

Offset 1 1 5

Aperature 200 200 200

Step Size (μm) 10 10 10

Excitation Filters 485-595 550-648 620-780

(512) (580) (665)

Emission Filters 495-530 575-640 Far Red

111

CHAPTER IV

FUTURE DIRECTIONS: THE USE OF A HIGH THROUGHPUT APPROACH

TO REVEAL THE HIDDEN REGULATORY LOGIC OF GENE

EXPRESSION REGULATION AND ITS EVOLUTION

Abstract

Every feature of every living organism requires the regulated expression of the

RNA and protein products of numerous genes. These patterns of regulated gene expression are controlled by DNA sequences known as cis-regulatory elements (CREs), which encode a “regulatory logic” of binding sites for various transcription factor proteins. Two major challenges of our time are to decipher how regulatory logic is encoded in DNA sequence and to understand how regulatory logic can evolve from antecedent DNA sequences. This is an immense challenge as metazoan species have genomes with over 10,000 genes, several hundred to over a thousand genes for transcription factors, and fifty thousand to perhaps over a million CREs. Presently, the time frame needed to identify a CRE is far shorter than that required to decode its

112

regulatory logic. As a result our understanding of what constitutes a regulatory logic is derived from a small number of well characterized examples, and the few published case studies of CRE evolution have only revealed minute details about how those logics‟ have evolved. In order to achieve a deeper mechanistic understanding of CRE evolution, research programs decoding regulatory logic need to include an approach that can expedite the mapping of binding interactions between transcription factor proteins and

CRE sequences. Thus, we led an effort to complete a near comprehensive test for interactions between the Drosophila (D.) melanogaster transcription factors and CRE sequences for the D. melanogaster tan, yellow, and bab genes whose expression shapes an evolved pattern of male abdominal pigmentation. We identified hundreds of transcription factor-CRE interactions by a high-throughput yeast one-hybrid assay. These interactions can be further characterized for a role in regulatory logic evolution through the subsequent use of more conventional methods.

Introduction

CREs, their logic, and their linkages

Spatial, temporal, and environmentally-induced patterns of gene expression result from interactions between transcription factor proteins and sequences that are commonly referred to as cis-regulatory elements or CREs (Davidson, 2006b; Davidson and Erwin,

2006). These control sequences typically range in size from several hundred to over a thousand base pairs (Rebeiz and Williams, 2011). While expression regulation in single

113

celled organisms can often be directed by one or two interacting transcription factors, in multicellular organisms, such as animals, CREs have been found to be regulated by combinations of four or more transcription factors (Arnone and Davidson, 1997;

Swanson et al., 2010). For example the stripe 2 CRE drives the second of seven stripes of eve expression in the D. melanogaster embryco. This pattern of expression results from a

CRE sequence that possesses multiple binding sites for four different transcription factors

(two repressors and two activators) that themselves have a spatially limited pattern of expression (Arnosti et al., 1996; Small et al., 1992; Stanojevic et al., 1991). Thus, this sculpted stripe of eve expression results from its regulatory logic defined by its conformation of transcription factor binding sites.

While aspects of regulatory logic have been identified for a handful of carefully studied CREs, our understanding of regulatory logic remains naive considering that these logics must be tuned to the different cell types, life times, and environmental conditions necessary to make all the attributes that are possessed by the diversity of life on Earth. A fundamental question to be answered for any CRE is this: how many unique transcription factor-binding site interactions (referred to as regulatory linkages) comprise a single regulatory logic? Ultimately, through characterizations of many CREs it may be possible to reveal general principles about the binding site contents and organizations of regulatory logic.

114

CREs and the importance of regulatory logic evolution

While the regulatory logic of a few CREs have been resolved, these characterized

CREs represent single alleles that may be distinguished in sequence and function from other alleles for a given population or species, and from orthologous sequences possessed by related species. It often remains untested whether the studied CRE and its regulatory logic is equivalent to other alleles and orthologs. Moreover, well-reasoned arguments have been put forth that the major source of morphological diversity among species are changes to CREs (Carroll, 2008; Wray, 2007). Thus, the existence of related CREs with divergent regulatory logic should often be expected. This suspicion is supported by human population genetic studies which have indicated that the CRE regions of genomes seem to harbor most of the genetic differences responsible for human phenotypic variation (Wellcome et al., 2007), including genetic diseases (Ghiasvand et al., 2011), obesity (Claussnitzer et al., 2015), and disease risk (Kulzer et al., 2014; Musunuru et al.,

2010). Thus, in order to fully understand the genetic underpinnings of human morbidity and mortality, and the origin of species differences, it is essential to understand how CRE regulatory logic‟ comes into existence and subsequently diversify (Rebeiz and Williams,

2011). To achieve this goal, appropriate model traits are needed that are amenable to experimentation that can resolve regulatory logic and its evolution.

115

Fruit fly pigmentation as a model trait to study CRE regulatory logic

A convenient model trait to study regulatory logic evolution is the pigmentation pattern that decorates the Drosophila (D.) melanogaster cuticle tergites covering the dorsal surface of the abdomen segments. For this species, the tergites covering the A5 and A6 segments are fully pigmented in males, but female pigmentation is limited to a narrow posterior stripe on each tergite (Rogers et al., 2013) (Figure 4.1A and 4.1B).

Genes involved in coloring tergites need to be limited to the A5 and A6 segments as well as being expressed during the later stages of pupal development, indicating the occurrence of spatial and temporal regulation respectively. Gene expression needs to occur in the epidermal cells where pigment metabolism occurs, thus gene expression is cell-type regulated too. Lastly, for certain genes their expression differs between males and females, indicating the occurrence of sex-specific regulation. The genes responsible for making this species‟ pattern of sexually dimorphic pigmentation are controlled by

CREs that encoded complex regulatory logics. The tan and yellow genes are expressed in the epidermal cells for which the overlying tergite will be colored black (Camino et al.,

2015). These patterns of expression are each controlled by a single CRE known as the t_MSE and yBE0.6 respectively (Figure 4.1A‟‟, 4.1A‟‟‟, 4.1B‟‟, and 4.1B‟‟‟). The absence of tan and yellow expression in the female abdomen requires the expression of the bric-à-brac (bab) transcription factor genes that function as dominant repressors of black pigmentation (Jeong et al., 2006; Kopp et al., 2000). bab expression in the A5 and

A6 segment epidermis of females requires the regulatory activity of a single CRE known as the dimorphic element (Figure 4.1A‟ and 4.1B‟ ). Dimorphic pigmentation evolved

116

from an ancestral monomorphic state which occurred in large part by the evolution of these three CREs (Camino et al., 2015) encoding male- and female-specific regulatory activities.

Figure 4.1 The sexually dimorphic pigmentation of Drosophila melanogaster is driven by sex-specific CRE activities. The tergites covering the A5 and A6 segments are fully melanic for D. melanogaster ( A) males but not (B) females. (A’ and B’) The absence of pigmentation in females requires the expression of the female-limited expression of the bab genes under the control of the dimorphic element‟s regulatory activity. Male pigmentation requires tan and yellow gene expression in these segments driven by the male-specific activity of the (A’’ and B’’) t_MSE and (A’’’ and B’’’) yBE0.6.

Decoding an evolving female-specific regulatory logic

Females from different geographic regions differ in the extent of black pigmentation on the A5 and A6 tergites (Kopp et al., 2003; Parkash et al., 2008). A main part of William Rogers‟ PhD thesis (PhD from the University of Dayton in 2014) was 117

showing that this pigmentation variation was mostly due to differences in the activity of dimorphic element alleles that drive disparate levels of bab gene expression (Rogers et al., 2013). Dr. Rogers inferred the ancestral DNA sequences and showed that one new

(“derived”) mutation played a major role in reducing the activity of the dimorphic element. This was called the “D” mutation (Figure 4.2B), and swapping this mutation into the ancestral sequence was sufficient to reduce this CRE‟s ability to drive the expression of an EGFP reporter gene in transgenic D. melanogaster pupae (compare

Figure 4.2F to 4.2E). Likewise, he found two derived mutations, called “F” and “L”

(Figure 4.2C and 4.2D), which were sufficient to increase the regulatory activity of the dimorphic element (compare Figure 4.2G and 4.2H to 4.2E). These CRE activity effects presumably resulted from changes in the encoded regulatory logic. However, the identity of the lost or gained transcription factor binding sites remain unknown. Hence, the dimorphic element and these three mutations make an excellent model to study how new mutations occurring on top of an existing regulatory logic can alter gene expression and a phenotypic trait.

118

Figure 4.2 Three derived mutations alter the regulatory activity of the dimorphic element. (A) A Phylogeny for four D. melanogaster populations, the extinct ancestor of these populations named the “Concestor”, and D. simulans as an outgroup species. (B-D) Sequence alignment showing the derived nucleotide states in white font color. (E) EGFP reporter gene expression driven by the Concestor‟s dimorphic element, and (F-H) Concestor elements where a single derived mutation replaced the ancestral nucleotide state. Numbers represent the measured level of EGFP expression and standard error of the mean for the Concestor elements with a derived mutation compared to that of the ancestral sequence. This figure was inspired by that published in Rogers et al. (2013).

Decoding CREs with male-specific regulatory logic

A significant portion of my PhD research has focused on the t_MSE and yBE0.6 sequences, as these CREs originated in the ancestry of D. melanogaster to direct a male- limited pattern of expression (Camino et al., 2015; Jeong et al., 2006, 2008) (Figure

4.1A‟‟ and 4.1A‟‟‟). These CREs provide an excellent model to understand how a pattern of expression opposite of the dimorphic element‟s evolved to be encoded in a regulatory

119

logic of transcription factor binding sites. To date, we have shown that the t_MSE CRE seemingly evolved in part by the gain of binding sites for the transcription factors Abd-A and Hth (Camino et al., 2015). Moreover, an 85 base pair sequence was found that encodes most of this CRE‟s activity. However, the binding sites and the interacting transcription factors responsible for this element‟s activation in the pupal abdomen remain unknown, as do the sites and factors that suppress this element‟s activity in the female abdomen. The similar pattern of male-limited expression of yellow is known to involve the direct binding of the Hox transcription factor Abd-B to two binding sites within the yBE0.6 (Jeong et al., 2006). Previously, we found that this CRE possesses many additional DNA sequences that shape this element‟s regulatory activity (Camino et al., 2015). However, no obvious candidate transcription factors have been identified that may comprise these key regulatory logic inputs. Resolving the regulatory logic for these two CREs would facilitate comparisons of regulatory logics for independently evolved

CREs with similar activity and for an understanding of the origin of sex-specific regulatory logic.

A yeast-one hybrid approach to test hundreds of transcription factors for binding interactions with numerous CRE sequences

A few regulatory linkages have been characterized for CREs involved in the network of genes that produces the D. melanogaster pattern of abdominal tergite pigmentation. Abd-B is a direct activator of the yBE0.6 and the dimorphic element

(Jeong et al., 2006; Williams et al., 2008). Abd-A and Hth are direct repressors of the

120

t_MSE (Camino et al., 2015), and the male and female isoforms of DSX act respectively as a repressor and activator of the dimorphic element (Williams et al., 2008). Prior to characterizing the CRE binding sites for these factors, these genes were candidates for regulation due to the effects that perturbations to these genes had on tergite coloration

(Baker and Ridge, 1980; Kopp et al., 2000; Rogers et al., 2014). The problem facing the further characterization of these CREs is that we lack obvious candidate transcription factors that form regulatory linkages and this species genome contains around 750 transcription factor genes (Pfreundt et al., 2010). This large number of factors makes it untenable to evaluate each factor one-by-one with conventional methods such as an electrophoretic mobility shift assay.

121

Figure 4.3 Conceptual overview of the yeast one-hybrid assay. (A) Example of a 2x multimer bait design that was cloned upstream of the His3 growth reporter gene. Yeast will (B) not grow when the prey transcription factor fails to interact with the bait sequence and (C) grow when an interaction occurs because the His3 gene will be expressed. (D) This method was scaled up to test 722 preys for an interaction with numerous CRE baits. Large red spots represent yeast growth, and each test is done in quadruplicate. AD and TF respectively stand for Activation Domain and Transcription Factor.

What is needed is an approach that allows for the individual testing of each transcription factor for binding with a particular CRE sequence and that is scalable for performing similar tests with a myriad of CRE sequences. One suitable approach is the yeast one-hybrid assay (Figure 4.3), for which one component of this assay includes a cis-regulatory sequence that will function as a bait for interactions by coupling it to a reporter gene such as lacZ or a gene needed for growth such as His3. The bait-reporter

122

genes can be integrated into the genome of the S. cerevisiae yeast, which adds a chromatin environment to the tests of protein-DNA binding. The second component to this assay is a protein coding sequence for a DNA-binding protein that is fused to the coding sequence for a strong transcriptional activation domain, such as that for the yeast

GAL4 transcription factor (Figure 4.3B and 4.3C). These coding sequences can be expressed in yeast cells, referred to as prey molecules, and their ability to bind the bait sequence can been seen by the induced expression of the lacZ or His3 reporter genes

(Figure 4.3D). An important advantage of the yeast one-hybrid assay is its ability to be scaled up to hundreds to over thousands of tests for prey-bait interactions. For example, genes expressed in the early Drosophila embryo have a cis-regulatory sequence known as

TAG-team sites (CAGGTAG), and the significance of these particular sites was unknown. Using a yeast one-hybrid screen of a library containing thousands of prey factors, it was found that preys that encoded the D. melanogaster Zelda protein were able to activate the bait-reporter gene in yeast (Liang et al., 2008). The in vivo binding of

Zelda to these sites was later validated and serves as an excellent example of how the yeast one-hybrid assay can identify a key interacting transcription factor when an obvious candidate does not exist.

In recent years the technology available for utilizing yeast one-hybrid assays to study D. melanogaster CREs has dramatically improved. A library of preys was made that includes the open reading frames for 722 or the ~755 transcription factors in the D. melanogaster genome. Additionally, the steps of transforming prey and bait vectors into yeast, and screening for interactions has been automated (Hens et al., 2011). Thus, it is

123

not only possible to test a specific CRE sequence for interactions with the vast collection of this species transcription factors, but it is also realistic to perform similar tests for many CREs. We sought to use this high throughput system to inspect for transcription factors binding to the D. melanogaster yBE0.6 and t_MSE CREs, and to parse small motifs from the dimorphic element that contain either the derived “D”, “F”, or “L” mutation, or the ancestral base pair variants. The results of these tests have revealed a wealth of candidate interactors that can now be validated through further investigations.

Results

Yeast one-hybrid assay identifies 74 transcription factors that interact with yBE0.6 sequences

Previously, we determined that a 564 base pair sequence upstream of the yellow gene, called yBE0.6, possessed a regulatory logic capable of driving reporter gene expression in the abdominal epidermis of the male A5 and A6 abdominal segments during a late stage of pupal development (Camino et al., 2015). Moreover, we found that sequences broadly distributed throughout the yBE0.6 contributed to the element‟s regulatory activity. Thus, we decided to test the entire yBE0.6 as baits for interactions with prey transcription factors in a high-throughput yeast one-hybrid assy. This was accomplished by subdividing the yBE0.6 into 6 bait sequences each of ~110 base pairs and that overlapped adjacent bait or baits by ~30 base pairs (Figure 4.4, Table 4.1, and

Appendix E).

124

Figure 4.4 Annotation of the yeast one-hybrid bait sequences upon the full yBE0.6 CRE. The yBE0.6 is 564 base pairs in length and has two known direct binding sites for the transcription factor Abd-B (vertical blue lines). Most of this CRE‟s sequence was included in one of six baits and the particular domain included in the baits is represented by the yellow rectangles.

For the yBE0.6 baits, we identified 74 transcription factors interacting with at least one bait with a significance level of p<0.01 (Table 4.2). This includes 11, 16, 14, 15,

21, and 21 factors respectively for yellow Baits 1-6. Notably, these interacting preys did not include Abd-B which is a known direct interactor through two binding sites (Jeong et al., 2006), nor the Bab proteins which are suspected to be direct repressors of yellow expression. There are a number of factors that immediately present interesting research avenues to further pursue. Factors such as Sex comb on midleg (Scm) that plays a role in abdominal segment identity; ScmD1 alleles mediate an A4 segment to A5 segment identity shift through altering the expression of Bithorax complex genes such as abd-A and Abd-B

(Simon et al., 1992). We previously identified two regions of the yBE0.6 that when mutated led to the CRE driving ectopic GFP reporter gene expression in females (E.M.

Camino, unpublished data) and thus seemingly harbor sex-specific regulatory inputs.

Interestingly the sex-determination gene runt (run) showed strong binding with the yellow Bait 2 which mostly overlaps one of the two regions encoding sex-specific inputs

125

(Duffy and Gergen, 1991; Younger-Shepherd et al., 1992). Finally, and perhaps holding the most promise, the products of daughterless (da), Suppressor of variegation

(Su(var)2–10), and lame duck (lmd) all had strong interactions with the yellow Baits 5 and 6. A previous genetic screen revealed that these three genes function to shape abdominal pigmentation (Rogers et al., 2014).

Yeast one-hybrid assay identifies 141 transcription factors that interact with certain t_MSE Sequences

The gene tan, like yellow, is expressed in the A5 and A6 abdominal segments of male D. melanogaster at a late stage of pupal development, and this expression is presumably controlled by the t_MSE. Previously, we identified a 124 base pair sequence within the t_MSE CRE that is necessary for its regulatory activity in a reporter transgene assay (Camino et al., 2015). Thus, we decided to finely characterize this 124 base pair sequence for novel transcription factor interactions as 11 baits. Each bait includes 20-25 base pairs of t_MSE sequence, with 10 base pair overlaps between adjacent baits (Figure

4.5, Table 4.3, and Appendix F). Due to the small size of the utilized CRE sequences, we designed each bait to possess the same CRE sequence in duplicate, though separated by four base pairs (hereafter called 2x multimers).

126

Figure 4.5 Annotation of the yeast one-hybrid bait sequences upon the full t_MSE CRE. The t_MSE is 897 base pairs in length and has a central region that was found to be necessary and sufficient for the CRE‟s regulatory activity. 11 baits were designed to this central region and the particular domain included in the baits is represented by the black rectangles that bear the baits given name.

For the 11 t_MSE bait sequences tested, we identified 141 interactions with a significance level of p<0.01 (Table 4.4). This includes 17, 26, 23, 23, 24, 31, 15, 18, and

10 factors respectively for tan baits 1, 2, 4-9, and 11, while the tan baits 3 and 10 had no significant interactions. Notably, these interacting preys did not include Abd-A which is a known direct interactor to several binding sites, and HTH for which one binding site has been found (Camino et al., 2015). Given that the t_MSE encodes sex-specific patterns of gene regulation, it was interesting to identify a number of sex determination factors, such as Fruitless and Sisterless-A (Cline, 1988; Ryner et al., 1996), that interact with the t_MSE bait sequences. We also identified a CTCF interaction, this protein helps mediate chromosomal looping (Sanyal et al., 2012) and may play a role in how the t_MSE locates its target promoter which is positioned ~3,000 base pairs away. Finally, interactions were found for four factors (Abd-B, Vvl, Grh, MBD-like) that were identified in a genetic screen to play a role in the D. melanogaster abdominal pigmentation phenotype (Rogers et al., 2014).

127

Yeast one-hybrid assay identifies transcription factors that interact with ancestral and derived dimorphic element sequences

Pigmentation and expression of yellow and tan is repressed in D. melanogaster females by the activity of the Bab1 and Bab2 proteins. Female-limited expression of the bab genes is thought to be under the regulatory control of a CRE known as the dimorphic element (Williams et al., 2008). While this CRE encodes a regulatory logic that includes binding sites for the Abd-B and Dsx transcription factors, three derived mutations altering the CRE‟s activity were found that occur outside of these known transcription factor binding sites. Suspecting that these mutations either created or destroyed binding sites for novel transcription factors, we designed 2x multimer baits that included versions with the ancestral and derived base-pair states situated in the middle of 21 base pairs of dimorphic element sequence (Figure 4.2 and Table 4.5).

For the dimorphic element baits, we identified 148 interactions with a significance level of p<0.01 (Table 4.6). This includes 18, 38, and 35 interactions for the baits containing the derived D, F, and L mutations respectively, and 17, 15, 40 respectively for the baits containing the ancestral base pair states at the locations for the D, F, and L mutations. Interestingly, few (7 factors in total) of the interacting preys were found to interact with both the ancestral and derived bait forms. This outcome suggests that these mutations drastically altered the range of factors capable of binding to the highly similar sequences. An interesting factor for the D mutation, a mutation that shapes a darker female coloration in the A5-A7 segments, is Su(var)2–10. This factor was implicated as a

128

promoter of pigmentation in a genetic screen (Rogers et al., 2014), and this factor also interacted with a yBE0.6 sequence (Table 4.2, Bait 6). For the two derived mutations that shape lighter abdominal phenotypes and increased dimorphic element activity (Figure

4.2), F and L mutations, there were 4 factors (Exd, Bab2, Dsx, Grh) that are known to lead to ectopic pigmentation when gene function is reduced (Rogers et al., 2014), signifying their role as repressors of pigmentation. Of particular intrigue was the interaction found between Bab2 and the L mutation containing bait. This outcome supports one interpretation where the derived L mutation created a positive feedback loop with the dimorphic element to boost Bab expression and further suppress pigmentation to generate its lighter phenotype.

Control baits identify transcription factors that might be promiscuous binders in yeast one-hybrid assays

In order to ensure that the transcription factors identified as interacting with pigmentation network CRE sequences were specific and not a by-product of promiscuous transcription factor binding, we generated and tested two control bait sequences (Table

4.7). For these baits, we altered every other base pair of the tan Bait 1 and D Derived Bait with non-complementary nucleotide transversions. The mutant baits are referred to as tan

Control Bait and D Derived Mutant Bait Control. In each case, the interactions occurring with the control bait can be directly compared to the sequence it was derived from. For the tan control bait, only two of the nine interactions (Table 4.4, names in red) were for factors identified to interact with the non-mutated sequence (tan Bait 1). Similarly, of the

129

18 transcription factor preys interacting with the D Derived Mutant Bait, only 1 interacted with the non-mutant bait (D Derived Bait).

Discussion

Here, we have performed a high throughput test of 722 D. melanogaster transcription factors for the ability to interact with 25 sequences from three different D. melanogaster cis-regulatory elements in a yeast one-hybrid assay. From these tests, we identified 448 interactions of which the vast number involved transcription factors with no known role in the network of genes involved in making the D. melanogaster pattern of abdominal pigmentation. Thus, a wealth of candidate transcription factors was derived from the vast number of factors encoded in the D. melanogaster genome. Future studies can seek validation for these factors in the D. melanogaster pigmentation gene regulatory network, which would advance an understanding of the regulatory logic for three CREs whose sequence and activity has evolved during the diversification of fruit fly pigmentation patterns.

Limitations to the yeast one-hybrid approach

An ideal outcome from yeast one-hybrid assays would have been where just a single or perhaps few preys were found to interact with each CRE sequence bait. A result similar to when Zelda was found to interact with the TAGteam site (Liang et al., 2008).

However, we found a large number of statistically significant binding events for 23 of the

130

25 CRE baits we tested, ranging from 74 for the yellow baits to 148 for the dimorphic element baits. Thus, it is reasonable to conclude that our lists of interactors in yeast include many that are not realized in vivo and can be considered false positives. We anticipated that false positives would occur and thus we created some control baits that were derived from baits we were testing, tan bait 1 and D Derived bait, where every other base pair was altered with a non-complementary nucleotide transversion. What we found to interact with these baits were several transcription factors that little is known about

(possesses a CG gene name), as well as Bgb and Blimp-1. Since these prey factors also bound to a few of our CRE baits, it seems reasonable to conclude that these transcription factors exhibit promiscuous binding and are therefore unlikely to explain the in vivo activity of the tan, yellow, and bab CREs. The factors that bound our control baits generally did not overlap with those that interacted with the bait CRE sequences, indicating that most of the CRE interactions were due to sequence-specific binding.

In addition to false positives, it is reasonable to expect that this data set includes false negatives. This can be due to a genuine interacting factor not being among the list of transcription factors that were included in the prey library. The list of unrepresented factors includes 8% of the D. melanogaster transcription factor genes. It is also possible that some genuine in vivo interactors fail to bind to bait sequence in yeast for various reasons. Such as the presence of the GAL4 activation domain on the transcription factor protein, the absence of a DNA-binding cofactor, or the absence of required post- translational modifications. By comparing our data set to known direct interacting transcription factors, we can see that false negatives do occur. Specifically, Abd-B was

131

not found to interact with the yellow baits 3 and 4, each which possesses a site known to be bound by this factor in vivo (Jeong et al., 2006).

Progressing from yeast one-hybrid identified candidate regulatory linkages to bona fide in vivo linkages

While the use of the yeast one-hybrid assay achieved its goal of creating lists of candidate interactors to our CRE sequences of interest, the remaining challenge is to whittle these lists down to those interactions that represent functional in vivo regulatory linkages. First, the promiscuous binders can be eliminated from further consideration.

Second, these lists could be further narrowed down by identifying the interacting transcription factors that are actually expressed in the abdominal epidermal cells for which the CREs are active in vivo. It would be useful to pair this yeast data set with a list of expressed transcription factors through a method such as RNA-seq.

Another way to narrow our list of candidates is by searching for connections between the interacting transcription factors in yeast and publications that implicate the factors in the patterning of the D. melanogaster abdomen. In a previous study, an RNA- interference screen was done that sought tergite pigmentation defects following the suppression of expression for 558 of the D. melanogaster transcription factors (Rogers et al., 2014). Defects were found for 28 of these genes, which included Abd-B, abd-A, dsx, bab1, and bab2. One of the other 23 genes was ventral veins lacking (vvl), which was elsewhere shown to regulate Dopa decarboxalyse (Ddc) in dopaminergic neurons (Certel

132

et al., 2000; Johnson and Hirsh, 1990; Junell et al., 2010). Outside of the central nervous system, Ddc is responsible for the conversion of dopa to dopamine in the pigmentation metabolic pathway (Wittkopp et al., 2003). So Vvl seems to be an interesting transcription factor to further investigate for a role in regulatory logic of the t_MSE as well as the dimorphic element allele possessing the derived L mutation. For many transcription factors, preferred binding motifs have been identified and DNA sequences can be inspected for the presence of these binding motifs. Vvl is one such factor with a known preferred binding motif of YAHKMA. Using the JASPAR CORE database

(Stormo, 2013; Wasserman and Sandelin, 2004), we found motifs resembling Vvl binding sites within the t_MSE baits that a Vvl prey interacted with in the yeast one- hybrid assay (Figure 4.6). Ultimately, the narrowed list of interacting transcription factors will need to be tested for the ability to bind specific CRE sequences through an in vitro method, such as an electrophoretic mobility shift assay, and for a meaningful in vivo interaction, perhaps by mutating the bound site and looking for an altered ability of the

CRE to regulate a reporter gene‟s expression in transgenic D. melanogaster.

Figure 4.6 Binding site motifs for Vvl in t_MSE2. The Vvl motif of YAHKMA appears 6 times within the region of t_MSE that has been identified as being responsible for the majority of the CRE‟s regulatory capability. 133

At this moment, the yeast one-hybrid data set provides a whole breadth of exciting new research avenues for the Williams‟ lab to purse. In order to garner a full understanding of CRE regulatory logic and their evolution, it is a prerequisite to know the full complement of factors influencing CRE operation. The work I have presented in this dissertation along with the yeast one-hybrid generated resources poises the Williams lab for an unparalleled characterization of an evolving gene regulatory network. Moreover, this method can be used to understand the protein factors that bind to any such tethering element or remote control element sequences identified by the Red Light Green Light approach that was described in Chapter 3.

Materials and Methods

Design of sequences to use as yeast one-hybrid baits

Bait sequences for the t_MSE and the dimorphic element were generated as 2x multimer repeats with a 4 nucleotide guanine bridge (Tables 4.3 and 4.5). Each bait consists of 20-25 base pairs of CRE sequence, and for the t_MSE baits a ~12 base pair overlap was included with the adjacent bait sequence to ensure that no potential binding sites went unscreened due to the nature of how the sequences were distributed into baits.

Bait sequences were created by generating oligos that would include overhangs for

BamHI and HindIII restriction endonuclease sites (Intergrated DNA Technologies) and annealing them prior to pEntry5‟ vector cloning. Bait sequences for the yBE0.6 (covering

134

564 base pairs) were cloned in ~110 base pair sequences with ~30 base pair overlaps with adjacent bait sequences. These baits were PCR amplified using the primers listed in Table

4.8.

Bait vector construction

Bait sequences were ligated into the BamHI and HindIII restriction endonuclease sites of the pEntry5‟ donor plasmid vector (Deplancke et al., 2004) using BP clonase II

(Invitrogen Catalog #11789-100). These ligation reactions were transformed into chemically competent E. coli DH5α cells. For each pEntry5‟ vector with a presumed bait sequence, plasmid DNA was purified and subsequently Sanger sequenced using the

M13F and M13R sequencing primers separately (sequenced by DNA Analysis LLC). The bait sequences within the pEntry5‟ vectors were then subcloned into pBD-His3

(Deplancke et al., 2004) vectors using LR clonase II and the manufacturers protocol

(Invitrogen Catalog #11791-100). The proper inclusion of baits within the pBD-His3 vector was verified by Sanger DNA sequencing using the HIS293Rv primer (5‟ –

GGGACCACCCTTTAAAGAGA – 3‟). Successful bait vectors possess an inserted

“bait” sequence (D. melanogaster regulatory sequence) upstream of the His3 gene.

Integration of baits into the yeast genome

All pBD-His3 baits were integrated into the YM4271 (MATa, ura3–52, his3-

Δ200, ade2–101, ade5, lys2–801, leu2–3,112, trp1–901, tyr1–501, gal4Δ, gal80Δ,

135

ade5::hisG) strain of S. cerevisiae that is unable to synthesize its own histidine. To accomplish the integration, pBD-His3 bait vectors were first linearized at a unique XhoI restriction endonuclease site. Linearized baits were shipped to the lab Dr. Bart Deplancke at École Polytechnique Fédérale de Lausanne. Each linearized bait DNA was then individually transformed into competent yeast by a TE/lithium acetate/PEG protocol

(Hens et al., 2012), and integrated yeast were selected by growth on SC-His-Ura media

(Deplancke et al., 2006).

Yeast one-hybrid assay

10 ug of an AD-TF prey from the library of prey vectors (Hens et al., 2011) was transformed into 500 ul suspension of each strain of YM4271 yeast with an integrated bait. Each prey vector contains the coding sequence for a D. melanogaster transcription factor inframe with the coding sequence for the transcriptional activation domain of

GAL4. The library includes preys that include the DNA-binding domains for 722 unique transcription factor genes. The transformations were accomplished using TE/lithium acetate/PEG protocol, and successfully transformed yeasts were selected by growth on

SC-His, -Ura, -Trp agar media plates for 10 days at 30oC. The prey library transformed yeast were then screened for growth on media that is deficient for histidine and supplemented with 3-Amino-1,2,4-triazole (3AT) at concentrations of 10mM, 20mM, and 40mM. 3AT is a natural competitive inhibitor of the His3 gene encoded protein imidazoleglycerol-phosphate dehydratase. The baits are then tested for self-activation by measuring growth on SC-His, -Ura, -Trp agar media plates containing 3AT without

136

integrating the prey library. After 1-2 days of growth at 30oC, the size of yeast colonies were scored by an automated system (Hens et al., 2012). Replicated growth of any yeast containing an integrated bait and transformed prey vector indicates that the DNA binding domain of the prey molecule interacts with the bait sequence thereby allowing the GAL4-

AD to drive robust expression of the His3 gene (Figure 4.3) (Hens et al., 2012).

Significant interactions between a bait sequence and a prey molecule reported here were those with a p-value equal to or less than 0.01. The p-value is generated by calculating an overall background profile of growth and then comparing “positives” as a multiple of growth over the background. Three out of the four technical replicates needs to show the same positive growth profile for a p-value to be assigned.

Acknowledgements

The integration of bait DNAs into the yeast genome and the subsequent yeast one- hybrid screens were done by Julie Russeil of Bart Deplancke‟s lab. The bait sequence designs were shaped by helpful discussions with Dr. Bart Deplancke.

137

Table 4.1 yellow Bait Sequences Bait Sequence CTAAGAAAAAATAGCATTGCATAAATGATATAGAGTCCAAAAACTACACAAA yB1 TTCAATAGCAGTAATGGTTACATTAGCTTTGAAATTGTTTTTAGACATCCG GGTTACATTAGCTTTGAAATTGTTTTTAGACATCCGAAGAAATAAGATTAAAT yB2 TTAAACGGCATTCTTTAATTTGTATTTTAATATTTTGAGAGGTTTTCCTTATTTA AAGTGTAGATTATTGAGGATTAATGC GAGGTTTTCCTTATTTAAAGTGTAGATTATTGAGGATTAATGCAAACCACTTTA yB3 TCTGCGGAGGTCGTAAAACGTATTTTTACCCATTTGCATGTTTATTATGCGTGT GGCTGGTTGTATTACTTTAC CATGTTTATTATGCGTGTGGCTGGTTGTATTACTTTACTTAAGTTTTGCAATTTTT yB4 TCTTTAGCAAGCAGGTGCATTTGGGCCAAGAGATATATGCGATCGCTTTCGGTT CGAATTTTTAACATTTACTTGCG GATATATGCGATCGCTTTCGGTTCGAATTTTTAACATTTACTTGCGGCGATGGTC yB5 ATTAGAGCATTACCCACTTAGGGCACCCCCAACATCCAGTTGATTTTCAGGGACC ACAATATTTTAAATAACAGCTAGTGGAATTACCTAAAAGCG CCAGTTGATTTTCAGGGACCACAATATTTTAAATAACAGCTAGTGGAATTACCTA yB6 AAAGCGCTTTCGTCCCTTTTGAAATTTTATGTAACACTCAATTATATTTATGTATAT GTATGCTCAAAATCACCTGCCAATAAC

138

Table 4.2 Transcription factor preys that interacted with yBE0.6 baits in at least one condition with a p-value ≤0.01 * Tests for interaction were done for three different concentrations of 3-Amino-1,2,4-triazole (3AT): 10, 20, and 40 mM.

Gene Name 10mM 20mM 40mM Bgb - 0.023 0.003

cad 0 0.067 - CG10979 0 0.011 - CG11695 0 0 0 CG12299 0.045 0.014 0 Bait 1 Bait CG14655 0 0 0 CG15336 - 0 0

yellow CG17612 0.014 0 0 CG3136 - 0.01 0 chm - 0.0002 - Scm 0.01 0.004 0.019 10mM 20mM 40mM Bgb - 0.016 0.008 CG12054 0.001 0.028 - CG12299 - 0 0.001 CG15336 - 0 0 CG17359 0.006 0.001 0.011

CG17806 0.025 0 0.08 CG3136 - 0.001 0.003 Bait 2 Bait

CG4282 0 0.073 0.401 CG9797 0.021 0.008 0.318 CrebA 0.075 0.004 0.032 yellow Dif ¦ dl 0 0.0002 - Gsc 0.066 0.009 0.429 HLHm5 ¦ HLHm&ggr 0.057 0.002 - Max 0.012 0.001 0.041 pfk - 0.013 0 run 0.028 0.009 - 10mM 20mM 40mM btd 0 0 0 btn 0.009 - - CG12236 0.007 0.136 - Bait 3 Bait CG14655 0 - - CG15336 - 0 0

yellow CG3136 - 0.006 0.004 CG4496 0 0 0.101 139

CG6930 - 0 0.243 Kr 0.001 0.048 0.031 Oaz 0.035 0.158 0.006 opa 0.003 0.244 0.492 sd 0.002 0.009 - Side 0 0 - woc 0.003 0 0.028 10mM 20mM 40mM apt 0.037 0.008 - Bgb - 0.002 0.009 CG12299 0.003 0 0.003 CG12659 0.004 0 0.075 CG15286 0.002 0.041 0.035 CG15336 0 0 0 CG3136 - 0.002 0.002 Bait 4 Bait CG32830 - 0.001 - CG4404 0 0 - yellow CG7963 0.097 0.075 0.001 0.001 0.001 0.301 net 0.001 0 0.014 rn 0 0.001 - sd 0.009 0.362 0.331 tsh 0.005 0.049 - 10mM 20mM 40mM bap 0.008 0.069 - CG10949 0.009 0.026 - CG11906 0.335 0.001 0 CG15336 - 0.019 0 CG15436 - 0.009 0 CG1602 0.166 0 0 CG17257 - 0 0 CG30020 0.067 0 0 CG31875 0.023 0.008 0.227 Bait 5 Bait CG8119 0.009 0.128 - CG8145 0.264 0.001 0 yellow CG8159 - 0.008 0.071 D ¦ sim 0.007 0.006 0.013 da 0.001 0 0.353 Doc1 0.014 0 0.502 ERR 0.106 0.002 0 exex 0 - - l(2)k10201 0.009 0.008 0.266 lmd 0.181 0 0

140

Sox21a 0.004 0.064 - tgo ¦ trh - 0.008 0.298 10mM 20mM 40mM ac - 0 Bgb - 0.007 0 CG12299 - 0.014 0 CG15336 - - 0 CG3136 - 0.031 0 CG6254 0 0.458 - CrebB-17A 0 0.066 - Dif ¦ Rel 0.003 - - Hr96 0 0.05 0.24 pad 0 0.262 - Bait 6 Bait repo 0.006 - - rgr 0.002 0.169 - yellow run 0.008 0.002 - sna 0.003 - - sqz 0 0 0 Sry-&bgr; 0 0.546 - Stat92E 0 0.07 - Su(var)2-10 0.001 - - sv 0 0.003 0.34 vnd 0.001 0 - wor 0.014 0.007 0.294

141

Table 4.3 tan Bait Sequences Oligo Name Sequence tb1 top y1h agcttTCAATAACAAAAATGAGTGCATTTTggggTCAATAACAAAAATGAGTGCATTTTg tb1 btm y1h gatccAAAATGCACTCATTTTTGTTATTGAccccAAAATGCACTCATTTTTGTTATTGAa tbCTRL top y1h agcttgCcAgAcCcAcAcTtAtTtCcTgTggggggCcAgAcCcAcAcTtAtTtCcTgTgg tbCTRL btm y1h gatccCACAGGAAATAAGTGTGGGTCTGGCccccCACAGGAAATAAGTGTGGGTCTGGCa tb2 top y1h agcttCTTGCACCATTAGAATATTAGATTTggggCTTGCACCATTAGAATATTAGATTTg tb2 btm y1h gatccAAATCTAATATTCTAATGGTGCAAGccccAAATCTAATATTCTAATGGTGCAAGa tb3 top y1h agcttTAGAATATTAGATTTTAGTGTTTAAggggTAGAATATTAGATTTTAGTGTTTAAg tb3 btm y1h gatccTTAAACACTAAAATCTAATATTCTAccccTTAAACACTAAAATCTAATATTCTAa tb4 top y1h agcttATTTTAGTGTTTAAATAAACTAATTggggATTTTAGTGTTTAAATAAACTAATTg tb4 btm y1h gatccAATTAGTTTATTTAAACACTAAAATccccAATTAGTTTATTTAAACACTAAAATa tb5 top y1h agcttAAATAAACTAATTTGAGAATggggAAATAAACTAATTTGAGAATg tb5 btm y1h gatccATTCTCAAATTAGTTTATTTccccATTCTCAAATTAGTTTATTTa tb6 top y1h agcttAGAATTCAAGATCATAATATggggAGAATTCAAGATCATAATATg tb6 btm y1h gatccATATTATGATCTTGAATTCTccccATATTATGATCTTGAATTCTa tb7 top y1h agcttAGATCATAATATGCATACTAATTAGggggAGATCATAATATGCATACTAATTAGg tb7 btm y1h gatccCTAATTAGTATGCATATTATGATCTccccCTAATTAGTATGCATATTATGATCTa tb8 top y1h agcttTGCATACTAATTAGACAGTCTCTTTggggTGCATACTAATTAGACAGTCTCTTTg tb8 btm y1h gatccAAAGAGACTGTCTAATTAGTATGCAccccAAAGAGACTGTCTAATTAGTATGCAa tb9 top y1h agcttTAGACAGTCTCTTTTTTTTATTACTggggTAGACAGTCTCTTTTTTTTATTACTg tb9 btm y1h gatccAGTAATAAAAAAAAGAGACTGTCTAccccAGTAATAAAAAAAAGAGACTGTCTAa tb10 top y1h agcttTTTTTTTTATTACTTCAACTATTCAggggTTTTTTTTATTACTTCAACTATTCAg tb10 btm y1h gatccTGAATAGTTGAAGTAATAAAAAAAAccccTGAATAGTTGAAGTAATAAAAAAAAa tb11 top y1h agcttACTTCAACTATTCAAATTTGCGTTTggggACTTCAACTATTCAAATTTGCGTTTg tb11 btm y1h gatccAAACGCAAATTTGAATAGTTGAAGTccccAAACGCAAATTTGAATAGTTGAAGTa top = top strand of bait btm = bottom strand of bait (reverse complement) * All sequences are oriented in the 5‟-3‟ direction

142

Table 4.4 Transcription factor preys that interacted with t_MSE baits in at least one condition with a p-value ≤0.01 * Tests for interaction were done for three different concentrations of 3-Amino-1,2,4-triazole (3AT): 10, 20, and 40 mM. ** Transcription factors in red font color indicate those that also interacted with a mutant control bait and that can be considered to be non-specific binding.

Gene Name 10mM 20mM 40mM ato 0.001 0.003 0 bigmax ¦ Mio 0.01 0.032 - CG12299 0.001 - - CG15336 0.01 - - CG3136 0.004 - 0.017 CG5669 - 0 0

CG6254 - 0 0.014 chn 0.001 0 0

Bait 1 Bait her - 0 0.529 kay 0.004 - - tan tan Mnt 0.004 0.259 - Side 0.002 0.319 0.487 trr 0.503 0.076 0.001 wor 0.011 0 0 Xbp1 0.002 0.214 - yem 0.009 0.084 - zen 0.002 0.219 - Gene Name 10mM 20mM 40mM Bgb 0.008 - - CG15336 0.001 - - CG17803 0 0.212 - CG3136 0.001 - - CG31510 0 - - CG32105 0.001 0 0

CG33213 0 0.456 - CG6272 0 0.051 -

Bait 2 Bait CG6686 0 - - CG7271 0.001 - - tan tan CG8159 0 - - CG8388 0 0.319 - CG8765 0.016 0.001 0.001 CG9876 0 - - ftz 0.001 0.489 - HLH4C 0 0.03 0.307 Hr96 0 0.011 0.226 143

kay 0.003 0.182 - ken 0 0.568 - 0.005 - - Pcl 0 - - salr 0 - - Side 0 0 0.004 Su(z)2 0.001 - - toe 0.04 0.001 0 vis 0.005 0.521 0.533 Gene Name 10mM 20mM 40mM Abd-B 0.172 - 0.007 Bgb - - 0 CG10979 0.003 - - CG11641 0.137 0 0.001 CG12299 - 0.022 0 CG1407 - 0.03 0.003 CG15336 - - 0 CG17197 0.136 - 0 CG18446 0 0 0.501

CG3136 - 0.051 0 CG9571 - 0.029 0.003

Bait 4 Bait drm 0 0 0.242 emc 0.194 - 0 tan tan Ets98B - 0.055 0.008 fu2 - - 0.009 HLHm5 ¦ HLH54F - 0.008 0.003 knrl 0.399 0.004 0.017 MBD-like 0 0 0.041 Mes4 - - 0.004 p53 - 0.004 - sd 0.107 0.002 - Snoo 0.001 0.012 0.042 Sry-&bgr; 0 0 0.294 Gene Name 10mM 20mM 40mM bcd 0.001 0 - Bgb - 0 -

brk 0 0.001 0.014 CG11085 0.037 0.009 -

Bait 5 Bait CG12299 - 0.001 0 CG15336 - - 0 tan tan CG17197 0 0 0.092 CG17612 0 0 0 CG2116 - 0.007 -

144

CG30443 0.01 0 CG3136 - 0 0 CG5204 0 0 - CG8145 0.006 0.209 - CG8314 0.286 - 0 d4 - 0.084 0 ey 0.514 0.044 0 grh 0 0 0.403 HLHm5 ¦ HLH54F 0.003 0.004 0.141 Hr38 0 0 0 Kr 0 0 0 pnr 0 0 0.275 sisA 0 0.544 - vvl 0.001 0 0 Gene Name 10mM 20mM 40mM Abd-B 0.004 0.015 0.044 Alh 0.003 0.424 - Blimp-1 0.003 - - C15 0.007 0.031 - CG12054 0 0 0.241 CG12299 - 0.003 - CG15336 - 0.004 - CG18764 0 0.384 - CG7372 0.003 0.156 - CG8159 0 0.001 0.567 CG9139 0 0 0 chn - 0.037 0.009 Bait 6 Bait Dsp1 0 0 0

tan tan dve 0 0 0 E(spl) 0 0 0.498 gfzf 0 0.001 0.046 l(2)k10201 0 0.004 - pnr 0 0.222 - Side 0 0 0.059 sima 0 0 0.427 Snoo - 0.028 0.009 Su(z)12 0.005 - - tio 0.011 0.009 0.059 Tkr - - 0 Gene Name 10mM 20mM 40mM ATbp - 0.375 0.001

Bait Bait 7 Bgb - 0.004 0

tan tan brk 0.047 - 0

145

cg 0.052 0.002 0.001 CG12299 - 0.001 0 CG14117 0 0.001 0 CG15336 - - 0.002 CG17186 0.003 0.027 0.114 CG18011 0.058 0.003 0.222 CG31169 - - 0.001 CG3136 - 0.011 0 CG3815 - 0.005 - CG4956 0.022 0 0.001 CG8089 0 0 0 CG9305 - - 0 comr 0.203 - 0.005 crm - - 0.005 D ¦ vvl 0.012 0.006 0 EV 0 0.109 - fru - 0.007 0.002 Hmr - - 0 Hr96 0 - - klu 0.484 0.002 MBD-like 0 0 0 mod 0 0 0.001 Pep 0.006 0 - Ravus 0.008 - - ro 0.001 0 0.001 Sox15 0.512 0.012 0.001 Tusp 0.219 - 0.001 unpg 0 0.308 0.676 Gene Name 10mM 20mM 40mM Abd-B 0.021 0 0.052 BEAF-32 0.005 0.227 0.143 Bgb - 0.006 0 CG12299 - 0 0 CG3136 - 0.002 0 CG3281 0.082 0.002 0.002 CG9797 - 0 0 Bait 8 Bait CHES-1-like 0.001 0 0.002 tan tan D ¦ sim 0.177 0.009 - dm 0 0 0 EV 0 0.046 - fkh - 0 0 OdsH 0 0.266 - phol 0.071 0.001 0

146

sima - 0.087 0

Gene Name 10mM 20mM 40mM brk 0.001 0.104 0.046 cad 0.001 0.073 - cbt - 0.038 0.008 CG10543 0.001 0.004 0.164 CG17197 - 0.045 0.002 CG3136 - 0.005 0.015 CG32767 0 0 0 CG34376 - 0.034 0.006 CHES-1-like 0.009 0.009 - Bait 9 Bait E(spl) ¦ Eip78C 0.009 0.003 0.032 tan tan Eip75B 0.008 0.117 0.071 H 0 0.006 0.097 Jra ¦ kay 0.009 0.056 0.049 MBD-like 0.004 0.21 0.392 opa 0.004 0.166 0.075 Ravus 0.011 0.01 0.008 run 0.004 0.004 0.006 Sox100B 0.01 0.009 - Gene Name 10mM 20mM 40mM cad 0.004 0.047 0.17 CG10431 0.057 0 0 CG14117 0.005 0.013 0.029 CG9305 0.01 0.017 0.104 CHES-1-like 0.01 0.035 0.161 Bait 11 Bait CTCF 0.009 0.012 0.125 tan tan ey - 0.017 0 lab 0.002 0.013 0.156 ovo - - 0 sd - 0.042 0

147

Table 4.5 DRE bait sequence oligos Oligo Name Sequence “D”anc Top agcttATAACTTCTTGCTCTGCGGTCggggATAACTTCTTGCTCTGCGGTCg “D”anc Bottom gatccGACCGCAGAGCAAGAAGTTATccccGACCGCAGAGCAAGAAGTTATa “D”der Top agcttATAACTTCTTCCTCTGCGGTCggggATAACTTCTTCCTCTGCGGTCg “D”der Bottom gatccGACCGCAGAGGAAGAAGTTATccccGACCGCAGAGGAAGAAGTTATa “D”SM Top agcttAgAcCgTaTgCaTaTtCtGgCggggAgAcCgTaTgCaTaTtCtGgCg “D”SM Bottom gatccGcCaGaAtAtGcAtAcGgTcTccccGcCaGaAtAtGcAtAcGgTcTa “F”anc Top agcttGTGGACTGGGGATGTGTGGCGCggggGTGGACTGGGGATGTGTGGCGCg “F”anc Bottom gatccGCGCCACACATCCCCAGTCCACccccGCGCCACACATCCCCAGTCCACa “F”der Top agcttGTGGACTGGGTTTGTGTGGCGCggggGTGGACTGGGTTTGTGTGGCGCg “F”der Bottom gatccGCGCCACACAAACCCAGTCCACccccGCGCCACACAAACCCAGTCCACa “L”anc Top agcttTTAGTTGATTAAGGGCGTGGCggggTTAGTTGATTAAGGGCGTGGCg “L”anc Bottom gatccGCCACGCCCTTAATCAACTAAccccGCCACGCCCTTAATCAACTAAa “L”der Top agcttTTAGTTGATTGAGGGCGTGGCggggTTAGTTGATTGAGGGCGTGGCg “L”der Bottom gatccGCCACGCCCTCAATCAACTAAccccGCCACGCCCTCAATCAACTAAa top = top strand of bait btm = bottom strand of bait (reverse complement) * All sequences are oriented in the 5‟-3‟ direction

148

Table 4.6 Transcription factor preys that interacted with dimorphic element baits in at least one condition with a p-value ≤0.01 * Tests for interaction were done for three different concentrations of 3-Amino-1,2,4-triazole (3AT): 10, 20, and 40 mM. ** Transcription factors in red font color indicate those that also interacted with a mutant control bait and that can be considered to be non-specific binding. *** Transcription factors in blue font color indicate those that bound both the ancestral and derived baits for a given mutation.

Gene Name 10mM 20mM 40mM Cenp-C 0.007 0 0 CG11071 0.001 0 0 CG17257 0.044 0 0.25 CG18619 0.015 0 0 CG7045 0.001 0 0.286

CG7928 0.006 0.19 - hbn 0 0 0.53 HLH106 0.024 0 0 Hmr - 0.006 - Hr78 0.035 0.004 0.53 Myb 0.001 0 0.502 'D' Ancestral Bait 'D' Ancestral odd - 0.002 0.001 retn 0.004 - - Rfx 0.003 0.002 0.525 rn 0 0 0 Side 0 0 0.029 tup - 0 0 Gene Name 10mM 20mM 40mM Camta 0.019 0 0 caup 0.097 0 0.011 CG10959 0.002 0.19 -

CG12299 - - 0.005 CG12659 - 0.007 - Bait CG14117 0.007 0.004 - CG17802 0.039 0 - CG17803 0 0.001 0.13

'D' Derived CG31835 0.007 - -

CG4496 0 0 0 CG9139 0.267 0.074 0.001 ey 0.008 0.005 0.026 grau 0.018 0.001 0.035 149

Myb 0 0 0.003 Pep 0.004 0 0.025 sqz 0 0.001 0.1 Sry-&bgr; 0.005 0.33 0.654 Su(var)2-10 0 0.001 0.025 Gene Name 10mM 20mM 40mM CG10321 0.003 0.001 0.063 CG12299 - - 0.005 CG31510 0.002 0.022 0.035 CG7928 0 0 0.009

chn 0.006 0.031 - cyc 0.004 0.024 - Bait

egg - 0 0.03 Mes4 0.005 - - PHDP 0.025 0.033 0.003 pnr - 0.022 0.009 'F' Ancestral 'F' Rpd3 0.002 0 0.014 Side 0.006 0.112 - Snoo 0.005 - - Xbp1 0.005 0.015 0.075 Z4 0.006 0.008 Gene Name 10mM 20mM 40mM

ac 0.022 0.022 0.007 bap 0.006 0.107 - bi 0.029 0.005 0.167 Blimp-1 0 0 0.006 CG12075 0.048 0.004 0.078 CG12942 0 - 0 CG13897 - 0.022 0 CG14860 - 0.021 0.001

CG15269 - 0.09 0.006 Bait CG15715 - - 0.007 CG17612 0 - CG18446 0 0 0.309 'F' Derived 'F'

CG18764 0.002 0.008 CG31666 0.15 0.003 0.141 CG33213 0 - - CG33557 - 0.009 - CG7271 0.006 0.006 0.022 CG7963 0.006 0.028 0.634 CG9895 0.006 0.051 - croc - - 0.004 dmrt11E - 0.043 0.001

150

drm 0 0.007 0 ewg - - 0.001 exd 0 0.001 0 gem 0 0 0 HLHm&ggr; 0 0 0.001 ind 0.002 0 0 jing 0.032 0.015 0 Kr-h1 - 0.002 0.004 lz 0 0.011 0.188 mod 0.015 0.009 0.075 pdm2 0.004 0.508 0.465 SoxN - - 0.001 sr 0 0 0 Su(var)2-10 0.001 0 0.006 TFAM - 0.022 0.001 wek 0.089 0.004 0.052 wor 0.004 0.002 0.007 Gene Name 10mM 20mM 40mM Aef1 0 0 0.006 0 - - Bgb - 0.022 0.01 Blimp-1 0 0.018 - Camta 0.137 - 0.002 Cenp-C 0.032 0.007 0.007 CG10147 - - 0.003 CG10209 0 0 0.294 CG10654 0.001 0.276 -

CG11695 0 0.008 -

Bait CG12299 - 0.001 0.005

CG13287 - - 0.004 CG15336 0.136 0.015 0.001 CG15455 0.027 0 -

'L' Ancestral CG15696 - 0.054 0.008

CG16801 - - 0.005 CG17612 0.152 0.007 0.133 CG2790 0.008 0.06 - CG3136 - 0.007 0 CG3891 0.008 - - CG5204 0.002 - - CG9895 0.036 0.004 0.01 D - 0.021 0.004 Dll - - 0.01 Doc3 0 0 0.214

151

drm 0 - - exd 0.003 - - Fer2 0.107 0.162 0.001 gl 0.009 - - Hey - - 0.003 lbl 0.003 - 0.003 mor 0 - 0 Oaz 0 0 0.001 salm 0.003 - 0.003 seq 0 0.007 - Side 0.002 0.155 - sqz 0 0 0.006 sr 0 0 0.037 Su(var)3-7 0.052 0.002 0.004 Z4 0 0.099 - Gene Name 10mM 20mM 40mM ac 0 0 0.049 Aef1 0 0 0.194 bab2 - 0.008 - Bgb - 0.001 0 Blimp-1 0 0.002 - btd 0.003 0.624 - CG10348 0 0.071 - CG12236 - 0.007 - CG12299 - 0.004 0.003 CG15336 - - 0 CG15715 0.008 0.021 0

CG16779 0.002 0 0.001 Bait CG17361 0.003 0.026 0.578 CG17612 0.02 0.001 - CG18599 0 0.005 - 'L' Derived

CG3136 0.024 0.004 0 CG9876 - 0.001 - Dif 0.039 0.003 - dom 0.002 0.014 - drm 0 0 0.112 dsx 0.008 0.051 0.132 E5 0.003 0.053 - Eip78C 0.005 0.143 0.326 ey 0.013 0.031 0.009 grh 0 0 0 Hey ¦ dpn 0.001 0.008 - Hr83 - 0.021 0.008

152

nerfin-2 0 0.023 0.224 Neu2 0 0 0 Pcl 0.022 0.007 0.001 Poxn 0 0.02 0.184 run 0 0.368 - Side 0.001 0.31 - Sox14 0.005 - - vvl 0 0.008 -

153

Table 4.7 Transcription factor preys that interacted with control baits in at least one condition with a p-value ≤0.01 * Tests for interaction were done for three different concentrations of 3-Amino-1,2,4-triazole (3AT): 10, 20, and 40 mM. ** Transcription factors with bolded font are those that were found to interact with one non- mutant bait, and those with bolded italicized font are those that were found to interact with more than one non-mutant bait.

Gene Name 10mM 20mM 40mM Bgb - - 0.003 bigmax 0.001 0.007 0.084 Blimp-1 0.03 0.013 0.009 CG12299 - 0.005 0.01 CG17186 0.006 - - CG3136 - 0.018 0.001

CG31955 0.007 0 0.024 CG9305 0.002 0.143 0.038 CG9571 0.002 0.016 0.025 cic 0.009 - 0.039 Control Bait Control Dif ¦ Rel 0.001 0.111 - tan dmrt93B 0.195 0.001 0.074 H 0.014 0.008 0.273 Jra ¦ kay 0.004 0.006 0.119 pfk 0.002 0.001 - Pph13 ¦ ey 0.013 0 0.02 tgo ¦ dys 0.076 0.009 - trh 0.003 0.002 - Gene Name 10mM 20mM 40mM Bgb 0.013 0.001 0.006 Blimp-1 0.052 0.004 0.002 CG1024 0 - - CG10267 0.001 - - Bait CG10565 0 - - CG11641 0 - - CG12299 0.002 0.006 0.036 CG15216 0 0 0 CG30077 0 - - CG32105 0.074 - - 'D' Derived Mutant 'D' Derived CG3281 0.001 - - CG8478 0.01 0.204 0.359 CHES-1-like - 0.007 0.004 d4 0.006 - - 154

gem 0.005 0.076 - Hr38 0.003 - - Mnf 0 0.074 - Rbsn 0 - -

155

Table 4.8 Forward and reverse primer pairs used to PCR amplify yBE0.6 sequences for inclusion as bait sequence Primer Name Sequence yBE Bait 1 Fwd TTCCGggatccCTAAGAAAAAATAGCATTGCATAAATG yBE Bait 1 Rvs TTGCCaagcttCGGATGTCTAAAAACAATTTCAAAGC yBE Bait 2 Fwd TTCCGggatccGGTTACATTAGCTTTGAAATTG yBE Bait 2 Rvs TTGCCaagcttGCATTAATCCTCAATAATCTACAC yBE Bait 3 Fwd TTCCGggatccGAGGTTTTCCTTATTTAAAGTGTAG yBE Bait 3 Rvs TTGCCaagcttGTAAAGTAATACAACCAGCCACACGC yBE Bait 4 Fwd TTCCGggatccCATGTTTATTATGCGTGTGGC yBE Bait 4 Rvs TTGCCaagcttCGCAAGTAAATGTTAAAAATTCG yBE Bait 5 Fwd TTCCGggatccGATATATGCGATCGCTTTCGGTTCG yBE Bait 5 Rvs TTGCCaagcttCGCTTTTAGGTAATTCCACTAGC yBE Bait 6 Fwd TTCCGggatccCCAGTTGATTTTCAGGGACCAC yBE Bait 6 Rvs TTGCCaagcttGTTATTGGCAGGTGATTTTGAGCATAC

156

BIBLIOGRAPHY

Abouheif, E. (2008). Parallelism as the pattern and process of mesoevolution. Evol. Dev. 10, 3–5.

Abouheif, E., Fave, M.-J., Ibarraran-Viniegra, A.S., Lesoway, M.P., Rafiqi, A.M., and Rajakumar, R. (2014). Eco-Evo-Devo: The Time Has Come. In Advances in Experimental Medicine and Biology, C.R. Landry, and N. Aubin-Horth, eds. (Springer), pp. 107–126.

Abràmoff, M.D., Hospitals, I., Magalhães, P.J., and Abràmoff, M. (2004). Image Processing with ImageJ. Biophotonics Int. 11, 36–42.

Amano, T., Sagai, T., Tanabe, H., Mizushina, Y., Nakazawa, H., and Shiroishi, T. (2009). Chromosomal dynamics at the Shh locus: limb bud-specific differential regulation of competence and active transcription. Dev. Cell 16, 47–57.

Arnone, M.I., and Davidson, E.H. (1997). The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864.

Arnosti, D.N., Barolo, S., Levine, M., and Small, S. (1996). The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214.

Arnoult, L., Su, K.F.Y., Manoel, D., Minervino, C., Magriña, J., Gompel, N., and Prud‟homme, B. (2013). Emergence and diversification of fly pigmentation through evolution of a gene regulatory module. (Supplement). Science 339, 1423–1426.

Averof, M., and Patel, N.H. (1997). Crustacean appendage evolution associated with changes in Hox gene expression. Nature 388, 682–686.

Baanannou, A., Mojica-Vazquez, L.H., Darras, G., Couderc, J.-L., Cribbs, D.L., Boube, M., and Bourbon, H.-M. (2013). Drosophila distal-less and Rotund bind a single enhancer ensuring reliable and robust bric-a-brac2 expression in distinct limb morphogenetic fields. PLoS Genet. 9, e1003581.

Baird, G.S., Zacharias, D. a, and Tsien, R.Y. (2000). Biochemistry, mutagenesis, and oligomerization of DsRed, a red fluorescent protein from coral. Proc. Natl. Acad. Sci. U. S. A. 97, 11984–11989.

157

Baker, B.S., and Ridge, K.A. (1980). Sex and the single cell. i. on the action of major loci affecting sex determination in. Genetics 94, 383–423.

Baker, B.S., Hoff, G., Kaufman, T.C., Wolfner, M.F., and Hazelrigg, T. (1991). The doublesex locus of Drosophila melanogaster and its flanking regions: a cytogenetic analysis. Genetics 127, 125–138.

Barmina, O., and Kopp, A. (2007). Sex-specific expression of a HOX gene associated with rapid morphological evolution. Dev. Biol. 311, 277–286.

Barolo, S., Castro, B., and Posakony, J.W. (2004). New Drosophila transgenic reporters: insulated P-element vectors expressing fast-maturing RFP. Biotechniques 36, 436–440, 442.

Bickel, R.D., Kopp, A., and Nuzhdin, S. V (2011). Composite effects of polymorphisms near multiple regulatory elements create a major-effect QTL. PLoS Genet. 7, e1001275.

Bischof, J., Maeda, R.K., Hediger, M., Karch, F., and Basler, K. (2007). An optimized transgenesis system for Drosophila using germ-line-specific phiC31 integrases. Proc. Natl. Acad. Sci. U. S. A. 104, 3312–3317.

Bonn, S., and Furlong, E.E.M. (2008). cis-Regulatory networks during development: a view of Drosophila. Curr. Opin. Genet. Dev. 18, 513–520.

Boyd, J.L., Skove, S.L., Rouanet, J.P., Pilaz, L.-J., Bepler, T., Gordân, R., Wray, G.A., and Silver, D.L. (2015). Human-chimpanzee differences in a FZD8 enhancer alter cell- cycle dynamics in the developing neocortex. Curr. Biol. 25, 772–779.

Calhoun, V.C., and Levine, M. (2003). Long-range enhancer-promoter interactions in the Scr-Antp interval of the Drosophila Antennapedia complex. Proc. Natl. Acad. Sci. U. S. A. 100, 9878–9883.

Calhoun, V.C., Stathopoulos, A., and Levine, M. (2002). Promoter-proximal tethering elements regulate enhancer-promoter specificity in the Drosophila Antennapedia complex. Proc. Natl. Acad. Sci. U. S. A. 99, 9243–9247.

Callaerts, P., Halder, G., and Gehring, W.J. (1997). PAX-6 in development and evolution. Annu. Rev. Neurosci. 20, 483–532.

Calleja, M., Herranz, H., Estella, C., Casal, J., Lawrence, P., Simpson, P., and Morata, G. (2000). Generation of medial and lateral dorsal body domains by the pannier gene of Drosophila. Development 127, 3971–3980.

Camino, E.M., Butts, J.C., Ordway, A., Vellky, J.E., Rebeiz, M., and Williams, T.M. (2015). The Evolutionary Origination and Diversification of a Dimorphic Gene Regulatory Network through Parallel Innovations in cis and trans. PLOS Genet. 11, e1005136. 158

Carroll, S.B. (1995). Homeotic genes and the evolution of arthropods and cordates. Nature 376, 479–485.

Carroll, S.B. (2008). Perspective Evo-Devo and an Expanding Evolutionary Synthesis : A Genetic Theory of Morphological Evolution. Cell 25–36.

Carroll, S.B., Grenier, J.K., and Weatherbee, S.D. (2004). From DNA to diversity.

Carroll, S.B., Grenier, J.K., and Weatherbee, S.D. (2005). From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design (Malden, MA: Blackwell Publishing).

Celniker, S.E., and Lewis, E.B. (1993). Molecular basis of transabdominal--a sexually dimorphic mutant of the bithorax complex of Drosophila. Proc. Natl. Acad. Sci. U. S. A. 90, 1566–1570.

Certel, K., Hudson, A., Carroll, S., and Johnson, W. (2000). Restricted patterning of vestigial expression in Drosophila wing imaginal discs requires synergistic activation by both Mad and the drifter POU domain transcription factor. Development 127, 3173– 3183.

Chan, Y.F., Marks, M.E., Jones, F.C., Villarreal, G., Shapiro, M.D., Brady, S.D., Southwick, A.M., Absher, D.M., Grimwood, J., Schmutz, J., et al. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327, 302–305.

Clark, A.G., Eisen, M.B., Smith, D.R., Bergman, C.M., Oliver, B., Markow, T.A., Kaufman, T.C., Kellis, M., Gelbart, W., Iyer, V.N., et al. (2007). Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218.

Claussnitzer, M., Dankel, S.N., Kim, K.-H., Quon, G., Meuleman, W., Haugen, C., Glunk, V., Sousa, I.S., Beaudry, J.L., Puviindran, V., et al. (2015). FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 373, 150819140043007.

Cline, T.W. (1988). Evidence that sisterless-a and sisterless-b are two of several discrete “numerator elements” of the X/A sex determination signal in Drosophila that switch Sxl between two alternative stable expression states. Genetics 119, 829–862.

Cohn, M.J., and Tickle, C. (1999). Developmental basis of limblessness and axial patterning in snakes. Nature 399, 474–479.

Cooper, K.L., Sears, K.E., Uygur, A., Maier, J., Baczkowski, K.-S., Brosnahan, M., Antczak, D., Skidmore, J. a., and Tabin, C.J. (2014). Patterning and post-patterning modes of evolutionary digit loss in mammals. Nature.

Cretekos, C.J., Wang, Y., Green, E.D., Martin, J.F., Rasweiler, J.J., and Behringer, R.R. (2008). Regulatory divergence modifies limb length between mammals. Genes Dev. 22, 159

141–151.

Crocker, J., Tamori, Y., and Erives, A. (2008). Evolution acts on enhancer organization to fine-tune gradient threshold readouts. PLoS Biol. 6, e263.

Davidson, E.H. (2006a). The Regulatory Genome: Gene Regulatory Networks In Development And Evolution (Burlington, MA: Elsevier Inc.).

Davidson, E.H. (2006b). The Regulatory Genome: Gene Regulatory Networks In Development And Evolution (Burlington, MA: Elsevier Inc.).

Davidson, E.H. (2010). The Regulatory Genome: Gene Regulatory Networks In Development And Evolution (Academic Press).

Davidson, E.H., and Erwin, D.H. (2006). Gene regulatory networks and the evolution of animal body plans. Science 311, 796–800.

Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three- dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403.

Deng, W., Lee, J., Wang, H., Miller, J., Reik, A., Gregory, P.D., Dean, A., and Blobel, G.A. (2012). Controlling Long-Range Genomic Interactions at a Native Locus by Targeted Tethering of a Looping Factor. Cell 149, 1233–1244.

Deplancke, B., Dupuy, D., Vidal, M., and Walhout, A.J.M. (2004). A gateway- compatible yeast one-hybrid system. Genome Res. 14, 2093–2101.

Deplancke, B., Vermeirssen, V., Arda, H.E., Martinez, N.J., and Walhout, A.J.M. (2006). Gateway-compatible yeast one-hybrid screens. CSH Protoc. 2006, pdb.prot4590 – .

Duffy, J.B., and Gergen, J.P. (1991). The Drosophila segmentation gene runt acts as a position-specific numerator element necessary for the uniform expression of the sex- determining gene Sex-lethal. Genes Dev. 5, 2176–2187.

Duncan, I., and Montgomery, G. (2002a). E. B. Lewis and the bithorax complex: part I. Genetics 160, 1265–1272.

Duncan, I., and Montgomery, G. (2002b). E. B. Lewis and the bithorax complex: part II. From cis-trans test to the genetic control of development. Genetics 161, 1–10.

Erives, A., and Levine, M. (2004). Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl. Acad. Sci. U. S. A. 101, 3851–3856.

Frankel, N., Erezyilmaz, D.F., McGregor, A.P., Wang, S., Payre, F., and Stern, D.L. (2011). Morphological evolution caused by many subtle-effect substitutions in regulatory DNA. Nature 474, 598–603.

160

Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., and Dubchak, I. (2004). VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32, W273–W279.

Gebelein, B., Culi, J., Ryoo, H.D., Zhang, W., and Mann, R.S. (2002). Specificity of Distalless repression and limb primordia development by abdominal Hox proteins. Dev. Cell 3, 487–498.

Gebelein, B., McKay, D.J., and Mann, R.S. (2004). Direct integration of Hox and segmentation gene inputs during Drosophila development. Nature 431, 653–659.

Ghiasvand, N.M., Rudolph, D.D., Mashayekhi, M., Brzezinski, J. a, Goldman, D., and Glaser, T. (2011). Deletion of a remote enhancer near ATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat. Neurosci. 14, 578–586.

Gompel, N., Prud‟homme, B., Wittkopp, P.J., Kassner, V.A., and Carroll, S.B. (2005). Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature 433, 481–487.

Groth, A.C., Fish, M., Nusse, R., and Calos, M.P. (2004). Construction of Transgenic Drosophila by Using the Site-Specific Integrase From Phage phiC31. Genetics 166, 1775–1782.

Guerreiro, I., Nunes, A., Woltering, J.M., Casaca, A., Nóvoa, A., Vinagre, T., Hunter, M.E., Duboule, D., and Mallo, M. (2013). Role of a polymorphism in a Hox/Pax- responsive enhancer in the evolution of the vertebrate spine. Proc. Natl. Acad. Sci. U. S. A. 110, 10682–10686.

Halder, G., Callaerts, P., and Gehring, W. (1995). Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. Science (80-. ). 267, 1788–1792.

Hedley, M.L., Amrein, H., and Maniatis, T. (1995). An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor. Proc. Natl. Acad. Sci. U. S. A. 92, 11524–11528.

Hens, K., Feuz, J.-D., Isakova, A., Iagovitina, A., Massouras, A., Bryois, J., Callaerts, P., Celniker, S.E., and Deplancke, B. (2011). Automated protein DNA interactions screening of Drosophila regulatory elements. Nat. Methods 8, 1065–1070.

Jeong, S., Rokas, A., and Carroll, S.B. (2006). Regulation of body pigmentation by the Abdominal-B Hox protein and its gain and loss in Drosophila evolution. Cell 125, 1387– 1399.

Jeong, S., Rebeiz, M., Andolfatto, P., Werner, T., True, J., and Carroll, S.B. (2008). The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793.

Johnson, W.A., and Hirsh, J. (1990). Binding of a Drosophila POU-domain protein to a 161

sequence element regulating gene expression in specific dopaminergic neurons. Nature 343, 467–470.

Junell, A., Uvell, H., Davis, M.M., Edlundh-Rose, E., Antonsson, A., Pick, L., and Engström, Y. (2010). The POU transcription factor Drifter/Ventral veinless regulates expression of Drosophila immune defense genes. Mol. Cell. Biol. 30, 3672–3684.

Kalay, G., and Wittkopp, P.J. (2010). Nomadic enhancers: tissue-specific cis-regulatory elements of yellow have divergent genomic positions among Drosophila species. PLoS Genet. 6, e1001222.

Khila, A., Abouheif, E., and Rowe, L. (2009). Evolution of a novel appendage ground plan in water striders is driven by changes in the Hox gene Ultrabithorax. PLoS Genet. 5, e1000583.

Kimura, M., and Ohta, T. (1974). On some principles governing molecular evolution. Proc. Natl. Acad. Sci. U. S. A. 71, 2848–2852.

Kopp, A. (2006). Basal relationships in the Drosophila melanogaster species group. Mol. Phylogenet. Evol. 39, 787–798.

Kopp, A., and Duncan, I. (2002). Anteroposterior patterning in adult abdominal segments of Drosophila. Dev. Biol. 242, 15–30.

Kopp, A., and True, J.R. (2002). Phylogeny of the Oriental Drosophila melanogaster species group: a multilocus reconstruction. Syst. Biol. 51, 786–805.

Kopp, A., Duncan, I., and Carroll, S.B. (2000). Genetic control and evolution of sexually dimorphic characters in Drosophila. Nature 408, 553–559.

Kopp, A., Graze, R.M., Xu, S., Carroll, S.B., and Nuzhdin, S. V (2003). Quantitative Trait Loci Responsible for Variation in Sexually Dimorphic Traits in Drosophila melanogaster. Genetics 787, 771–787.

Kulzer, J.R., Stitzel, M.L., Morken, M.A., Huyghe, J.R., Fuchsberger, C., Kuusisto, J., Laakso, M., Boehnke, M., Collins, F.S., and Mohlke, K.L. (2014). A Common Functional Regulatory Variant at a Type 2 Diabetes Locus Upregulates ARAP1 Expression in the Pancreatic Beta Cell. Am. J. Hum. Genet. 1–12.

Kvon, E.Z., Kazmar, T., Stampfel, G., Yáñez-Cuna, J.O., Pagani, M., Schernhuber, K., Dickson, B.J., and Stark, A. (2014). Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature advance on.

Lachaise, D., Harry, M., Solignac, M., Lemeunier, F., Bénassi, V., and Cariou, M.L. (2000). Evolutionary novelties in islands: Drosophila santomea, a new melanogaster sister species from São Tomé. Proc. Biol. Sci. 267, 1487–1495.

162

Lagha, M., Bothma, J.P., Esposito, E., Ng, S., Stefanik, L., Tsui, C., Johnston, J., Chen, K., Gilmour, D.S., Zeitlinger, J., et al. (2013). Paused Pol II coordinates tissue morphogenesis in the Drosophila embryo. Cell 153, 976–987.

Lettice, L.A., Heaney, S.J.H., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., Goode, D., Elgar, G., Hill, R.E., and de Graaff, E. (2003). A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.

Liang, H.-L., Nien, C.-Y., Liu, H.-Y., Metzstein, M.M., Kirov, N., and Rushlow, C. (2008). The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila. Nature 456, 400–403.

Li-Kroeger, D., Witt, L., Grimes, H.L., Cook, T.A., and Gebelein, B. (2008). Hox and senseless antagonism functions as a molecular switch to regulate EGF secretion in the Drosophila PNS. Dev. Cell 15, 298–308.

Li-kroeger, D., Witt, L.M., Grimes, H.L., and Cook, T.A. (2009). NIH Public Access. 15, 298–308.

Maniatis, T., Fritsch, E.F., Lauer, J., and Lawn, R.M. (1980). The molecular genetics of human hemoglobins. Annu. Rev. Genet. 14, 145–178.

Maniatis, T., Goodbourn, S., and Fischer, J. (1987). Regulation of inducible and tissue- specific gene expression. Science (80-. ). 236 , 1237–1245.

Mann, R., Joshi, R., and Lelli, K. (2010). Hox Specificity: Unique Roles for Cofactors and Collaborators. 2153.

Mann, R.S., Lelli, K.M., and Joshi, R. (2009). Hox Specificity: Unique Roles for Cofactors and Collaborators. Curr. Top. Dev. Biol. 88, 63–101.

Markow, T.A., and O‟Grady, P.M. (2006). Drosophila: A guide to species identification and use (Academic Press).

Markstein, M., Pitsouli, C., Villalta, C., Celniker, S.E., and Perrimon, N. (2008). Exploiting position effects and the gypsy retrovirus insulator to engineer precisely expressed transgenes. Nat. Genet. 40, 476–483.

Martin, A., and Orgogozo, V. (2013). The Loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution 67, 1235–1250.

McGinnis, W., Levine, M.S., Hafen, E., Kuroiwa, A., and Gehring, W.J. (1984). A conserved DNA sequence in homoeotic genes of the Drosophila Antennapedia and bithorax complexes. Nature 308, 428–433.

Moczek, A.P. (2008). On the origins of novelty in development and evolution. Bioessays 163

30, 432–447.

Mora, C., Tittensor, D.P., Adl, S., Simpson, A.G.B., and Worm, B. (2011). How many species are there on Earth and in the ocean? PLoS Biol. 9, e1001127.

Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K. V, Li, X., Li, H., Kuperwasser, N., Ruda, V.M., et al. (2010). From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719.

Myers, R., Tilly, K., and Maniatis, T. (1986). Fine structure genetic analysis of a beta- globin promoter. Science (80-. ). 232, 613–618.

Noyes, M.B., Christensen, R.G., Wakabayashi, A., Stormo, G.D., Brodsky, M.H., and Wolfe, S. a (2008). Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289.

Ordway, A., Hancuch, K.N., Johnson, W., Wiliams, T.M., and Rebeiz, M. (2014). The expansion of body coloration involves coordinated evolution in cis and trans within the pigmentation regulatory network of drosophila prostipennis. Dev. Biol. 1–10.

Parkash, R., Sharma, V., and Kalra, B. (2008). Climatic adaptations of body melanisation in Drosophila melanogaster from Western Himalayas. Fly (Austin). 2, 111–117.

Pfeiffer, B.D., Jenett, A., Hammonds, A.S., Ngo, T.-T.B., Misra, S., Murphy, C., Scully, A., Carlson, J.W., Wan, K.H., Laverty, T.R., et al. (2008). Tools for neuroanatomy and neurogenetics in Drosophila. Proc. Natl. Acad. Sci. U. S. A. 105, 9715–9720.

Pfreundt, U., James, D.P., Tweedie, S., Wilson, D., Teichmann, S.A., and Adryan, B. (2010). FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database. Nucleic Acids Res. 38, D443–D447.

Prabhakar, S., Visel, A., Akiyama, J. a, Shoukry, M., Lewis, K.D., Holt, A., Plajzer- Frick, I., Morrison, H., Fitzpatrick, D.R., Afzal, V., et al. (2008). Human-specific gain of function in a developmental enhancer. Science 321, 1346–1350.

Prud‟homme, B., Gompel, N., Rokas, A., Kassner, V.A., Williams, T.M., Yeh, S.-D., True, J.R., and Carroll, S.B. (2006). Repeated morphological evolution through cis- regulatory changes in a pleiotropic gene. Nature 440, 1050–1053.

Rajakumar, R., San Mauro, D., Dijkstra, M.B., Huang, M.H., Wheeler, D.E., Hiou-Tim, F., Khila, A., Cournoyea, M., and Abouheif, E. (2012). Ancestral developmental potential facilitates parallel evolution in ants. Science 335, 79–82.

Rebeiz, M., and Posakony, J.W. (2004). GenePalette: a universal software tool for genome sequence visualization and analysis. Dev. Biol. 271, 431–438.

Rebeiz, M., and Williams, T.M. (2011). Experimental Approaches to Evaluate the 164

Contributions of Candidate Cis- regulatory Mutations to Phenotypic Evolution. In Methods in Molecular Biology, V. Orgogozo, and M. V. Rockman, eds. (Totowa, NJ: Humana Press), pp. 351–375.

Rebeiz, M., Pool, J.E., Kassner, V.A., Aquadro, C.F., and Carroll, S.B. (2009). Stepwise modification of a modular enhancer underlies adaptation in a Drosophila population. Science 326, 1663–1667.

Rebeiz, M., Jikomes, N., Kassner, V.A., and Carroll, S.B. (2011). Evolutionary origin of a novel gene expression pattern through co-option of the latent activities of existing regulatory sequences. Proc. Natl. Acad. Sci. U. S. A. 108, 10036–10043.

Richards, S., Liu, Y., Bettencourt, B.R., Hradecky, P., Letovsky, S., Nielsen, R., Thornton, K., Hubisz, M.J., Chen, R., Meisel, R.P., et al. (2005). Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res. 15, 1–18.

Riddle, R.D., Johnson, R.L., Laufer, E., and Tabin, C. (1993). Sonic hedgehog mediates the polarizing activity of the ZPA. Cell 75, 1401–1416.

Rogers, W.A., and Williams, T.M. (2011). Quantitative Comparison of cis-Regulatory Element (CRE) Activities in Transgenic Drosophila melanogaster. J. Vis. Exp. 2–7.

Rogers, W. a, Salomone, J.R., Tacy, D.J., Camino, E.M., Davis, K. a, Rebeiz, M., and Williams, T.M. (2013). Recurrent modification of a conserved cis-regulatory element underlies fruit fly pigmentation diversity. PLoS Genet. 9, e1003740.

Rogers, W. a, Grover, S., Stringer, S.J., Parks, J., Rebeiz, M., and Williams, T.M. (2014). A survey of the trans-regulatory landscape for Drosophila melanogaster abdominal pigmentation. Dev. Biol. 385, 417–432.

Ryner, L.C., Goodwin, S.F., Castrillon, D.H., Anand, A., Villella, A., Baker, B.S., Hall, J.C., Taylor, B.J., and Wasserman, S.A. (1996). Control of Male Sexual Behavior and Sexual Orientation in Drosophila by the fruitless Gene. Cell 87, 1079–1089.

Ryoo, H.D., Marty, T., Casares, F., Affolter, M., and Mann, R.S. (1999). Regulation of Hox target genes by a DNA bound Homothorax/Hox/Extradenticle complex. Development 126, 5137–5148.

Salomone, J.R., Rogers, W. a, Rebeiz, M., and Williams, T.M. (2013). The evolution of Bab paralog expression and abdominal pigmentation among Sophophora fruit fly species. Evol. Dev. 15, 442–457.

Sanchez-Herrero, E., Vernos, I., Marco, R., and Morata, G. (1985). Genetic organization of Drosophila bithorax complex. Nature 313, 108–113.

Sánchez-Herrero, E. (1991). Control of the expression of the bithorax complex genes 165

abdominal-A and abdominal-B by cis-regulatory regions in Drosophila embryos. Development 111, 437–449.

Sanyal, A., Lajoie, B.R., Jain, G., and Dekker, J. (2012). The long-range interaction landscape of gene promoters. Nature 489 , 109–113.

Shirangi, T.R., Dufour, H.D., Williams, T.M., and Carroll, S.B. (2009). Rapid evolution of sex pheromone-producing enzyme expression in Drosophila. PLoS Biol. 7, e1000168.

Simon, J., Chiang, A., and Bender, W. (1992). Ten different Polycomb group genes are required for spatial control of the abdA and AbdB homeotic products. Development 114, 493–505.

Small, S., Blair, A., and Levine, M. (1992). Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057.

Snustad, D.P., and Simmons, M. (2015). Principles of Genetics (Wiley).

Spitz, F., Gonzalez, F., and Duboule, D. (2003). A Global Control Region Defines a Chromosomal Regulatory Landscape Containing the HoxD Cluster. Cell 113, 405–417.

Srivastava, M., Begovic, E., Chapman, J., Putnam, N.H., Hellsten, U., Kawashima, T., Kuo, A., Mitros, T., Salamov, A., Carpenter, M.L., et al. (2008). The Trichoplax genome and the nature of placozoans. Nature 454, 955–960.

Stanojevic, D., Small, S., and Levine, M. (1991). Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo. Science (80-. ). 254, 1385–1387.

Stansbury, M.S., and Moczek, A.P. (2014). The function of Hox and appendage- patterning genes in the development of an evolutionary novelty, the Photuris firefly lantern. Proc Biol Sci. 281.

Stathopoulos, A., and Levine, M. (2005). Genomic regulatory networks and animal development. Dev. Cell 9, 449–462.

Stern, D.L., and Orgogozo, V. (2008). The loci of evolution: how predictable is genetic evolution? Evolution 62, 2155–2177.

Stone, J.R., and Wray, G.A. (2001). Rapid Evolution of cis-Regulatory Sequences via Local Point Mutations. Mol. Biol. Evol. 18, 1764–1770.

Stormo, G.D. (2013). Modeling the specificity of protein-DNA interactions. Quant. Biol. 1, 115–130.

Strack, R.L., Hein, B., Bhattacharyya, D., Hell, S.W., Keenan, R.J., and Glick, B.S. (2009). A rapidly maturing far-red derivative of DsRed-Express2 for whole-cell labeling.

166

Biochemistry 48, 8279–8281.

Swanson, C.I., Evans, N.C., and Barolo, S. (2010). Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. Dev. Cell 18, 359–370.

Swanson, C.I., Schwimmer, D.B., and Barolo, S. (2011). Rapid Evolutionary Rewiring of a Structurally Constrained Eye Enhancer. Curr. Biol. 1–11.

Tanaka, K., Barmina, O., Sanders, L.E., Arbeitman, M.N., and Kopp, A. (2011). Evolution of sex-specific traits through changes in HOX-dependent doublesex expression. PLoS Biol. 9, e1001131.

Tolhuis, B., Palstra, R.-J., Splinter, E., Grosveld, F., and de Laat, W. (2002). Looping and Interaction between Hypersensitive Sites in the Active β-globin Locus. Mol. Cell 10, 1453–1465.

True, J.R., Yeh, S.-D., Hovemann, B.T., Kemme, T., Meinertzhagen, I.A., Edwards, T.N., Liou, S.-R., Han, Q., and Li, J. (2005). Drosophila tan encodes a novel hydrolase required in pigmentation and vision. PLoS Genet. 1, e63.

Walsh, C.M., and Carroll, S.B. (2007). Collaboration between Smads and a Hox protein in target gene repression. Development 134, 3585–3592.

Wasserman, W.W., and Sandelin, A. (2004). Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287.

Weatherbee, S.D., Nijhout, H.F., Grunert, L.W., Halder, G., Galant, R., Selegue, J., and Carroll, S. (1999). Ultrabithorax function in butterfly wings and the evolution of insect wing patterns. Curr. Biol. 9, 109–115.

Wellcome, T., Case, T., and Consortium, C. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678.

Werner, T., Koshikawa, S., Williams, T.M., and Carroll, S.B. (2010). Generation of a novel wing colour pattern by the Wingless morphogen. Nature 464, 1143–1148.

Williams, T.M., Selegue, J.E., Werner, T., Gompel, N., Kopp, A., and Carroll, S.B. (2008). The regulation and evolution of a genetic switch controlling sexually dimorphic traits in Drosophila. Cell 134, 610–623.

Wittkopp, P.J., Vaccaro, K., and Carroll, S.B. (2002a). Evolution of yellow gene regulation and pigmentation in Drosophila. Curr. Biol. 12, 1547–1556.

Wittkopp, P.J., True, J.R., and Carroll, S.B. (2002b). Reciprocal functions of the Drosophila yellow and ebony proteins in the development and evolution of pigment patterns. Development 129, 1849–1858. 167

Wittkopp, P.J., Carroll, S.B., and Kopp, A. (2003). Evolution in black and white: genetic control of pigment patterns in Drosophila. Trends Genet. 19, 495–504.

Wittkopp, P.J., Stewart, E.E., Arnold, L.L., Neidert, A.H., Haerum, B.K., Thompson, E.M., Akhras, S., Smith-Winberry, G., and Shefner, L. (2009). Intraspecific polymorphism to interspecific divergence: genetics of pigmentation in Drosophila. Science 326, 540–544.

Wray, G.A. (2007). The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216.

Wright, T.R. (1987). The genetics of biogenic amine metabolism, sclerotization, and melanization in Drosophila melanogaster. Adv. Genet. 24, 127–222.

Yoder, J.H. (2012). Abdominal segment reduction: development and evolution of a deeply fixed trait. Fly (Austin). 6, 240–245.

Younger-Shepherd, S., Vaessin, H., Bier, E., Jan, L.Y., and Jan, Y.N. (1992). deadpan, an essential pan-neural gene encoding an HLH protein, acts as a denominator in Drosophila sex determination. Cell 70, 911–922.

168

APPENDIX A.

Sequence alignment of yBE0.6 scanning mutagenesis

AscI yBE0.6 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM1 1 GGCGCGCCaT tTtGtTtCcA gGcTgTcGcA gGaGtGaAcG tGcTaAcGgT SM2 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM3 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM4 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM5 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM6 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM7 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM8 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM9 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT SM10 1 GGCGCGCCCT GTGGGTGCAA TGATTTAGAA TGCGGGCAAG GGATCAAGTT

yBE0.6 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM1 51 tAcCaAaTgC gAcGcAcAcA gAtCcTgGCA TAAATGATAT AGAGTCCAAA SM2 51 GAACCACTTC TAAGAAAAcA gAtCcTgGaA gAcAgGcTcT cGcGgCaAcA SM3 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM4 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM5 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM6 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM7 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM8 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM9 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA SM10 51 GAACCACTTC TAAGAAAAAA TAGCATTGCA TAAATGATAT AGAGTCCAAA

yBE0.6 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM1 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM2 101 cAaTcCcCcA cTgCcAgAtC cGgAcTtGgT cCcTgAtCTT TGAAATTGTT SM3 101 AACTACACAA ATTCAATAGC AGTAATGGgT cCcTgAtCgT gGcAcTgGgT SM4 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM5 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM6 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM7 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM8 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM9 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT SM10 101 AACTACACAA ATTCAATAGC AGTAATGGTT ACATTAGCTT TGAAATTGTT

yBE0.6 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM1 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM2 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM3 151 gTgAtAaAgC aGcAtAcAgA cGcTgAcAgT gAcAaGtCcT gCgTgAcTTT

169

SM4 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCcT gCgTgAcTgT SM5 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM6 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM7 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM8 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM9 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT SM10 151 TTTAGACATC CGAAGAAATA AGATTAAATT TAAACGGCAT TCTTTAATTT

yBE0.6 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM1 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM2 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM3 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM4 201 tTcTgTgAcT cTgTgGcGcG tTgTgCaTgA gTgAcAtTtT cGcTgAgTtA SM5 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTtA SM6 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM7 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM8 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM9 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA SM10 201 GTATTTTAAT ATTTTGAGAG GTTTTCCTTA TTTAAAGTGT AGATTATTGA

yBE0.6 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM1 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM2 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM3 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM4 251 tGcTgAcTGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM5 251 tGcTgAcTtC cAcCaAaTgT cTaTtCtGcG tTCGTAAAAa GgAgTgTgAa SM6 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM7 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM8 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM9 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC SM10 251 GGATTAATGC AAACCACTTT ATCTGCGGAG GTCGTAAAAC GTATTTTTAC

yBE0.6 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM1 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM2 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM3 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM4 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM5 301 CaAgTgGaAg GgTgAgTcTG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM6 301 CCATTTGCcT gTTTATTATG aGgGgGtCgG tTgGgAgTcC gTgAaTgAcG SM7 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM8 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM9 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG SM10 301 CCATTTGCAT GTTTATTATG CGTGTGGCTG GTTGTATTAC TTTACTTAAG

yBE0.6 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM1 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM2 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM3 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM4 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM5 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM6 351 gTgTtCcAgT gTgTaTgTcG aAcGaAtGTG CATTTGGGCC AAGAGATATA SM7 351 TTTTGCAATT TTTTCTTTaG aAcGaAtGgG aAgTgGtGaC cAtAtAgAgA SM8 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM9 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA SM10 351 TTTTGCAATT TTTTCTTTAG CAAGCAGGTG CATTTGGGCC AAGAGATATA yBE0.6 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM1 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM2 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA 170

SM3 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM4 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM5 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM6 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM7 401 gGaGcTaGaT gTaGtTgCtA cTgTgtcAaA gTgAaTgGCG GCGATGGTCA SM8 401 TGCGATCGCT TTCGGTTCGA ATTTTTAAaA gTgAaTgGaG tCtAgGtTaA SM9 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA SM10 401 TGCGATCGCT TTCGGTTCGA ATTTTTAACA TTTACTTGCG GCGATGGTCA

yBE0.6 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM1 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM2 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM3 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM4 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM5 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM6 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM7 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG SM8 451 gTcGcGaAgT cCaCcCgTcG tGaAaCaCaA cCcTaCcGgT tAgTgTaAGG SM9 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGgT tAgTgTaAtG SM10 451 TTAGAGCATT ACCCACTTAG GGCACCCCCA ACATCCAGTT GATTTTCAGG

yBE0.6 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM1 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM2 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM3 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM4 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM5 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM6 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM7 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM8 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTTC SM9 501 tAaCcCcAgA gTgTcAcTcA aAtCgAtTtG cAgTcCaTcA cAtCtCgTgC SM10 501 GACCACAATA TTTTAAATAA CAGCTAGTGG AATTACCTAA AAGCGCTTgC

yBE0.6 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM1 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM2 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM3 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM4 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM5 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM6 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM7 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM8 551 GTCCCTTTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM9 551 tTaCaTgTTG AAATTTTATG TAACACTCAA TTATATTTAT GTATATGTAT SM10 551 tTaCaTgTgG cAcTgTgAgG gAcCcCgCcA gTcTcTgTcT tTcTcTtTcT

SbfI yBE0.6 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM1 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM2 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM3 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM4 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM5 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM6 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM7 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM8 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM9 601 GCTCAAAATC ACCTGCCAAT AACCCTGCAG G SM10 601 tCgCcAcAgC cCaTtCaAcT cAaCCTGCAG G

171

APPENDIX B.

Sequence alignment of t_MSE scanning mutagensis

AscI t_MSE 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM1 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM2 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM3 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM4 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM5 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM6 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM7 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM8 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM9 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC t_MSE SM10 1 GGCGCGCCCC ATGGAAGCCG AGCACCTGGT AGAGCCGCAG GTGGAACTGC

t_MSE 43 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM1 51 AGGACCCGAC CCAGATGGCC GCTCATaGgT tAaTtCgCtG cAtTtAcAaC t_MSE SM2 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM3 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM4 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM5 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM6 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM7 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM8 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM9 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC t_MSE SM10 51 AGGACCCGAC CCAGATGGCC GCTCATCGTT GACTGCTCGG AAGTGAAACC

t_MSE 93 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM1 101 aTgAgGtAgG tAaAtGaCgT cTaCgTtCtG aGtAgCgCaC gTgAcAgGtG t_MSE SM2 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM3 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM4 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM5 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM6 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM7 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM8 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM9 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG t_MSE SM10 101 CTTATGGATG GACAGGCCTT ATCCTTGCGG CGGATCTCCC TTTAAATGGG

t_MSE 143 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM1 151 aCcAaAcAaA cCTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM2 151 CaccaAAcaA cCgGcGcAcT aCcTgAtCaT cAaTtAaTgC aTaAcAcAaA t_MSE SM3 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM4 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM5 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA 172

t_MSE SM6 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM7 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM8 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM9 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA t_MSE SM10 151 CCAACAAACA ACTGAGAAAT CCATTAGCCT AACTGACTTC CTCAAAAACA

t_MSE 193 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM1 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM2 201 aTtAtAcCcG gAtTaCcTtT cTtAcTtTcC aAcAaACGAT TTCCGTATTT t_MSE SM3 201 CTGAGAACAG TAGTCCATGT ATGAcTtTcC aAcAaAaGcT gTaCtTcTgT t_MSE SM4 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM5 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM6 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM7 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM8 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM9 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT t_MSE SM10 201 CTGAGAACAG TAGTCCATGT ATGAATGTAC CAAACACGAT TTCCGTATTT

t_MSE 243 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM1 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM2 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM3 251 tAcAgAcTcA gAcAgAcTaA tAcTtTcAcT cTcTgAgAaG gTgTcTcGcT t_MSE SM4 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGCT t_MSE SM5 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM6 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM7 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM8 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM9 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT t_MSE SM10 251 GAAATAATAA TAAATAATCA GAATGTAAAT ATATTATACG TTTTATAGAT

t_MSE 293 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM1 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM2 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM3 301 cGcAgCaAtA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM4 301 cGcAgCcAtA aTgAtGcTcA gTtCcCgAcG gAtTcTcCgT cAcTgCaCcT t_MSE SM5 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM6 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM7 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM8 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM9 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT t_MSE SM10 301 AGAATCAAGA CTTAGGATAA TTGCACTAAG TAGTATACTT AAATTCCCAT

t_MSE 343 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM1 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM2 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM3 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM4 351 gGaCcAtTtA cCaGtTgGtT cTaCaAcGtT tAcGTCAATA ACAAAAATGA t_MSE SM5 351 TGCCAAGTGA ACCGGTTGGT ATAcCaCgGt TacGgCcAgA cCcAcAcTtA t_MSE SM6 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM7 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM8 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM9 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA t_MSE SM10 351 TGCCAAGTGA ACCGGTTGGT ATCCAAAGTT GAAGTCAATA ACAAAAATGA

t_MSE 393 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM1 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA 173

t_MSE SM2 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM3 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM4 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM5 401 tTtCcTgTgA aTaTgGaAaC cTgAtAcTcT gAtAgTgTcG gGgTgAcAgA t_MSE SM6 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAcAgA t_MSE SM7 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM8 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM9 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA t_MSE SM10 401 GTGCATTTTA CTCTTGCACC ATTAGAATAT TAGATTTTAG TGTTTAAATA

t_MSE 443 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM1 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM2 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM3 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM4 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM5 451 cAaTcAgTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM6 451 cAaTcAgTgG cGcAgTaAcG cTaAgAcTcT tCcTcCgAcT gAtAaAtTaT t_MSE SM7 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM8 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM9 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT t_MSE SM10 451 AACTAATTTG AGAATTCAAG ATCATAATAT GCATACTAAT TAGACAGTCT

t_MSE 493 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM1 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM2 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM3 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM4 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM5 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM6 501 aTgTgTgTgA gTcCgTaAcC gAgTaAcAgT gGCGTTTTTA TTACATTATA t_MSE SM7 501 CTTTTTTTTA TTACTTCAAC gAgTaAcAgT gGaGgTgTgA gTcCcTgAgA t_MSE SM8 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM9 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA t_MSE SM10 501 CTTTTTTTTA TTACTTCAAC TATTCAAATT TGCGTTTTTA TTACATTATA

t_MSE 543 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM1 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM2 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM3 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM4 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM5 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM6 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM7 551 cTgTgCcAtT tGgCgTtGgG aTgTaCcAaT tCgAtGcTgG cGgTtAcAcA t_MSE SM8 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTtAcAcA t_MSE SM9 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA t_MSE SM10 551 ATTTTCAAGT GGTCTTGGTG CTTTCCAACT GCTAGGATTG AGTTGAAACA

t_MSE 593 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM1 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM2 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM3 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM4 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM5 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM6 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM7 601 cAgAcATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM8 601 cAgAcAgAcA gAcAaAcAgA cGgTcAaCgT gTtTgTcTgA aTgCgAcCcA t_MSE SM9 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA t_MSE SM10 601 AATAAATAAA TAAACAAATA AGTTAACCTT TTGTTTATTA CTTCTAACAA 174

t_MSE 643 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM1 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM2 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM3 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM4 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM5 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM6 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM7 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT t_MSE SM8 651 aTgGcTgCaT cGcAcTgGcA gAcCcTgTaT TTAAGTGTTT ACAAACATTT t_MSE SM9 651 CTTGATTCCT AGAAATTGcA gAcCcTgTaT gTcAtTtTgT cCcAcCcTgT t_MSE SM10 651 CTTGATTCCT AGAAATTGAA TAACATTTCT TTAAGTGTTT ACAAACATTT

t_MSE 693 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM1 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM2 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM3 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM4 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM5 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM6 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM7 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM8 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACGAAGCTAG t_MSE SM9 701 cTgTcTgTcT aGaTgAcAgC gGcAcAaTtA aTcTaGcCgT cCtAcGaTcG t_MSE SM10 701 ATTTATTTAT CGCTTAAATC TGAAAACTGA CTATCGACTT ACtAcGaTcG

t_MSE 743 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM1 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM2 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM3 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM4 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM5 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM6 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM7 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM8 751 GAAAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM9 751 tAcAAAAAAA GTGCAATAAA TAGATCTTAG ATTAGACGGG GTGTTGGAAC t_MSE SM10 751 tAcAcAcAcA tTtCcAgAcA gAtAgCgTcG cTgAtAaGtG tTtTgGtAcC

t_MSE 793 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM1 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM2 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM3 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM4 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM5 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM6 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM7 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM8 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM9 801 GCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA t_MSE SM10 801 tCCTACAGAT AGGGCCCACC ACTGTACTGG TGATAGCTAC TGGATCCATA

SbfI t_MSE 843 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM1 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM2 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM3 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM4 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM5 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM6 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG 175

t_MSE SM7 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM8 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM9 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG t_MSE SM10 851 GCCCTGAACA TGACCCACGT TGTAGCCTGC AGG

176

APPENDIX C.

Sequence alignment of fine scale t_MSE2 scanning mutagensis

AscI t_MSE2 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM5.1i 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM5.2i 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM5.3i 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM5.4i 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.1 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.2 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.3 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.4 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.5 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6.6 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2SM6 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT

t_MSE2 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM5.1i 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM5.2i 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM5.3i 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM5.4i 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.1 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.2 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.3 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.4 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.5 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6.6 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2SM6 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA

t_MSE2 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM5.1i 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TaCcAcGgTt AcGgCcAgAc t_MSE2SM5.2i 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAcTcA t_MSE2SM5.3i 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM5.4i 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.1 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.2 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.3 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.4 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.5 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6.6 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2SM6 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA

177

SM5 t_MSE2 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM5.1i 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM5.2i 151 aAcAcAgGcG gGaAgTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM5.3i 151 CAAAAATGAG TtCcTgTgAa TaTgGaAaCa TTAGAATATT AGATTTTAGT t_MSE2SM5.4i 151 CAAAAATGAG TGCATTTTAC TCTTGCcCaA gTcGcAgAgT cGcTgTTAGT t_MSE2SM6.1 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT cGcTgTgAtT t_MSE2SM6.2 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM6.3 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM6.4 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM6.5 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM6.6 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2SM6 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT

M1 M2 M3 M4 t_MSE2 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM5.1i 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM5.2i 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM5.3i 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM5.4i 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM6.1 201 tTgTcAcTcA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM6.2 201 GTTTAcAgAc AaTcAgTgGc GcAgTCAAGA TCATAATATG CATACTAATT t_MSE2SM6.3 201 GTTTAAATAA ACTAATTTGA tAcTgCcAtA gCcTcAgAgG CATACTAATT t_MSE2SM6.4 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATcAgAgG aAgAaTcAgT t_MSE2SM6.5 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATg t_MSE2SM6.6 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2SM6 201 GTTTAcAgAc AaTcAgTgGc GcAgTaAcGc TaAgAcTcTt CcTcCgAcTg

M5 t_MSE2 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM5.1i 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM5.2i 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM5.3i 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM5.4i 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.1 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.2 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.3 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.4 251 cGcCAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.5 251 AtAaAtTaTa TgTgTgTgAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2SM6.6 251 AGACAGTCTC TTTTTgTgAg TcCgTaAcCg AgTaAcAgTg GCGTTTTTAT t_MSE2SM6 251 AtAaAtTaTa TgTgTgTgAg TcCgTaAcCg AgTaAcAgTg GCGTTTTTAT

t_MSE2 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM5.1i 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM5.2i 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM5.3i 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM5.4i 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.1 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.2 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.3 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.4 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.5 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6.6 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2SM6 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA

SbfI t_MSE2 351 GTTGAAACAC CTGCAGG t_MSE2SM5.1i 351 GTTGAAACAC CTGCAGG t_MSE2SM5.2i 351 GTTGAAACAC CTGCAGG t_MSE2SM5.3i 351 GTTGAAACAC CTGCAGG t_MSE2SM5.4i 351 GTTGAAACAC CTGCAGG 178

t_MSE2SM6.1 351 GTTGAAACAC CTGCAGG t_MSE2SM6.2 351 GTTGAAACAC CTGCAGG t_MSE2SM6.3 351 GTTGAAACAC CTGCAGG t_MSE2SM6.4 351 GTTGAAACAC CTGCAGG t_MSE2SM6.5 351 GTTGAAACAC CTGCAGG t_MSE2SM6.6 351 GTTGAAACAC CTGCAGG t_MSE2SM6 351 GTTGAAACAC CTGCAGG

179

APPENDIX D.

Sequence alignment of t_MSE2 binding site mutants

t_MSE2 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2 TTAT KO 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2 TTAT+TAAT KO 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2 Hth KO 1 GGCGCGCCTG AAATAATAAT AAATAATCAG AATGTAAATA TATTATACGT t_MSE2 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2 TTAT KO 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2 TTAT+TAAT KO 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2 Hth KO 51 TTTATAGATA GAATCAAGAC TTAGGATAAT TGCACTAAGT AGTATACTTA t_MSE2 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2 TTAT KO 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2 TTAT+TAAT KO101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2 Hth KO 101 AATTCCCATT GCCAAGTGAA CCGGTTGGTA TCCAAAGTTG AAGTCAATAA t_MSE2 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2 TTAT KO 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2 TTAT+TAAT KO151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2 Hth KO 151 CAAAAATGAG TGCATTTTAC TCTTGCACCA TTAGAATATT AGATTTTAGT t_MSE2 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2 TTAT KO 201 GTTTAAcggc ACTAATTTGA GAATTCAAGA TCcggcTATG CATACTAATT t_MSE2 TTAT+TAAT KO201 GTTTAAcggc ACcggcTTGA GAATTCAAGA TCcggcTATG CATACcggcg t_MSE2 Hth KO 201 GTTTAAATAA ACTAATTTGA GAATTCAAGA TCATAATATG CATACTAATT t_MSE2 251 AGACAGTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2 TTAT KO 251 AGACAGTCTC TTTTTTcggc TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2 TTAT+TAAT KO251 gGACAGTCTC TTTTTTcggc cgCTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2 Hth KO 251 CCCCCCTCTC TTTTTTTTAT TACTTCAACT ATTCAAATTT GCGTTTTTAT t_MSE2 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2 TTAT KO 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2 TTAT+TAAT KO301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2 Hth KO 301 TACATTATAA TTTTCAAGTG GTCTTGGTGC TTTCCAACTG CTAGGATTGA t_MSE2 351 GTTGAAACAC CTGCAGG t_MSE2 TTAT KO 351 GTTGAAACAC CTGCAGG t_MSE2 TTAT+TAAT KO351 GTTGAAACAC CTGCAGG t_MSE2 Hth KO 351 GTTGAAACAC CTGCAGG

180

APPENDIX E.

Sequence alignment of yellow Bait seqeunces

yBE0.6 1 CTAAGAAAAA ATAGCATTGC ATAAATGATA TAGAGTCCAA AAACTACACA yB1 1 CTAAGAAAAA ATAGCATTGC ATAAATGATA TAGAGTCCAA AAACTACACA yB2 1 ------yB3 1 ------yB4 1 ------yB5 1 ------yB6 1 ------

2222222222 2222222222 2222222222 0000000000 0000022222 yBE0.6 51 AATTCAATAG CAGTAATGGT TACATTAGCT TTGAAATTGT TTTTAGACAT yB1 51 AATTCAATAG CAGTAATGGT TACATTAGCT TTGAAATTGT TTTTAGACAT yB2 1 ------GGT TACATTAGCT TTGAAATTGT TTTTAGACAT yB3 1 ------yB4 1 ------yB5 1 ------yB6 1 ------

2222222222 2222222888 8888888888 8888888888 8888888555 yBE0.6 101 CCGAAGAAAT AAGATTAAAT TTAAACGGCA TTCTTTAATT TGTATTTTAA yB1 101 CCG------yB2 34 CCGAAGAAAT AAGATTAAAT TTAAACGGCA TTCTTTAATT TGTATTTTAA yB3 1 ------yB4 1 ------yB5 1 ------yB6 1 ------

5552222222 2222222222 2222222222 2222222222 2222222222 yBE0.6 151 TATTTTGAGA GGTTTTCCTT ATTTAAAGTG TAGATTATTG AGGATTAATG yB1 104 ------yB2 84 TATTTTGAGA GGTTTTCCTT ATTTAAAGTG TAGATTATTG AGGATTAATG yB3 1 ------GA GGTTTTCCTT ATTTAAAGTG TAGATTATTG AGGATTAATG yB4 1 ------yB5 1 ------yB6 1 ------

2222222288 8888888888 8888888888 8888884444 4444444444

181

yBE0.6 201 CAAACCACTT TATCTGCGGA GGTCGTAAAA CGTATTTTTA CCCATTTGCA yB1 104 ------yB2 134 C------yB3 43 CAAACCACTT TATCTGCGGA GGTCGTAAAA CGTATTTTTA CCCATTTGCA yB4 1 ------CA yB5 1 ------yB6 1 ------

4222222222 2222222222 2222222222 2222222222 2222222288 yBE0.6 251 TGTTTATTAT GCGTGTGGCT GGTTGTATTA CTTTACTTAA GTTTTGCAAT yB1 104 ------yB2 135 ------yB3 93 TGTTTATTAT GCGTGTGGCT GGTTGTATTA CTTTAC------yB4 3 TGTTTATTAT GCGTGTGGCT GGTTGTATTA CTTTACTTAA GTTTTGCAAT yB5 1 ------yB6 1 ------

8888888888 8888888888 8888833333 3333332222 2222222222 yBE0.6 301 TTTTTCTTTA GCAAGCAGGT GCATTTGGGC CAAGAGATAT ATGCGATCGC yB1 104 ------yB2 135 ------yB3 129 ------yB4 53 TTTTTCTTTA GCAAGCAGGT GCATTTGGGC CAAGAGATAT ATGCGATCGC yB5 1 ------GATAT ATGCGATCGC yB6 1 ------

2222222222 2222222222 2222222222 2222299988 8888888888 yBE0.6 351 TTTCGGTTCG AATTTTTAAC ATTTACTTGC GGCGATGGTC ATTAGAGCAT yB1 104 ------yB2 135 ------yB3 129 ------yB4 103 TTTCGGTTCG AATTTTTAAC ATTTACTTGC G------yB5 16 TTTCGGTTCG AATTTTTAAC ATTTACTTGC GGCGATGGTC ATTAGAGCAT yB6 1 ------

8888888888 8888865555 5555555555 5222222222 2222222222 yBE0.6 401 TACCCACTTA GGGCACCCCC AACATCCAGT TGATTTTCAG GGACCACAAT yB1 104 ------yB2 135 ------yB3 129 ------yB4 134 ------yB5 66 TACCCACTTA GGGCACCCCC AACATCCAGT TGATTTTCAG GGACCACAAT yB6 1 ------CCAGT TGATTTTCAG GGACCACAAT

2222222222 2222222222 2222288888 8888888888 8888888888 yBE0.6 451 ATTTTAAATA ACAGCTAGTG GAATTACCTA AAAGCGCTTT CGTCCCTTTT yB1 104 ------yB2 135 ------yB3 129 ------yB4 134 ------yB5 116 ATTTTAAATA ACAGCTAGTG GAATTACCTA AAAGCg------yB6 26 ATTTTAAATA ACAGCTAGTG GAATTACCTA AAAGCGCTTT CGTCCCTTTT

8888888888 8888888888 8888888888 8888822222 2222222222 yBE0.6 501 GAAATTTTAT GTAACACTCA ATTATATTTA TGTATATGTA TGCTCAAAAT yB1 104 ------182

yB2 135 ------yB3 129 ------yB4 134 ------yB5 152 ------yB6 76 GAAATTTTAT GTAACACTCA ATTATATTTA TGTATATGTA TGCTCAAAAT

2222222222 2222211111 1111111111 1111222222 2222222222 yBE0.6 551 CACCTGCCAA TAAC yB1 104 ------yB2 135 ------yB3 129 ------yB4 134 ------yB5 152 ------yB6 126 CACCTGCCAA TAAC

2222222222 2222

183

APPENDIX F.

Sequence alignment of t_MSE Bait sequences

t_MSE 1 TCAATAACAA AAATGAGTGC ATTTTactCT TGCACCATTA GAATATTAGA tB1 1 TCAATAACAA AAATGAGTGC ATTTT------tB2 1 ------CT TGCACCATTA GAATATTAGA tB3 1 ------TA GAATATTAGA tB4 1 ------A tB5 1 ------tB6 1 ------tB7 1 ------tB8 1 ------tB9 1 ------tB10 1 ------tB11 1 ------

2222222222 2222222222 2222200022 2222222255 5555555559 t_MSE 51 TTTTAGTGTT TAAATAAACT AATTTGAGAA TTCAAGATCA TAATATGTAT tB1 26 ------tB2 23 TTT------tB3 13 TTTTAGTGTT TAA------tB4 2 TTTTAGTGTT TAAATAAACT AATT------tB5 1 ------AAATAAACT AATTTGAGAA T------tB6 1 ------AGAA TTCAAGATCA TAATAT---- tB7 1 ------AGATCA TAATATGCAT tB8 1 ------TGCAT tB9 1 ------tB10 1 ------tB11 1 ------

9995555555 5884444444 4444113333 3111444444 4444452244 t_MSE 101 ACTAATTAGA CAGTCTCTTT TTTTTATTAC TTCAACTATT CAAATTTGCG tB1 26 ------tB2 26 ------tB3 26 ------tB4 26 ------tB5 21 ------tB6 21 ------tB7 17 ACTAATTAG------tB8 6 ACTAATTAGA CAGTCTCTTT ------tB9 1 ------TAGA CAGTCTCTTT TTTTTATTAC T------tB10 1 ------TTT TTTTTATTAC TTCAACTATT CA------tB11 1 ------AC TTCAACTATT CAAATTTGCG

4444448885 5555555888 5555555588 8555555555 5522222222

184

t_MSE 151 TTTt tB1 26 ---- tB2 26 ---- tB3 26 ---- tB4 26 ---- tB5 21 ---- tB6 21 ---- tB7 26 ---- tB8 26 ---- tB9 26 ---- tB10 26 ---- tB11 23 TTT- 2220

185