THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF BIOLOGY

THE ORIGIN AND EVOLUTION OF INVERSION BREAKPOINT SEQUENCES ON THE THIRD OF DROSOPHILA PSEUDOOBSCURA

HALEY ELIZABETH RANDOLPH SPRING 2016

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Biology with honors in Biology

Reviewed and approved* by the following:

Stephen W. Schaeffer Professor of Biology Thesis Supervisor

James H. Marden Professor of Biology Honors Adviser

* Signatures are on file in the Schreyer Honors College. i

ABSTRACT

The chromosomal arrangements of Drosophila pseudoobscura provide an excellent model system to understand the mechanisms that generate novel chromosomal inversions. D. pseudoobscura has over thirty different chromosomal arrangements generated by a series of overlapping paracentric inversions on the third chromosome. Two major models of rearrangement are postulated: repeat-mediated events and staggered-cut single-stranded break events. Analyzing the DNA sequences at breakpoints of both the ancestral and derived arrangements of inversion will likely reveal the apparent mechanism that led to the formation of the chromosomal arrangements. To study the mechanisms of rearrangement, we looked at three different inversion mutations in D. pseudoobscura. The locations of the breakpoint sequences were determined from three kilobase large-insert mate-pair libraries whose reads were generated on an Illumina HiSeq 2000 sequencing platform. Sequences were assembled from short-insert paired-end libraries, and a gene flanking the breakpoint served as a starting point for sequence assembly. We have determined the ancestral and derived breakpoint sequences of the Standard to Pikes Peak , the derived breakpoint sequences of the Santa

Cruz to Tree Line mutation, and the ancestral breakpoint sequences of the Standard to

Arrowhead mutation. Bioinformatics investigation suggests that short, common repeat sequences of about 100 base pairs comprise a main portion of the breakpoint regions. These repeat sequences are enriched within all breakpoint sequences analyzed, suggesting that these shared repeats play a general role in the inversion mutational process. No evidence for boundary gene duplication or transposable element involvement exists in the breakpoint sequences, supporting the repeat-mediated model of chromosomal arrangement formation. ii

TABLE OF CONTENTS

LIST OF FIGURES ...... iii

LIST OF TABLES ...... iv

ACKNOWLEDGEMENTS ...... v

Chapter 1 Introduction ...... 1

1.1 History of Chromosomal Rearrangements in Drosophila ...... 3 1.2 Chromosomal Arrangements in Drosophila pseudoobscura ...... 3 1.3 Phylogeny of the Arrangements ...... 9 1.4 Rearrangement Models ...... 10 1.5 Methodology of Mapping Breakpoint Sequences ...... 14

Chapter 2 Methods ...... 16

2.1 Inversion Breakpoint Discovery and Assembly ...... 17 2.2 Bioinformatics Analysis ...... 21

Chapter 3 Results ...... 22

3.1 Standard to Arrowhead Inversion Mutation ...... 23 3.2 Standard to Pikes Peak Inversion Mutation ...... 25 3.3 Santa Cruz to Tree Line Inversion Mutation ...... 27

Chapter 4 Discussion ...... 30

Appendix A Inversion Breakpoint Sequences ...... 34

A.1 Proximal Ancestral STAR ...... 34 A.2 Distal Ancestral STAR ...... 37 A.3 Proximal Derived STAR ...... 38 A.4 Distal Derived STAR ...... 39 A.5 Proximal Ancestral STPP ...... 39 A.6 Distal Ancestral STPP ...... 42 A.7 Proximal Derived STPP ...... 44 A.8 Distal Derived STPP ...... 47 A.9 Proximal Derived SCTL ...... 50 A.10 Distal Derived SCTL ...... 52

BIBLIOGRAPHY ...... 59 iii

LIST OF FIGURES

Figure 1. Model of inversion mutation that leads to reversal of the sequence contained within the breakpoint regions ...... 1

Figure 2. Clinal distributions of five arrangements found in relatively high frequencies throughout the southwest United States (adapted from ZL Fuller) ...... 5

Figure 3. Banding pattern comparison between Giemsa-stained Standard and Arrowhead arrangements (photomicrographs by SW Schaeffer) ...... 7

Figure 4. Chiricahua/Cuernavaca inversion heterozygote; blue outlines the Cuernavaca homolog, and red outlines the Chiricahua homolog (photomicrograph by SW Schaeffer) ...... 8

Figure 5. Phylogenetic relationships of the chromosomal arrangements: Hypothetical (HY), Standard (ST), Pikes Peak (PP), Arrowhead (AR), Santa Cruz (SC), Cuernavaca (CU), Chiricahua (CH), and Tree Line (TL) ...... 10

Figure 6. Repeat-mediated model of chromosomal rearrangement in which blue arrows represent inverted repeat sequence motifs that can pair and exchange via a crossover event ...... 11

Figure 7. Staggered-cut single-stranded break model of chromosomal rearrangement in which single-strand DNA breaks occur offset from one another in an inversion loop, and subsequent resolution of the inversion loop leads to an inversion mutation ...... 13

Figure 8. Mapping of the STPP inversion breakpoints (a) Method of large-insert mate-pair and short-insert paired-end Pikes Peak library creation and breakpoint discovery (b) Pikes Peak large-insert mate-pair library mapped to Arrowhead reference ...... 18

Figure 9. IGV plots of large-insert mate-pair Pikes Peak reads mapped to the Arrowhead reference ; reads shown in red are those that straddle the breakpoint and thus map >3 Kb from the mate-pair partner (a) Determination of the proximal Pikes Peak breakpoint location (b) Determination of the distal Pikes Peak breakpoint location ...... 19

Figure 10. Standard to Arrowhead inversion mutation represented on a cytological map ...... 23

Figure 11. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity (a) Repeat analysis of the ancestral proximal and distal STAR breakpoints (b) Repeat analysis of the derived proximal and distal STAR breakpoints ...... 25

Figure 12. Standard to Pikes Peak inversion mutation represented on a cytological map ...... 26

Figure 13. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% iv

similarity (a) Repeat analysis of the ancestral proximal and distal STPP breakpoints (b) Repeat analysis of the derived proximal and distal STPP breakpoints ...... 27

Figure 14. Santa Cruz to Tree Line inversion mutation represented on a cytological map ..... 28

Figure 15. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity, repeat blocks in yellow have >70% similarity (a) Repeat analysis of the derived proximal SCTL breakpoint (b) Repeat analysis of the derived distal SCTL breakpoint . 29

v

LIST OF TABLES

Table 1. Collection locations and strains used to generate large-insert mate-pair and short-insert paired-end libraries...... 16

Table 2. Lasergene SeqMan Pro assembly parameters ...... 20

Table 3. Locations in the Arrowhead reference genome of the boundary genes and breakpoint for each breakpoint sequence ...... 22

vi

ACKNOWLEDGEMENTS

Above all, I am immensely grateful to SW Schaeffer for being a phenomenal mentor and role model throughout my time in his lab. I am indebted to him for nurturing my evolutionary insights, forcing me to challenge myself continuously, and encouraging me to broaden my horizons in graduate school. His unwavering support and invaluable advice has truly facilitated my development as a young scientist.

Other members of the Schaeffer lab, including ZL Fuller and ME Lee, were instrumental for their role in sharing data. Additionally, I would like to recognize S Richards at the Human

Genome Sequencing Center at Baylor College of Medicine for generating the mate-pair and paired-end library data that made this project possible. Lastly, I would like to thank my mom,

Elizabeth Randolph, for always supporting my pursuit of scientific knowledge and fostering my inquisitive nature.

This work was supported by grant R01 GM098478 awarded to SW Schaeffer by the

National Institutes of Health and an Undergraduate Summer Discovery Grant (2014) funded through the Office of Undergraduate Education and Eberly College of Science. Financial support for travel to the 55th and 56th Annual Drosophila Research Conferences was provided by the Schreyer Honors College, Eberly College of Science, and Office of Undergraduate

Education. Finally, I would like to acknowledge the donors of the scholarships I have received throughout my undergraduate career for which I am extremely appreciative. 1

Chapter 1

Introduction

Inversion mutations occur when two independent regions of a single chromosome break at locations defined as breakpoints; the region of the chromosome between the breakpoints then rotates 180°, and rejoins the original flanking chromosomal segments (Figure 1). Such an event leads to the reversal of the DNA sequence in the derived chromosome and contributes to within the genome by creating new gene arrangements within populations

(Griffiths et al. 2000).

Figure 1. Model of inversion mutation that leads to reversal of the sequence contained within the breakpoint regions

Two types of inversion mutations exist: paracentric inversions and pericentric inversions.

Pericentric inversions occur when the of the chromosome is included in the inversion 2 with one breakpoint located on each chromosome arm (Kaiser 1984). Paracentric inversions happen when the centromere is not involved in the inversion mutation, and therefore both breakpoints occur on only one chromosomal arm (Madan 1995). Breakpoints of both types of inversion mutations can occur in coding or noncoding regions of the chromosome.

Generally, inversion mutations do not result in a loss of genetic material, and therefore, individuals harboring inversions do not usually display an abnormal phenotype (Griffiths et al.

2000). It is estimated that approximately 2% of humans carry inversion mutations (Griffiths et al. 2000), and inversions on all 23 and both sex have been identified

(Pettenati et al. 2005). Inversion identification in humans likely arises from cases presenting with symptoms due to an unknown mutation causing a gene knockout who later discover that the causative mutation is an inversion (Pettenati et al. 2005), but non-disease-causing inversion polymorphisms are being discovered with the advent of genomic methods (Stefansson et al.

2005). In humans, crossing over within the inverted region of inversion heterozygotes leads to a reduction in recombination frequency and increased infertility due to the production of atypical dicentric and acentric gametes (Brown et al. 1998; Griffiths et al. 2000). In contrast, the negative fitness consequences due to crossing over in paracentric inversion heterozygotes are not seen in Drosophila. Male flies do not undergo crossover during so acentric and dicentric gametes are not formed. Crossing over does occur in female flies, but chromosomal products of recombination end up in polar bodies and are not passed to offspring. Thus, the gamete that ends up in the oocyte is one of the parental arrangements (Sturtevant and Beadle 1936).

It is of interest to understand models for how inversion mutations occur, and we can utilize the structure of the breakpoint sequences to test hypotheses about the mechanisms of inversions. It is difficult to study the mechanism of inversion mutations in humans due to the 3 rarity of inversions and to the difficulty in detecting submicroscopic inversion mutations less than five megabases (Mb) of the chromosome (Feuk 2010). In Drosophila, cytological analysis of polytene chromosomes derived from larval salivary glands allows detection of inversions as small as 0.6 Mb (Painter, 1934; Crumpacker and Kastritsis 1967) making this an ideal system to study mechanisms of chromosomal inversion mutations.

1.1 History of Chromosomal Rearrangements in Drosophila

Inversion mutations are much more abundant, and thus more easily characterized, in the

Drosophila genus. Specifically, paracentric inversions in Drosophila are often polymorphic when chromosomes are surveyed in natural populations; such high levels of polymorphic inversion mutations contribute to variation within Drosophila of the same species, and fixation of these inversions may lead to (Sperlich and Pfriem 1986; Noor et al. 2001; Navarro and Barton 2003). Paracentric inversion mutations often segregate in natural populations of the genus because of the lack of negative fitness consequences. In addition, fixation of chromosomal arrangements can lead to hybrid infertility or species incompatibility, if Muller-

Dobzhansky incompatibility genes, so called “speciation” genes, are fixed along with the inversion (Sperlich and Pfriem 1986; Noor et al. 2001; Navarro and Barton 2003).

1.2 Chromosomal Arrangements in Drosophila pseudoobscura

The chromosomal arrangements in Drosophila pseudoobscura provide an excellent model to study inversion breakpoint sequence evolution and mechanisms that form novel chromosomal arrangements. Dobzhansky and Sturtevant (1938) first discovered these abundant, 4 naturally occurring inversion polymorphisms, predominantly on the third and X-chromosomes, after analysis of D. pseudoobscura salivary gland squashes (Painter 1934).

In the North American distribution of D. pseudoobscura, over 30 different chromosomal arrangements on the third chromosome have been observed, all of which exhibit stable clinal distributions (Dobzhansky 1944). These arrangements are the result of a series of overlapping paracentric inversions (Dobzhansky and Sturtevant 1938), and the names of the arrangements were derived from the location in which the gene arrangement was first discovered. Six arrangements, including Arrowhead, Pikes Peak, Standard, Tree Line, Chiricahua, and

Cuernavaca, are widely distributed and found at relatively high frequencies throughout the geographical area (Figure 2). The formation of stable geographic clines by the polymorphic chromosomal arrangements suggests that they are under significant selective pressure despite extensive gene flow, which is further supported by the fact that certain populations exhibit seasonal cycling, altitudinal clines, and high levels of linkage disequilibrium (Dobzhansky 1948;

Schaeffer and Miller 1993; Schaeffer et al. 2003).

5

Figure 2. Clinal distributions of five arrangements found in relatively high frequencies throughout the southwest United States (adapted from ZL Fuller)

These inversion mutations are relatively recent, arising approximately one to two million years ago (Wallace et al. 2011), in contrast to the much older divergence of D. pseudoobscura and the related species D. melanogaster, which occurred approximately 25 to 30 million years ago (Drosophila 12 Consortium 2007). Consequently, the character of the sequence in the breakpoint may reflect what the arrangement resembled at the time that the inversion appeared.

The first pair of inversion breakpoints mapped in D. pseudoobscura led to the Arrowhead gene arrangement from the Standard chromosome. These breakpoints were unique because they were located in noncoding, intergenic regions of the genome (Richards et al. 2005). Genes flank 6 the breakpoint sequences on either side, but coding regions are not involved in the lesion within the actual inversion breakpoints. In addition, each chromosomal arrangement has a single, independent origin, and thus each arrangement creates a monophyletic cluster when a phylogeny is composed (Aquadro et al. 1991; ZL Fuller and SW Schaeffer, unpublished data).

It has been previously shown that transposable element (TE)-mediated recombination can facilitate chromosomal rearrangements in multiple Drosophila species (Krimbas 1992; Caceres et al. 1999; Mathiopoulos et al. 1999; Casals et al. 2003). TEs are regions of DNA that have the ability to move and shift their position in the genome. Two main types of TEs exist: DNA-based transposon elements and RNA-based retrotransposon elements (Pray 2008). DNA-based transposons move throughout the genome utilizing multiple mechanisms, but at all points during the transposition process they remain as DNA elements. On the other hand, retrotransposons move throughout the genome via RNA intermediates by reverse transcribing DNA elements to

RNA elements. A hallmark of both DNA and RNA-based transposons is the existence of short, inverted repeats at terminal regions of the TE and a gene encoding for transposase. In addition, retrotransposons encode for reverse transcriptase (Pray 2008).

Although TEs are involved in the mechanism for some chromosomal rearrangements, other TE-independent mechanisms of rearrangement have been proposed, including random breakage of DNA sequences (Ohno 1973; Nadeau and Taylor 1984), fragile breakpoints

(Novitski 1946; Pevzner and Tesler 2003), repeat-mediated ectopic recombination of inverted repeat sequences (Richards et al. 2005), and staggered-cut single-stranded DNA breakage followed by exchange (Ranz et al. 2007).

Because the inversion mutations in D. pseudoobscura are paracentric, and thus chromosome arm sizes are conserved, the arrangements can only be visualized through 7 cytological methods via Giemsa staining of polytene chromosomes derived from D. pseudoobscura salivary gland squashes (Painter 1934). Dark bands within a Giemsa-stained chromosome represent transcriptionally repressed regions that are compacted into (Barcia 2007). The order of distinguishing heterochromatin bands is utilized to define a certain chromosomal arrangement and may also be used to infer phylogenies among related species, estimate genetic diversity in a population, and map gene locations (Dobzhansky and Sturtevant 1938; Judd et al. 1972; Ashburner and Lemeunier 1976; Lemeunier and

Ashburner 1976). Therefore, alterations to the order of bands suggest that an inversion mutation has occurred, and each arrangement in D. pseudoobscura has a unique banding pattern (Figure

3).

Figure 3. Banding pattern comparison between Giemsa-stained Standard and Arrowhead arrangements (photomicrographs by SW Schaeffer)

Both inversion heterozygotes and homozygotes exist in natural populations of D. pseudoobscura. Inversion heterozygotes, in which diploids carry two different chromosomal arrangements on each copy of the third chromosome, can be identified by a phenomenon known as inversion looping (Figure 4). Crossover products within the inversion loop generate inviable 8 gametes in meiosis leading to the observation of suppressed recombination (Dobzhansky and

Epling 1948). Inversion loops are created by the pairing of homologous sequences between the inverted chromosomes as they would during meiosis (Griffiths et al. 2000). The inversion looping patterns are characteristic and dependent upon which arrangements the diploid organism harbors, and such patterns can be visualized by Giemsa staining

(Dobzhansky and Sturtevant 1938) (Figure 4). In this analysis, only inversion homozygotes, in which both copies of the third chromosome are identical for the same inversion mutation, were utilized to generate sequence data.

Figure 4. Chiricahua/Cuernavaca inversion heterozygote; blue outlines the Cuernavaca homolog, and red outlines the Chiricahua homolog (photomicrograph by SW Schaeffer) 9 1.3 Phylogeny of the Arrangements

We are currently interested in six arrangements of the third chromosome found in a wide distribution and at relatively high frequencies throughout the southwest United States; these include the Cuernavaca (CU), Tree Line (TL), Chiricahua (CH), Standard (ST), Pikes Peak (PP), and Arrowhead (AR) inversion gene arrangements. The phylogenetic relationships between the arrangements are shown in Figure 5 (Dobzhansky and Sturtevant 1938). All seven of the inversion mutations are derived from a common ancestor referred to as Hypothetical. The

Hypothetical arrangement has not been sampled in nature, but it was previously postulated to exist due to its similarity to the closest ancestor of D. pseudoobscura, D. miranda (Dobzhansky and Sturtevant 1938). The Hypothetical arrangement was confirmed as the ancestral arrangement based on molecular evolutionary analysis (Wallace et al. 2011) and gene adjacency analysis of identified breakpoints (SW Schaeffer, unpublished data). Compared to Hypothetical, all other arrangements are more distantly related to D. miranda. The Santa Cruz and Standard arrangements are derived directly from Hypothetical by one independent inversion mutation each. Santa Cruz is the ancestral arrangement to Chiricahua, Tree Line, and Cuernavaca, although the Santa Cruz rearrangement is not found at high frequencies in natural populations

(Dobzhansky and Sturtevant 1938). Standard is the ancestral arrangement to both Pikes Peak and Arrowhead.

10

Figure 5. Phylogenetic relationships of the chromosomal arrangements: Hypothetical (HY), Standard (ST), Pikes Peak (PP), Arrowhead (AR), Santa Cruz (SC), Cuernavaca (CU), Chiricahua (CH), and Tree Line (TL)

1.4 Rearrangement Models

With the advent of genomic sequencing techniques, the determination of DNA sequences at inversion breakpoints has allowed for an in depth analysis of chromosomal rearrangements and their breakpoints at the molecular level. Since the development of these molecular methods of investigation, two alternative models of chromosomal rearrangement have been proposed: the repeat-mediated model (Richards et al. 2005) and the staggered-cut single-stranded break model

(Ranz et al. 2007). Both models involve repetitive sequence elements within the breakpoint regions; however, the outcome of the models on the derived chromosomal arrangement differs.

The repeat-mediated model was proposed following the identification of the two inversion breakpoints responsible for generating the Arrowhead arrangement from the Standard arrangement in D. pseudoobscura by conserved linkage data and subsequent polymerase chain reaction (PCR) amplification (Richards et al. 2005). Analysis confirms that the proximal and distal Arrowhead breakpoints contain repeat motifs, including 13 copies of a 128 base pair (bp) 11 segment and 3 copies of a 315 bp segment. In addition, the repeat sequences are in reverse orientation relative to each other at both the proximal and distal breakpoints (Richards et al.

2005). The inverted nature of the breakpoint sequences suggests that the mechanism of chromosomal rearrangement involves the pairing of these inverted repeat sequences and subsequent exchange via recombination, leading to the reversal of the DNA sequence contained within the breakpoints (Figure 6).

Figure 6. Repeat-mediated model of chromosomal rearrangement in which blue arrows represent inverted repeat sequence motifs that can pair and exchange via a crossover event

12 An alternative mechanism, known as the staggered-cut single-stranded break model, was proposed after sequence analysis of three different Drosophila species, D. melanogaster, D. simulans, and D. yakuba, suggested that crossover events between inverted repeat sequences were not the most likely mechanism of rearrangement formation (Ranz et al. 2007). Sequences of the breakpoints of 29 inversion mutations that separate the three species of Drosophila were compared by computational and experimental analysis. Of the 29 inversions studied, 17 corresponded with inverted duplications of nonrepetitive sequences, including genes, at locations flanking the breakpoint regions. Within these genomic regions, 22 incomplete or full duplications of genes coding for proteins were characterized (Ranz et al. 2007). Two inversion mutations were found to contain inverted repeat sequences, and thus only these events supported the repeat-mediated model of breakpoint formation. The lack of data supporting inverted repeat sequence involvement led the authors to believe that staggered, single-stranded DNA breaks within were involved in the formation of the inversion mutations due to the presence of inverted duplications of nonrepetitive sequences in derived arrangements (Figure 7).

13

Figure 7. Staggered-cut single-stranded break model of chromosomal rearrangement in which single-strand DNA breaks occur offset from one another in an inversion loop, and subsequent resolution of the inversion loop leads to an inversion mutation

Analyzing the inversion breakpoint sequences will allow us to test the model of chromosomal rearrangement responsible for generating the arrangements on the third chromosome of D. pseudoobscura to determine if there are general features to the process of inversion mutations in this species. If the staggered-cut single-stranded break model is correct, we expect to see an enrichment of inverted duplications of genes or other nonrepetitive elements that immediately flank the breakpoint region in the derived sequence (Ranz et al. 2007). In the ancestral arrangement, the sequences duplicated in the derived arrangement are expected to be unique. Alternatively, a lack of such inverted duplications of coding regions will support the 14 repeat-mediated model of chromosomal rearrangement and suggest that recombination of inverted repetitive sequences plays a major role in inversion mutation formation (Richards et al.

2005).

1.5 Methodology of Mapping Breakpoint Sequences

To identify the location of the inversion breakpoints, large-insert mate-pair libraries were created by circularizing three Kb DNA fragments via ligation to an adaptor that served to join the ends of the DNA. After circularization, the large DNA fragment was sheared into 400 bp fragments, and these fragments were then mapped to the Arrowhead reference genome (SW

Schaeffer, unpublished data). Locations of breakpoints were identified by assaying the reads that mapped at distances greater than three Kb from one another with the assumption that reads could not map greater than three Kb apart unless they were at the boundary of an inversion mutation.

Due to the large genomic gaps and repetitive regions found within the breakpoint sequences, it is difficult to construct the sequences from traditional sequencing methods.

Therefore, paired-end and mate-pair libraries that have the ability to span genomic gaps are ideal for sequencing breakpoints (Korbel 2007; Bashir 2008). Furthermore, while mate-pair libraries are ideal for determining the location of breakpoint sequences, issues can arise when attempting to utilize them for breakpoint sequence assembly. Because mate-pair libraries contain a section of DNA that is artificially ligated, this sequence may be input into the assembly of the breakpoint region if it is read-through, even though it is not a legitimate sequence found within the genome.

To circumvent this issue, short-insert paired-end libraries were used to construct the breakpoint sequences. Paired-end libraries are created similarly to the mate-pair libraries; however, the end 15 reads directly involved in the circularization step are not present in the paired-end library, eliminating the issue of the integration of false read-through reads in the assembly. Ultimately, pure paired-end libraries for each strain were utilized to assemble the breakpoint sequences because these libraries removed the issue of contamination with false read-through sequences, ensuring that the all of the reads incorporated were valid.

16

Chapter 2

Methods

In order to elucidate the inversion breakpoint sequences, high-throughput, Illumina next- generation DNA sequencing was utilized, and both large-insert mate-pair and short-insert paired- end libraries were constructed. Large-insert mate-pair libraries were constructed for four arrangements to determine the location of the breakpoints (Table 1). A mate-pair library for the

Standard arrangement was not constructed because the mate-pair libraries for CH, PP, CU, and

TL reflect all inversion events back to SC and HY, then through ST when reads are mapped to the AR reference genome. Short-insert paired-end libraries were created for three arrangements in order to determine the inversion breakpoint sequences (Table 1).

Table 1. Collection locations and strains used to generate large-insert mate-pair and short-insert paired- end libraries

Large-Insert Mate-Pair Arrangement Strain Location Chiricahua CH_JR4 James Reserve, CA Pikes Peak PP_DM1020 Davis Mountains, TX Cuernavaca CU_SPE123_5-2 San Pablo Etla, Mexico Tree Line TL_MSH76 Mount Saint Helena, CA Short-Insert Paired-End Pikes Peak PP_DM1081 Davis Mountains, TX Standard ST_JR72 James Reserve, CA Tree Line TL_SPE_7-1 San Pablo Etla, Mexico

17 Once the inversion breakpoint sequences were assembled, bioinformatics techniques allowed the sequence analysis of the proximal and distal regions of the ancestral and derived breakpoint sequences. Repeat analysis and BLAST analysis were used as the primary means to study the constructed sequences.

2.1 Inversion Breakpoint Discovery and Assembly

Initial attempts to amplify the breakpoint sequences using polymerase chain reaction

(PCR) analysis failed. Breakpoint regions that were identified with large-insert mate-pair libraries often straddled gaps in the genome assembly where the gaps were of unknown length.

We suspected that PCR amplification of the breakpoint regions in numerous strains of purified

D. pseudoobscura DNA may have failed because the distance between amplification primers was too large. Ultimately, we determined that PCR would not be a suitable way to ascertain the breakpoint sequences because the DNA sequences composing the breakpoint regions are highly repetitive and contain large genomic gaps. Bioinformatics methods became the primary tool to assemble the breakpoints and analyze the inversion breakpoint sequences.

Initially, three Kb large-insert mate-pair libraries of the different strains of interest were generated utilizing next-generation sequencing tools on an Illumina HiSeq 2000 sequencing platform (Figure 8). First, three Kb DNA fragments were created via shearing of genomic DNA; biotin junction adaptors were then attached to the ends of these DNA fragments. The fragments were circularized through ligation of the biotin junction adaptors to one another, and the circularized DNA was fragmented into 400 base pair segments that were then sequenced. The large-insert reads generated were mapped to the Arrowhead reference genome sequence MV2- 18 25, which originates from Mesa Verde, Colorado, using the Burrows-Wheeler Aligner (BWA)

(Li and Durbin 2009).

Figure 8. Mapping of the STPP inversion breakpoints (a) Method of large-insert mate-pair and short-insert paired-end Pikes Peak library creation and breakpoint discovery (b) Pikes Peak large-insert mate-pair library mapped to Arrowhead reference

Once mapped to the reference genome, Integrated Genomics Viewer (IGV) plots were

created, and the breakpoint regions were identified by assaying locations in the genome in which

corresponding paired-end reads mapped greater than three Kb from each other (Figure 9). In the

absence of an inversion mutation, paired-end reads are not expected to map greater than three Kb 19 away from one another, so paired-end reads mapping greater than this distance suggest that these reads flank the breakpoint.

Figure 9 . IGV plots of large-insert mate-pair Pikes Peak reads mapped to the Arrowhead reference genome; reads shown in red are those that straddle the breakpoint and thus map >3 Kb from the mate-pair partner (a) Determination of the proximal Pikes Peak breakpoint location (b) Determination of the distal Pikes Peak breakpoint location

Short-insert paired-end libraries were created from genome strains of each arrangement, and these were utilized to assemble the proximal and distal sequences of numerous inversion 20 breakpoints. Paired-end libraries were generated with the goal of capturing the inversion breakpoints between the locations to which the paired-end reads mapped in the reference genome in order to circumvent the issue of genomic gaps within the breakpoint sequences. Paired-end reads were identified with a BLAST search of the sequence library starting with a query sequence from a gene flanking the breakpoint region. The reads returned from the query search were then assembled into a contig utilizing SeqMan Pro from the Lasergene suite of programs available from DNAStar. Throughout the breakpoint sequence assembly, reads were pair- specified and specific parameters for assembly were defined (Table 2).

Table 2. Lasergene SeqMan Pro assembly parameters

Match Size 25 Minimum Match 99 Percentage Match Spacing 50 Minimum Sequence 75 Length Gap Penalty 0.00 Gap Length 0.70 Penalty Maximum Mismatch 15 End Bases

Extremely stringent match parameters were applied (> 99.01% match) in order to ensure that only reads differing by, at most, a single mismatch from the contig would be incorporated into the assembly. The goal of this restriction was to increase the probability that we would exclude misleading reads from the contig; one mismatch was allowed to account for the possibility of polymorphisms and Illumina sequencing errors. 21 2.2 Bioinformatics Analysis

Once the breakpoint sequences were assembled, analysis of the sequences was performed using MegAlign, another program from the Lasergene suite of DNAStar packages. Proximal and distal sequences of ancestral and derived arrangements were aligned using the One Pair DotPlot method with a percent match parameter of greater than 60%. Similarity indices of greater than

70% between proximal ancestral, proximal derived, distal ancestral, and distal derived sequences were noted and compared to identify repetitive regions shared between the different breakpoint sequences. Size of the common repetitive region was also determined.

In addition, repeat maps, in which a histogram represents how often bases in the region match other sequences in the genome, were generated. The repeat maps served as a second method to quantify the amount of common repetitive regions in the proximal and distal regions of ancestral and derived breakpoint sequences for each arrangement. Finally, the MegAlign analysis was overlaid on the repeat maps in order to confirm the location of repetitive regions in the inversion breakpoint sequences. BLAST analysis of the inversion breakpoint sequences was utilized to determine whether any markers of TEs, such as DNA elements coding for transposase or reverse transcriptase, were present in the breakpoint regions.

22

Chapter 3

Results

The proximal and distal breakpoints of three inversion mutations were mapped to the

Arrowhead reference genome. The derived proximal and distal Standard to Arrowhead (STAR) breakpoint sequences were previously determined by Richards et al. (2005) and are utilized in this analysis. Proximal and distal breakpoints were assembled for the ancestral Standard to Pikes

Peak (STPP) mutation, derived STPP mutation, ancestral STAR mutation, and derived Santa

Cruz to Tree Line (SCTL) mutation from appropriate arrangement databases (Table 3). In the case of the SCTL inversion mutation, the proximal and distal breakpoint sequences were determined with only the derived Tree Line database because the laboratory does not have a

Santa Cruz database.

Table 3. Locations in the Arrowhead reference genome of the boundary genes and breakpoint for each breakpoint sequence

3’ Coordinate 5’ Coordinate Breakpoint Breakpoint (upstream gene) (downstream gene) Coordinate pSTAR 8,902,898 8,905,949 8,904,211

dSTAR 14,812,842 14,833,781 14,832,381

pSTPP 2,500,358 2,505,754 2,504,966

dSTPP 9,162,237 9,164,938 9,163,680

pSCTL 10,622,881 10,628,202 10,626,174

dSCTL 17,149,432 17,153,461 17,151,447 23 Multiple regions within all of the inversion breakpoints were associated with a much

higher read coverage due to repeat sequences present in the regions, and these repeat sequences

greatly complicated breakpoint assembly. All constructed sequences can be found in Appendix

A. Repeat analysis and BLAST searches were performed on all of the assembled breakpoint

sequences.

3.1 Standard to Arrowhead Inversion Mutation

Considering chromosome cytology, the proximal breakpoint of the STAR inversion

mutation occurs at the 70A/76B junction, while the distal breakpoint occurs at the 70B/76B

junction (Figure 10). Originally described by Dobzhansky and Sturtevant (1938), these

alphanumerical sites designating the breakpoints originate from positions on a cytological map,

which were defined by a numbered section and a lettered subsection in the cytology. The

upstream gene flanking the proximal breakpoint is GA24334, and the downstream gene flanking

the proximal breakpoint is GA19155. The upstream gene flanking the distal breakpoint is

GA17716, and the downstream gene flanking the distal breakpoint is GA17559. Derived

proximal and distal breakpoint sequences of the STAR arrangement were previously determined

by Richards et al. (2005) and utilized in this analysis.

Figure 10. Standard to Arrowhead inversion mutation represented on a cytological map

24 Both repeat-map analysis and MegAlign analysis confirmed the existence and location of highly repetitive regions in the proximal and distal breakpoint sequences (Figure 11). Repeat- map analysis of the STAR breakpoints displayed that the ancestral and derived proximal and distal breakpoints contain large stretches of DNA with varying levels of repetition dependent upon the region under consideration. These repeats were first defined in the Arrowhead reference sequence, and repeat analysis similarly defines them in the ancestral Standard arrangement. In all cases, the length of the repeat block is greater than 1.5 Kb. However, the magnitude of repetition within each breakpoint sequence, location of repeats in the intergenic region containing the breakpoint, and the pattern of the repeat stretch is unique when defining the repeats in the inversion breakpoint sequences in each of the arrangements.

MegAlign repeat analysis identified multiple repetitive regions contained within the proximal breakpoint sequence with high sequence similarity (>60%) to repeats found in the distal sequence (Figure 11). In addition to identifying highly similar repeat sequences found in both the proximal and distal breakpoints, such repeats were also found when comparing the

STAR breakpoints to both the Standard to Pikes Peak (STPP) and Santa Cruz to Tree Line

(SCTL) inversion breakpoints. BLAST analysis reveals no sequence similarity to any known characteristic elements of TEs, such as DNA regions coding for transposase or reverse transcriptase, or sequences known to be present in TE inverted terminal repeats.

25

Figure 11. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity (a) Repeat analysis of the ancestral proximal and distal STAR breakpoints (b) Repeat analysis of the derived proximal and distal STAR breakpoints

3.2 Standard to Pikes Peak Inversion Mutation

The proximal STPP inversion breakpoint occurs at the 65B/65C junction, while the distal

STPP occurs at the 76B/76A junction (Figure 12). The proximal STPP breakpoint is particularly

interesting because mapping of the breakpoint sequences show that this breakpoint is reused as

the distal Hypothetical to Standard (HYST) breakpoint. The distal HYST breakpoint was

mapped using the mate-pair libraries of the TL, CH, and CU strains, which all have the HYST

arrangement in common, making it possible to utilize these three arrangements to locate the

HYST breakpoints. The proximal STPP breakpoint is flanked by the upstream gene GA13487 26 and the downstream gene GA13457, while the distal STPP breakpoint is flanked by the upstream

gene GA22082 and the downstream gene GA20777.

Figure 12. Standard to Pikes Peak inversion mutation represented on a cytological map

Again, repeat-map analysis and MegAlign analysis confirmed the presence and location

of repeat regions in the breakpoint sequences (Figure 13). Repeat-map analysis of the STPP

proximal and distal breakpoints showed a similar structure to the STAR proximal and distal

breakpoint sequences, in that the ancestral and derived breakpoints exhibit sequences of

approximately 1 Kb or greater with varying levels of repetition. In addition, the magnitude of

sequence repetition within each breakpoint sequence and location of repeats in the breakpoint

sequence are unique to each breakpoint, which also corresponds with the pattern seen in the

STAR inversion mutation.

MegAlign analysis again showed similar repetitive sequences shared between both the

proximal and distal breakpoint sequences in the ancestral and derived arrangements (Figure 13).

Repeat elements with high sequence similarity, of up to 97.6%, were not only shared between

proximal and distal breakpoints in the ancestral and derived arrangement, but also between

independent inversion mutations, including the breakpoint sequences of the STAR and SCTL

mutations. BLAST analysis of the derived proximal, derived distal, and ancestral proximal

breakpoint sequences share no sequences with any known TEs. Within the ancestral distal 27 breakpoint sequence, a 244 bp match with 91.4% sequence similarity to the D. pseudoobscura

transposon Galileo was identified. The breakpoint region corresponding to the Galileo sequence

(bp 1 to 244) is located within the inverted terminal repeat of the transposon (bp 1 to 728) and

not within the transposase of Galileo (Marzo 2008).

Figure 13. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity (a) Repeat analysis of the ancestral proximal and distal STPP breakpoints (b) Repeat analysis of the derived proximal and distal STPP breakpoints

3.3 Santa Cruz to Tree Line Inversion Mutation

The proximal SCTL inversion mutation occurs at the 74C/74B junction, while the distal

SCTL inversion mutation is located at the 79B/79A junction (Figure 14). The upstream gene

flanking the proximal breakpoint is GA19582, while the distal gene flanking the proximal 28 breakpoint is GA14679. The upstream gene GA13539 and the downstream gene GA24132 flank

the distal breakpoint.

Figure 14. Santa Cruz to Tree Line inversion mutation represented on a cytological map

The proximal and distal breakpoints of the derived SCTL arrangement were constructed

and analyzed; ancestral proximal and distal SCTL breakpoints could not be determined due to

the lack of a Santa Cruz database. Again, repeat-map analysis and MegAlign analysis were used

to identify the repeat regions within the breakpoint sequences (Figure 15). Repeat-map analysis

of the SCTL breakpoints displayed that the derived proximal and distal sequences have repeat

regions with varying levels of repetition throughout the breakpoints. In addition, both the

proximal and distal breakpoints have a major repeat block of approximately 1 Kb. Like the

STAR and STPP inversion mutations, the repeat sequences in the SCTL inversion fluctuate in

magnitude and location within the breakpoint region. MegAlign analysis showed that shared

repeat sequences exist within the proximal and distal breakpoints, and confirmed that common

sequences are found in both the STAR and STPP inversion breakpoint sequences. Again,

BLAST analysis displayed no known TE structures within the inversion breakpoints.

29

Figure 15. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity, repeat blocks in yellow have >70% similarity (a) Repeat analysis of the derived proximal SCTL breakpoint (b) Repeat analysis of the derived distal SCTL breakpoint 30

Chapter 4

Discussion

It is of interest to determine the mechanism of novel chromosomal arrangement formation in D. pseudoobscura because rearrangement mechanisms differ across Drosophila species (Casals et al. 2003; Richards et al. 2005; Ranz et al. 2007). Furthermore, the arrangement mechanism in D. pseudoobscura has not previously been characterized in multiple arrangements via inversion breakpoint sequence construction followed by sequence comparison

(Richards et al. 2005). Ultimately, analyzing the inversion breakpoint sequences generated from libraries of various D. pseudoobscura arrangements at the DNA level will provide insight into the molecular mechanism leading to chromosomal rearrangement.

A 1 to 1.5 Kb stretch of repetitive DNA, composed of small repeats on the order of approximately 100 bp, was present in all of the breakpoint sequences constructed. Due to the heavily repetitive nature of the breakpoint sequences, breakpoint assembly presented a challenge even when considering the extremely stringent assembly parameters. In addition, single nucleotide polymorphisms (SNPs) complicated the assembly further. Although each inversion breakpoint contained a heavily repetitive region, the magnitude of the repeats in the region differed greatly, both within proximal and distal breakpoints of the ancestral and derived sequences considering the same inversion mutation and within independent inversion mutations.

Furthermore, the location of the repeat stretch varied in the inversion breakpoint regions of each arrangement, as some were located significantly closer to the proximal or distal breakpoint. 31 Differences in the extent of repetition and location of repeat regions within the breakpoint sequences suggest that the repeats are not identical; however, similarity analysis of the sequences shows that the repetitive regions within the breakpoints are very similar to one another.

Proximal and distal inversion breakpoints within the ancestral and derived sequences of the same arrangements share similar sequences, and related sequences are seen between proximal and distal breakpoint sequences of independent inversion mutations. This suggests that these common repeats may play a significant role in the inversion mutational process.

Furthermore, we have evidence that these repeats exist not only at known breakpoint regions, but also at regions throughout the genome (Richards et al. 2005). It is possible that areas of the genome in which these sequences arise and fix create sites of instability and may be locations of future breakpoint development. It would be of interest to determine the sequence composition of the repeats and analyze the D. pseudoobscura genome, and the genomes of other

Drosophila species, to identify whether there are other areas in the genome that harbor similar repeat sequences.

Defining the location of the inversion breakpoint sequences led to the discovery that a breakpoint sequence was reused in the case of the proximal HYST and distal STPP inversion breakpoint. These data support the idea that the repeat motifs associated with the D. pseudoobscura breakpoints are the fragile sites that allow double-strand breaks to form and rejoin to create inversions. Within the Drosophila genus, breakpoint reuse is relatively common

(Bhutkar et al. 2008; Orengo 2015), and the present findings provide evidence for breakpoint reuse within the D. pseudoobscura species. The first case of breakpoint reuse was identified in the order Diptera, which encompasses many families of insects, in 2007 (Ranz et al. 2007).

Through a breakpoint sequence comparison analysis between D. simulans, D. melanogaster, and 32 D. yakuba, a novel, second inversion mutation leading to the D. yakuba lineage was found to involve the reuse of a breakpoint (Ranz et al. 2007).

Although the inversion breakpoint sequences appear to be highly repetitive, there is no evidence that DNA or RNA-based TEs are involved in the mutational process, as known functional TE sequences within and around the inversion are absent, and no sequences coding for reverse transcriptase or transposase are present. The partial sequence match to the Galileo transposon in the ancestral distal STPP arrangement does not suggest that a TE is involved in the mechanism of chromosomal arrangement because the shared region is located within the nonfunctional TE inverted terminal repeat (Marzo 2008). No evidence of transposase or reverse transcriptase coding regions is present in the breakpoints, indicating that no TE sequences are actively involved in the inversion mutational process. The monophyletic clustering of arrangements in the D. pseudoobscura lineage (ZL Fuller and SW Schaeffer, unpublished data) also suggests that TEs are likely not involved, as a TE-dependent mechanism could lead to the generation of multiple origins for each chromosomal arrangement. However, it is possible that

TE-mediated inversion mutations could occur in a monophyletic fashion, so the most definitive evidence supporting a TE-independent mechanism of inversion mutation in D. pseudoobscura is the lack of functional TE sequences in the breakpoints studied thus far.

Considering the mechanism of chromosomal rearrangement, all of the inversion breakpoint sequences constructed support the repeat-mediated model of rearrangement (Richards et al. 2005). Proximal and distal inversion breakpoint sequences of the STAR, STPP, and SCTL arrangements contain inverted repeat regions and lack duplications of the genes flanking the inversion mutation. It is possible, although unlikely, that the mechanism of chromosomal rearrangement could occur via the staggered-cut single-stranded break model (Ranz et al. 2007) 33 if the staggered cuts within the breakpoint sequences leading to the rearrangement only duplicate small repeat sequences and fail to duplicate boundary genes. To completely rule this out, copy number variants of repetitive and nonrepetitive noncoding regions within the breakpoint sequences need to be determined.

Proximal and distal breakpoint sequences for the ancestral SCTL, Santa Cruz to

Chiricahua (SCCH), and Santa Cruz to Cuernavaca (SCCU) will need to be constructed before the current analysis can be extended. The creation of a Santa Cruz database will allow for the determination of the ancestral SCTL proximal and distal breakpoint sequences, which can then be compared to the breakpoint sequences constructed from the derived Tree Line database. Once these sequences are assembled, the repeat analysis previously described can be applied to the newly constructed sequences. Constructing and analyzing the complete set of inversion breakpoint sequences of both phylogenetic branches leading away from the ancestral

Hypothetical arrangement (Figure 5) in all of the derived lineages will allow for a more complete picture of inversion breakpoint sequence evolution on the third chromosome. Finally, breakpoint sequence composition analysis characterizing the types of repeats within the breakpoint regions is of great interest and will provide further insight into the molecular structure of the breakpoint sequences and the mechanism of inversion mutation. 34

Appendix A

Inversion Breakpoint Sequences

All sequences are represented in the 5’ to 3’ direction.

A.1 Proximal Ancestral STAR

GAGAGTGCAGGCAGCTTTGCTGCTCAGGCGTGACAGGCAGAGTGCTGCTGCAGCAG AGGTGCAACTGTAGGATGACAGACGACTCTCCTGAACCTCGTTTTGGGTTAATTCTC GTGTAATTTTGGTTCTCCTTATTAATTAGCAAATGTTTGCTTGCCATATGGAATGCAT TAGGCATTCACATTGAAATTGTTTAAAAAATTGCATTTAATTTTGTAAATTTTAAGTT GTGATGTACCTGAAATATGATTTCTCTTAAAACAAACAAAAAATTACTTAACAAAAC ACGATTCATAGAACGAAATGGGCGCAAAAATTCGATACGCAGCCGCATCTCAAATG AATTTCATTATAATTATTATCATTATCATTATCTTAGTTGTATAATTGAGCTCTCAAT CTATCATAGAAACACACACACACACACACACCAACAAACACTCAACACATACTCAC GAGAACACGGATATTTAAAGATATACATAGTTATATGTGAGCATTTATCAAATAATT CAACCCAATATATGATTTATTGAAATGTGAAGTAAAAGCTTTAATTTGCCATTTACA AAGCTTCAGTTTAGTAACGCGAAATGATTTGCCATTTATTTACAATAACTTAGATTGT ACTTTTCTAGTTGCCTTAGTTTATTGTATAGTTTATGAACCCCGCGCACACGCAAACC AAACCTCATTGTCGTACCCTTTGCCTTATTTCTTAAACGTAAGACTTGATCGTGAAAA TACATTTAGTTTTATTTACTTTAGCCATTTACTTTAGAGTAGCAGTCACGCCCACAAA TGCCCACGTCAAACATTCTCATAAGGAACATTTTAAGCATATTGAAAAGAAAACAA CAAAACAAAAAGTATATAAATATTAACGAAAGAAACAAAAGCTAACTATATTATTG TACGTGTAGTGTACTTATGATTTTATCAAAGGAAAAGCTAAAGGAAAGCACGAAAG GAAAAAGAAACACAAGAACAACATACAAAACAAATACTCAAAGAACATGAATATA AAACCATACATACAACTTTTTACCACACACTGAACAAACAAGAAATATTTGACTGAA AAGTGTCGGGGGAAAATATGATATATATAGTATTATTATATCATATGAATGGAAAAC ATAACGAATAAATGGTATAAACAATTTTAGTGTACAATGTGTGTGTTTTTGCTAAAT GTGGTAAACTTAAACGTAAATCTATTACACGAAACTAAAACTAAACTAAACTAAAA ATAAAAACGTAAAAACGAGGAGAAATTCTTAAGTAAAGTAAATTTTTGGATGACAA GCCATTTAGACAAAATTTTTGAGACGAAAAAAGACATACGGGGAATGAAACAAAAC AAAACAAAAAGCAAAACCCAACAAATTACTTTAAATTTTATTTAAGCAAAAATTTGC AAAAAACAAAACCAACAAAAAATGGAAAAATAAAAAATATATTACTGTAATGTGA GTGTTTTTAGTTTAAAAATTATATATTTCCAAAACATTCAAAATAAAAGATAATTAG TTTAAAATATAAAATATATAGACTTTTGAATCGTAGTATATGGCAGAGGCCCTTCTG 35 TTTCTGGTAATTTGAGATTGAAAGATCCATTGATCTTGAATAAAAAAGACGCATTTC TTCAGTTAGTTAGCCCTTTTTATTCAAGCACACATATTCCATTAAAAACTAATTCCTT ACACCTGTAGTAGTCGAATTTTCAGTTGGCATTAGCGTGCATCGAAGACAACCACTG AAAAGAAAAACGAAAGATTCCATTAGCCTCTTGCCTGCCTACAGTTTGACAGTAATT GGAGGCTCCCAGCCAGACGCCGGGAGGACTACTTACGACGCTTGGTGCGCTTCTTGT ACATGATGCGGTAGCCGCACTCCCGGCAACGTATGGGGTCACGTGGTCGCATCTCAT TCTCATGATGGCATTCTAGAAAGAGTAGTGCAATTGTTTATCAAATTAACAAATTGT ACATCAACTTCGAGGCCATGTGCGTGGACAAAAGGCCGTCTTGTTGACCTACACGTT CTAGCCGACTTACCGCCACAAATATAAGTCATTGCTGTCTTAACATTTTCCTTGGAGC TGGTCTCGGACATTTTTAGTTAATAAATTAACAAATTAACAAAATTCGCGTAATTTT ACAGAAAAAAAAATTTGTTTTGTTTTGTTTTGCTATCCTCAGAAATATACCGAAATA TGCGGTCTCATTTTAGAAATATACCGAAATATACGGTCGCATTGGAGAAATATACCG AAATATACGGTCGCATTGGAGAAATATACCGAAATATACGGTCTCATTTGAAAAAT ATACCGTAAATATACTGAAAAATTCAAGTTCTATTATTCATATTCCCCGTTTTTCATA TTCCGACGAATATTACTAGCTAAATAGAGCATTTAGCCGTGCCCACATAATTTTAGC CCATTGATGAATATATTTTCTACTTAACTGGCTTATTTTTATTACTTGCTTTTATTGTA TTTTGACTAAAACAACGTTTAAACAAAATAGTTCAACAAAAAAAACAGTCGCAATG TTGCTGGTATTTGATGACATGATGCGTGTATAGGCCTTTGTCCACCCTATTTCGCTAA TTTTATATGTAGATCAAACGCAATTTATTAAGACCTTTTCTCAAATAGATATGTTTTT GACTTATTCATGTGTATTATTAGTATCTTGTATAAATTTATGTCGATCCTGATACCTA CTGAGCGTTGGAAGACAAACGTATTCACAGTTCTATAACATCCATTCCCCCAGGGTA TGTGTGCGTGGAATGGGACTCATTGTTCTGACGACTAATTCTTGGTTAATCTTTCTTC CGTCGAATATTCTCAGCTATATAGATCTCTCAGTTTTGCCCAGACAATTTTATATGAT AGCTGAATCAATTTTGATCAAGATTAACTATTTTTAAGTACTTGCATGTTTTTTATTT TGACAGAAACACGATTTCATCAAAAATGTTCAATGTCGCAGGTTGTGACGTCAATGT TGCTGGTATTTGCAGACTCATTTGTTGTCAACCAGGGTATGGGCCTTTGTATACCCTA TTTCGTGAATATTTTATAAAGATACTGTTTTCCAAGCTGCTTTCAAATGTAATTAATA AAGGCCTTGTTTTAGATTGCATTTTTTTGGTCTATTCATGCATTTGATTAGTTATTTGT ATATAAAACCTGATCCCTGTAGCGATTCTTTGTCTCTGTAAGAGGACCACCAATTGC AAAATATCGTTTAAAATAGATACTGACTAGTTTCTGCATTAATTAGAGCATGGGATG AAACACATTTCCGCAATTCTATATCATCCATTTCCCCCAGGGTATGTGTGCATGGAG CTAGCAACATGTTTGTGTATGGAATGGGACTCATTGTTGCAATAGCGAAACTGCGGC GAATCGGGTATTGCCTATATTTCAAGCTATATAGATTTGCCCACTTAGCTTTATCCCA TAGATAAATGCATTTTCTACAAGATTGCTTCTTTTAATTACCTTTATGTATTTTATTTT GACCAAAAAACGGTTTTCAAGTAAATGTTGCCGCAACTTTTGTGTATGGAATGCGCA ACATTTGTGTATGGAATGAGACTCATTGTTGCTAAAGCGAATTGGGGTGAAATGCGG CGAACTGCGGCGAATTCAAATTTTCTCATATTCTTTGTTCCGTTGAATAATACTATTT ATATGGAAGATTTAGCCATGCCCACTCAATTTTGTACGATTGATGAATCAATTCTAT ACATGATTGGCCTATTCGAAATACTTGCATTTATTGGATTTTTATATAAAATTGAAAA TTAAATTAATAGATGACAAAAATTCCATTTAGCACTAAAAAACACCACATGGGAGC GCGGCGAAAACCAGTTTCTACAGTATATACGGTCTCACTTGAAAAATATACCGAGTG GTGCGCGCCAAATCGAAATTTTTTGAATGTGAGCGACAAGGACTTAAATGTTGTCAA CCGGGCCTTTATCTATGCCCACATAATTTTATACGATTTATGAATCAATTTTCTACTT AACTGGCTAGTTTTTAGTCCTTGCTTTTATTGTATTTTGACTAAAACAAGGTTTTCAA GTAAATGTTGCCGCAACATTGTGTTGGAATGGGACTCATTGTTGCTAAAGCGAAACT GCGGCGAATTCAAATTCGATTCAAAAAACGACTTTTACCAGCTAAATAGAACATTTA 36 GCCATGCCCAGATAATTTTATCCGGTTAATGAATCAATTTTCTAATTGAATGACTTCA TTTAAATACTTGCCTTGTTTTCCTTTGATTGTTTTTGCATAAATTTTAGTGCATCAGAA GAAATAATTCATGTTTTACATGGTAACTACACGTTAACTACATTGAGATGTAAATAT CAACAAAGCCACCTGACGACGTATTCAACTAGTTCGACTATCGACGGCATGAACCA GGTTCGTCGTAAATATCGAATTAGGCCATGCCTAGTGCTGGTAAGCGAAGCTATGCT GACAGTGCTAGAAAAATTAACAATGCCCGCCATATTTGATTTGAATTTAAAATAAAA TTGTTTATTTAAAAAACCATAAATTTGAATAGAAGAATAAAATCAGAAATAAAATGT ATGGAAAAATTCTGTTTACTGTACTGTGCTATACTGTTTAAAAAGTAGGCGATACTT AATAGACGTATTCCAGCTCTTAGAACTTATGCTGTATTAAATGATCAAAAATAAGTG AATGCATTGTTAATGAAAGTGATAATTGACAAGTTTGGCTGTCCTCGATGTCATTTT GGGCACAACAACACAATAACTCGTGCCCTAGGCCGCACTATCTCAATTTGATAAATG AATGTCGTCCCTCCCATTATCATCGCAGTCATCTAGCATAACATAACCCTTTCCTCCA AAACCAGCCACAACCCCTGCTCCTGCTCCTTTGCCACACACTCGCACACTCGCAGGA TAACACAATACAGAACAACCAAACCAAACCGAAAAGAAGAACAATTTAATTAGCGA TGACATGTTGCCATTTAATTTTCTTGTAAAAAGTGTAATTAAAAATGTTTTGTCTCGC AGCTGGATGGAGAAGGGGATGGGTATGGGTATGTGGAAGGGGAAGGGAATGTGTG GTTGCGGTAGGGTAGAGGCGGAGCCGAAGGTAAAGGCGAAGGCAAAACAAAGGGG GAACGATCATATATCGGTGGGCCGGTTATAAATCATACTGCGACTACCTCTGGGTGC ATATTAGGCTCTAAACGATGAGATAATCTATGGTTAGAGCGAATACAAAATGCTGCC ACCGGCAATAAATTTGGAAAATACAAAACATTTGACAATTTTCCAAACAAAGGTTG CTAGCTATGTGTGGTTGTGTGTGCCCCTTTTTGTGTGGTTATGTGGGCTATGTGTGTG TGTGTGAGTGTCTATCTGAAGGGTTTCTGGGGCCCGTCCAGCACACATCGAGGACAC AGCAGTCACGTCAACACTTCCAACTCTTTCGGTCGCCTTTTGTTTCGCCTTTCGTATC CTTTCGGCAATAAAAATTCCGTTTGCTATTTATCTTGATTGTCCCCAAGTCGCCATTG CCAGGATATATTCAGGACATGCTCCGGACACTTCAATGTTGGTCGCAGTTGGCGTAG CCGTCGCCATCGCCATCGCCAGCGCCATAGCCATAGCCATCGCGATGCCTGGACATT TTTTGCTTAATGAAATGTACATAAAACTAAATGAAATTAAGTTTGTGGCGTGCTCGT GGAGCAAGTCCAAGTCCGTGCCCCATCTCCTGTGCTGTAAGCGCTGCGCTCGCTAGC TGCGTGTTTATTACCGCGTCGGGTATAAAATGCAAAACGCGTTCCCGAGTCCCGCTC CCATCAAATGGCACAAAATGGCCGCCACTGCCACTGCCACTGGCACTGGCACTGTCC TCCCCCCAGCGAACCCTTTTCCAAGAACTCGTCCAAGTCCGATTCCCATTCAAATTT AAAGCCTCAAACAGCTGCTGTGGGGCGCACTGGGGTGAACTGGGCTGGCATGGAGA GGGGGAAAGCTCATAAAGTCGAAATTAAATAATTTGTCTTCTTAATTTTCAAATAAA TTCTTCATCATACAAATAAAAGGAAAACCAAACCCCAAGCAGCAAAAACAATTGTC AGGCAGCCTTCAAGCCCCGCTCCTGCTCCGAAAAGTGAGCTCACTCAGCCAACCAG AAGCGTGGCGAAATCCCAACCAGCATAGCATCCCTCAGTGCCATCGCACCCCATCCT ATCTCAAGCCATCTCATACCCTTCAAGAGAAGACGGAGAACACTCGAGATGTTGCC GAGCAGGACCGTTTTATTAAATTCGACCTACGCCATGCCATGAATATGCGAGAGCAG CACCCCTGGTCTCATTCTCATCCTTGTCCTCATCGTAGCCCATATCCTTATCCTGTTG CCGCTCTGGGTTCTGATACTGCCGCCCTCGCGACAGTTTTCGTATTGTATTTTTTAAT ATGTTTGTCTGTCTGTCTGGGTGTCCGTCCGTCCGTCCCTCTATCTGTCGCCCGGCAC AAATAATCGTGTACACCATTAATCCTTATTTCACGGCATTGTCCTCCCCGCCCTTATC TCGAGCATCCTTCCCCTCCTCGGCACCCCCCCTCAGCAGCCGGCCGCATGGCAAACA CCCTCACACAAGGACAAGTGGCATGTAAGCGATTATCGCAAGGACATCACAGCAAG GGAAAAGGCGAGGATCTGATATCAGAGGGGAAAAAACAATAAATGTAAAAATAAA TAAAAGGGTGTCAATGACTTTTGATAATAGAAATGATTGTCCAAAAAATTCTCAAAC 37 AATATTCAATGTGAGAATGCACTGTAAGAAAATCTCTGCAGTAACATTTATATATCA TTTTTAAGAATTTAAATGTACAAAAGTTGATTTTGATGAAAGTAAATAGTTGATGCG ATTAAGATTTCATACATATACATATCGGCACATATTTATTTCTAACTAATTGAAAATG ATGGATCATATTATATATATTTGTTGAATATAATTCATAAAGATGGGCTGATTCTACT AGGCCCATGATTTTTGTCAATATTACTATATATAGTACTTGATAGGCTCGAAAATTA CTTATTCCGGATGTAAAATAGATTTCAGAAATTCCATCTCCATTGTACTCTCTGCTTA AATACCTACATTTCTATCGACTACTCCAAAAAGTTGTTAAGTCGCTCTGGAAGTCGC CACTCAATCAGTTTCCCCCACTGAGCCATACACTTTCTGACAGTGCAGATGAAATCT CGTTTCGGAAAACCAATAAAGTCCAAAGACAGCTTTTAGCAATCACAAAACGATTT GGAATACACGCACAGCCTTTGCCATACCCACACACCCTGCCCTGCTCATCCATCCCC TGACTGCCTTTCACCACCCTCTCCACCCGCTGCCTGAGCGACTGACTTGTCCTTTTCT ATAGTTTCAGCACGCTTTCCGCTTGGCCTAATTGCCGGATCCGGATCTGGATTCGGA TCATCGGCGGCACTGAGGCTGCCTCTTCTTCCCCCCAGGAATGTTGGACTTTTCGAC ATATACAAATTTCGGGTGTGCGCAAAATGAAAAATGACAACACAGGCAGTAGGTAG GAGGCAGCAGGGCCGAGTGACGACCGCACGCACTTTTCCGAGCAGGGGAAGGAGC AGAAGCAGGAGCAGGACTTTGACTGCTTTGTCCTGTTGGTATGCTGTGCCATGTGCC ATGCCTGTGGATCCTTTCATTGCATCCTTTTGATGCTGGGGGTGCGAAACGTTTGAAT GCGAAATTAAATGATAAAAGATAGCAATGACACACGCACCGGCACACACATCCCCA CCCAGACCAACCCACACCACACCACACCGCACCACACACAAGTTCCCCTTCCACTTC AACCGTTCGCACGCTGAAGGTCGAGCACCTGTTTTATTAAAAAGCAAATAAACGAA TTTGTGGAACGACAGAGAAGGGTGAGTGCCCAGGCACCAAAGGACCACCGCAGAC AACAGGGCAGAGGACAGGCAACAGGACCCCGAGCTACTGTTAATGGCCGGATGATG ATGATGATGCCGGTGCTGACTGCAGTGGTGGTGGCTGCCCATTCCATCCTAGATGAT CCGTGATCCAGCTGGGCAGGAGTCCTCGTGCTCTGGCTCTGGTTCTGGCTCATGGAC TGTCAAGTGGTTGGCGTAGCATAATAATTGTGCATATTTGGCTTTTTAATAAAATCTA ATCGTCATGGCATCGCTGTTAGGGTGGTAGAATGGGCCAGAGCAGACCCAAACTGG ACCAGAGCTCATGGGCGGCATTGAAATTTAATTAAATTTACCCAAACAAACTCCAAC TGGCTCTCCGATTGTGTGCCCCAGCCGAGAATTGCGTGAGTTGGAGCAAATTGGGAT ACGGGGAGTGGTGTGTGTTCATTGTTCGA

A.2 Distal Ancestral STAR

TGCGTCTCCTCTTGCAGACGGCGCAAGGAGTTGCCCTTGGGGCCGAGCAGCTTGCCG ACGTAGTTGAACTTCTTATCCTTAATGGGCACGTGCACCTTCTGCGACAGTTTGATG GTGCGCTGCTGGTAGACATCCGCGTACTGGTCTCTCGTGGGTATGCGCCCATTTAGC TGCACCCGCTCCAAGGCTGAAAGATGGCCAGAAAAAGGGTAAATTGAGCACTTGGC AAGCAAAGGGCAAGCGACACACGCATCGAATCGATACCTTCGTCGATTAGCTTGAC AGCCAATGGAAAATGGTTCTCCATGCGCGACCGCTCTGTCATCAGTTCCTGCATGTA TTTCTGGATGGCCGGCGCCGGCTGAGCCTTTTCGATATCCCCATTCGCATTATGTTCG TTCTCGTGCTCGCCATTGCTGGCAGCACCTGTGCCCGTCGCAACCACAACGCCTACC GCTGCCGCCACTACAGATGACCCTGGGCCCACACGCATTCCGTTTTCCGCCATTTGT CCTGCTTCCTCCACGGAGCCCGTTCTCAATTAGTCGCCGCGTTTAATGCCAAATAGT GGGTGCTTATCACAGTGCGCGGCACGTTTCGCAGTCAATTCGATTCGGTTTCGGCAA 38 ATGTGACGGCAACCTTCCTATGGCCCTAATTTTTTTTTTCTCTCAAGTGTGTATTTTAT TCGTATATTTTTACTATGTTTACGCCGAAATATGGCAACACTCCGCCGACTTGAGAT GTGCTCGTGTGCTCGCCCGCGTTCTGTTAAGCGCGATTCCATGGCTGTTGAGAGCAC TTGACGTGCTGTTAGGCGCACACGATTGTCTATCGGCTGTTATCCAGCCGAAAGTCG AAAAGTGGATAACGGATAACGCACACATTTCCCCTTCAGGTGGCGTCTTTGGTAATT ACATGATCATGTCGTTACTATGTAGCAAAATTGTAGCTAGTGTGTAGTTACCATGTC AGTATATTTACGGTATATTTTTTAAATGAGATGCTGTTCTGAGGGTTATACTGTAGAA ACTGGTTTTCGCCGCGCGCCCATGTGGTGTTTTTTAGTGCTAAATGGTATTTTTGTCT TCTATTAATTTAATTTTCAATGTTATATAAAAATCCAATAAATGCAAGTATTTCGAAT ACGCCAATCATGTATAGAATTGATTCATCAATCGTACAAAATTGAGTGGGCATGGCT AAATCTTCCATATAGATAGTATTATTCAACGGAACAAAGAATATGAGAAATTTGAAT TCGCCGCAGTTCGCCGCAGTTCGCCCCAGTTCGCTTTAGCAACAATGAGTCTCATTC CATACACAAATGTTGCGCATTCCATACACAAAAGTTGCTGCAACATTTACTTGAAAC CCGTTTTTTGGTCAAAATAAAATACATTAAGGTAATTAAAAGAAACAATCTTGTAGA AAATGCATTTATCTATGGGATAAAGCTAAGTGAGCAAATCTATATAGCTTGAAATAT AGGCAATACCCGATTCGCCGCAGTTTCGCTATTGCAACAATGAGTCTCATTCCATAC ACAAACATGTTGCTAATTCCATGCACACATACCCTGGGGGAAATGGATGATATAGA ATTGAGAAAATGTTTGTCATCCCAGGCTCCGTAGGTATCAGGATCGAGATAAATATA TACAAGATACTAATAATACAGATGAATAAGTCAAAAACATATCAATTTGAGAAAAG GTCTTAATAAATTGCGTTTGATCTCCATATAAAATTAACAAAATAGGGTGCACAAAG GCCTATACACGCATCATGTCTTCAAATACCAGCAACATTGCGACTGTTTTTTTGTTGA ACTTTTTCGTTTAACCCTTTTTTTACAATAAATACAATAAAAGCAAGGACTAAAAAC TAACCAATTGTGTAGAAAATTTATTCATCAATGGGCTAAAACTATGTGGGCACGGCT AAATGTTCTATATAGCTGGGAATATTCGACGGAAGAAAGTTTAACCAAGAATTAGTC GTCAGGTTCACAACAATGAGTCCCATTCATGCACACATACCCTGGGAGAACGGATGT TATAGAATTATGAATGTGTTTGTCATCCCAGGCAAAAACAATTTATTTTCAATATATT TTAATATAAATCAATATTATTTTCAATATTTCAACGATATTTTAAGGCTTATTGTCTT CTTGCAGAGACCAAGAATAGGTATCAGGATTGAGATATATATACAAAACACTCATC ATATACATGAATATGTCTAAAAAATGTCAATTTGAGAAAAGGTCTTTATTAGTTAGG TTTTATCTTCGATCAATATATATACTTTTATATAAAATTAACAAAATAGGGTATAAA AAAGCATATACTGCGGCTTACATGCAATATAGCGACTCTTTTTTTGTTGAACTTTTTC GTTTAAGACTTTTTTTACAATAAATACAATAAAAGCAAGGATTAAAAACTATCCAAT CTTGTAGAAAATGTATTCATCAATGGGATAAAACTATGTGGGCATTGCTGAGAGATT TAGTTAGTCAGCATTATTCAATGGAATATCAAAATTGAGCAATTTCCTTTTTCGTTTG TTTTGATTGACCTATTTCGCT

A.3 Proximal Derived STAR

Previously determined by Richards et al. (2005); GenBank accession number AY693426.1. 39 A.4 Distal Derived STAR

Previously determined by Richards et al. (2005); GenBank accession number AY693425.1.

A.5 Proximal Ancestral STPP

TCTGTTTCACACACGGAGTGGAGTGACTGACAGACACCCAAACAAACTCGGAATAA CTGAAAGATTACCGGTGATCGGTTCGTATCTGTGTGCACTATACTGATATACTGGCA GTTGTCGGCTTTCCACTGGACTCAAATTTGCATACGAACTTTGATGATTTATATTTAT GGGCTGTGCGAGTGCAAACAAATGCCAGCTTCCTGTCTGAGGCGCCACTCCAGTTCC AGGTATTGATATATTCATATGTATATATCTACGTTTAAGTCCCTAAGATTGGGGAAT ACCATTTAAATATGTACCTTAAAGCAGATGCACATTTGGTTTCTTAGTTTGTGGGCTT GGAGCCGATAGCACTAGGCCATTAAAAAGAACAGGCGCAAGACCAGACTGTCACCG ACTGGCACTTCAGATTTAATCTCCCTACCACCTGAAAAGCGATTCTTTTATGTAGTCG AACCGCAGTTGAGGGCTACAGTTAATCAATAGTAACATTGTGTTTGCTATTGTGACA GAACCGCCATAATTTGTCCTCCCTCTAGCCAAATACCATTTTAAAATCTGATCTAACT TGCCTGATATTTTCCAATATTAATTTTGTATGGAATTGATTCCATATTCCATTGTGGT TCCACTTACCTTTGGCCTCAGCCTCATCATCCTTCTTGGGGTTCTTCAGGTCCATGGA GTCGTTGCTCCCACCACAGCACATCAGCTTCATGTTTAATGTCCTATTTAATTCCTAC TAGATCCTACTATAAGGGGTAAATTCGTGGAGCTTTGTTGAGAATTGTTGGGGGCTG TTGGAGGTTGTGGGTGTTATTTTAGCACTTCTATGGTTCGTCGCTCTTAATGTGGCGT AGCAATTCAGTCTCTCCGCCCGCTCACTGCTGGCCAGCAATTCGATCTGATTCGGTTT CGATCGAGGAAAATGTGAGCATATTAACTACTGTTCCAGGTGCCACAACAAAGCAA TTGCTGCTGTTTCTATTTCTGTTTCAGTTCGATTTCGGTTACGATTCTCTGCCGCTCGA CGGATGCTACCGTTTAGTACCGCGCTCTCGCCACAATTTCAGACCGACTGAACGCTG AATCGGCGCTGGGACTCCATTTATACGGTGTGGCTGGGGGTTGTGCTGTAGCGCCGG CGCGAGACACTCAAACACGCTCGTGGCGGCGGACGTCACTGGAGCAGAGCTCTCGA CTCTGCCGAGAAAACCCAGGGAGGCCACTTACGCGCCCCGTGGACAGCCCGTGAGG TGCCTAGGTGACAGTAGGTACATAGGTACTTACCTATGTATGTACTTATATCCRTCG CGAATCGAATCTTATCAGAATCATCTATAGATTACACTTAGTAAAAGAGACCAGGGC ATGCCCACGAGCTTGTCAGGGACAGAATACATGTATATAAGACCATCCGTTCAAGTG TTTGAACCTCGCACTCGCACCCAATCKAAGAAGTTCCATATGATGCTCTTGACAAAG TGATATATTTGTATGTATCAGCGCCTTAGGGGCCCAGAAGTGATATGAAATCGTGAT ATTCCCAAAAGACACCGAAGCCACCGAAATGGGACAATGGAAAATTTATGGCTACG AGGTTTTTTATTTTGATTTTTGAATGAATGAAACTTTGGGTAAACGAATTGAATTATT GCTCGGGAGTTACGCTCAGAGGGAACCCCGAAAACAATCCAACAAAGTGTGGCCGA AACAAAAAAAACAAAAACTGGTTAAACGATCACCTGCCCTACTCGTGGATGCCACC GAGTTCCGAGAGCTATGCTCGAATCATAGCTGGCATTCAGAGGAGTGAGTGGATTG CTCTTTGCATTACATTACCCCCGGCAGTGCAGGCAGAGTAAGAGTCGATCACTTTCA CTTTCGCATTGGGAGCTTTAATTAGTTCTGAATAGTTCTGAACACAATACTCTCTCCC CAAGGCCACTCACTCCCCTCTGGTTGGACAGTTACTATGTGTTATTTAAAAATTCTTA GCTATGCATAGTTCATGCCTGGCTGATCCAATGGAATTCGTTTTGGTTCTGTTTCCCT 40 CGATCTCTCTCGCTCTCTCTCTCCCTCTGATAGTGTTTTGCGACTACTAGCAACCTTTT TATTGTTGCTGCCAACGTGTGTTTTTTTTTTTAGAGAATAACTGGAATTTCTTGCAAT TACCAATGAATATACATACTACGTATGTACGTGGCAAATCCAGCTTCTAATCTTTGC TTTAGTCGTATTTACCAACTCGACTCTCACTTTGACACCATTTACTGCCCCAAATGAA AGTTCGATCTGATCTTAGGAGGTGCAAACGAAGTGTAAAACTGCTTTAGATAGCCCC CCCCAAAAGGTGACGCACTTTCAAACMCCCTTCAGCCACACGCTCGGCTCATGTCG AGCCATATGAAAAACAAATGTTAATCTCGGTTCTACTCTTAAGACGTTAAACAATGG GACACTAAAATCAAACATTGTGATAAGCGGTCTWAAATCCTCTCTACTTGATGCGG ACCGGACGAGTCCGAGGGGTAGGAGGGTTTGCTAATCCAATTGTCGTGATGCTTCTG GCTTGCGGTCTCACTTAGCAAACATGTTAAAGCTCATTGGGGCACACTGAGTCTCTC AGGATTTTATTGATGGATTTGTCACGATATCAAACCAGCTCTAACAGTCACTGTCGA TTTATTTATGTGTAAGTTCTACTCATTTAGTAGAACTGAGCGTAAGGGAGGTCCAAT GACCTTGTTATAGGGAATATTGAATTTTGCAGCATACTTCTGGGGATCATATTTACA GCAAAGGATTAAGGATAATCAGCTGATTAAAATCGTATTTTCAACTATAAATGTCCT TTAAGTGTTTCAAATATCCCTTAAATTCTTTAAAACAATTTGAATTCTTATAAGCCAA GTGGTATATTCCTATAAGCCCTTAGGGTATTGATACTGACCCACTACCCGAATGTTG TTTCTGATTTACAGATCTCGTGCTTGGCACGACTGACCGCAAACGACGTTTGTTTGTA GATCAATAAATGTGTTTGGATCAATAAAGTATAGTCTCATATCGATTGTAGATTTGT ATGTTAATGGGAAGCCTCCCATTTCGCGACTCTTCCGCTGCCAAATGATGCTGCAGA TTTTACTTATCGTTCTCCTAGACGTCACCCTTCCCTGCCACCCTCTACGTATGTATAT AAAACTTCCAGCAAAGCAAGCCTCAACACGCTATTCTCTGGTAATACATTTGTAGGT ATATCTAGCACAGATGGATGCTAGTGTTTTAGTTTTTCACGTGTCATGGGGGCTGGC ATCGGTGACGGGCGCAGAATTAGTGCACACAATAATTGCCAGTCAATGCAATTACA CATCCTCCAGTAGATATTAGTGGGAGCAGCTGTTTGACGGACTAGTAATTACATTCT GGACATATTTTTTACAACACAGCACTACAGAATGAACAGTTGTAAATGCAATTCAGA TTTATTTGTGTTTACCTCAATCCCTACCTCCCTCCCACTCTCACTCTTTCTTTTTGTAT CTGTTTTAGCTTTCGAATCGATTGCATAAGCTCCCCCACACTCGATTCATGATCGAGA GTTTTTCCCTAGTAGTTCTAATATGCGAAACGTGCCGTTTCAGCTTGAACTTGACCAT AGACAAGTCAGGCTCGACTTAAGTTCCTCAATTCCTGAGTACTCCACCCCCCACTCG GTCACATGTAAATTCAAGTTCAAGTGCCTGTCGCACTGTTCAATGGGAAGTCTGAAT GGAACAGACTCCCGCTAAATTGGCAAGCTCTGCTAGGCCATGCCGAAAACATGTCA AGCTGCGATGAGTAGTGTATTCGATGTTATGGTACCTGGCCATAAGCGAGCGGCGG GTCGTTGCGCATGCTAATTGCCGTCTGATATGGCCGATTTTGTGTGGCCATTAACGC GCAACGCGATCGACCAGCAAGATGACATTGATTGGGCCGTACAACTTCCATCGACTT CCCTTCCCCCTCTGGAACCCCCCTTAGAACTTAACTATTCTACTCTTGATTTTTGCGG TCTCTCAACCAATCTCATTATAAATTTCCAACATTCGTGGGGGATTCTCGCTGGGCTG GGATTTCTCCGAATGCAAAATTGATCAGCAATGAAATTAACATTTACCAGAGCTGAT TTCGTGCTACGTTGACATAGGCCCGTCGGCACCTTGTCGACGTCAGAGCGAAACTTC GCATCCATTAAACAACACTTGCTAACTGATTTGGATCATATTTCGGGAGGGCAATCC TAACTTGTGCTTTTAGCGAAACAATTTCGCAAATTCAAGGAATTTTCCGCGACACTT CAGATGCTAATAACACAACGGAAGACCTCTTAGAATTTGGTTAAATGATAAAGCCTT AACAACGTGCTATAATTAATAACCCTAATCTAATCGTTGATGCCAATATTAATTACA AATCCATCAGTGAACAATTTAGGGCGAGCGTCATAAAGCCATCGAACGTTTATCTGA CACTTCCGTATCAGGGCTAAGCAATAAACTCGTAAACAAAGAGAAAATCCTCAAAT CGCAGCCGAAACAAACCATTGTCATGCTTTGCAAGCAAGCCTTAGGACTGTCGTGAT AAGGTGGAGTGGCAACGAGACGTCAAAAAACCTCATAAATTCAGGGAAATCTATAA 41 TGGAAGTGCGTTTTCCGCTGAAGCAAACTAAGATTAATTATAGATAGAAAAAACCCT CAGTTAGCTTCAATTATACAGTTTGGTAAGCTTATTTTTAACATAACTCAGTGTAAGA AAACTAATTTATTTTTTCCTGTTTAGTTTACATACATAAAAATCTACATGCAAGTTCA AATTTATATTCAAAATTCAAATTGCGCGCTCAACAAATTGCCATTCGCTGCGCACGT GAACTCTAAATAATATTACCTAACGCGTTCACAAACATCGTATTTACGACCAATTTA TTGACCTCAACTCTCATATTCGTGGTGTTGTTTTCAGTCAAAATAGAATGCATGCAAT TCTTTAAAATTAATTCATTTTGTAGAAAATTGATTCATCAATTGTTTAAAATTATGTG GGCAATGCTGAGAGATCTATATAGCTGGGAATATTCGACGGCAGAAAGAATAACCA AGAATTAGTCGTCATAACCGGTTGACAACAATGAGTCCCATTCCATGCACACATACC CCGGGGGGAAATGAATGTTATAGAATTGTGATTATGTTTGTCATCCCAGGCAAAAAC AATATATTTTAATATTATTCAAAATTATTTAATATTATTTTCAATATTTCAACGATAT TTTAAAGCTTATTGTCTTCTTGTAGGGACCAAGAATAGGTATCAGGATCGAGATATA TACATATACAAAACACTCATCATATACATGTCTAAAAAATGTCAACTTGAGAAAATG GCTTTATTAGTTACGTTTTATCTTCATATCAATATTTATACTTTCATATAAATTTAAAA AAATAGGGTATACAAAGGCGTATACTACGGCTTACACGCAATACAGCAACTTTTTTT GTTTAACTTTTTCGTTGAAACCTTTTTTTACAATAAATAAAATAAAAGCAAGTGGTTA AAATAAGCCAGTTAGGTAGAAAATTCATTCATTAATCGGATAAAATATTCGACGGA ACATCAAAATTGAGCAATATGGTTTCCAATATTCGACGGAGAACGATTAACCCATTG CAAGCCAAGGGGGTATTTCAATTTTTCACCGACAATAATGAATATTTAATAATGTTA GCGTAGTTGAGGTGAAAGTTAGCGTGGTTATTTTTAAGTTCAGAAGAAACATGGCAT TAATCTACAAGAAGAATTTTCAAGCCTTAATATATTCCGCCACTTACCTCAAGTAGT TAAACTTTTTAAATAGAGGGGGCTTAGTGGACTAAGTTCGATTTCTTATCGATAAGA ATTCAGGTAAATAGAACGTGCGGCTTTTCTTTTTGTGCGATGTAATTATGAAATTTCC GTTTATTAAATCAAAGCAGGCTCTGAAATAGTTGAAGAGACAGATTCTCAGTATAAA CTTCTTCATTTAGATTACAAACAACATGGCGCAAAGCAAACTGAATGATCTTGCGGG CAAGCTGGGCAAAGGTGGCCCACCTGGACTGGGTATTGGACTGAAGCTCCTGGCGG CCGCAGGAGCTGTAGCCTATGGTGTTAGCCAGTCCTTTTACACAGGTAACTGGGAAA TGGGATTGAAAAAATGTCGAAAATTAATGGATAATAGCAGAAACACGAATGCCAAA GCTGCATGTGCACATAACCTCAACCCTACTGAGTGCCGATGTGTGAGTGTATGCGAG GGCCTGGGCTATCTCCATATTTCATAACTATTAAAAAGCCTTGCACAACCTTTGCTTG AATAGAATCTGGATTTATTTTGCAGACTGCGTAGACCCATCCATGATATATAGTTCT GTCAACTTCATATATTTCCACTTCCAATATTTCATCTGAACATGATTGCATAACCAAA CTCCATATTTTGTTGCTTGCAGTCGACGGCGGTCACCGCGCCATCATTTTTAACCGCG TTGGCGGCATACAGAATGATATATTCTCGGAGGGTCTGCATGTCCGCATTCCATGGT TCCAGTATCCCATCATCTATGATATTCGCTCCCGTCCCCGTAAAATTGCCTCGCCGAC GGGCTCGAAAGACTTGCAGATGATCAACATTTCACTGCGTGTGCTGTCGCGTCCCGA CTCTCTGAACCTTCCCTCCCTGCACAAGCAGCTGGGCGTGGACTACGACGAGAAGGT GCTGCCCTCCATCTGCAATGAGGTGTTGAAGAGCGTGATCGCCAAGTTCAATGCCTC GCAGTTGATTACACAGCGTCAGCAGGTGAGTGGCTTGAGCACAATCAGATCTGATTT GATTCTAAACGAGAATTCCTTGTGTGAAGGTGTCGCTTTTGATCCGCAAGGAGCTGG TCGAGCGTGCCCGCGACTTCAACATCATTCTGGACGATGTCTCCCTCACCGAGCTGA GCTTCGGCAAGGAATACACTGCCGCCATCGAGGCAAAACAGGTGGCCCAGCAGGAG GCGCAGCGCGCCGTCTTCTTTGTGGAGCGGGCCAAGCAAGAGAAGCAACAGAAGAT TGTCCAGGCTGAGGGTGAAGCCGAGGCAGCGAAAATGATATCCTTTTACAAACACA CAAACTTTATCTTTCGATTTAGCAATTGCCATTTGATTTGTCCCCTGGTTTTTCCCTTA ACCGATATGTTTACTTAGGTCTGGCCGTGAAGCAGAACCCAGCTTATCTGAAGCTGC 42 GCAAGCTACGTGCCGCACAAAGCATTGCCCGCACGGTATGATATTCACCTCAATATC GATCTCAATGTGAATCTCAACATTTTCCCTTTCTTCACAGATTGCCTCCTCACAGAAC AAGGTCTACCTGTCGGCGGACAGCTTGATGTTGAACATTCAGGACTCTGCCTTCGAT GACATGACCGAGAAGGTTTACAAAAGTAAATAGCTCTCGTTTTGATTAACAAAAAG AGTTTACTTTTAAGTTGAACGGTTTCTGATTTTGGTTCCAATCACTGGCCAATACATA GATAGCCTTAATATTTGAGTAATTTCTAACACGTTCTTCGTACTCTCTCTTTTTGCTAC ATCAATTTCGTTTTGTTTACGTTTCTCCAATCCTTCCGCGCAACACAGTAGCTGGCGA CTTGGACTAAGAATCGACCAATATTCCTTGTTGTTAATTAATTCTTATCCCATATTTT CGAGGTAGTGCCTAAGTACCGTTTAGTTTTAACCACTCTCAAGCGGCTTGTCGTCTTC ATTTTTTCACGAATTGCCTGCTCTGGTTGTGCGTTGGCCTTGCCTTTCGGAGAGAGCT TCATCCCACATCATCTTCGAATCGCAAGCGCCAAGGCACAGCAGAGCACAGCGGGC CAAGCGCCCTCAGTTTGTGAGGTAGGGAGTCGACGAAGGCAGATGGCCGCATCAGC ATTTAACGTTCATTGTCCCATCATCAAGAATCAAGTGAAACTGAGACTTATAATAAA TGCGTACTGAGAACTGTCGATCTGTACTTGTCGCTTGCTTGGTCGTCCATTGGTCATG GCCGATTGTTGATATTGGTTTTAGCTCCCACCGGTTGCTGCACACACTCGTGAAAAA ACGAACTGACGTGTCCACAAAGCACTTGTTCGACTCTTCAATAATCACCATCCCCAT CATTCTTACAGTTGGCACGGGCTTGCCCAAGGATTGGATTAATCCAGGAAAAATGAT CGCTGCATCGGGCAATCAGCTGGGGCAGGAGGACGGCAACAACGTGGGCAAGGTG GCACGCAAGACAGCGGAACGGATGATTTAATTCGGGCACTGATTGGGGCACGGCAC TCTTCCACAATCCTAGTACAAATATCAAGATATTCAAGAGGGGAACAAACGCCGCTT GACTCTGACCGACTTCAAATACGAAACATTTACTTATAATTATAATTGCGTGTATAA AATACATAAGAATAGTACATAAGAGTGTATATGGTATATAATTTATGTGTATAGAGA TTTATGTATGTAAATTGTAGGCAGATTAGGTTGCTTCTCCATTTTTGTTGTTTGATGT CCTTGCTAGCAGACTATCCAGGCCCCTGCGGCATTCCCATGTGATGTGGATGTGTGG TAGGGTGTGTGGGCGTGTCTAATGTAAGAGATGTTGACTAGAAATATGTGGTTGTAG CTATGTTTTCCTATGTGTGGCATGTGTAAGGGGAATCCCGTTTTGCATTCATACTGTA GTCCGTCCATGTTGAGCAATCGAACAACCGCCAGGGGCCGGTCCTCGGTGGTGGCT GCTCGCATCTCCGTCATGATGAATAGTCGCACTTATGCGAGTATCTATTCCTATATAT ATATCTATAATTTATAACGAACTGAACAATCAATCAAATTCCTGCATTAGGCGTGGG TTTAGCTTTGTATGTAAGTGTTTTGTAGTTGGCTTAAAATCGACGCGACATAGACCA GGATCAGGCTGTCGCCGGTGGCTTTACTCCTCCTCCAGCTCCTGTCCCGCTGGTGGC ACTGCTGCCACTCTCACTGCCG

A.6 Distal Ancestral STPP

TATTGCCCGAAACCCACATGGTGCCTCCGGTGGTGGCGGCYGTGTCACGGCGCGTGC GACGGTGCACCACAGGGGCGGTGGAAGGTGCGGCCAGGTACAGATAAGCTACCTTG CACTCAAACTGCTTGCTGATAGCGGCAATGGCAGAGTCTGAGAAGAGCACAACGAA ATTAGCACCCATTCTCACTCGCAAACTGATCTGTGTACAAACCATGTGACTCCAGGA CAGCCTCGCGGGTATCTTCGTCATCCTCGAAGCTGACGAACAGAGCGGTGCCCACG GAACACTTGGCCGGGCTCGCAAACCGACCACTGGCATCGATGCTGTTGTGCTCCCGC TTGGTAGCCACCGAGCGTAGGGCCTCCGACGGATTCTCCACGCTGGTGTAGTACGTC TTGGGACTCACGCCCTGCAGCTGGGCGTAGCAAGACTGCGACTGGGAATTTGAGCA 43 TAGGAAGTCCTTGGTGCTCAGCTGCAGTCAGACGAAAAAGCATTAAGAACTAAACT CTGCTATCAGCCATTTCACGTGCAGCACTTACGCCATTTTCCTCGAAGGCCACAACC ATGTGATCTTTCAGCAGCGCTGCCAACTGGTCAGCAAACTCCGCCTGGGGCACAGTC TTCAACGAAGGCTTCGCCTCACTGTACAAAGAGAAAGGACTTGGTATTAGCAATAC AACTTGTGGCCCGCAATTCTGACCCACCTGTTGGCACCCCATAGGAAGACGGGCGTT TGCTCGGCCAAGGCTGCTCCAATGACACACAATGCAATCAGCGATTTCCACAACATC TTGCTGGCAGCGCAGGAACGACGCACTTAATGGGGGCAAACAATAAGGAAACAAA ACCAGGCATAATGTAAGAACAAAAGAGAACACAAAAAGCAAGAAAACTAACCGAA AAGTCAACTCAAACTACAAACGAAAAACGGTGGACAGAATTTTCGCCTTCGCACGA AAAAATTGAGAAAACTTATGACCGACCAAATCATGTGATCGTGTGACCGCGGATTT ATCGTTAAAACATACCGTCCGACCCTCAGAAATATACCGTAAATATACTGATTTATA TTTTCCATATTCCTCAATTTTTATATTCCGTGGAATATTACTAGCTATATAGAACATT TAGCCATGTTATAATAATTTTATACGATTGACAAATTAATTTTTTACTCGACTGGTTT GTTTTAAATACTTGATTTGTTTTCTTGCTACGGCGATTGCAGTTTTTTCTTCTCAACAA CAATATAGCGTCTCGTTATGATGAAAAAACGATCTTAGCCCACTTCAGCCCCTTTAT TTAAATAGTTCAACGACTTCAGGTAAATGGCTGAGAATATTAACGATTGTAAATTCT TTTGGTAGATTCATGCCATCTGTCATTTGAACTTAAAAATAAGAATGCTAACTTTCAC CGGAACTGCGCTAATAGTATTAAATTTTCATTATTTCCGATGGAATATTAAAAAACC GCATTGGCTTATAATGGGTTAATCGTTCTCCGTCGCGGACTGGAAACCATATTCCTC GATTTTTATATTCCGTGGAGCATTACTAGCTATATAGAACCCTCAGCTCTGTCGACCT AATTTTATCCGATTATTGAATTAATTTTCTACATGATTGGCTTATAAATACTGGCTTT TATTGTATTTTCTCGAAAAATAAAGTTTTTAAGCGAAAAAGGTCAAACAAATGAAAC GTAATAAAACGTAAGCGAAAAATCGTTCCAATTGTGCGATTTCTTTATTCTGTTGAA TAATGCTGCCTAACTAAAAACATTAGCTCTGCCCAAATAATTTTCTACCTGATTGACT ACTTTTATGTACTCTCTTTTATTGTATAGGGTAGAAAACAAGGCTTAAGCAAAAAAG GTCAACAAAAAACAGTGGCAACTTGAGTGTGGAATGTCACGTCAACCAGATTTTTTG AACAGGTATACCCTATTTCGTTAATATTATATGAAGAACCTTTTTTCAAAGCTGTTTT AATTTAATTTTAATTTGCCGAAGCAGCTGGATCAGGCGGTGGAGACGGTTGAGTCGA ACACTTACCATATGAGGGATATGATACTCATCTATCATGATGAAAAAAGGTTCGTTC TTGAAATATGTCAAGTGAGACACGAAGTTAATTTTTCCCTCCAACATTTCCGGCCGC TTTAAATAGTTAATACATTTTCTTAACTGGGAATGTTTTAAAAAGATAGAATATTGG GCAGAAGTGGCGGAGCAGCAGCAACAGCAACGTTAGATGTTAGCTTCCTCACAACA CGCTTTGCCGGCACAGGAGCCTGCGCTAACTGGTTGTTGGTACTGCATTTCCTTGTTG GCGTTCCCGCAGCAGGAGGCCTGGACACTGCATCTGGTATACCCAGCGGTCGTTGCT GCTTGTGGTAGCAGTGCTCGAGAGTGACTACATAACCATTGTCTCCAAAAGCTATAA TAAGGAAAAAAAAAACCAAGATAAATATCCTTACTTCTTGAAAAGAACTGCAATAA ATACACACTTGTTGCTGATGATTGAGCTTGTCGGGGGTACCAAATGGCAATTGGCAG CGGCCACTTGGCGGCTGCTGTTCAATATTGCGCTTCACCGTGGCCGTCTGCTGCTGTG CTGTTCCACTGGCCTGTACCGTGGCCATATCCATATCACTGTCCATTTAATTGCAGTT CGAGCACTTGCACTTTAGATCGGATCGAAATAGTTTTGAAATTTTCGCGGGCGCCAC AGTTGAAAGTTGAAATTTAATTTGGTTTTGCAGCCCTAGTCAAGTCACTACACGACA CGCACCTATCGACTGTTTTAAAAACTCCATAAGAACGTACTATATTAAAAATTATAT TAATCAATTAATTGATTATTATTAATATAATATATAATATATATATACATATTTTATA AAAATATTGTTTCGATACTATCACTAACCATACAGTATATACAGTATATGCATGTGC ACCCATACACTAAAACATGCTGGTGGCAAAACAGATCAGACCTGAGAGAGTTCAAA AAATCTGAAAAAACATACAATCACATATTTTTGTCGAAACATATTGCGTGTGTACTA 44 ACACATGCAAAGATGTTCTCTCTGCGCTCTATCTCTCATGATGACGTCACTCTGGGGT ACGGATAGCGCAAGTCGGTGTGTGACAATTTCGTTGTATGAACTGGCACATCTATAT ACTGTATGGTTAGTGATACTATTGAATCAATTCTGCCAACGATTCTGCAGCCCTAGG TAAAAGATTCCGTCGAATATCTAGATAGAACCCTCAACGTACCTTGCTTTGCGTACT GCGCTAAAATTTAATCAAAAGTTAAATAATAGGCCATATAGCAAAGAACGTGTTCG AGTACCATAGTACTTTTCTGTCGTGCATAAAAAATACCATATAGGAACAGAGTTTCT TAGTTTCTTAGCCGCCAGCTTATTGTTATTATTTCCTCCCAGTCAGAATTAAAAGTTT CAAAGCGTTCTGCCTTTAAGCAAACAGCCACTAAATATCTGAGCAGGCAGCTTTAAG ATTCATATTTGCCAGCTTTTGTAGCTTAATACCAGAGATCCTTTAGGAGTTTCGTAGT ATCCAACCTCTTTGAGTTTCACAGCACCCAACTTTTAGTTTCAGACAAATTAGTTTTA TGGCAGCAAGAATTCTCAGCAGCCTGCTCGTAGCTTAATATCAGCCATACAAGTAAA ATGCCCTGCCATATCTCCATATGGAATGGTTGAAATAATCCGATTAAAGCCTTTAAT ATTAGATATAAAAAGAACTTAAAAATAGCTCAGCGAGAAGCTAATATGAGCTTTGC AGACATTTTTATAATCACGATATACTGCCCAGACTTGGAAAAATAATTAATCCTCAG AATATTATTACTTTATCAGATGAATCTATTTTTAATATTTTATGTTATGTTCTATACAA ATCTTCAGCTTATCTTGGCAGGAAATTTGAAATTTAGTATCGAGATTGTATTTTGACA CACTAACATTGTAGAGCCCTGACGTTAATTATCATTAATCAGACTTCGAATTTGATT AGCTTTTGACTAATTATGTGCGTCAGTTTTGATCGACTGCAAAATTAAATTAACTTCT GCAATGGACCGCAACCAGTTGGACGGTGATTTCCCTGCGCCAAATCTAAGGTCGTTG TGCCAAACTGAGACAAACGAGTTCTGGTATATTTTGAGCTATGATTCACATCGAGAA AGATTTGATTTTGTAAATTTTGTAATAATAAATACACAGCTAAAGCTTTAATAGGAC TCTCCGCTAACAAGCATACATCGTAATTATGGCATTAATACTAAGAATACAAACAGG CAACCGCTGAGGCACAAACTATAACTTTGTGTTAACGATATCCTCGCGCGACTCCGT CGTCTCCACCTCATTCCACTTTTGTATCTTTCCCGATCCGAAGGCCCAGAAGAACAG CGCGGGCAATATATACGCTGCTGCACCAATGATGAATATCCAGTGCCACTGATCGAT CGTGTTCTATGAGAGATTGAGAATGAGAAGTGTTATTGGGTTTTATAGAATGTACTA TGCATAGTATTATGTTATATGTACTCACCTCGTTTTTGGTAAAGGCAGCGACGATCA GCGGCGAAAAGATTCCTGGCGTGGTTCCAACGCAATTGATGATGCCGTACAGCGTG CCCGCATAATTGGGAGCCAAGTCCTGGGAGTTGGCCAAATTTGAGGCTGTCGCAGC GCCATTGAAGCCCAGCGATATCGTCATGATGGCCACACAAACATAGGGATCGCGGC CGAAGTAGGCCAGGATGATGAGCATAATGCCCGGCAAGATGTGAGCTGTCAGAAAA CGGACATAATATTTCACGATTTCTATTCCCATTCAGTCAATGGACTTACAAGGCAAG CAGAATGCTTTTCGCATTTTTGTAACCGTCCACCAGTCCCTGCGGCGTATCCAATCTG CCACGCCGCCGAAWCCAAAGGCACAGATCAGACGGGCCACATGAGGCAGGCTGGA CAGGACACCGGCGGAGGACAGATTGAAGCCCAGCACTTCGCTCAAGAACTTGGGTG TGA

A.7 Proximal Derived STPP

GTTCAWAGAACGTATCTCCTATCTAAATCTCTAGTCGACGCTGCTGTTGTTTTTGTTT TTGTTTTTGTTTTTGCTTAGTTTAGTCGCCGCTGCCACTGCCACTGCCGCTGCTGACG TTGCTCATTAGCCGCCGGTACGCGTACTAAACTGGATTCGCGTTAGACTGTCTGTCT GTGGCGTATTGGAGCATTCTTTGTGTGTTTTGTTGCTCTGTTTCACACACGGAGTGGA 45 GTGACTGACAGACACCCAAACAAACTCGGAATAACTGAAAGATTACCGGTGATCGG TTCGTATCTGTGTGCACTATACTGATATACTGGCAGTTGTCGGCTTTCCACTGGACTC AAATTTGCATACGAACTTTGATGATTTATATTTATGGGCTGTGCGAGTGCAAACAAA TGCCAGCTTCCTGTCTGAGGCGCCACTCCAGTTCCAGGTATTGATATATTCATACATA TGTATGTATATATCTACGTTTAAGTCCCTTAGATTAGGGAATACCATTTAAATATGTA CCTTAAAGCAGATGCACATTTGGTTTTTTAGTTTGCGGGCTTGGAGCCGATAGCACT AGGCCACTAAAAAGACCAGACGCAAGACCAGACACAAGACCAGACTGTCACCGAC TGGTGCTTCAGATTTAATCCCCCTACCACCTGAAAAGCGATTCTTTTATGTAATCGAA CAGCAATTGAGGGCTACTATTAATCAATAGTAATATTGTGTTTTCTATTGTGACAGA ACCGCTATAATTTGTCCTCTAGCCAAACACCATTTTAAAATCTGATCTAACTTTCCTG ATATTTTCCAATATTAATTTTGTATGGTATTGATTCCATATTCCATTGTGGTTCCACTT ACCTTTGGCCTCGGCCTCATCATCCTTCTTGGGGTTCTTCAGGTCCATGGAGTCGTTG CTCCCACCACAGCACATCAGCTTCATGTTTAATGTCCTATTTAATTCCTACTAGATCC TAATATAAGGAGTAAATTCGTGGAGCTTCGATGAGAATTGTTGGGGGTTGTTGGAGG TTGTGGGTGTTATTTTAGCACTTCTATGGTTCGTCGCTCCTAATGTGGCGTGGCAATT CAGTCTCTTCGCCCGCTCACTGCTGGCCAGCAATTCGATCTGATTCGGTTTCGATCGA GGAAAATGTGAGCATATTAACTACTGTTCCAGGTGCCACAACAAAGCAATTGCTGCT GTTTCTATTTCTGTTTCAGTTCGATTTCGGTTACGATTCTCTGCCGCTCGACGGATGC TACCGTTTAGTACCGCGCTCTCGCCACAATTTCAGACCGACTGAACGCTGAATCGGC GCTGGGACTCCATTTATACGGTGTGGCTGGGGGTTGTGCTGTAGCGCCGGCGCGAGC CACAAGCTGGCTGGCCAGCGATCGAGGCACTCAAACACGCTCGGGCGGCGGACGTC ACTGGAGCAGAGCTCTCGACTCTGCCGAGAAAACCCAGGGAGGCCACTTACGCGCC CCGTGGACAGCCCGTGAGGTGTCTAGGTGACAGTAGGTACCTACATATGTACATACC TATGTACATATGTACTTATATCCGTCGCGAATCGAATCTTATCAGAATCATCTATAG ATTACACTTAGTACAAGAGACCAGGGCATGCCCACGAGCTTGTCAGGGACAGAATA CATGTATATAAGACCATCCGTTCAAGTGTTTGAACCTCGCACTCGCACCCAATCGAA GGAGTCCTATATGATGCTCTTGACATAGTGATATATTTGTATGTATCAGCGCCTTAG GGGCCCAGAAGTGATATGAAATCGTGCTACCCGCAAAAGACACCGAAGCCACCGAA ATGGGACAATGGAAAATTTATGGCTACGAGGTTTTTTATTTTGATTTTTGAATGAAT GAAACTTTGGGTAAACGAATTGAATTATTGCTCGGGAGTTACGCTCGGAGGGAACC CCGAAAACAATCCAACAAAGTGTGGCCGAAAAAAAAAACAAAAACTGGTTAAACG ATCACCTGCCCTACTCGTGGATGCCACCGAGTTCCGAGAGCTATGCTCGAATCATAG CTGGCATTCAGAGGAGTGAGTGGATTGCTCTTTGCATTACATTACCCCCGGCAGGGC AGGCAAAGTAAGAGTCGATCACTTTCACTTTCGCATTGGGAGCTTTAATTAGTTCTG AATAGTTCTGAACACAATACTCTCTCCCCAAGGCCACTCACTCCCCTCTGGGGTGGT TGGACAGTTACTATGTGTTGTTTAAAAATTCTTAGCAATGCATAGTTCATGCCTGGCT GATCCAATGGAATTCGTTTTGGTTCTGATTCCCTCGATCTCTCTCTCTCTCTCTCTCTC tCCcTCTGATTGTGTTTTGCGACTACTAGCATTGTTTTTATTGTTGCTGCCAACGCGTG TGTTTTTTAGAGAATAACTGGAATATCTTGCAATTACCAATGGATATACATACTACG TATGTACGTGGCAAATCCAGCTTCTAATCTTTGCATTAGTCGTATTTACCAACTCGAC TCTCACTTTGACACCATTTACTGCCCCAAATGAAAGTTCGATCTGATCTTATGAGGTG CAAACGAAGTGTAAAACTGCTTTAGATAGCCCCCCCCCCAAAAGGTGACGCACTTTC AGACACCCTTCAGCCACACGCTCGGCTCATGTCGAGCCATATGCAAAACAAATGTTA ATCTCGGTTCTACTCTTAAGACGCTAAACAATGGGACACTAAAATCAAACATTGTGA TAAGCGGTCTTAAATCCTCTCTACTTGATGCGGACCGGACGAGTCCGAGGGGTAGGA GGGTTTGCCTAATCCAATTGTCGTGATGCTTCTGGCTTGCGGTCTCACTTAGCAAACA 46 TGTTCAAGCTCATTGGGGTACACTGAGTCTCTCAGGATTTTATTGATGGATTTGTCAC GATATCAAACCAGCTCTAACAGTCACAGTCGATTTATTTATGTGTAAGTTCTACTCAT TTAGTAGAACTAAGCGCAAGGGAGGTCCAATGCCCTTCTTATAGGGAATATTCAATT TTGCAGTATACTTCTGGGGATCATATTTACAGCAAAGGATTAAGGATAATCAGCTGA TTAAAATCGGATTGTTAACTATAAATGTCCTTTAAGTGTTTTAAATATCCCTTAAATT CTTTAAAACCGTCTGAATTCCTATAACTCAAGTTGTATATTCCTATAAGCCCTTAGGG TATTGATACTGACCCACTCGTCGAATGTTGTTTCTGATTTACAGATCTCGTGCTTGGC ACGACTGACCGCAAACGACGTTTGTTTGTAGATCAATAAATGTGTTTGGATCAATAA AGTATAGTCTCATATCGATTGTAGATTTGTATGTTAATAGGAAGCCTCCCATTTCGCG ACTCTTCCGCTGCCAAATGATGCTGCAGATTTTACTTATCGTTCTCCTAGACGTCACC CTTCCCTGCCACCCTCTACGTATGTATATAAAACTTCCAGCAAAGCAAGCCTCAACA CGCTATTCTCTGGTAATACATTTGTAGGTATATCTAGCACAGATGGATGCTAGTGTTT TAGTTTTTCACGTGTCATGGGGGCTGGCATCGGTGACGGGCGCAGAATTAGTGCACA CAATAATTGCCAGTCAATGCAATTACACATCCTCCAGTAGATATTAGTGGGAGCAGC TGTTTGACGGACTAGTAATTACATTCTGGACATATTTTTTACAGCACAGCACTACAG AATGAACAGTTGTAAATGCAATTCAGATTTATTTGTGTTTACCTCAATCTCTACCTCC CTCCCACTCTCACTCTTTCTTTTTGTATCTGTTTTAGCTTTCGAATCGATTGCATAAGC TCCTCCACATTCGATTCATGATCGAGAGTTTTTCCCTAGTAGTTCTAATATGCGAAAC GTGCCGTTTCAGCTTGAACTTGACCATAGACAAGTCAGGCTCGACTTGAGTTCCTCA ATTCCTGAGTACTCCACCCCCCACTCGGTCACATGTAAATTCAAGTTCAAGTGCCTG TCGCACTGTTCAATGGGAAGTCGAACAGACTCCCGCTAAATTGGCAAGCTCTGCTAG GCCATGCCGAAAACATGTCAAGCTGCGATGAGTAGTGTATTCGATGCTATGGTACCT GGCCATAAGCGAGCGGCGGGTCGTTGCGCATGCTAATTGCCGTCTGATATGGCCGAT TTTGTCTGGCGATTAACGCGCAAAGCGATCGACCAGCAAGATGACATTGATTGGGCC GTACAACTTCCATCGACTTCCCATCCCCCTCTGGAACCCCCCATAGAACTTAACTATT CTACTCTTGATTTTGCGGTCTCTCAACCAATCTCATTATAAATTTCCAACATTCGTTG GTGGATTCTGCTGGGCTGGGATTTCTCCGAATGCAAAATTGATCAGCAATGAAATTA ACATTTACTTTCGTACAGAAATCAGAGCTGATTTCGTGCTACGTTGACATAGGCCCG TCGGCACCTTGTCGACGTCAGAGCGAAACTTCGCATCCATTAAACAACACTTGCTAA CTGATTTGGATCATATTTCAGGAGGGCAATCCTAACTTGTGCTTTTAGCGAAACAAT TTCGCAAATTCAAGGAATTTTCCGCGACACTTCAGATGCTAATAACACAACGGAAGA CCTCTTAGAATTTGGTTAAATGATAAAGCCTTAACAACGTGCTATAATTAATAACCC TAATCTAATCGTTGATGCCAATATTAATTACAAATCCATCAGTGAACAATTTAGGGC GAGCGTCATAAAGCCATCGAACGTTTATCTGGCACTTCCGTATCAGGGCTAAGCAAT AAACTCGTAAACAAAGAGAAAATCCTCAAATCGCAGCCGAAACAAACCATTGTCAT GCTTTGCAAGCAAGCCTTAGAACTGTCGTGATAAGGTGGAGTGGCAACGAGACGTC AAAAAACCTCATAAATTCAGGGGAATCTTTAATGGAAGTGCGTTTTCCGCTGAAGCA AATTAAGATTAAATATAGATAGAAAAAACCCACAGGGAGCTTCAATTATACAGTTT GGTTAGCTTATTTTTAACATAACTCAGTTCAAGAAAACTAATTCATTTTTTCCTGTTT AATTTACATATACAAATCTACATGCAAGGGCAAAATTATATTCAAAATTCAAATTGC GCGCTCAACAAATTGCCATTCGCTGCGCACGTGGACTCTAAATAATATTACCTAACG CGTTCACAAACATCGTGTACTCGACCAATTTATTGACCTCAATTCTCATATTTGTGGT GTTATTATAAACACGTGTTTCAGTCAAAATACAATGCATGCGAATATTTAAAATTAG TTAATCTTGTGGAAAATTGATTCAACAATCATATAAAATTATCTGGGCTAAACTGAG AGATCTATAAAGCTGGGAATATTAGATGGAAGAAAGAATAACCAAGAATTAGTCGT CATAACCGGTTGACAACAATGAGTCCCATTCCATGCATGTTATAGATTTGTGAATAT 47 GTTTGCCATCCCAGGCAAAAACAATTTATTTTCAATATATTTTAATATTATTCAATAT TATTTAATATTATTTTCAATATTTCAACGATATTTTAAAGCTTATTGTCTTCTTGTAGG GACCAAGAATAGGTATCAGGGTCGAGATATACATACATACATATGTACATATATAC AAAACACTCATCATATATGTATGTACATACATATGTATGTACATGAATATGTACATA TGTCTAAAAAATGTCAATTTGAGAAAAGGTCTTTAATATTTACGTTTTATCTTCATAT CAATATTTATACTTTCATATAAAATTAACAAAATAGAGTATACAACGGCATATGTAT ACTACGGCTTACACGCAATATAGCGACTCTTTATTTGTTGAACTTTTTCGTGTAAACC TTTTTTTACAATAAATACAATAAAAGCAAGGATTAAAAACAATCCAATCTTGTAGAA AATGTTTTCATCAATGTGATAAAACTATGTGGGTATTGCTAAGAGATTTAGTTAGTC AGCATTATTCAAGGGAATATCAAAAGTGAGTAATTTCCTTTATCGTATGTTTTGATTG ACCTATTTCGCTACTTGTTTTTTGTTTGACTTATTTCGCTAAAACCTCTTTTTACGCAA AAAACAATAAAAGCAAGTATTAAGAATAAATCAATTAAATAGAAAATTAATTCATC AATCGTATAAAATTATGAAGGCATGGCTAAATGTTCTATATAGCTAGTAATATTCCA CGGAATATCAAAATCGAGGAATATGGAAAATAGAACTTGAAGTCGTCAGTATATTT ACGGTATATTTTGGTATATTTCTGAGGGTCGGTCGGTATGTTTTAACGATAAATCCGC GGTCACACGATCACATGATTTGGTCGGTCATAAGTTTTCTCAATTTTTTCGTTCGAAG GCGAAAATTCTGTCCACCGTTTTTCGTTTGTAGTTTGAGTTGACTTTTCGGTTGGTTTT CTTGGTTTTTGTGTTCTCTTTTGTTCTTACATTATGCCTGGTTTTGTTTCCTTATTGTTT TCCCCCATTAAGTGCGTCGTTCCTGCGCTGCCAGCAAGATGTTGTGGAAATCGCTGA TTGCATTGTGTGTCATTGGAGCAGCCTTGGCCGAGCAAACGCCCGTCTTCCTATGGG GTGCCAACAGGTGGGTCAGAATTGCGGGCCACAAGTTGTGTTGCTAATACCAAGGC CTTTCTCTTTGTACAGTGAGGCGAAGCCTTCGTTGAAGACTGTGCCCCAGGCGGAGT TTGCTGACCAGTTGGCAGCGCTGCTGAAAGATCACATGGTTGTGGCTTTCGAGGAAA ATGGCGTAAGTGCTGCACGTGAAATGGCTGATAGCAGAGTTTAGTTCTTAATGCTTT TTCGTCTGACTGCAGCTGAGCACCAAGGACTTCCTGTGCTCAAATTCCCAGTCGCAG TCTTGCTACGCCCAGCTGCAGGGCGTGAGTCCCAAGACGTA

A.8 Distal Derived STPP

CATCATGACGGAGATGCGAGCAGCCACCACCGAGGACCGGCCCCTGGCGGTTGTTC GATTGCTCAACATGGACGGACTACAGTATGAATGCAAAACGGGATTCCCCTTACAC ATGCCACACATAGGAAAACATAGCTACAACCACATATTTCTAGTCAACATCTCTTAC ATTAGACACGCCCACACACCCTACCACACATCCACATCACATGGGAATGCCGCAGG GGCCTGGATAGTCTGCTASCAAGGACATCAAACAACAAACATGGAGAAGCAACCTA ATCTGCCTACAATTTACATACATAAATCTCTATACACATAAATTATATACCATATAC ACTCTTATGTACTATTCTTATGTATTTTATACACGCAATTATAATTATAAGTAAATGT TTCGTATTTGAAGTCGGTCAGAGTCAAGCGGCGTTCGTTCCCCTCTTGAATATCTTGA TATTTGTACTAGGATTGTGGAGGAGTGCCGTGCCCCAATCAGTGCCCGAATCAAATC ATCCGTTCCGCTGTCTTGCGTGCCACCTTGCCCACGTTGTTGCCGTCCTCCTGCCCCA GCTGATTGCCCGATGCAGCGATCATTTTTCCTGGATTAATCCAATCCTTGGGCAAGC CCGTGCCAACTGTAAGAATGATGGGGATGGTGATTATTGAAGAGTCGAACAAGTGC TGGGTGGACACGTCAGTTCGTTTTTTCACGAGTGTGTGCAGCAACCGGTGGGAGCTA AAACCAATATCAACAATCGGCCATGACCAATGGACGACCAAGCAAGCGACAAGTAC 48 AGATCGACAGTTCTCAGTACGCATTTATTATAAGTCTCAGTTTCACTTGATTCTTGAT GATGGGACAATGAACGTTAAATGCTGATGCGGCCATCTGCCTTCGTCGACTCCCTAC CTCACAAACTGAGGGCGCTTGGCCCGCTGTGTTGCTGCTGTGCCTTGGCGCCTGCGA TTCGAAGATGATGTGGGATGAAGCTCTCTCCGAAAGGCAAGGCCAACGCACAACCA GCGCTAGCTGAGGCGCAGACAATTCGTGAAAAAATGAAGACGACAAGCCGCTTGAG AGTGGTTAAAACTAAACGGTACTTAGGCACTACCTCGAAAATATGGGATAAGAATT AATTAACAAGGAATATTGGTCGATTCTTAGTCCAAGTCGCCAGCTACTGTATTGCGC GGAAGGATTGAAAAAACGTAAACAGAACGAAATTGATGTGGCAAAAAGAGAGAGT ACGAAGAACGTGTTAGAAATTACTCAAATATTAAGGCTATCTATGTATTAGCCAGTG ATTGGAACCAAAATCAGAAACCGTTCAACTTAAAAGTAAACTCTTTTTGTTAATCAA AACGAGAGCTATTTACTTTTGTAAACCTTCTCGGTCATGTCGTCGAAGGCAGAGTCC TGAATGTTCAACATCAAGCTGTCCGCCGACAGGTAGACCTTGTTCTGTGAGGAGGCA ATCTGTGAAGAAAGGGAAAATGTTGAGATTCACATTGAGATTACTATTGAGGTGAA TATCATACCGTGCGGGCAATGCTTTGTGCGGCACGTAGCTTGCGCAGCTTCAGATAA GCTGGGTTCTGCTTCACGGCCAGACCTAAGTAAACATATCGGTTAAGGGAAAAACC AGGGGACAAATCAAATGGCAATTGCTAAATCGAAAGATAAAGTTTGTGTGTTTGTA AAAGGATATCATTTTCGCTGCCTCGGCTTCACCCTCAGCCTGGACAATCTTCTGTTGC TTCTCTTGCTTGGCCCGCTCCACAAAGAAGACGGCGCGCTGCGCCTCCTGCTGGGCC ACCTGTTTTGCCTCGATGGCGGCAGTGTATTCCTTGCCGAAGCTCAGCTCGGTGAGG GAGACATCGTCCAGAATGATGTTGAAGTCGCGGGCACGCTCGACCAGCTCCTTGCG GATCAAAAGCGACACCTTCACACAAGGAATTCTCGTTTAGAATCAAATCAGATCTGA TTGTGCTCAAGTCACTCACCTGCTGACGCTGTGTAATCAACTGCGAGGCATTGAACT TGGCGATCACGCTCTTCAACACCTCATTGCAGATGGAGGGCAGCACCTTCTCGTCGT AGTCCACGCCCAGCTGCTTGTGCAGGGAGGGAAGGTTCAGAGAGTCGGGACGCGAC AGCACACGCAGTGAAATGTTGATCATCTGCAAGTCTTTCGAGCCCGTCGGCGAGGCA ATTTTACGGGGACGGGAGCGAATATCATAGATGATGGGATACTGGAACCATGGAAT GCGGACATGCAGACCCTCTGAGAATATATCATTCTGTATGCCGCCAACGCGGTTAAA AATGATGGCGCGGTGACCGCCGTCGACTGCAAGCAACAAAATATGGAGTTTGGTTA TGCAATCGTGTTCAGATGAAATATTGGAAGTGGAAATATAAGAAGTTGCCAGAACT ATATATCATGGATGGGTCTACGCAGTATGCAAAATAAATCCAGATTCTATTCCAGCA AAGGTTGTGCAAGGCTTTTTAATAGTTATGAAATATGGAAATAGCCCAGGCCCTCGC ATACACTCACACATCGGCACTCAGTAGGGTTGAGGTTATGTACACATGCCGCTTCGG CATTCGTTTTTCTGCTATTATCCATTAATTTTCGACATTTTTTCAATCCCATTTCCCAG TTACCTGTGTAAAAGGACTGGCTAACACCATAGGCTACAGCTCCTGCGGCCGCCAG GAGCTTCAGTCCAATACCCAGTCCAGGTGGACCACCTTTGCCCAGCTTGCCCGCAAG ATCATTCAGTTTGCTTTGCGCCATGTTGTTTGTAATCTAAATGAAGAAGTTTATACTG AGAATCTGTCTCTTCAACTATTTCAGAGCTTGCTTTGATTTAATAAACGGAAATTTCA TAATTACATCGCACAAAAAGAAAAGCCGCACGTTCTATTTACCTGAATTCTTATCGA TAAGAAATCGAACTTAGTCCACTTAAGACCCCTCTATTTAAAAAGTTTAACTACTTG AGGTAAATGGCGGAAAATATCAACTATTGTAAATTGCTCTTTTAGATCTATGCCATA TTTCTTGAGAGCTTAACAAAAAAACACGATATCTTTCACCTGAACTGCCCTAAAATA ATTATTGTTTTATTATTATTTTCAGTGGAATATTAAAAGACCCCATTGGCTTGGAATG GGTTAATAGTTGTCCGTTAAGGATTCGAAACCATATTCCTCGATTTTGATATTCGATT TGATATTATTAGCTAGATAGAACTTTCAGCCCTGCCCACATTATTTTATACGATTAAT GAATGAATTTTCTACCTAACTGGCTTATTTTAACCACTTGCTTTTATTGCATTTTGCTT AAAAAATGGTTTAGCGAAAATGGTCAAACAAAAAAGGTTTTAGCGAAATATGTAGG 49 TCACACAAAAATCGAAAGGGGATAGTCGTTCCAATTGCTCAATTTCTTTATTCCATT AAATAATGCTGCCTAACTAAAACCATTAGCTCTGCCCACATAATTTTAGGCCATTAA TGAATACATTTCTACCTGATTGACTACTTTTAAGTACTCTTCTTTATAATGTACTATA ATGTAGAAACAAGTTTTAGGCAAAAAAGGTCAACAAAAAAACAGTGGCAACGTATG TAAGTGAGTGTGGAATGTCACGTCAACCTGAACAGGTATACCCTATTTCGATAATAT TATATGAAGAAACTTTTTTTAAAGCTGTTTTAATTTAATTTTAATTTGCCGAAGCTGC TGGATTGGGAGGTGGAGACGGTTGAATCGAACACTTACCATATGAGGGATATGATA CTCATCTATCATGATGAAAAAAGGTTCGTTCTTGAAATATGTCAAGTGAGACACGAA GTTAATTTTTCCCTCCAACCTCCAACAGCAGGAGGCCTGGACACTGCATCTGGTATA CCCCCGGTCGTTGCTGCTTGTGGTAGCAGTGCTCAAGAGTTACTATATAGCCATTGT CTCCGGATGCTAAAATAAGGAAAAGAAACCAAGATAAATATCCTTGCTCCTTGCAA AGAACTTAAATACACACTTGTTGCTGATGATGTTGAGCTTGTCGGGGGGTACAAAAT GACAATTGGCAGCGGCCGCTGTTCAATATTGCGCTTCACCGTGGCCGTCTACTGCTG TGCTGTGCTGGAGGCTATCCATCTTCACTTCACTTGCACTTGCGCTTCGATTTGATCG GATCGGATCGAAATTTTTTTGAAATTTTCGCGGGCGCCACAGTTGAAAGTTGAAATT TAATTTGGTTTTGCAGCCCTAGACAAGTCACTACACGACACCACTAACCATACAGTA TATACTGTATATATACTGTATATAGACAACACGCACCTATCGACTGTTTTAAAAACA GACGAGAACGTACTATGGAGAGTACTAACTACCATAACGCACTCCATAAGAACGTA CTATATTATAAATTATATTAATCAATTAATTGATTATTATTAATATAATATATAATAT ATATATATATTTGATAAAAATATTCTTTCGATACTATTGAATCGATTCTGCCAACGAT TCTGCAGCCCTAGGCAAAAGATCCGTTATGTTTTGGCGTCGCCGCCGACGATCCAGA CGATCCTGAATATGAAGATATATTAAACATTATAATAGACTAGGTATGTATTTATGG TATGTATGTATTTAAAACTCTCAGGTGCTGGGTGGAGCCGGACTCCACCAAACAAGT ATGAATTTTGCGAGAGCAGTCTCACCCCACCCCCATATTGACGAATTTAAGTTCCAT CATATATATTCCTTCCTTCCAATTAAAAGTTTCACGCAAACAGCCACTAAATATCTG AGCAGCCAGCTTTAAGATTCAAATTTGCCAGCTTTTGTAGCTTAATACCAGAGATAC TTTAGGAGTTTCGTAGTGTCCAACCTCTTTGAGTTTCACAGCACCCAACTTTTAGTTT CAGACAAATTAGTTTTATGGCAGCAAGAATTCTCAGCAGCCTGCTCGTAGCTTAATA TCAGCCATACAAGTAAAATGCCCTGCCATATCTCCATATAACCTATTAATGGAATGG TTGAAAGAATCCGATTAAAGCCGTTAATATTAGATATAAAAATAACTTAAAAATAG CTCAGCGAGAAGCTAATATGAGCTTTGCAGACATTTTTATAATCACGATATACTGCC CAGACTTGGAAAAATAATTAATCCTCAGAATATTATTACTTTATCAGATGAATCTAT TTTTAATATTTTATGTTATGTTCTATACAAATCTTCAGCTTATCTTGGCAGGAAATTT GAAATTTATTATCGAGATTGTATTTTGACACACTAACATTGTAGAGCCCTGACGTTA ATTATCATTAATCAGACTTCGAATTTGATTAGCTTTTGACTAATTATGTGCGTCAGTT TTGATCGACTGCAAAATTAAATTAACTTTTGCAATGGACCGCAACCAGCTGGACGGT GATTTCCCTTCGCCAAATCTAAGGTCGTTGCGCCAAACTGAGACAAACGAGTTCTGG TATATTTTGAGCTATGATTCACATCGAGAAAGATTTGACTTTTGTAAATTTTGTAATA ATAAATACACAGCTAAAGCTTTAATAGGACTCTCCGCTAACAAGCGTACATCGTAAT TATGGCATTAATACTAAGAATACAAACAGGCAACCGCTGAGGCACAAACTATAACT TTGTGTTAACGATATCCTCGCGCGACTCCGTCGTCTCCACCTCATTCCACTTTTGTAT CTTTCCCGATCCGAAGGCCCAGAAGAACAGCGCGGGCAAGATATACGCTGCTGCAC CAATGATGAATATCCAGTGCCACTGATCGATCGTGTTCTAAGATAATGAATGAGAAG TGTTATTGGGTTTTATAGAATGTACTATACATATTATTATGTTATATGTACTCACCTC GTT 50 A.9 Proximal Derived SCTL

ATTGCCTAAGGGGGTACCTACAGATATGTAAATATGTATACTGATGATTGCTTACCA TTTAAAAAGAAGTTGCTCTTCCCAGAATTTCCTCATGCTGAGCCACCTGTAAAGAGC CATACATTAGCATTAGCGCATTTTCAAACATTTTTGAGCACTCGAAGTACTTTCTAGA GTCAGTCGTTTGAGAATCAACATAATTATTCGGTTATATGCTTGTCTATTTGGTTTGT CATCACAGAAAGTACATCGGATTGTGCGCAGTGGATGTTTTTATTCACCTTTGATCA TGGCTTGCAGACAGTCTGGACATTGCTGGTGCGAGAGTACAGTAGGTGCCACTGAA AACGAACAGAATGAGGTTAAAACCAATCCATGTGCGAATTTTCACGATAGCTAAAT TACTGAAGCATCAGTACAAAAATCCAATCGACATTATTCTTCCGACTGGGGCGGTAT ATATTTAGCATCATAGAAACAATAACTTCACATCCATTTCCGAACGAAACTAACCTT AATATTGCTCCGGCAACACGTGGGACCACAGCAGCACCAGGCCGGTGGGTTCTCCA TAATAATCTTGGACTACATTGTAATCAGCATCAAACTGCAGACAATGCAAGCGACCG CTTTCGTTCTGCGTTCTGTAAATGCAATGAGAGATATAACATTTAATTATACTTTCAT CAAAAAATGCTCATCGGCATGTGGCATTTGTAATCAGAATTGAAAACCAATCAACA CGTCTGTTCAGACTGGGGCGGTATACGAAAAACATCATGGAAACACACATTTATTTA CACTCTCTTTTTTTCTACAACTTACCATTTGGTTGAAATTATTACAAATTTACGGAAA AAGTGAACTGCAATGGCCGGGGGCGGAAATAGAAAGAATTAGTGATGTAAAAGAA CATCGATGTGTCGAGCATCGATGTTTTTCTGCCGATGTTTACTGATGTTTTCCGATGT TTTTGCTGCTACCGATGTTGAGTGATGTATCGATGTTTTGAAAAGTACCTATCCTAAT TTATACAAAATTGCAATGCAATTTTTACGTTAATGGCATCTTCTGGGTGTATTTTGAA CGGGTGTACTCAGCGGCTGCTAACATATTAAGTCGAAAACGTAATAGACTTTCACCT GTGCGACTAAGCCGCATTTTTGTTTTTGCAAAGCATTGACAAATAGTACTTATTTGA AATATTAGTTATCTCTGATGATATCTACAGTGATATGTAAAAGACTGCTATGAAAGC ATAATCATTTCTTTTATGTTAAATATTGTTTCGTTTAGTCCTATGATATGTAAAACTCT GCTATGAAAGCTCAAGTTTAATTTTTCCTTTGTTGAATATTCGGTTTTATAATCAAAT GGGAACGTTTTGAAACATCGGTCATACATCGATGCATCGAAGAATAAATCCGATGT ATCAATCCAACATCGATGTTTGATTTTGACAACATCGATGGACATCGACATCGATGT TTTCATTCAAATTTACATCACTAATTTAGTGTAGAGGGACACTATCGATGGTAAGAC ACTATCGCGCTACTAAATATCGATGGTTCAATCTTTCGATAGTGGGCGATAGTGGGA AGCACAGAATAACCGTGCAAGCGTCTTAGCCCACTAAAGCCCCCTGTATTTAAATAG TTCAACTACTTGAGGTAAATGGCGGAAAATATTACTGATTGTAAATTCTTCTTGAGG ATTCATGCCATCTCTACTATAAACTTAAAAAAAAATCACAATATTTTTCAGCTGAAC TGCGCTAAAATGATTAAATTTTCATTATTTTCAGTGGAAAATTGAAATACCCCCTGG GCTTGTAATAGGTTAATCGTTCTCCGTCAGGATTGGAAACCATATTCCTCATATTCTT TGTTCCGTTGAATATTACCAGCTAAATGGAACATTTAGCCATGCCCACATATTTTTAT CCGATTAATAAATCAATTTTCTACTTGACTGGCTTATTTTAAACACATGCTTTTATTG GATTTTGCTTAAAAAAAGATTTTAGCGAAATAAGTCAAACAAAAAGCAAGTAGCGA AATAGGTCAATCAAAACAAACGAAAAAGGAAATTCGTTCATATTGCTTAATTTTGAT ATTCCATTGAATAATGCTGACTAACTAAATCTCTCAGCAATGCCCACATAATTTTAT ACGATTAATGAATCAATTTTCTACAAGATTGGATAGTTTTTAATCCTTGCTTTTATTG TATTTATTGTAAAAAAAGGTTTGAACGAAAAAGTTCAACAAAAAAAGAGTCCTTAT GTTGCGTGTAGGCCTTTGTATACCCTATTTTATTAATTTTATATGAAGATAAAACATG ATTTATAAAGACCTTTTCTCAAATTGATATTCCTTAGACATATTCAAGTACATTATTA GTATCTTGTGAATATTTATCTCGATCCTGATACCTACTAAGCCTTGGAAGACAAACG 51 TATTCACAATTCTATAACATCCATTACCCCCAGGGTATTTGTGCATGGAATTATCAAC ATGTTTGTGTATGGAATGAGACTCATTTGTTGTGAACCGGGTATTGCCAAATATCTTC CGTCGAATATTTCAAGCTACATATACATATAGATTTCTCAGCCTTGCCCACTTAATTT GATCCCATTTATGAATACATTTTCTGCAAGATTGCTACTTTTAATTACTTTTATCAGT ATACAGTATACTGTATACAGCAGAGCCACCTGGCGACAAAATAGACATACATTTTTG ACAACATTGCCCGTTTCCCGCCAAAGAGCGCTAGTAACTAGAAAGTTGTTATTAAAG CCATTAAAACCAGCCTAGCATCAATAATATCGAGTTTCCGTAAGACCGAAGAAATGT TTGGATAGATATGTATGAATATTAGTATATATTGTAAATAGTCTTGTGGCCCCATTCG CCTCGCCTTTTAGGTATTATATAGTATCCGCTAATTCAACAGAATATGTTAGTTTGTT AGGAATGGCTGTTCTAATGTGTCTCGGTTGATTCCATTTAAAATAAAACGCGCTTTCT GAATACTTTTATGTTATTGTCGTTGTAACTAAGTACTACACACAAGTCAGTAAATAG TAGTTATTATGCTAAACATTGTTTAACCAATAGGATGTCTAAATGTGCAAAAACGTG TACATTTTATAATCAAATTGGGAAAAGAGTATACATATTGTATATAAAGGCCGGGTT TAAATGTTTCGTAGTTGGGACAGCTGTTAAATGCTAATGGTGCAGACAACAATAATT CTCATGCCTAAGGTTAACATATTTCAGTGGATCATTGTATATCCGGTGCACATTCTAC ATATTACCCAAGTTACATTTTATTACAATTTGTTGTCACAGGGACAGGAAGCGTTCG ATCCAGCGGTGACTGCGGCATGTATGGCCTTGGCCACATATTCAATGTTGCCCGTGT TAAGTCCACACATGTTGATGCGGCCTGTCTTTAGGAGATAGATGTGGTGCTCCTTGA TGAGGTATCGAACTTGGTTCTCATTGAGGCCCGTGTACGAGAACATGCCGATTTGAT TGACAATATGGTCCCAGTTGCCAGGTGTTCCCAGTGCAACGAGCTTGTCCCGCAGCA GTGAACGCATCTCCCGGATGCGACTCGACATTTGCTTAATGCAGTCCATCCATTCTT GACGCAACGCAGGGTTGTTCAACACGGCCGACACGATACGAGCACCATAGGCTGGT GGGTTGGAGTACATGCCGCGGATAATCAAAGTCAATTGGGAGTGTACTGCAGCTTTA GTGGATGCGTGCTGCTGCACCACCGTCAAGTTGCCAGCTCTTTCGCAATACAGGCCA AAGTTCTTTGCAAACGATTGGCAAGTTAGGAGCTCAAAGCCTCGCGAAACGAAATA CCGCACTGCCCAAGCATCGCGATCGGGGTCACCGCTAGCAAAGCCCTGATATGCCG AATCAAACAGAGGGAATAGTTTCTTGTGCTCCATCAGATCGGCAATTTGCTTCCACT GTTCCTGGGTGGGATCAATTCCGGTGGGATTGTGTGCGCATGCATGCAGGATGATCA CTGCTCCGGGAGGAGCCTGGTCAAGATCAGCCAGCATGCCATTGAAGTCCAACTGA CGCTTGGATTGGTCCCAGTAGCGGTAGGAGTGCAACGAGGTGAAACCAGTATCGGA AAATATCTTGTGATGGTTCTCCCATGTGGGATTCGAGTAAAAGACTTCATTTCTGTTC AACTGAGTGTGGAGGAACTCGGCAGCCACACGCAGAGCACCTGTGCCAGATATAGT CTGCACCCCAAAAGCCTGCGAACAAGAAGAGTATTTCAATTATAATTTCAAACGAA AACTTAGTCTTTTTAAATTACGCGATTCTCCTTGATGGCTATGGAGTCAGCACCCAG AACCAATTCCGTTGCCGCACTTGTGAATGGGTCAAGACCTTTGAGAAAAATAGATAT TGTGGTTCAAATTGATTCTCAGGATGGGATTAAACCCTTTAATTACTCACCAGTCAC CGGCAAATACTCATGGTTAACAGACTCATCGCTGGCGATTGCTATTTCGGTTTTTCGC ACCACAGGCAGGACCCAAGGCACACCAGCATCTGTCCTGTAAGCTGTAGTCAAAAT AAAAGACCTTATTAGTAATAATAATAACAGTAAGACCAGTTCCTGGTCAAGTCTTTG AGGGAATCCATTGTTCAAAGTGCTGACTTTCGTACAATCTGAAATCTTATCTTAACG GACTTAGTATGGGTAATCAAGGGGTAATGCTGTACGAAATCCGATAATGTGAAAGC ACTTGTTTCTAGGCCATTTTCTATACCATATGCCACAACAAGCTCGTCGGTTATTCGA AACCAATTTAAGTAATTGGGCTGGAATTGTTTGTTTTAGGAGCGCGTGATAAAAAGC GAGAGAGAACAATAATAATTATTAAAAAACAGTAAAACCAGCAATACCAAACAATA AATGAAGAAATCTATTAATAGTTTATTGATTTCAGCACTGTAAAAGTTGATGAATGA TTTTATGTATATGTAGTTCCAAGCTAGCATATTTATGAATATAATGTGTACAACGATT 52 CGGGCAGGTCTCATTAACATGAGAATAAGCCATAGGTTATTGACTTGGAAGATTCAA CAAAACAAAATAAAAAAAGCAAACAATTTGCTGGGGGAATTGTCATTTAAATTATG CAACAACATCCATAAATATAAGAATTGGGAACTGCTGGATCAGGCAAAAACAGCCA TAAATATAAAAAAAAGGTGAAATGGCATCCATCCAAATCAGCAGCGGCCTTGCCCT TGATGATGAGATTACAGCTTGTGGTACGAAGCTGATGGAAAGAAAGTTCGAATCTA AGCGGCTTATCGTTATGGTTATGCACACTAGTACATAACTAAATGCTGTACTATCTG ACGTCAATGAGAAGAAAAACAAGGAAAGTGATGGATCCTAGAAAGCACAATTCTTA TCTCTCACAATCAGTTTCAATGAAGAAATGTGACACGAATGTATGAGGAATAGTGTG GCAGAAGAATTGCAGTGTAAGCGCGTCATGACAGCCCACCCCAAACCACCCCACCC TAAGGATAGAGAATCACAACAATTCACATGCCAAGTGACTTCCAAATGACCTCCAG CAGAGGGGGGGCCAACGTCATGCGAGGAGGAGGTGGTGGGCTGCAGGGCGGCCGC TAACAAATGATAGCAAATTGGACAAGCGCGACTACGCTACACAGACCCCCTACCCT ACACAATACTTTGATTGCACACATATGCACGAGTATGCACGTACATATGTAAGTGCC AAAAACACTGAGACCCAGAATACGCACGAGCAATTAGTATTTACTTACCGCCCACT GATAGATTGACTTTGTTTTGGTTGGAATCGTCTTTAAATGCTTGCGTCAAGGCAAAA ACCTCAATCGCGGGTCCTTTTTCAACTTTGGCGTAGATAGACATGGCGAAGAACTTG TAGAACAACGTAGCGAGCAGGCAATAACAAATAAATTAAATTTTGAGTGTGCTGTC CAGATAAGTGTAATCGGCAATAACAACAACAAAAACTACTGTCGACTACAACTATC GATGTTTTCACTAACGATTTGTTCGACTGGATTGATGATATCAATTGAGCACCCGCA CTTTGACTCTGACTTTCCCCTGCTTGACAATGAAGAATCCTATAATTTTTTGCTTTGC ACGTTGGACGAAAAATAAAAAAGAACCCGCCAACTACTTTACGAAACCATCGATAG TACAACGATGTTTTTTCGAGAGTTTAGTCGACTATCAATACAACAACAGCTGGAGAC CAGCTGATCCACGAAATGAACAACAAATCTGATTTCACGAAATGTTTTTAAGCTAAA TCCTGGTCAAAAATAGAATACATCTTTATGGCTGGTAAGTCATAAACTAAAACAGCA TTGGTTTTTCAATTGTAAACCATTGCCCTTGACAGCCGATCCCAACCACCCCTCAACC TCGGCGAAAGCGGATGAAACGCCTCTGTTTCCGCCCGCCAAGCCTGGCAAACGGGT CGACTATGGTACAAATGATACACTGGAGCTTCTGGAGCGCATCCCAAGCCAAGTGG TCGATCTGAAGCTGGACGATGTGCTAACCGCCGTTAAGGAGGTGGACAACCTCTGC GACTATGCGCGCTGCAAGACCAAAACGAGCCTAATGGGCCAGGACTGTCCGCACTG CAAGAAGCGGTTCTGTTTCAAGCACGGTCTGCCCGAGGTTCATGGCTGTGGCGAGGA AATCAAGCG

A.10 Distal Derived SCTL

AAAACACGACCAGCCAGTCAAACCAACAACAAATGAAATTAGCGATGTTGAATATA TACAGAAATATATACGATATATCGATATATATTTTTGATGATAATATATTTTTTAGTG CTATTTTTTGGACTGCAAAATATTAATATTTCTGTGGAATCCCACTGCTACACAATTA TTATCTTGTTTCAACTCGTTAGAACCAATATGTAGTTGATACTTGCAGTGTTGAAGAA GAACCGATATTTTTGTGCGATAAGCGCCTGTTGAATGTGGCAAAAATTCATAGCAGA CACAATCAGTTTTACAGTCAGTGGACAATTATTGTACATTTATTTAACGCTTGTGCTT ATTTCTACAATTTTATATATGCTGTTTTATACCATATAATAAAATTAATTTTGGGGAA ATTGTTATGGATCTTTTTTGTTTAAACCACATTTTATAAGCAATCCAATAAAAAAAA AGAACTTAAAATCAGACAGTCTTGTAAAAAATTCATTCACCTATCGAATACAATTAT 53 GTATGTGGGCATGGCTAAATATTCTATCTAGCTAATAATATTCGACGGAATATCAAA ATCGAGGAGTATGGAAAATATAACTTGAATTCGTCAGTATTTTTTGAAAATGAGACC GTATATTTTGGTATTTTTTTGCATCGTGTAGAATAGATAATAACCGGTTCAGAACAAT GAGTCCCATTCCATGCACACATACCCTGGGGGAAATGGATGTTATAGAATTGTGAAT ATGTTTGTCATCCCAGGCAAAAACAATTTATTTTCAATATATTTTAATATTATTCAAT ATTTTTTAATATTATTTTCAATATTTCAACGATATTTTAAAGCTTATTGTCTTCTTGCA GGGACCAAGAATAGGTATCAGGATCGAGATATATACACAAAACACTCATCATATAC ATGGATATTTCTAAAAAATGTCATTTTGAGAAAAGGTTTTTATTAGTTACGTGTTATC TTCATATCAATATTTATCTAACAATACTTTCATATAAAATTAACAAAATAGGGTATA CAAAGGCATATGAACGAATTTCCTTTTTCGTTTGTTTTGATTGACTAAAACCTCTTTT TACGCAAAATGCAATAAAAGCAAGTGTTTAAAATAAGCCAGTCATGTTGAGAATAG TTTAATTTATCGGATATAATTATGTGGGCTTGGCTAAATGTTCTATATAGCTAGTAAT ATTCGACGGAAGAAAGAAAACGAGAAAAGATCGTTCCAATCCTTGACGGAGAACGA TTAACCTATTATAAGCCAAGGTGGTATTTCAATTTTCCACTGAATATAATGAAAATTT AATCATTTTAGCGAAGTTCAGGTGAAAGATATCGTGGTTGTTTTTAAAGTTCAGAGG AAAGATGACATGAATCCTCAAGAAGAATTTACAATCAGTAATATTTTCCGCCATTTA CCTCAAGTAGTTGAACTATTTAAATAGAGGGAGATTTAGTGGGCGTCCCCTCTCCCC TCTGACAATCAAGATGTTAGCCGTTATGGACAGCCTCGTCAGCGTTGGCGCTCTATT TCCATAGAGAACCTTCCCATACCCAAGTCAGGGCGTTCACACGTCCCAAATAAGAA GTCATTTAAGTACATATGTATGTATATATTCTTTATTTATAATTCATTTTTCTTGAATG TGCACTGAATGCAGCATTAAGCGTCCTGCGGGGCCTCATCAGCGGCTGCCGGTGGG GCAGCACGGGTCGGTGGGGCAGCACGTGCTGATGGGGCAGCACAGACCGGTGGGG CAGCACGGGTTGATGGGGCAGCACGGACCGCGTCTCTTGCGATCGCAGCCCTATGC ATACGGCGGTTTAGGCCTCCTTGGGAGCCGTGGGGACGCCCTCCACCTTCACGCCCT CCACCGTTGTGGGCACCGCGTCCACGGGCACCTTGGACGTAGTGGGCACCGCGTCCT TGGCTACCATTTACGAAATATATCGCCTGAGGCATCTGAAAGTATAGAAAATATTTA AATAAAAACATTAATACAATAAATAAACTTAAACTTACCTCTTTAATTTTTTAATTTT TTTTTCTAGCGTCGCGCCTTGGTTGTCGGTTGGTTTTGAATACAAGACATCGGAACCC GATAGAACCTAATATCGACATACAAGTGCACGGTTCCATTTGCCAATATTTTGCAAT TAAGTAAAAATATAACTAAAGGGCTTGTAAACAAATCTGAAATGCAGGAAAATAAA TTAAACAAAATGTTTGATGTACTACAATAAATTAATTTGGTTGTTTTTTGGCAACATT TTTTTTATTTTCTAATTGGAAACACGATTATCATATGTGCGGGTGGATCCGATACCGG TATCGATAAACACCTTCACCACCAACAAAGCAACAATTGCGCAAAGCAATTGTTGTA TTTAAATGTTGTTAAAATAAGTGAAAGTGTTAAAAAAACAATGTTACAATGCTTAAC AGCTGGTAAGTTCAAGTTATTTTTTTAATTGGACGTAAGTTTTGCGAAAATTCCTATG TTTTTGTGTTAAATTTTCGGTTCTTCATTTTTGTGTGCATTTTGGGCGTGCATTTTGCG TAAAATTTGCAATTTCCTTTTGGGTGAAATTTGTGCGTGAATTTTTAGGGTGAAATTT ATGCATGGAATATGTGCGTGCCAGTGAAATTTGTGTTTGCAATGTGCAATTTCCTTTC GATTTATTTTTCCAATATTATTTGTATTGTATTTCCAATTGCATTTGTATTATTGTATT ATTTTAGCGGCCAACTTGGCAACGGATATGCAAAAAATTGCATGCATCCTGTAAGGG ATGAGACCAAGAATGATCCAAGAGTCCCTAGGTTCGAGACCTGGACTGACCGACCC CGGCTGGTTTTTAAATTTGTATCCCAAAAATTTATTTTTTCCTTCAAAATTCACTTTTT TTTTCCTTAAAACCACAGCTTTTTTTCATTAACAAATTTTTTTTTATTTTTTTTTTTAAT TTTTTTGCTCTGCCATTACAATTTTTTTTCTAATAATAATAATAATAATAATAATTTTT TCATATTTCAGCAAAACCGACAAGTGCTGAGTAACAGGCTGCAGTGGCGATGTAAA ATCGTTCGCTACTCCAAGGATGGGTGAAATGGGAGGCTTCGAACAGGAAGTCTCCC 54 AACCGAAAATTTGGGTAAGTTAGTTAAAGATAGTTTCTAGTTAGTTTTAATAGTCTG TCCGTCAGTCCGTCCCTATGAGCGCCCCATTTCTCAGAGACTACAAGAGCCAGGTGC CAATGCAATTGAAGTCCGAATCAAACAGGATGAAATCACCCTTGGCGTTTCCCTCTG CGTCGCCCATAACAAATATCATTATATTGCGTCGTGACCCCCCCACGTCACGTTGTG GCTCTTCAGCACGTCGCCATCGATTGTCTTTATGATTTCGGGCTTCCGCGGATCCTTG ACCACGTCGAGTATGTAGACAAAGACCGAGTTCGGCCGGCAACACCAATCTGTCCC TTTTCGGAATGGTTTTTGCGTTCTCGTCCACATAGTAGAAACTCGAGCAAGCATTCC ACCCAGAGTGGTGCAGCTCGTCCCCTTTCCGATTCTTAAACGTAAACTTGCATAGAT AATTAAATTTTATTCTATTCCATACACAAGAATAGGCTTACAATCCAAGGTGCAAGT CTTGGCAAATATAATGATAATAACCTATTTACAGAATTGAAAAAAATAACCTTGGTC CAATGATACGAAATAATAATTAAATTTTAAAATATCAGTGAAAATAAGCGGCACGT GTGGCAAGACTGAGATAATTTATTGGTAGGACAAGTCTAGACAAATACATTTTACAA AAGGAATATTTTCAAATTCTATAGTGATTACAGATCCCATTATGACTGGGAAGTGAG CCTGATATTGATAAGATATGGTATGTCTAGTAATTAGCTGTGACTGGCACAACATAT TGTTAAATTCGTTTGATAGCGAAGATTAGCTGTACTATAAATGGTTTCCATAGCTTTC AGACGGAGGCGTGGCAGTTCTTATCATGTACATATATAGTATATTGAGTGTCTGGCA TTAGCTGTATTATAAAAGGTTTCCATAGCTTTCAGACGGAGGCGTGGCAGTTCTTAT CAACATCTACATATGTACTACTATATAGTATAAAGAGTGTCTGGCACTTACCAACAA TCCCAATATGTTTGTCGATTCCTCTCACGTACAATCTACAATCTAATCTATGGGACTG TCAGCTGAAAGCAAGCCGATTTTTGTATTAGTATTACATAGAAAAATACTAAAGTAA AACATGGACTAAACAACTGCCCATCCAAACAAACATCTTATAGTACATAGTACATAT GTACAAAATATGTATTCCTCTTCAATACGCAACATGCTTTTTGTTGGGTCTCGTCTTA CATATAGATAACGCGCCCCCTGTTATTGCCACAGTCGCGCGTGCCGTGATATAAAAA TAAAAATCTAAAAGACTCTGATACACAAGCTCCAACACATGTATGTATGTATGTATG TATATACATGCATGTATACATATTTGTAAGTATATAGTTTTTGAGAGCGAGAGCCAA TCATAAAGTCCAGCTGATAGCGACGATAACAGAGAGCGCAATCGTCGGGTCGGCAA CGATGAGAGCTCTCAGCCCGAATCAGTGCCCGAATCTTCATTAATTTTGAGTCGTAC GATTCCCCCCGATATAGAAGTAGACTCACCGGCTGTTAATATAATAAGCTTCAGCTT CGTTAACATCGCCGGACTACACACGCGCCGAGTTCAAATGGGTGAAGCATTCTAGTC GCCTCCACAAAACAGAACAAAAAGAACAACAACAGCTCTATCAATCGTCTTGCGAT TTAAAAGTTTCCATAACATCGAACTGATATGTATGGGTGGGAGACGTCACTAATGAT GCGATGCTCAATGGTTTTTGTTTTTTGTTGGATTGCGACCCATGTATTTGGAAATAGT TCTAGAGAGGGTTGATCAGGAACATAATAATAGATCCCGGAGAGCAATTCATCTAC CCTGCAGGGCCCTGTGAGAATGGTTGAGAAAGTATACTGGTTATAAATTGCAAATA AACAGTTTTTTCTGGATTAGGCACATGTCTGCTCTACCAAGAGAGCTTTTGCTTGAGC TAAAGGCAGAATTGACAGATGTGTGTTAGTGGAGCATGGAGAAATGGAATAAACCA CTCACCACTTACGACTGATAAAGAACGGCAGACTGGAGTAAATACTGGAATGTCTG CTCCGTGAGCCTCTCTCTCACATCGGCCAGCTTCTTTGGATTCGGCCTAGCTTTGGGG GCTGGTGAGAACAGCTGTGACGCTCCCCCCACCGACAATGGTGGCCGGCATCCGGT GACAACAACACAAACCAAAAATTGGACGTTTGGACAAAGTTCAGAATTTGTTTTTGT GTCTATGGTGAAGAACAAATTTCATTACCGGTTAGGAGCACACGCGCAGAGCACAG AGGCAGAGCCGATGAGAGGCGCCGTGATTGGGAGAGGCAGAGAAGATAAGCAGGC ACCAACCCCACAACTTGGTATAATTATTTAAGGTAGAGCCGACTCACTTTGTGGTGG CCTATGCAGTCTGCTTAGGTCCCTCGCAGAGCGGAATGTCTATAGACTCTATACCCA TTATATTTATACCATGTCGGCAATGTCGAAGCTGCGAGACTGCTGCTTTCAAGTTTTG GCGAGAAGCTCAACCGGCCCGTTGAGTTGGTCGTGCAACTTTGACGTGAGCGGTTCG 55 CCGAGTTCCTGGGAAAGAAAGCAAACAGCACGCCGTTGTAAAGTGCAGTGAACAAA TTCGACTTCGCGGTGTTCTACCTACCCAAAAAAAAAATTAAGAGGAAGAGTGTACA AAACACATATAGGAACCGTGTGCATTTGTGTCCCCAGAGCTTCGATCGGAAGGCTGT CCCACAGATTTCAATTGTTTTGAGTGCTGAAACGGCGATTCCCCTTTGGAGTCCCTTT GGATAACAAGGATAAGCGCAGGGCGCTTTTTGTGCATTTTTTTTGGAAACGTAAGTC AAAGGTCACCGGCCCCAATCATATTGCCAGACAGACAGACACCTGCCCACGCAGGA GTGCGCCGCCACAAGGATGTGCTTGTCGATATGGATTTCACTTTTTGATTGTAATCG ATCTCATTTAATTGCATAATTTCTGCTTAATTTTCTTTTTCGCTCGGCCTCGTCCCTTC TCCCTTCCGGCTCACGCCACCATCTACTGCAATTTACATGAGAGCCCTCTGCTCTCCC TTCCTCTCTGGCAGACATGTGTTAATTACATTTTATTCTATTCCATACACACACCACA CGCACACACCATCACTGCGATAGAGACAGATATAGAGAAAGAGTGAGAGAGGGGC ACACGCAGCCGCTGGCTGTTTTATTTTTGTCCTCTTTTTACTTTCTGCCCAATATTTAT GACCTTCTTATGGTTTCGCTATTGAGTCGAAACGTGTTTGACAAACCAGACAGACAG ACATAGGCAAATGTATAATATAATACATTTACATATATACACAGACATGCTTCCGTT CGAATGTGCAATTAATAATGGGGATATACAGTGAGAGATCTGTGGAGTGGACACCC ACAAATCTAATTGGGTTTCATTTGCCTTCCTTGGCAGGGCATCAAGTGTAGACTGAA AGCTGGACCCGGGACCAGTCAATGTCAAAGCTTCTATCAGCTAGTCGACTTGATAAG GTCCAGACAGTTAGCGTAACCAAATATAAAAGCTTTCCAGTGTTACATTCTAAAAAC CTAAAAGTGCGGCATGATACTGAGACTAAATGTAATCGATGGGCGGAAGTGCTCAC TATTCCGGCTGGCTCATCCATTTATGCGCAATTATTGTCACGGCTCCGCCATATTTTG CGGAAAATCAATGCCCGGCCCACATTCAAAACGACTTTGACCTTAGCGGACTGTCGT TCAAGATCGTATGTTCAAACAGCAAAAAAAAAACAAAAATTAGCACGCACGTTTGA TGAGTAAAAGTGGGGGGAACAAAAGCAAATCGTTTGAGTGGCAGTGGCCCCTTCTA ATATTATTATTATTATTATACAATGCACATACATACATACATAGTTTTTTGTATTATC ATTATATTATTACATAATCGTGTGATGACCGTGCCTGGGCAGGCGAGTAAACACATT CGGGAGCTCGGGTATGGAGTGGTTTGCCATGTCAACAGCGTTTTCGTAAAATTGCTT GAAATTGGTCTCGCCCAGGGATTGCCGCACCACGCGAGCATTGAAAGGGCACGGCA CACACGGCAAACAGAAGCAACAGGATAATTGAAACTGTCTGTAAACATTGTCAAAA GTTGAACAGAAAATTTACACAAACAAGGGAAACAGGAAACGAAATCAATTACAGTG AAAGTGATCCGCCAAAGGAAGCTCTTACTGTACTCTGATTTGTATATTTTCACCGCA ATTTGTTCTTTTAAGCTCGAGTATTCAAAGGAATATTTAGGAATCCGGTTCTTTAGTT TCTTAGCTTCTTCAATTGCAGAACAAACAAATTTGAAGCGCAAAGGAATGCATATAT GTATATATATATACATATATTTTATAACGGATCGATATTTAGACTCCTTATCGCACTT CCTTTAAATTATTGAAAGCAAAATCGAAAGAGCTTCCGATTATACGAGAACCTTCCT CATAGTCTTAACGATAAAAAAATCCCTGAGATAAGACGAACGATTTACCTTTTGGCT TTATAAGTAAGACCATATAGATATTGAAAGGTTGAAGTGATTGAGCTCTATAAAACT ACTGCCTCAATAAGAGATAAGAGCCGATAAGAGAGGAAAGGTGCCTGGATACCTGC CCTAACTCGACTTATCGATCCGTCTTCCATATGTTCGTGTCTGTCGCTAAATATCGCA CTATATCACTGCCACGCAATCCTAGAGAATGTTACACATCGTTATATAAGAAGTATT GCATTGATAAGGAACCATTTCATACTTGAAAAACACATGAAGCAACATATTTTTTCA TTCAAAATTTATTATAGTACAGTAAAAATAAAAACGGCTGTCGAGTGGCTGTTCCGT TCTACCAAGCACAAGGTGACATTGAGGCTTATCTGGGGATTCGATTTTAGTTTTCCG AATAGTATTTGGAGTGAGTATTTGATTGATAGTTGGACCGAAATACTAGAACTGGGT GGAATCTTTTTGGTTGTTAATTATATAATCGTGCATTAGCTCGGTACGAAGGAACGA AATTAGCAAATTATATTGATGTTAACAGACGAAATTAAATTAGAGCTGTCGATCGAC ATCGGATCACGTCCATCATGACTATTTATAGGGTGCCCCCAGGAGTGCACTCCCACT 56 GGGAATGATAAAATGGATGGCCAGACCTCCGATGTGTTAATAAAGAAGTCTTATCA TCCGCTCATTATCTGCACCTGTGTCGCCTCATGTGTAGGCAAACAGTCAATACATGT GTCAACAGTGGTTCTAGATAATCTATGGACCCCTGGTGTTTCCACTCGCCATTCCCCC CATAGTGAGTCTCTCGGAGAGGCATGGGGTTCACGTGCGGGCACAGTCCGCCCCGT GGAGGAAAATATATTTATGTTTGGCCAAATATAAAGTGGATCTTTATCAGAGCACAT AGTGCCTAATCCAATTTCGCTAAAATCACCCGATCACTCGGTCCAGTTCCAAATGCA CTTCAAAGTACATATGTATATACATTTTGGCGACCCACACGAGGCCCACTTTCTGTTT GACGCAGACGAGCCCCCATGATATACTTGAAGAATCGGAGCACAAGAGGAAGAGAT AGTGAGAAAGAGAGAGAGGTAGACAGACAGAGTACAGACAGGGTCGGCTATTTGA ACTTGTTCGGGGCCTGAATCTGAAATGCAAAAAAAAAGCGTGTGTTTGCCGTCTGGC CAGCAGCAGCAAAGCCGGTCACTGGCGCACCCCTGTGCCCCCCTTCCCGTCGGTCTA AAAAAACCCATATAAATCGTTGATAATGAGCAAAACAAGCTTTTAGCTAAAAGTTA ATGGAAAAAGAAAAGCAAAAATACAAATGTTACGAGGCATGTGCGAACGCACTGG GGATACTCTTTAGAAGAACTACCAGACGTTCCCTTGATTCGTTTTTATTTGGGTTTAT TTTTATACTCTATGCTAATTTGTATTCGCCATAGGTGTAAATATATAATACAAATGTT AAACCATTAATGGATTTAAGTGTAAATATCTCGACTGTAATATACCCTGGGATCTTG CTACAGAAGTAGAGGAATTTGATTTCAACTAACAGTATGCATCGTGGGGAACAGCC ACTCCCATATACCCTTTAGGTATTGAGTGTATATTAAATATTGACACATGTATTAGCA CAATATGTCCACTATAGTACGCAATAGAAAATAAATAGGCATTTTCGATGTATCAAT TAGGGAAGTTCACGGCACATCATAGTTTGGGAGCTGTTCACCCGATTCTACCCAGAC AATCGGGTTAACATATTCAAGATTGTCATTCCAGTTGTTAGAACTCGCTTTGGATTTT GGTCACGTGAGTCAATCCACTTTTTGCTGGGGCTTAGTGCGCCGACAGCTCGTTATTT AAATGGAATAATACTTGGAAATCACACAATCGGCGAACTTTGAGCCGAAGATTAGG CTGGAGCTGGAGCTAGAGCAAACATACAAACATTCGTAGATCAGAAGGAATGAGAA CAGTTTGAATTGAAACAATTTTAGGCGAGACCTTCCTTCCCCTGTGGCATGTTGTTTG TTGGCGCATTCGAAAGTAAAGTCGATAACCTTAAGGCATTTGGAAATCGAGCGCAA CTCGAAATCAGCTTATTAGCAAATAATTGAAGCACCGAATGCAATCTCTCGATCGTA AAATACAACAGACTGCGAGTATTATATGTGAAAAACATTATTCTCCTCTCGCCCCTG CAGCAACGCTCGGATCATCATGTTCATACCAGTGGCCCAGTCCGCCTTCAAAGCGTT GCGCAGTGCGCAGCCGCGCGTGCGCCTCTACTTCGTCAGCAAGAATGCCTACTCGAG TCAGGTGAGTGGAGGGAGGTCCCCGAGGGTCGCAGGTCACCATGAATCACCGATAT TATGTGTCCCCACCAGGTGGAGTACAAGCCCATTCGATCGGTGCTGGTGGCCAATCG AGGGGAGATTGCCATTCGCGTGTTCCGCGCCTGCACAGAGCTGGGCATCAAGTCGGT GGCCGTGTACTCGGAGCAGGACAAGATGCACATGCATCGCCAGAAGGCGGACGAGT CCTACCTGGTCGGCAAGGGTCTGCCCCCGGTGGAGGCCTACCTCAACATTCCAGAGA TCATTCGGGTGTGCAAGGAGAACGACGTGGATGCCGTGCATCCCGGCTACGGCTTCC TCTCCGAGCGCAGCGACTTTGCCCAGGCCGTGATCGATGCTGGCCTTCGGTTCATCG GCCCCTCGCCCAAGGTTGTCCAGAACATGGGCGACAAGGTGGCGGCTCGTGTGGCG GCCATCGAGGCCGGTGTTCCCATTGTGCCCGGCACAGATGGTCCCGTAACCACCAAG GAGGAGGCCCTCGAGTTCTGTAAGATGCACGGACTACCAGTCATCTTCAAGGCGGC ATACGGCGGCGGCGGACGCGGCATGCGAGTGGTGCGCAAGATGGAGGAGGTCGAG GAGAGTTTCCAGCGCGCCAGCTCAGAGGCCAAGGCAGCCTTCGGAAACGGCGCCAT GTTCATTGAGAAGTTCATCGAGCGTCCGCGTCACATCGAGGTCCAGTTGCTGGGCGA CAAGGCAGGTAATGTGGTGCACCTGTACGAGCGAGACTGCTCGGTGCAGCGTCGCC ACCAGAAGGTTGTCGAAATCGCTCCGGCTCCCCGCCTGCCCATCGAGCTACGTGACA AGATGACAGAAGCTGCCGTGCGGCTGGCTTGTCACGTGGGCTACGAGAACGCAGGC 57 ACTGTGGAGTTCCTGTGCGACGAGTCAGGAAACTTCTACTTCATTGAGGTGAACGCC CGCCTTCAGGTGGAGCACACGGTCACAGAGGAGATCACCGGCATCGATCTGGTGCA GTCGCAGATCCGCATTGCCGAGGGCATGACTCTTCCAGAGCTCGGCTACACCCAGG AGAATATTCAGCCCCGCGGCTATGCCATCCAGTGCCGTGTCACAACAGAGGATCCC GCTAACGATTTCCAGCCCAACACGGGACGCCTGGAGGTGTTCCGATCGGGCGAGGG AATGGGCATTCGTCTGGACAGTGCCTCGGCCTATGCCGGCGCCATCATCTCCCCGTA CTACGACTCCCTGCTGGTCAAGGTGATCGCGCATGCAGGCGACCTGCAGAGTTCCGC CGCAAAAATGAACCGTGCCTTGCGCGAGTTCCGCATTCGTGGTGTCAAGACGAATAT ACCCTTCCTTTTGAACGTGCTGGAGAACCAGAAGTTCCTGAACGGCGTCTTGGACAC CTACTTCATTGACGAGCATCCGCAGCTTTTCAAGTTCCGTCCCACCCAGAATCGTGC CCAGAAATTACTCAACTACTTGGGCGAAGTGCTGGTCAACGGTCCACAGACCCCACT GGCCACCACCCTGAAGCCCGCCGAGGTCAGTCCTCATGTGCCCGCCATTCCACTGGG TAAGTTCTAGCCTGATCTGTCCTAGAAGACCGTTGGGTCCATTGCATGTGCTTTATAC GGGTTCACGCTTGCTTTTGTGGCTACCAAAGGGAATCAAAGTCTGAAGGAGGGACTT CTATGCGATGCATTTAATGCAAAATAAATGCAACATGTACAAGTCCCTCAATTGATC TTCAATTTCAATGAAAGGAAAGTTACTTCGAATTGATTCAATTAAAAGGCATGCACG GTGGATTTTGTTTCTTTTACGTAGCCTCTTGTGACAATTTTGTTCGCTTCTCCTACTAT ATGCTTTAATCAAACTTTATGCTTTTACTCTTTCAAAATACTGCACTGCAATTTTTGT GTTACATCTGTTATTAATCTACCAAATATTGAACCGTACGCAATGCAATCGTAATCG CAATCCAAACAATTGGGGGAAAAAAACAACCAACAAATATCACACACCTCCAATAT CTCAACAACAATGATTCAATACGCACAACAAAAAACACGATATGATATGCCCCACA CGTCCATAAACGAAACGTTCTCATGATCCTCCTCATACACCAAACCAACCGAAGACT TGTCGCCGGAGGCACTAGAACGTGAAGAGCGTGGCGAGGCCAAAGGTATGTCTATG CTGATGTTACCCTTAAATATACGGAACATCCAAAAACATTCATCATGGCAGTGGGTT TTCATCAGAAAGTATTCAAGCTGGTGGTATTGTTATTCTATCATAGTATCACATTAAA TGTTTGGGCTTCATAGCTTGATGTTTTCTTTTTTTTCGCATCCAGTTAATTGTTTTATT ATCGCTTCTGGGATAATATATGTATTCCATTCACAACACATTAAGTAGAAAAGTAAT TTCTGAAATATATTAATGAATACTAGTTGCTTTATTTTAGAAGCTTAAATACAAATTT TAATTCGATTTAAATACATTTTTTTGTGGTTCCGCTCAAGATGTAATTGTTATGGAAA CATTGTTTGTCTGAGCTTGTTCCAAAATTTAATTTAATCGCTCTTTTGTGTAGCCTTA ATCTAAAGACATCGAAATATTTAAACATATTAATAATACTCACATCTCTATGAAGGG GATATGCGACCATATCGTTTTGGAAGTGTGATCCTTTGACTAATGTTTTGAGTCTCTT GCACAGTTACGGAACCACCTTGCGGTCTACGCGATATTCTGGTACGCCAGGGTCCCG AGGCCTTTGCCAAGGAGGTTCGCTCCCGCAAGAACCTGATGCTGATGGACACCACCT TCCGCGATGCCCACCAGTCGCTGCTGGCCACACGTGTACGCTCGCACGATCTCCTGA AGATCTCGCCCTATGTGGCCCATAAGTTCAACAATCTATATGCGCTGGAGAACTGGG GTGGCGCCACCTTCGATGTGGCCCTGCGCTTCCTGCACGAGTGCCCCTGGGAGCGTT TGGAGGAGATGCGCAAGCGCATCCCGAACATTCCATTCCAGATGCTGTTGCGCGGA GCCAATGCCGTTGGCTACACCAGCTATCCGGACAACGTGGTTTACAAGTTCTGTGAA CTGGCGGTAAGGATCTGAATCCTCCCTCATTCATATGTGCCTAATTCCTCTGTGTCTG TCGGGCAGGTTCAAACCGGCATGGACATCTTCAGAGTGTTCGACTCCCTCAACTATC TGCCCAACTTGATCCTAGGCATGGAGGCGGCCGGCAAGGCTGGCGGCGTGGTCGAG GCAGCCATCTCCTACACCGGCGATGTGAGCGATCCCAAGCGCACCAAGTACGATCT CAAGTACTACACCAATCTGGCCGACGAGCTCGTCAAGGCTGGCACCCATGTGCTCTG CATCAAGGATATGGCGGGTCTTCTGAAGCCCGAGGCGGCTACGTAAGTCGAGGAAA CTGCCCGAGACTTGTGTGAACCCACAGCACTTCTGACGTATTTCCATTGCAGACTGC 58 TCATCACCGCCATTCGTGACAAGCATCCCGATATTCCAATCCACATCCACACCCACG ACACCTCCGGAGCTGGAGTGGCGTCTATGCTGGCCTGTGCGCAGGCTGGCGCTGATG TGGTTGATGTGGCCGTTGACTCGATGAGCGGCATGACCTCTCAGCCGAGCATGGGAG CAGTGGTTGCCTCGCTGCAGGGCACGCCTCTGGACACAGGGGTGGATCTGCGCGTTG TGTCGGAGTACTCGGCATACTGGGAGCAGACCCGCACCCTGTACGCTCCCTTTGAGT GCACCACGACCATGCGGTCGGGCAATGCCGATGTGTACCTGAACGAGATCCCCGGC GGTCAGTACACTAACCTGCAATTCCAGGCCTTCTCTCTGGGTCTGGGTGACTTCTTCG AGGATGTGAAGAAGGCGTACCGCGAGGCGAATTTGCTATTGGGTGACATTATCAAG GTCACTCCCTCGTCCAAGGTGGTGGGTGACTTGGCCCAGTTTATGGTTCAGAACAAC CTCACTGCCGATCAGGTGCTGGAGAAGGCTGAGGAGCTTTCGTTCCCCAAGTCCGTG GTGGAGTTCTTGCAGGGACACATTGGCATCCCCCATGGTGGATTCCCCGAACCGCTG CGCTCCCGCGTTCTCAAGGATATGCCACGCATCGAGGGACGTCCGGGCGCCGCGCTT GAACCTCTGGACTTTGACAAGCTGAAGCAGGATTTGAAGGAATCTCATCCCAATATC ACCGACCGCGACGTCATGTCGTCGGCCCTCTATCCCCAGGTGACCAACGAGTACCTG TTCTTCCGCGAGAAGTTCGGACCCGTCGACAAACTGGACACGCGCATCTTCCTCACC GGACCCAAGGTGGGCGAGGAGTTCGAGGTGACGCTGGAACGCGGCAAGACGTTGA GTCTGAAGGCCATGGCCATGGCCGCCGATCTCAAGCCCAATGGAGATCGCGAGGTC TTCTTTGAGATGAACGGCCAACTGCGCACAGTTCACATTCTCGACAAGGAGGCTGTC AAGGTGAGTGAAGACTCCAATTTACGATGGAAAATTACTAATACTCCCTCTAATTTC TGTGAATGCTTCCTTCAGGAAATCCACGTTCATCCCAAGGCCAACAAGGCGGTCAAG AGCGAGGTGGGAGCCCCCATGCCCGGCACTGTCATTGACATTCGCGTCGCTGTCGGC GATAAGGTGGAGAAGGGTCAACCCCTGGTTGTGCTCTCGGCCATGAAGATGGAGAT GGTAGTCCAGGCACCACAGGCGGGCGTTGTCAAGAAACTGGAGATCGCAAACGGCA TGAAACTCGAGGGCGATGATCTGCTCATGATCATCGAGTAAACACCTCATCCCGCCC AAGCCTTAGGGCTTACAATGTCCATTGTCCATGCTGCCAGCGCACACCTATTGTAAC ATTAAGTCGAACAGTGATAATCTTAGAGATTCTCGTCAGTTGCCACCAAACCACCTA TAGCCCTGACTGTGCATCTACTTTTTATATATGTATCTGTATCTGTAGCTGAGACTAG TTAGGGAGCACAGGCGCTCGCTGTGTGTGTTCTATGTATATTTATGATAGTTCTTGGA ACCGTCACGACCGTCATGTTTTGATACCCCGCCACACACACTTTTAAGTACCGTTAA AATTAAACTTGTACGATTTTTGAATTTAAGATTTGCGAAAACTGTCAAAATTCACCT CCAATACATACGTAAGTGGTTATAAATGTCAATATCTATATTGAAATAAATGTTAAA CAAATATGATTTGTTTGGGTCTCGTTCGTGTAAATATTGCTACTGCAACTGGTTGCTT ATAATTAATTAAACAGAACTGATTACCAACTAACTTATGGCTACGGTGTTT 59

BIBLIOGRAPHY

1. Aquadro CF, Weaver AL, Schaeffer SW, Anderson WW. Molecular evolution of

inversions in Drosophila pseudoobscura: the amylase gene region. Prod. Natl. Acad. Sci.

88, 305-309 (1991).

2. Ashburner M, Lemeunier F. Relationships within the melanogaster species subgroup of

the genus Drosophila. Proc. R. Soc. Lond B Biol. Sci. 193, 137-157 (1976).

3. Barcia JJ. The Giemsa stain: its history and applications. Int. J. Surg. Pathol. 15, 292-296

(2007).

4. Bashir A, Volik S, Collins C, Bafna V, Raphael BJ. Evaluation of paired-end sequencing

strategies for detection of genome rearrangements in cancer. PLoS Comput. Biol. 4, 1-14

(2008).

5. Bhutkar A, Schaeffer SW, Russo S, Xu M, Smith TF, et al. Chromosomal rearrangement

inferred from comparisons of twelve Drosophila genomes. . 179, 1657-1680

(2008).

6. Brown GM, Leversha M, Hulten M, Ferguson-Smith MA, et al. of

meiotic recombination in humans by use of sperm typing: reduced recombination with a

heterozygous paracentric inversion of chromosome 9q32-q34.3. Am. J. Hum. Genet. 62,

1484-1492 (1998).

7. Caceres M, Ranz JM, Barbadilla A, Long M, Ruiz A. Generation of a widespread

Drosophila inversion by a transposable element. Science. 285, 415-418 (1999). 60 8. Casals F, Caceres M, Ruiz A. The foldback-like transposon Galileo is involved in the

generation of two different natural chromosomal inversions of Drosophila buzzatii. Mol.

Biol. Evol. 20, 674-685 (2003).

9. Crumpacker DW, Kastritsis CD. A new gene arrangement in the third chromosome of

Drosophila pseudoobscura. J. Hered. 58, 2-6 (1967).

10. Dobzhansky TH. Chromosomal races in Drosophila pseudoobscura and Drosophila

persimilis. Carnegie Inst. Wash. Pub. 534, 47-144 (1944).

11. Dobzhanky TH. Genetics of natural populations. XVI. Altitudinal and seasonal changes

in certain populations of Drosophila pseudoobscura and Drosophila persimilis. Genetics.

33, 158-176 (1948).

12. Dobzhansky TH, Epling C. The suppression of crossing over in inversion heterozygotes

of Drosophila pseudoobscura. Prod. Natl. Acad. Sci. 34, 137-141 (1948).

13. Dobzhansky TH, Sturtevant AH. Inversions in the chromosomes of Drosophila

pseudoobscura. Genetics. 23, 28-64 (1938).

14. Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila

phylogeny. Nature. 450, 203-218 (2007).

15. Feuk L. Inversion variants in the human genome: role in disease and genome

architecture. Genome Med. 2, 1-8 (2010).

16. Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, et al., 2000. An Introduction to

Genetic Analysis. W. H. Freeman and Company, New York.

17. Judd BH, Shen MW, Kaufman TC. The anatomy and function of a segment of the X

chromosome of Drosophila melanogaster. Genetics. 71, 139-156 (1972). 61 18. Kaiser P. Pericentric inversions: problems and significance for clinical genetics. Hum.

Genet. 68, 1-47 (1984).

19. Korbel JO, Urban AE, Affourtit JP, Godwin B, et al. Paired-end mapping reveals

extensive structural variation in the human genome. Science. 318, 420-426 (2007).

20. Krimbas CB, 1992. The inversion polymorphism of Drosophila subobscura. CRC Press,

Boca Raton.

21. Lemeunier F, Ashburner M. Relationships within the melanogaster species subgroup of

the genus Drosophila (Sophophora). II. Phylogenetic relationships between six species

based upon polytene chromosome banding sequences. Proc. R. Soc. Lond. B Biol. Sci.

193, 275-294 (1976).

22. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics. 25, 1754-1760 (2009).

23. Madan K. Paracentric inversions: a review. Hum. Genet. 96, 503-515 (1995).

24. Marzo M, Puig M, Ruiz A. The Foldback-like element Galileo belongs to the P

superfamily of DNA transposons and is widespread within the Drosophila genus. Prod.

Natl. Acad. Sci. 105, 2957-2962 (2008).

25. Mathiopoulos KD, della Torre A, Santolamazza F, Predazzi V, et al. Are chromosomal

inversions induced by transposable elements? A paradigm from the malaria mosquito

Anopheles gambiae. Parassitologia. 41, 119-123 (1999).

26. Nadeau JH, Taylor BA. Lengths of chromosomal segments conserved since divergence

of man and mouse. Prod. Natl. Acad. Sci. 81, 814-818 (1984).

27. Navarro A, Barton NH. Chromosomal speciation and molecular divergence – accelerated

evolution in rearranged chromosomes. Science. 300, 321-324 (2003). 62 28. Noor MA, Grams KL, Bertucci LA, Reiland J, et al. Chromosomal inversions and the

of the species. Prod. Natl. Acad. Sci. 98, 12084-12088 (2001).

29. Novitski E. Chromosomal variation in Drosophila athabasca. Genetics. 31, 508-524

(1946).

30. Ohno S. Ancient linkage groups and frozen accidents. Nature. 244, 259-262 (1973).

31. Orengo DJ, Puerma E, Papaceit M, Segarra C, Aguadé M. A molecular perspective on a

complex polymorphic inversion system with cytological evidence of multiply reused

breakpoints. Heredity. 114, 610-618 (2015).

32. Painter TS. A new method for the study of chromosomal aberrations and the plotting of

chromosomal maps in Drosophila melanogaster. Genetics. 19, 175-188 (1934).

33. Pettenati MJ, Rao PN, Phelan MC, Grass F, et al. Paracentric inversions in humans: a

review of 446 paracentric inversions with presentation of 120 new cases. Am. J. Med.

Genet. 55, 171-187 (1995).

34. Pevzner P, Tesler G. Human and mouse genomic sequences reveal extensive breakpoint

reuse in mammalian evolution. Prod. Natl. Acad. Sci. 100, 7672-7677 (2003).

35. Pray L. Transposons: the jumping genes. Nature Education. 1, 204 (2008).

36. Ranz JM, Maurin D, Chan YS, Grotthuss MV, Hillier LW, et al. Principles of genome

evolution in the Drosophila melanogaster species group. PLoS Biology. 5, 1366-1381

(2007).

37. Richards S, Liu Y, Bettencourt BR, Hradecky P, et al. Comparative genome sequencing

of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome

Res. 15, 1-18 (2005). 63 38. Schaeffer SW, Goetting-Minesky MP, Kovacevic M, Peoples JR, et al. Evolutionary

genomics of inversions in Drosophila pseudoobscura: evidence for epistasis. Prod. Natl.

Acad. Sci. 100, 8319-8324 (2003).

39. Schaeffer SW, Miller EL. Estimates of linkage disequilibrium and the recombination

parameter determined from segregating nucleotide sites in the alcohol dehydrogenase

region of Drosophila pseudoobscura. Genetics. 135, 541-552 (1993).

40. Sperlich D, Pfriem P, 1986. Chromosomal polymorphism in natural and experimental

populations. Academic Press Incorporated, London.

41. Stefansson HA, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, et al. A

common inversion under selection in Europeans. Nat. Genet. 37, 129-137 (2005).

42. Sturtevant AH, Beadle GW. The relations of inversions in the of

Drosophila melanogaster to crossing over and disjunction. Genetics. 21, 554-604 (1936).

43. Wallace AG, Detweiler D, Schaeffer SW. Evolutionary history of the third chromosome

gene arrangements of Drosophila pseudoobscura inferred from inversion breakpoints.

Mol. Biol. Evol. 28, 2219-2229 (2011).

ACADEMIC VITA Haley Randolph [email protected] 507 Mueller Laboratory, University Park, PA 16802

EDUCATION

Bachelor of Science in Biology (Genetics and Developmental Biology option) May 2016 Minor in Biochemistry and Molecular Biology Honors in Biology Schreyer Honors College, Eberly College of Science Pennsylvania State University, University Park, PA

Honors Thesis: The origin and evolution of inversion breakpoint sequences on the third chromosome of Drosophila pseudoobscura Thesis Supervisor: Stephen W. Schaeffer

Relevant upper-division coursework: Honors Human Genetics, Population Genetics, Genetic Analysis, Molecular Biology of the Gene, Principles of Immunology, Medical Microbiology, General Virology, Honors General Biochemistry I & II, Laboratory in Proteins, Nucleic Acids, and Molecular Cloning

RESEARCH EXPERIENCE

Undergraduate Researcher Jan 2013 – Present Schaeffer Laboratory, Biology Department Pennsylvania State University, University Park, PA  Utilize Illumina HiSeq sequencing data and bioinformatics techniques to study the evolution of chromosomal arrangements and mechanisms that generate paracentric inversions in D. pseudoobscura  Prepared polytene chromosome squashes to determine the of various D. pseudoobscura strains for the Drosophila Species Stock Center at the University of California, San Diego  Received an Undergraduate Summer Discovery Grant (funded through the Office of Undergraduate Education and Eberly College of Science) for the research proposal “Molecular evolutionary genetics of chromosomal breakpoints”

Amgen Scholar May 2015 – Aug 2015 Raulet Laboratory, Department of Molecular and Cell Biology University of California, Berkeley, Berkeley, CA  Employed mouse models to investigate the induction of the inhibitory receptor PD-1 on natural killer (NK) cells to provide insight into the NK cell-mediated killing of tumor cells  Performed various functional assays on PD-1+ NK cells to determine the effect of PD-1 signaling on NK cell response

PUBLICATIONS AND PRESENTATIONS

The checkpoint regulator PD-1 inhibits NK cell responses to tumors. Ardolino M, Hsu J, Marathe M, Azimi CS, Trevino TN, Randolph HE, Thompson TW, Zhang L, Iannello A, Niccolai C, Raulet DH. Under review, 2016.

Induction and function of the inhibitory receptor PD-1 on natural killer cells. Randolph HE, Ardolino M, Raulet DH. August 2015. Oral presentation given at the UCB Amgen Scholars final presentation session, Berkeley, CA.

D. pseudoobscura inversion breakpoint sequence evolution. Randolph HE, Lee ME, Montaya GC, Jahanshahi A, Duggan S, Zhu D, Richards S, Schaeffer SW. March 2015. Poster presented at the 56th Annual Drosophila Research Conference, Chicago, IL.

The evolution of breakpoint sequences in Drosophila pseudoobscura. *Randolph HE, *Lee ME, Montaya GC, Jahanshahi A, Duggan S, Zhu D, Richards S, Schaeffer SW. March 2014. Poster presented at the 55th Annual Drosophila Research Conference, San Diego, CA. *contributed equally

TEACHING EXPERIENCE

Teaching Assistant Sept 2015 – Present Biology Department Pennsylvania State University, University Park, PA  Conduct problem solving sessions and give lectures concerning current genomics and bioinformatics research for Genetic Analysis (BIOL 322)  Instructed the laboratory for Honors Biology: Basic Concepts and Biodiversity (BIOL 110H) in Fall 2015

Private Instructor Oct 2013 – May 2015 PSUKnowHow State College, PA  Tutored numerous introductory and upper-division college science courses including organic chemistry, biochemistry, genetics, and statistics  Experienced with both individual and group teaching settings

SCHOLARSHIPS AND ACADEMIC AWARDS

NSF GRFP Honorable Mention 2016 Biology Department Student Marshal for Spring Commencement 2016 The Evan Pugh Scholar Award – Senior 2016 Doris N. McKinstry Scholarship 2015 – 2016 Barrett E. and Michele A. Giffel Scholarship in the Eberly College of Science 2015 The Evan Pugh Scholar Award – Junior 2015 Undergraduate Summer Discovery Grant 2014 Jane and Don Strickler Honors Scholarship 2014 The President Sparks Award 2014 Wheeler P. Davey Memorial Scholarship Fund 2013 – 2016 The President’s Freshman Award 2013 Society of Distinguished Alumni Trustee Scholarship 2012 – 2016 Class of 1941 Memorial Scholarship 2012 – 2016 Schreyer Academic Excellence Scholarship 2012 – 2016

PROFESSIONAL SOCIETIES

The Phi Beta Kappa Society Mar 2015 – Present Genetics Society of America Jan 2014 – Present