State University, San Bernardino CSUSB ScholarWorks

Electronic Theses, Projects, and Dissertations Office of aduateGr Studies

9-2019

GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF OSCULUS, THE SANTA ANA SPECKLED DACE, AS ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING

Liane Raynette Greaver California State University - San Bernardino

Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd

Part of the Evolution Commons, Molecular Genetics Commons, and the Population Biology Commons

Recommended Citation Greaver, Liane Raynette, "GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING" (2019). Electronic Theses, Projects, and Dissertations. 931. https://scholarworks.lib.csusb.edu/etd/931

This Thesis is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF

RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS

ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING

A Thesis

Presented to the

Faculty of

California State University,

San Bernardino

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

in

Biology

by

Liane Raynette Greaver

September 2019 GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF

RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS

ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING

A Thesis

Presented to the

Faculty of

California State University,

San Bernardino

by

Liane Raynette Greaver

September 2019

Approved by:

Dr. Anthony Metcalf, Committee Chair, Biology

Dr. James Ferrari, Committee Member

Dr. David Polcyn, Committee Member

© 2019 Liane Raynette Greaver

ABSTRACT

Rhinichthys osculus (), the speckled dace, is the most widely distributed freshwater in the western United States. The populations of R. osculus are identified as the Santa Ana speckled dace (SASD), though the SASD has not yet been formally recognized as a distinct taxon.

Current mtDNA analysis performed in the Metcalf Lab has shown a reciprocally monophyletic relationship among three California regions; southern, central coast, and Owens Valley. Similarly, microsatellite genotyping has shown significant levels of geographic population structure. The purpose of this study was to provide nuclear DNA sequence data to determine the taxonomic status of the SASD to elucidate their evolutionary history and the relationships among the three regions, and to further define their evolutionary trajectory by comparing

SASD sequence data to that of speckled dace from the Colorado River of

Arizona. To examine this, three EPIC intron markers were sequenced on 54 samples representing all four regions. Based on the mtDNA and microsatellite data alone, there is strong support that the southern California populations of R. osculus are a reproductively isolated taxon at the species level. My study confirms this by showing the SASD to be reciprocally monophyletic for nuclear

DNA markers, in conjunction with the mitochondrial DNA marker analyses.

Because they are evolutionarily independent and face increased incidence of drought, fire, and flood, status should be considered.

iii ACKNOWLEDGEMENTS

I would like to acknowledge and thank my thesis advisor, Dr. Tony

Metcalf, who gave me the opportunity of an internship with the WRI as an undergrad, which led to this master’s thesis project. Most importantly, he allowed me to wander my way through at my pace, sometimes nudging me to finish sooner but hopefully realizing that my interest has always been in the exploration of all things as I stroll through. He gave me the inspiration and much of the funding for the project, plus the stories to entertain me when I was losing track of my path or the path became bumpy. Niiiice truuuuck…

I also thank Dr. Dave Polcyn and Dr. Jim Ferrari for giving their time to advise me all through my education at CSUSB. Each of you has influenced my path in various ways and I cannot thank you enough for your encouragement.

Every student that has had the benefit of learning from you has been inspired by you. I consider it an honor to have worked with all three of you over the years.

To my mentor, Pam MacKay, who is the reason I ended up in the field, literally, of ecology. At VVC, Pam didn’t just tell us about the field, she took us out into it every week, and showed us how to look at the world from a different perspective, one of not just curiosity but, now having the tools to find out the answers to those curiosities, one of understanding. She introduced us all to opportunities and people, some of whom have become those I call “my people.”

To my lab compadres over the years who have all assisted me in so many ways, Jay and Pia VanMeter for welcoming me into the lab and teaching me all

iv the things I needed to know to survive the molecular pathway. To Joe Riley for suffering my first year as a grad student and teaching me all about British culture.

To all my lab assistants without whom, I’d never have finished: Nguyen Tran,

Diane Villalvazo, Caitlin Hazelquist (Lab Lackey), and Lauren Morrison (Lab Elf), you have earned your sock.

To Stacey Nerkowski, the Brain to my Pinky (Mills, 1993), with whom I began the quest to take over the world at Victor Valley College in 2007, starting with Biology Club, then ASB, Phi Theta Kappa, and Biology Club at CSUSB.

Stacey is also the person who got me addicted to Starbucks in physics. I became part of another family over the years as Stacey, and her parents, Kim and Jerry

Nerkowski, “adopted” me and introduced me to many things I never would have experienced otherwise, including nearly being impaled by trees while whitewater rafting. I am thankful for all the additional advice and support I’ve received from them. Stacey and I have taken the divide and conquer path over the last couple of years, but the quest is not over.

To Suzy Neal, my sister Pinky, together we couldn’t take over an airplane restroom much less the world, “Ooh, what’s this button do?” Suzy is that person who gets my need to push the button. Often, she’s the one who shows me the button, especially if it’s shiny, but she also consoles me if something explodes when I push the button.

To Tricia Turturro Fredendall, the one who I believe has the other half of my brain. We are too much alike in so many ways. Thank you for being a friend

v (Gold, 1978). It has been a blessing to have you as a friend through school,

“Student?” (Brooks, 1974) and through life. “Because I knew you…” (Chenowith

& Menzel, 2003).

I could never have completed this without the ever-supportive assistance of Debbie Reynolds. Thank you so much for being my cheerleader, my therapist, my enabler, my cohort, my accomplice, my social planner, my hostess, and my competitor. I can’t remember life before the Snoopy incident.

To my brothers, James and Matthew Greaver, for always understanding when I couldn’t be there for family events, for picking up my slack helping Mom, and every other bit of encouragement and humor you threw my way.

This work would not have been possible also without the funding from the following sources: CSUSB Associated Students Incorporated, CSUSB Office of

Student Research, U.S. Forest Service, and California Department of Fish and

Wildlife.

vi DEDICATION

I dedicate this work to my mom, Raynette Greaver, who has supported me in EVERY way; emotionally, mentally, and financially. She may not understand why I love what I do, but she has suffered my journey. She is the reason I love all life and its place in the world. She taught me and my younger brothers, to stop during a hike and just listen; to the wind in the trees, the calls of the birds, the sounds of the forest critters under the brush. She allowed me to run somewhat free, as a child, through the desert where I caught lizards and insects, and learned my place on the planet.

To my dad, Earl Greaver, who taught me by example, to help people whenever possible, and that we make a greater impact on the world by being that person who always tries to be there for others in whatever way we can. We are not always rewarded with monetary riches, but we gain much respect and trust from those we help and those are priceless. I miss you, Dad.

To my grandparents, Paul and Frances Freiling, there aren’t enough words to describe everything they gave to me. My grandfather inspired my curiosity and humor. He was an entertainer and inventor at heart. My grandmother inspired my drive to be helpful, to be accepting, and to be strong.

She battled so many things in life, all without loss of faith, humor, and love. She is and always will be the light that guides me when I’m lost in the dark.

TABLE OF CONTENTS

ABSTRACT ...... iii

ACKNOWLEDGEMENTS ...... iv

CHAPTER ONE: INTRODUCTION

Overview ...... 1

Phylogeography ...... 2

Model Habitat ...... 4

Model Organism ...... 9

Conservation Policies ...... 12

Molecular Markers ...... 14

Literature Reviews ...... 21

Phylogenetics Studies Enhanced by the Addition of Nuclear DNA Data ...... 21

Phylogeography as Influenced by California Floristic Province Topography ...... 26

Phylogenetics and Population Structure of the Santa Ana Speckled Dace ...... 29

CHAPTER TWO: MATERIALS AND METHODS

Research Objective ...... 33

Sample Collection ...... 33

Molecular Methods ...... 34

Sequence Analysis ...... 39

Population Genetics ...... 39

Phylogeography ...... 42

v

CHAPTER THREE: RESULTS

Molecular Methods ...... 44

Preliminary Testing ...... 44

Sequence Analysis ...... 45

Population Genetics ...... 47

Phylogeography ...... 50

CHAPTER FOUR: DISCUSSION

Molecular Markers ...... 52

Population Genetics ...... 53

Phylogeography ...... 55

Hydrographic History of Connectivity ...... 56

Conservation Implications ...... 57

APPENDIX A: FIGURES ...... 59

APPENDIX B: TABLES ...... 79

APPENDIX C: SEQUENCE DATA ...... 97

APPENDIX D: INPUT FILES ...... 103

REFERENCES ...... 130

vi

CHAPTER ONE

INTRODUCTION

Overview

The evolution of a species and its surrounding environment are unequivocally bound. The history of environmental changes from Darwin (1859) to the present, e.g. Coyne and Orr (2004), provides verification for hypothesized evolutionary changes in the species inhabiting those environments. This is the area of study, linking biology with geography, referred to as phylogeography.

Studying those evolutionary changes in the populations of organisms using molecular information is known as phylogenetics.

The study of molecular phylogenetics requires a study organism for which its current population structure and geographic range have been molded by the forces of evolution in conjunction with geologic forces. This organism must be one among many populations of a species that have evolved over time allowing for the accumulation of genetic differences among and within the populations.

This provides a tool to analyze the variability that exists as a result of the interrelationships between biological and geological forces. They must exist in natural populations in natural habitats in order to see the true structure resulting from years of modification in response to ever-changing environments.

Stream-dwelling vertebrates are an excellent resource to study phylogeography and molecular evolution. They exist in discrete populations of

1 varying population density and are intimately linked to the streams embedded within watersheds that are shaped by both geographic and biogeographic factors. Thus, many different species of stream-dwelling vertebrates have served as model organisms for the study of phylogeography (Bruno, Casciotta, Almirón,

Riccillo, & Lizarralde, 2015; Mayden & Allen, 2015; Phillipsen & Metcalf, 2009).

Phylogeography

Phylogeography involves the study of the distribution and extent of genetic variation in geographically distributed populations. This may be either intraspecific populations or populations of closely related species. This area of study elucidates the evolutionary processes that have provided us with the level of biodiversity that exists currently. It also affords a way to reveal links between extant species and those for which science has concrete evidence of past existence. The forces responsible for the genealogical path of a population are most often climatic and topographic in nature (Cody, 1986). Historically documented events can be correlated to the geographic ranges of many species.

To apply phylogeography to the conservation of biodiversity, we must be able to exemplify the correlation between species lineages and their present distribution

(Moyer, Remington, & Turner, 2009).

While research has been done in this field of study for many years, it initially had no official title by which to refer. The term “phylogeography” was not coined until 1987 by John C. Avise. While it is possible to draw inferences regarding the lineages of species based on differences in morphological or

2 behavioral characteristics, which have a basis in genetics, it is far less concrete than that of molecular evidence of genetic differentiation. By Avise’s definition, phylogeography in its most pure form deals only with allelic distribution (Avise &

Ph.D, 2000). With the advent of methods to visualize molecular markers, science now has a way to document in fine-scale detail, the differences and similarities in populations of species. These allow individuals to be categorized into haplotypes which often mirror the geographic distribution of those same individuals within their respective populations.

The analysis of evolutionary lineages can best be represented by the creation of phylogenetic trees. These trees are formed by the categorization of individuals of different species into groups based on the analyses of the comparisons of the molecular sequences of interest. These sequences are referred to as haplotypes. A haplotype can be designated based on a single nucleotide polymorphism (SNP) or a series of differences that are unique to any other. There may be a single individual in a haplotype or many individuals. The relationships among haplotypes are estimated by the application of models of molecular evolution.

A correlation between the haplotype groups and the geographical groups is often seen when geography has played a role in the evolution of the species’ populations within the regions. For this reason, phylogenetic analyses of populations of a single species that inhabit areas isolated from other populations of the same species are highly useful. Watersheds and their tributaries,

3 separated by geologic and topographic features, are prime examples of how landscape can influence the structure of stream-dwelling vertebrates.

Model Habitat

California is rich in biodiversity, particularly the cismontane southern

California region where it is known to have some of the highest levels of biodiversity in the world. This region is included in what is known as the

California Floristic Province (Appendix A, Figure 1), one of the world’s 25 published biodiversity “hotspots” (Calsbeek, Thompson, & Richardson, 2003; N.

Myers, Mittermeier, Mittermeier, Da Fonseca, & Kent, 2000).

Biodiversity can be measured in many variables, one of which is species richness, or the actual number of different species inhabiting an area. This can be increased in a variety of ways but ultimately results from the divergence of biological populations into distinct species. The pattern of dispersal of subpopulations and subsequent speciation are the focal point of this thesis project.

Geographically distinct populations arise via two possible distribution patterns, vicariance or dispersal. Dispersal is defined by the distribution of a population across an existing geographical barrier such as over and across a mountain range via migration. Vicariance denotes the distribution of populations whereby a new geographical barrier arises thus fragmenting the ancestral population into smaller groups (Avise & Ferguson, 1995; Coyne & Orr, 2004).

4

There are four major proposed processes of speciation. Sympatric speciation allows populations to diversify while still within an overlapping range.

Parapatric speciation is the differentiation of a population inhabiting an adjacent region to the parent population. Peripatric speciation is the branching off of a small portion of the original population that is then isolated nearby, though not directly adjacent to, the parent population. Allopatric speciation results from the partitioning of a population, by the development of a physical barrier, completely isolating the fragmented groups from one another. These are not mutually exclusive and extant populations of a species may have gone through multiples of these processes over time distinguishing them further from their ancestors

(Coyne & Orr, 2004; Mooi, 2009).

Changes in topographic landscape that lead to isolation of vertebrate populations occur intermittently or extremely slowly. Conversely, the frequency of changes that occur in freshwater environments of rivers, lakes, and streams can have a much greater impact on the diversification of freshwater fish species in a shorter expanse of time as streams and tributaries are in constant flux within and across watersheds (G. D. Grossman, Hill, & Petty, 1995). Therefore, watershed environments play an important role in the creation of ichthyologic diversity and population structure.

John Wesley Powell, leader of the 1869 Powell Geographic Expedition, stated a watershed is "that area of land, a bounded hydrologic system, within which all living things are inextricably linked by their common water course and

5 where, as humans settled, simple logic demanded that they become part of a community." The U.S. Environmental Protection Agency lists 153 watershed systems within California (US EPA, 2015). These meandering and typically isolated, though intermittently overlapping, stream systems provide a unique opportunity to study the effects of seasonal climate changes as well as large climatic events on the genetic structure of vertebrate populations. How are population lineages affected by natural seasonal fires and flooding, as well as by the human impact? In southern California, the major river systems may interweave in many areas while in others the streams are too distant for the stream-dwelling organisms to migrate naturally. Such is the case among the southern California river systems of interest in this study.

Several studies have presented evidence suggesting the influence of topography in southern California on the development and evolution of population structure (Benson, 2006; Calsbeek et al., 2003; Phillipsen & Metcalf,

2009; Vandergast, Bohonak, Weissman, & Fisher, 2006). Specifically, the prominent tectonic breaks within the region have created distinct mountain ranges and river systems. The San Andreas Fault, in existence for approximately

15-20 million years, is the division between the Pacific and the North American plates and is the major fault in a large complex of faults throughout southern

California, including the San Jacinto, Banning, and Elsinore faults (Schultz &

Wallace, 2013). The movement of the earth’s plates over vast periods of time, eventually lead to the development and translocation of what we currently deem

6 the Peninsular and Transverse mountain ranges. The north-south oriented

Peninsular range and particularly the uniquely east-west oriented Transverse range (Appendix A, Figure 2), are as such due to the movements of several plates that alternately moved them north and broke up the Transverse range thus rotating sections of it to give us the modern landscape arrangement (Schaffer,

1993). These mountain ranges have shown to be an influential factor in the shaping of population structure, as well as the increase of species richness, in the region (Chatzimanolis & Caterino, 2007; Phillipsen & Metcalf, 2009;

Spellman, Riddle, & Klicka, 2007).

The unique climate in this area, referred to as a Mediterranean climate, developed some 8 million years ago. Inland Southern California shares this designation with four other regions in the world, Central Chile, the Mediterranean

Basin, the Cape of Africa, and Southwest and Southern Australia. This climate type is characterized by rainy winters and long, hot, dry summers. These regions can claim some of the highest levels of biodiversity in the world (Cody, 1986;

Schoenherr, 2017).

The climatic characteristics of the Mediterranean ecosystem, in combination with the tectonic construction of the terrain, bred many unique features that led to the evolution of many endemic species. The subduction of the faults that caused the uprising of the mountain ranges also resulted in the formation of more varying soil types, often situated in pockets across the region.

For example, many of the plants that currently grow in these areas have evolved

7 to endure the low nutrient and often heavy-metal rich content of the serpentine soils in this region (Anderson, Fralish, & Baskin, 2007). These same uprisings created the valleys that direct the flow of the many waterways in the area. These valleys are intertwining through the multi-directional mountain ranges. During the hot dry summers, the flow of water running through these valleys decreases creating similar pockets of environment for the stream-dwelling organisms. In addition, the flora becomes dry brush fueling late summer wildfires. Conversely, during the rainy winters not only do the waters increase but do so in flash flood form. The soils become so dry during the summer that the sudden surge of rains will not soak in but rather floods along the surface of the ground. These flood waters from the various streams can collide at junctions and/or flood over areas that usually separate the streams creating the forced displacement of individuals from populations of one area into a new area. Once the floods subside, these individuals are now required to adapt to the new environment or perish. The flooding situation also destroys the terrain, leading to landslides. After fires, much of the ground that is pushed down the valleys is ash and dead timber. This debris clogs the waterways creating more isolation circumstances following the translocation of organisms (Moyle, Williams, & Wikramanayake, 1989). Ancestral relationships between the individual populations that range throughout the watershed environments can become convoluted due to these tempestuous circumstances (Swift, Haglund, Ruiz, & Fisher, 1993).

8

Model Organism

The utilization of model organisms allows us to broadly study various aspects of the biological world on a scale that is manageable in the laboratory

(Campbell & Reece, 2005). Model organisms are non-human organisms that represent a larger group and are chosen specifically to assist in answering the specific biological questions of a researcher.

In the interest of studying phylogeography, as stated previously, freshwater streams and watersheds in southern California provide the optimal habitat conditions to study the effects of environmental change on biological organisms. Decidedly, stream-dwelling vertebrate species are perfect models for the documentation of the biological and environmental changes that are occurring within these habitats.

While over 30 years ago, John Avise noted the staggering lack of interest in studying freshwater organisms despite their usefulness (Avise, Giblin-

Davidson, Laerm, Patton, & Lansman, 1979), since that time a resurgence of interest has occurred. Many phylogeographic studies have been done utilizing freshwater vertebrate species (Bernatchez & Wilson, 1998; Bufalino & Mayden,

2010a; Chen, Miya, Saitoh, & Mayden, 2008; Hollingsworth & Hulsey, 2011;

Houston, Shiozawa, & Riddle, 2010; Moyer et al., 2009; Phillipsen & Metcalf,

2009). Specifically, freshwater fish make an excellent study organism for this topic. Stream-dwelling fish typically have little ability to migrate which means they are restricted to small areas of habitat. In addition, these habitats are extremely

9 isolated (Avise & Ph.D, 2000). Because their habitats may be highly unstable, easily altered by the slightest environmental change, the evolution of the stream- dwelling freshwater fish populations can parallel the historical geologic transitions of the environment (Bernatchez & Wilson, 1998).

One group of fish that make excellent study populations are those of the

Cyprinidae family, also known as the family. This family of fish is one of the most successful in the world, with over 250 species in North America alone

(Moyle, 2002). The cyprinids inhabit a wide variety of environments and have been found to inhabit a large percentage of the dominant freshwater drainage systems in the western side of North America (Lee et al., 1980).

Within the Cyprinidae family is found the species of concern for this study,

Rhinichthys osculus, commonly referred to as the speckled dace. The speckled dace is the most widely distributed freshwater fish in the western United States ranging from Canada south to Sonora, Mexico (Lee et al., 1980; Moyle, 2002).

Rhinichthys osculus is expansively distributed throughout much of western North

America, (Appendix A, Figure 3), (Oakey, Douglas, & Douglas, 2004). California is broken into five ichthyologic provinces, Klamath, Great Basin, Sacramento,

South Coastal, and Colorado. R. osculus is known to reside in four of the five provinces predominantly in watershed habitats where the waterbeds are shallow gravel and the waters constantly flowing (Moyle et al., 1989). R. osculus grows no more than 80 mm in length and its latin name derives from its distinctive snout with a small subterminal mouth. R. osculus’ coloring usually consists of dark

10 blotches along its body (Appendix A, Figure 4), hence the species’ common name, speckled dace. Speckled dace commonly inhabit the shallow gravel and riffle streams utilizing the overhanging flora as protection from predators (Moyle,

2002).

The southern California populations of R. osculus are called the Santa

Ana speckled dace (SASD), though SASD has not yet been formally recognized as a distinct species. At one time, it inhabited much of the Santa Ana, San

Gabriel, and systems. Through the years, the group has experienced a decline in its populations and now can only be documented to inhabit smaller isolated creeks intermittently throughout the Santa Ana and San

Gabriel riverways. The SASD has been reported to be completely extirpated from the Los Angeles river system (Santa Ana Watershed Project Authority, 2004).

This decline mirrors that of the entire species within all of California. The current range of the SASD has been manipulated by the frequent fires and floods in this area (Santa Ana Watershed Project Authority, 2004) as well as human interaction with the environment (Swift et al., 1993), creating a highly fragmented habitat and small populations that are isolated from one another.

According to the Department of Fish and Wildlife’s California Natural

Diversity Database (CNDDB), the SASD is recognized as a species of special concern by the CA Department of Fish and Game, a threatened species by the

American Fisheries Society, and the US Forest Service lists it as a sensitive species (California Department of Fish and Wildlife, 2015). They have both

11 identified it as a population worthy of conservation. The Santa Ana Speckled

Dace was submitted for ESA consideration in a petition dated September 1994 that also addressed conservation needs of the Santa Ana Sucker and the Shay

Creek Three-spine Stickleback. It was submitted by the Sierra Club Legal

Defense Fund, Inc. on behalf of seven organizations. In 1996, despite these designations, the SASD was denied federal listing as an endangered species as it was lacking in genetic information and any peer-reviewed taxonomic description.

Conservation Policies

For this population to be considered for federal protection under the amended Endangered Species Act (ESA), it must be considered a “distinct population segment (DPS),” (U.S. Congress, 1978). Though the ESA does not officially define this term, the National Marine Fisheries Society (1991), in the interest of following the intent of the ESA with regards to listing Pacific salmon, determined that to be classified as a DPS, a population must be considered an evolutionary significant unit (ESU) of a species. This is defined by two criteria 1) reproductive isolation from other same species populations, and 2) be an important link in the evolutionary chain of the species (National Marine Fisheries

Society, 1991; Waples, 1991). This policy was later applied to all other vertebrate populations (U.S. Fish and Wildlife Service, 1996). ESU, however, was not defined by any of the above referenced agencies, and as such, has been defined via several published “conversations” by authorities in the field of and

12 phylogenetics. The definition has evolved to be widely accepted as requiring that a species be reciprocally monophyletic at the mitochondrial DNA (mtDNA) level, as well as show significant divergence at the nuclear DNA (nDNA) level (Moritz,

1994). A monophyletic group is one which includes an ancestral population and all its descendent populations. To be reciprocally monophyletic simply means that the populations of interest show equal divergence from the ancestral population. Appendix A, Figure 5 illustrates this relationship using basic phylogeny diagrams. Diagram B demonstrates reciprocal monophyly. We see that all “a” populations are equally descended from the same common ancestor as all “b” populations. Diagram A demonstrates paraphyletic relationships

(Kizirian & Donnelly, 2004). Deriving from these conversations has come another term, management unit (MU) which is used to describe populations showing allelic variation at any DNA loci, mitochondrial or nuclear. This is a more general level of genetic diversity that conservation biologists agree is deserving of conservation management (Moritz, 1994).

The ultimate goal of this study has been to determine the genetic composition of multiple informative nuclear DNA markers to provide a more thorough taxonomic description of the Santa Ana speckled dace. In addition, to analyze the genetic differences between the local taxon and the populations that inhabit the most proximal stream environments in Southern California, specifically the Eastern Sierra and Central Coast watershed systems, as well as the Colorado River.

13

Molecular Markers

The fields of study of phylogenetics and phylogeography have exploded since the development of polymerase chain reaction and sequencing techniques.

Grouping of taxa previously was done based on phenotypic polymorphisms and behavioral traits, as this was the extent of information that could be obtained by researchers (Avise, 2009). Science has known of the connection between those visible/behavioral traits and the heritable markers that exist within the organism, but did not have a way to visualize those “characters”, as Mendel referred to them (Mendel, 1865). Charles Darwin described evidence of the evolution of traits within and among populations (Darwin, 1859) relating them to inheritance but again did not have the knowledge that has since become available. The visualization and confirmation of the structure of DNA (Franklin & Gosling, 1953), paved the way for its utilization in molecular research from that time forward.

The development of protein electrophoresis in the 1950’s and 60’s

(Raymond, 1962; Smithies, 1955, 1959a, 1959b), which is still used today, was the precursor to DNA electrophoresis. The ability to amplify or clone DNA in order to obtain the necessary quantities to visualize was brought about through polymerase chain reaction, PCR (Mullis, 1987). Utilizing the continuing development of nucleic acid sequencing methods (Illumina, 2015; Sanger,

Nicklen, & Coulson, 1977) we have the ability to analyze the fundamental heritable elements affected by the forces of evolution – from a single nucleotide to an entire genome.

14

Proteins, the product of DNA transcription and translation, were one of the original molecular markers used to analyze species variation. While protein variation may reveal a certain level of differentiation, it cannot give as true a description as that of nucleotide sequence. The amino acid subunits of a polypeptide are determined by a three-base sequence, or codon, which dictates which amino acids are put into a specific arrangement along the chain. A basic piece of information in genetics is the flexibility of certain codon positions to experience mutation and yet still code for the same amino acid. This is referred to as redundancy. Analysis of protein sequences can only tell us if a protein substitution has occurred, but depending on the nucleotide substitutions that take place, this does not always happen. If the sequence experiences a mutation at the third position in the codon, also known as the wobble position, it is possible the same amino acid will result, thus masking the nucleotide substitution that may have occurred.

Along with advancement in molecular techniques came a transition to finer scale markers, the nucleic acid subunits that are the recipe for the protein products. Long sequences made up of just four nitrogenous bases each linked to a sugar and phosphate molecule, twined together by mere hydrogen bonds are the current source of genetic material of choice. In eukaryotic organisms, we can obtain DNA from a couple of different sources. In vertebrate organisms, among others, we can extract chromosomal DNA from the nucleus, or mitochondrial

DNA from the mitochondria.

15

Mitochondria were revealed to be a highly effective tool in the analysis of phylogenetics (Brown, George, & Wilson, 1979). The mitochondria are the cellular organelles responsible for oxidative phosphorylation in the process of cellular respiration. These organelles reside in the cytoplasm of eukaryotic cells.

Mitochondria are able to self-replicate due to the possession of their own DNA.

This DNA can be found in the inner mitochondrial membrane space of the organelle. The evolution of mitochondria within eukaryotic cells is believed to have been a result of an endosymbiotic event that occurred between α- proteobacteria and an early form of eukaryotic organism. Phylogenetic analysis of the hsp70 gene supported the link of a common ancestor between several species of α-proteobacteria and multiple eukaryotic species (Falah & Gupta,

1994). Mitochondrial DNA (mtDNA) is a small circular, usually double-stranded, piece of DNA that is completely separated from nuclear DNA. It is most often uniparentally transmitted; specifically, it is commonly only inherited from the maternal parent due to the complete transmission of the cytoplasmic fluid containing the mitochondria, of the female gamete to the zygote. Due to this fact, it does not go through genetic recombination (Dawid & Blackler, 1972).

The benefits of analysis based on mitochondrial DNA are plentiful. The maternal inheritance makes it haploid which provides a single template to reproduce thereby not adding confusion with allelic variation within an individual.

There are multiple mitochondria in each cell therefore it has a high copy number and is easy to isolate (Bogenhagen & Clayton, 1974). It is much smaller than

16 nuclear DNA and its length is predominantly composed of coding strand. For this reason, the structure of mtDNA is highly conserved among a wide variety of taxa.

Mitochondrial DNA has a single origin of replication (ORI), which is also known as the d-loop or control region where the instructions for control of mtDNA transcription; it contains the genes for the small and large ribosomal RNA subunits, and its replication is unidirectional in all species (Brown et al., 1979).

Additionally, mtDNA evolution has been shown to be extremely rapid as compared to nuclear DNA, including single copy nuclear DNA, despite the concept that genetic material with high functional constraint usually displays a slow evolutionary rate. It was hypothesized that mtDNA would evolve at a similar rate as single copy nuclear DNA (haploid DNA), but studies have shown that mtDNA evolves at a much higher rate possibly due to an increased mutation rate

(Brown et al., 1979). Mitochondria have low to no properly functioning repair mechanisms which brings about opportunity for the incorporation of more substitutions into the gene (L. I. Grossman, Watson, & Vinograd, 1973). In addition, more frequent replications combined with little repair of misincorporation errors increases the frequency of detectable mutations (Rabinowitz & Swift,

1970). Additionally, due to the haploid maternal characteristics the mitochondrial genome can exist at ¼ the effective population size of a nuclear genome (Birky,

Maruyama, & Fuerst, 1983). Therefore, mtDNA can demonstrate the development of population structure more quickly than nDNA (Brown et al., 1979) and thus the phylogenetic analysis can demonstrate reciprocal monophyly

17 earlier. From a conservation standpoint, populations of taxa may be considered for management and protection if their genetic description reveals them to be reciprocally monophyletic at the mtDNA level (Moritz, 1994). This allows the potential for a taxa to be designated as an evolutionarily significant unit (ESU) which is a subcategory of population distinction under the Endangered Species

Act (1973).

While mtDNA has been highly beneficial in the advent of phylogenetics, it has come to the attention of many scientists that there are limitations to the accuracy of the phylogenetic relationships hypothesized based on mtDNA.

Some of the characteristics that make mtDNA useful are also its shortcomings.

The haploid maternally inherited genome dictates that analyses based on mtDNA only gives a one-sided view of the historical relationship among taxa. The lineage will only reflect that of the matrilineal succession. Groupings of taxa may not mirror the actual gene sequence descent in populations where the males and females migrate or group contrarily (Palumbi & Baker, 1994). Since there is no paternal chromosome donation and recombination is rarely found (Lunt &

Hyman, 1997), the mitochondrial genome is transmitted as a whole entity and for all intents and purposes can be considered a single locus (A. C. Wilson et al.,

1985). In addition, as the genome is circular and there are no intervening non- coding intron segments in vertebrate mitochondrial genes, there may be a lack of independence in the evolution of the genes due to their close proximity (He et al.,

2008). Therefore, the use of multiple genetic loci in phylogenetic reconstruction

18 has become necessary. Combining mtDNA with nuclear DNA loci has shown to provide a more accurate hypothesis of evolutionary relationships (Palumbi &

Baker, 1994). Phylogenies based solely on mtDNA sequence have been recommended to be revisited to incorporate nuclear DNA information into the analyses to determine whether the original suppositions were accurate (E. A.

Myers et al., 2013). One of the forms of nuclear DNA revealed to be useful in examining both inter- and intra-species phylogenetic relationships is the predominantly non-coding intron segments within the coding genes of vertebrate organisms.

In 1977, while studying a mature viral structural protein and its corresponding mapped mRNA sequence, researchers at M.I.T. discovered when they hybridized the mature mRNA to the single-stranded DNA, the RNA included

“tails” on the 5’ end that did not compliment the DNA at that point along the strand. Rather, they found the complimentary DNA segments that matched the mRNA further upstream. After ruling out several possibilities it was deduced that these tail segments may result from alternate splicing of the precursor mRNA

(Berget, Moore, & Sharp, 1977). This research led to the discovery of introns, the intervening segments that reside within a gene (Appendix A, Figure 6). This term refers to the stretches of mRNA transcribed from the DNA template that are subsequently excised out during mRNA processing and excluded from the final mRNA transcript, as well as the corresponding sequence of the template DNA from which the excised segments are transcribed (Kinniburgh, Mertz, & Ross,

19

1978). As evolutionary divergence times are derived from analysis of nucleotide substitution within DNA, this dictates the importance of using chromosomal DNA intron segments as a basis for study when determining phylogenetic relationships between subpopulations experiencing isolation, rather than mRNA.

Introns are labeled “noncoding” segments of DNA as they frequently do not directly encode for the production of amino acids. They reside within gene segments but as they themselves do not code for proteins, they are not under the same selective pressures as are exons. They may, however, contain regulatory information pertaining to the transcription and/or replication of the gene. Introns are found interspersed throughout the genes. Introns can vary anywhere from

200 to 4900 base pairs long (Naora & Deacon, 1982). Therefore, the sequences from several independent introns provide sufficient sequence data for phylogenetic analysis. In contrast, exons are those segments of the gene that are transcribed and translated into amino acid sequences forming protein products

(Nei & Kumar, 2000) and are therefore under stronger selection and therefore less informative in closely related taxa. This is one reason that makes introns an excellent tool to study population structure.

This is not to say that all intron segments can be utilized for evolutionary study. Choosing introns within genes that are conserved across taxa, and even across species, will make it possible to predict the presence of the gene among multiple species. Studying the coding sequence would be uninformative as the most conserved sequences are those for which little to no base substitutions are

20 present. The introns within those genes though, will also be conserved to a point.

Their presence is expected within the gene while their actual sequence order can often withstand nucleotide substitution without detriment to the development and/or function of the organism. This provides a level of variation that enables analysis of genetic differentiation among populations of the same species.

Literature Reviews

Phylogenetics Studies Enhanced by the Addition of Nuclear DNA Data

As previously noted, mitochondrial DNA was initially the foremost molecular marker utilized in genetic characterization of both marine and freshwater vertebrate organisms. However, a multi-marker approach, comparing both mtDNA as well as multiple nuclear DNA markers, has become a much more widely effective method of population identification and differentiation of eukaryotic macroorganisms. Concordance among multiple genetic loci provides assurance of historically distant divergence allowing for a more concrete image of current population structure (Avise, 2009).

Twenty-five years ago, Stephen Palumbi began using a combination of both mtDNA and nuclear intron sequences to perform population studies on marine mammals. Palumbi and Baker (1994) analyzed the population structure of humpback whales, Megaptera novaengliae, using previously sequenced mtDNA

D-loop information (Baker et al., 1993). In conjunction with the D-loop sequences, they also amplified and sequenced the first intron within the highly

21 conserved musculoskeletal actin protein gene. DNA samples were taken from ten free-ranging humpback whales located around Hawaii and California, as well as blue whales and bowhead whales from all ocean populations, for comparison.

The 1,409 bp actin intron sequenced revealed approximately 3% and 4% difference between the humpbacks and the blue whales and bowhead whales, respectively. The phylogenetic tree created based on the initial study (Baker et al., 1993) utilizing mtDNA did distinguish the Hawaiian humpbacks from the

California humpbacks but revealed no variation within the Hawaiian population and minimal variation within the California population. The most informative phylogenetic tree formulated based on the intron marker however, presented a more highly varied structure. The two comparison whale species clustered accurately. The individuals belonging to the study group of humpback whales were clustered into two clades with no clear geographic structure to the cladogram. This study shows a strong difference in the evolutionary relationships using mtDNA versus a single nuclear DNA marker. The intron sequence revealed no particular pattern which could be heavily influenced by the fact that the males of these populations are migratory and thus the chromosomal DNA of a single male may be distributed throughout multiple populations reducing the amount of variation between the populations. This also illustrates the necessity for utilizing multiple gene markers. Based on the mtDNA alone we would conclude distinct genetic separation between populations but the nDNA reveals the level of introgression that actually occurs among the populations.

22

Further advancement of these techniques was demonstrated (Slade,

Moritz, & Heideman, 1994) by the use of three conserved nuclear intron sequences as well as an exon sequence to compare to the mtDNA control region phylogenetic tree of six species of pinnipeds. Corroboration of multiple nuclear loci lends stronger support to the determination of phylogenetic relationships. In this case, phylogenetic analysis based on the combination of loci was run via multiple minimum-evolution models each showing concurring relationship structure among the pinnipeds with canids as the outgroup, all members of order

Carnivora. This study showed the usefulness of using conserved intron sequences in the production of multigene phylogenetic analysis of closely related species.

Studies utilizing intron sequence to evaluate phylogenies of freshwater have become more frequent. For example, Angelo Bufalino and

Richard Mayden (2010b), recognized the need for nuclear DNA to lend greater support to previous mtDNA phylogenetic data on the relationships between many species of North American phoxinins, fish of the family. In this case, previous mtDNA work had been done, but Bufalino & Mayden were looking to further support that work. The results confirmed the three major clades revealed in the original studies. Analysis utilizing three analytical models provided a consensus of results. This study was a very extensive look attempting to resolve the relationships of several hundred species of fish. It resulted in further strong support for the previously identified relationships and additionally,

23 gave support to previously hypothesized relationships of many species. It resolved many loosely supported branches, as well. Many researchers have elected to reevaluate their previous phylogenetic studies, adding nuclear DNA data, in the form of microsatellites and introns, to provide a more accurate picture of the relationships between closely related species (Bufalino & Mayden, 2010a;

Chen et al., 2008; He et al., 2008a) and often reaffirm the pre-existing phylogenies.

Much more recently, a study detailed the phylogenetic resolution of eight species of freshwater fish endemic to southern Mexico and Central America

(Morcillo, Ornelas-García, Alcaraz, Matamoros, & Doadrio, 2016). These eight species are part of the Profundulidae family. Utilizing three mtDNA markers and two introns of the S7 ribosomal protein gene, two models were hypothesized, 8- species and 12-species. At the time of the study, the family included a single , Profundulus divided into two subgenera, Profundulus and Tlaloc. Each subgenus included four species. Morcillo et al, ran both Maximum Likelihood and

Bayesian Inference phylogenetic analyses on two concatenated sets of sequence data. The first analyzed the concatenation of the three mtDNA gene segments only while the second analyzed the concatenation of the three mtDNA segments (1897 bp), plus the two intron segments (998 bp) for a total of 2895 bp of sequence data.

The phylogeny based solely on the mtDNA concatenation in both the ML and BI analyses resulted in a somewhat convoluted division of lineage. Many

24 individuals sampled from the same region were grouped with others of another geographic location. One species of the Tlaloc subgenus, P. (T.) candalarius, was absorbed by another of the same subgenus, but this occurred in every phylogenetic reconstruction that was done until the Bayesian Species

Delimitation was constructed based on the hypothesized 12-species breakdown, which then separated out T. candalarius.

The phylogeny based on the mtDNA plus nDNA also resulted in the same number of clades but the individuals representing the geographic locations separated out with their nearest proximal relatives. The finer distinction is often the result of the more detailed genetic pool. Mitochondrial DNA alone will only provide a single parentage which often creates relationships among individuals that are more geographically distant. Including the biparental aspect of the recombinant nDNA intron sequences provided a tighter clustering of the individuals of the same geographic habitats.

In all permutations of the phylogenetic data though, the the mtDNA alone and the combination of the mtDNA and nDNA loci calculated extremely high probability of completely reciprocally monophyletic relationships all the way back to the outgroup(s). The authors of this study, due to the irrefutable support in all calculations of the data, state that while it could be conceivable that there are actually twelve distinct species among two definitely distinct genera, to remain on the conservative side, they proposed that the two subgenera should be recognized as two distinct genera, Profundulus and Tlaloc.

25

Phylogeography as Influenced by California Floristic Province Topography

The extensive biological variation that exists in Southern California makes it a prime location for phylogeographic studies. The varying ecosystems that make up this region; from evergreen mountain top communities, to coastal plains, to vast deserts and inland chaparral and scrublands, provide highly diverse conditions of which the effects can be studied in the multitude of organisms that inhabit each of these various environments. As evidence of this, many studies have documented the correlation between the distinct topographical formations present in this region, and the population distribution of many plant and species.

The phylogeography of the California mountain kingsnake has been studied using multiple mtDNA sequences (Rodríguez-Robles, Denardo, & Staub,

1999), and later using multiple genetic loci (E. A. Myers et al., 2013). The 1999 study sequenced the Ndh4 gene as well as three tRNA genes of seven subspecies such as the Baja California, San Bernardino mountain, San Diego mountain, and Sierra mountain kingsnakes to name a few. A maximum parsimony tree concurred with two variations of maximum likelihood trees resulting in two clades, a northern and a southern clade. The northern clade was further divided into two subclades, the coastal and the northeastern subclades.

The two ML trees differed only in the determination of one relationship and otherwise agreed.

26

Edward Myers (2013) revisited this original study with the purpose of determining the answers to two questions. He was interested to discern whether the geographic distinctions deduced from the mtDNA study would hold when also including nuclear DNA. Secondly, what area of the California Floristic Province was integral in the divergence events of this species?

Using the original 34 DNA extraction samples from the 1999 study, Myers sequenced two anonymous nuclear loci, CL4 and 2CL8 (Burbrink et al., 2011).

He then ran three separate phylogenetic trees, one for the mtDNA, one for the

CL4 nuclear loci, and one for 2CL8. The mtDNA sequence contained many more variable nucleotide sites than either of the nuclear loci, but only resolved the two clades, northern and southern, with the barrier between proposed to be inland seaways in the approximate region of the southern tip of the Sierra Nevadas

(Rodríguez-Robles et al., 1999). The nuclear loci also proposed two major clades, northern and southern but provides support for the hypothesis that the barrier may be climatic in nature and that it is located further north putting the separation between the two clades in the region of current Monterey Bay. This groups the former Coastal subclade with the current southern clade. Both of the nuclear loci trees independently revealed a third prospective clade to the east, that was teased out of the southern clade. Myers concludes that conservatively, it is suggested that at minimum this species be recognized as two separate species with the southern clade a candidate for future division.

27

Another study utilized mtDNA analysis in phylogeography of a California endemic invertebrate species (Vandergast et al., 2006). The authors of this study sought to distinguish between isolation factors contributing to genetic differentiation resulting from historical geological forces as opposed to the relatively more recent factors contributed by human presence. A single species of the Jerusalem cricket, endemic to southern California, is geographically isolated from the many closely related species that range throughout the rest of western North America. This species, Stenopelmatus ‘mahogani’, inhabits the cismontane region of southern California bordered to the north and east by the

Transverse and Peninsular mountain ranges, and to the south by the San Diego

River. By analyzing sequence data for the cytochrome oxidase I gene, results resolved the individuals into 65 unique haplotypes. These haplotypes did, in fact, mirror their geographic sampling locations. In the few instances where this was not the case, it was found the individuals were from a nearby sampling location to the one in which they were phylogenetically grouped. One case of a single haplotype present in two sampling locations occurred along the .

It was proposed this could have been the result of a previous flooding event or possibly that the river acted as a dispersal corridor. The populations showed little to no genetic variation within them but very high variation between them demonstrating the isolation of each population and the lack of migration between them.

28

Most locally, Ivan Phillipsen and Anthony Metcalf (2009), reported on the phylogeography of a local stream-dwelling frog, Pseudacris cadaverina. They hypothesized that genetic variation would be delineated by either the watershed systems, the mountain ranges, or into coastal and desert habitats. This is an organism that inhabits the shallow streams of the creeks within the watershed systems and has very limited dispersal, within 25 meters. In this respect, it is similar to R. osculus in that much of its genetic dispersal occurs due to flooding events or transplantation and thus populations become isolated from one another. Using the cytb and tRNA-Glu mtDNA genes, the authors ran multiple statistical scenarios based on the various hypothesized landscape features of influence. The resulting phylogenetic tree showed three distinct groups, which directly correlated with the three groups illustrated by the haplotype network diagram. Haplotypes grouped into either the Northern, Central, or Southern clades, with some overlap of the Central and Southern Groups. Overall evidence supports the hypothesis that the mountain ranges act as isolating barriers between the populations with the Transverse Range break as the primary barrier between the Northern groups, west of the , and the

Central/Southern groups.

Phylogenetics and Population Structure of the Santa Ana Speckled Dace

Initial study of the phylogenetic relationships of R. osculus in the southern

California region to populations of R. osculus inhabiting the central coast and eastern Sierra regions began in the Metcalf lab with work on two mitochondrial

29

DNA loci (J. J. VanMeter, 2017; P. M. VanMeter, 2017). Concurrently, a study of the relationships within and among the populations of the Santa Ana Speckled

Dace utilizing seven microsatellite loci was undertaken (Nerkowski, 2015).

Jay VanMeter sequenced the mtDNA control region (d-loop) and its related tRNAs for a total of 1143 bp of sequence data in 74 dace samples representing three California regions. Pia Van Meter sequenced the mtDNA cytochrome b gene which totaled 1155 bp in 92 dace samples representing three

California regions and the Colorado River, AZ region. Stacey Nerkowski genotyped seven microsatellite loci in 146 dace samples representing the three

California regions.

Analysis of the dloop sequence resulted in 14 unique haplotypes in the

Southern California region alone (Santa Ana, San Gabriel, and San Jacinto

Watersheds). Of the 14 haplotypes, only one was a mix of individuals from both

Santa Ana and San Gabriel watershed tributaries. The other 13 were composed of individuals within the same watershed. Bayesian inference phylogeny illustrated the relationship among the SASD, central coast, and eastern Sierra regions showing a reciprocally monophyletic lineage between the Santa Ana

Speckled dace and the coast/Sierra dace (J. J. VanMeter, 2017).

Analysis of the cytB gene, which included dace from the Colorado River, resulted in both Bayesian and Maximum Likelihood phylogenies revealing reciprocally monophyletic relationships between the SASD/Colorado and coast/Sierra dace. In both analyses, Colorado River dace were more closely

30 related to southern California dace. Minimum spanning haplotype network of the three watersheds of the Santa Ana Speckled Dace mirrored that of the d-loop network (P. M. VanMeter, 2017).

Microsatellite analysis as visualized in a Discriminant Analysis of Principle

Components (DAPC), based on the characterization of alleles of the SASD, central coast, and eastern Sierra regions, when the number of populations (K) is set to 3, all individuals were clustered as befits the region from which they were sampled (Nerkowski, 2015).

Collectively, these studies strongly suggest that the populations of R. osculus are genetically distinct from those of the Central California coast, the

Eastern Sierra Nevada valley, and the Colorado River. Recommendation for formal species status must include not only evidence of genetic reciprocal monophyly based on mtDNA sequence data but also significant genetic differentiation based on nuclear DNA sequence data. The mtDNA information has been provided by Jay and Pia VanMeter. The nDNA microsatellite data provided by Stacey Nerkowski corroborates the VanMeters’ conclusions regarding the levels of genetic differentiation between the three California regions’ populations.

The goal of this study has been to identify multiple nuclear DNA loci that would be informative in phylogenetic analyses and further, to determine whether these nuclear loci will support the mtDNA and microsatellite DNA evidence. In light of the fact that only a single intron marker had been sequenced on any R.

31 osculus specimens, the S7 ribosomal protein gene intron 1, I questioned whether

I could identify other novel introns for R. osculus. Secondly, would novel introns be useful in the characterization of molecular variation and divergence among populations of a species? If so, will it provide completion to the taxonomic description of the Santa Ana Speckled Dace as warranted by the ESA rejection of 1996?

32

CHAPTER TWO

MATERIALS AND METHODS

Research Objective

The intent of this study was to characterize the evolutionary relationships of Rhinichthys osculus populations among four regions of California and Arizona, using multiple intron markers. As only one intron has been sequenced on R. osculus and published, the S7 ribosomal protein gene intron 1 (Chow & Hazama,

1998), additional candidate intron sequences were needed to provide a more complete study. Based on the successful amplification of several intron sequences of fellow Cyprinidae family genera, Hypophthalmichthys and Danio

(Li, Riethoven, & Ma, 2010), research began on testing whether any of the primer sequences from this study would amplify on the R. osculus tissue samples in the

Metcalf lab.

Sample Collection

Rhinichthys osculus specimens, obtained from the Santa Ana watershed, were collected by the Metcalf Lab (CSUSB) under the auspices of the U.S Forest

Service (USFS) and California Department of Fish and Wildlife (CDFW). All other samples were collected under the auspices of the appropriate authorities and sent to the Metcalf Lab by the CDFW or USFS. Samples include those from the coastal region that lies north of the western Transverse Mountain range segment and west of the Sierra Nevada range, represented by tributaries of the San Luis

33

Obispo and Santa Maria River watersheds; the eastern Sierra Nevada region, north of the central Transverse range segment, represented by two Owens River tributaries; and the southern inland region, south of the Transverse range and east of the Peninsular Mountain range, represented by the Santa Ana, San

Gabriel, San Jacinto, and Los Angeles River watershed tributaries. Specimens from the Colorado River through the Grand Canyon were provided by the Arizona

Department of Game and Fish and one individual specimen was obtained from a southern tributary of the Colorado River via the Gila River. Sample sites overlayed onto Speckled Dace range map viewable in Appendix A, Figure 7.

R. osculus specimens were stored in 100% ethanol in 50 mL conical vials at -4ºC. A subset of the specimens, representing each tributary of each watershed, were identified for this study. Eight each from the Owens Valley,

Central Coast, San Gabriel, and Colorado Rivers, and 3 each from six tributaries of the Santa Ana River watershed, plus three samples from the San Jacinto River and two samples from the Los Angeles River. In total, 55 dace samples were identified to be used in this study (Appendix B, Table 1).

Molecular Methods

Approximately 0.25 grams of muscle tissue along the left or right lateral side was aseptically dissected from each designated specimen, being careful not to include any eggs from female specimens. Each tissue sample was placed on a sterilized pre-weighed watch glass and precise wet weight was obtained. Each sample was then treated for the removal of ethanol before beginning the DNA

34 extraction procedure. The removal of ethanol was accomplished by immersing the tissue in purified H2O on the watch glass, letting is sit for 5 minutes, then removing the water by wicking with a clean kim-wipe. This process was repeated

2-3 more times. At this time the watch glasses holding tissue samples were placed in a vacuum desiccator for one hour under a fume hood. At the end of one hour, samples were placed in sterile MC-15 tubes to begin the DNA extraction process. Initial DNA was extracted using a Qiagen® DNeasy Blood and Tissue

Kit according to the DNeasy® Blood and Tissue Handbook protocol for

Purification of Total DNA from Animal Tissues (Spin-Column Protocol). Final

DNA extract was suspended in 200 uL of Qiagen AE buffer. Extracted DNA was visualized on a 0.8% agarose gel run at 100 volts for 55 mins. Concentration and purity were determined using a NanoDrop® ND1000 spectrophotometer (US patent 6,628,382 and 6,809,826) at absorbance of 340 nm. This DNA was used for all preliminary primer amplification testing. DNA was stored in MC-15 tubes at

-20ºC. Aliquots were made of each DNA extraction stock for purposes of PCR.

Aliquots were stored at -4ºC.

All ten sets of primers that were successfully amplified on cypriniformes species, by Li et al. (2010), were obtained. Initial PCR was performed using the protocol described in the reference study (Li et al., 2010) but did not result in amplification of any Rhinichthys osculus DNA so optimization was undertaken over a long period of time. On an Eppendorf Mastercycler Gradient, PCR was run on a small subset of R. osculus samples representing the Santa Ana watershed,

35 with variations on reagent concentrations and temperature gradients, testing for all optimal parameters of the PCR protocol per each set of 10 primer pairs. Once successful amplification was obtained on this subset of samples, the primer list was narrowed to the six sets that provided the cleanest amplification. These initial PCR reactions were purified with ExoSAP-IT™ according to the ExoSAP-

IT™ protocol. Purified PCR was verified a second time by 1% agarose gel electrophoresis. Purified PCR was then prepared for Sanger sequencing according to submission requirements for premixed template by Retrogen Inc.

Before continuing, it was necessary to verify that the PCR amplicons and resulting sequence were, in fact, the same markers as those amplified by Li, et al. Several parameters were analyzed, including band length via gel electrophoresis and nucleotide BLAST™.

Once amplicons were verified, the subset of R. osculus samples was expanded to represent most of the streams of the Santa Ana, San Gabriel, Santa

Maria, San Luis Obispo, and Owens Valley watershed for which samples were available and viable. Upon further PCR optimization, three primer sets were most frequently successful at amplifying the expanded subset of samples and were therefore selected to be the three that would be used for the purposes of this study. Upon further consideration, it was decided it would be remiss not to include the one published sequenced intron of various Rhinichthys species, including several R. osculus subspecies, which is the first intron of the S7 ribosomal protein gene, as referenced earlier in this section of the thesis. The

36 purpose was to allow for continued analysis in later studies, to expand comparison with the published S7 intron sequences that many other studies have utilized to infer phylogenetic relationships of Rhinichthys species within and among other geographic areas (Bufalino & Mayden, 2010a; He et al., 2008b;

Hoekzema & Sidlauskas, 2014; Kim & Conway, 2014; Mussmann, 2018; Taylor,

McPhail, & Ruskey, 2015). One primer set of the three was substituted with the

S7 intron primers (Chow & Hazama, 1998). Primer sequences for all tested pairs are listed in Appendix B, Table 2 with final selected primers highlighted. PCR protocols for each of the final three are summarized in Appendix B, Table 3.

Once PCR protocols were optimized and primer selection was complete, the list of R. osculus specimens to be included in this study was expanded to include at least one representative of each tributary, of all watersheds represented in the sample collection, provided to the Metcalf lab. At this time additional specimens representing the Colorado River through the Grand

Canyon, were obtained from Arizona Game and Fish with an equivalent subset of these specimens added to the list. In all, 55 specimens were identified to be included in this study. Outgroups choices were limited due to the novel introns being used therefore, the two species of Hypophthalmichthys from the reference study (Li et al., 2010), H. molitrix, and H. nobilis, being the most closely related to

R. osculus, would serve as outgroups. It was determined that these two species had also been characterized on the S7 ribosomal protein intron, H. molitrix (He et al., 2008b), and very recently, H. nobilis (Stepien, Snyder, & Elz, 2019), making

37 them the optimal choices. Summary of Rhinichthys osculus outgroup specimens is listed in Appendix B, Table 4.

After expansion of the sample list and designation of outgroups, DNA was extracted from the selected R. osculus samples in the Metcalf lab using 2- phenol-chloroform isoamyl (25:24:1) and 1 chloroform isoamyl (24:1) extractions.

The isolated DNA pellets were resuspended in 100 uL of TE (tris-EDTA) buffer, with stocks stored at -20 ºC, and aliquots stored at -4ºC. DNA presence was verified by 1% gel electrophoresis, and quantity and purity were verified by

NanoDrop® One spectrophotometer.

PCR was performed on each DNA sample for all three introns under optimally determined conditions. PCR results were verified on 1% agarose gel electrophoresis at 100 volts for 55 min. Earlier sets of successfully amplified DNA were purified using Exo-SAP-IT™. Later PCR product was purified using the

Thermo Scientific™ GeneJet PCR Purification Kit.

All further Sanger sequencing was done by MacrogenUSA in the

Rockville, Maryland facility. Sequencing samples were prepared according to the

Macrogen sample preparation guidelines for premixed purified PCR product.

Sequence contigs were assembled, aligned, and base ambiguities resolved, using Geneious Prime 2019.1.3 (https://www.geneious.com). After multiple attempts to PCR all samples, a subset of individuals, though providing successful amplification, did not result in quality sequence data. To generate higher quality sequence, internal primers were designed for all three introns.

38

Internal primers were designed by aligning all successful sequence data per intron within Geneious Prime and primers were identified visually based on widely accepted optimal primer criteria. Annealing temperatures were calculated using the Thermo Fisher Scientific™ webtool, Tm Calculator. Internal primers are listed in Appendix B, Table 5. PCR on remaining problematic samples with internal primers was then completed.

Sequence Analysis

In the phylogenetic analysis of a population, standard sequence statistics such as nucleotide frequency, GC content, transition/tranversion ratios, polymorphic nucleotide site numbers, and haplotype identification are utilized to illustrate levels of genetic variation among and within the populations. These statistics were obtained using Geneious Prime 2019.1.3

(https://www.geneious.com), GenAlEx 6.5 (Peakall & Smouse, 2006, 2012), and

MEGA X (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). In addition to running all analyses on each primer alignment individually, a concatenation of the three sequences was created for each individual, and these sequences were also aligned and analyzed under the same conditions.

Population Genetics

Population genetics parameters describe the genetic variation within and among a defined population set. In this study, the central focus is the determination of the variation among the four regional populations of R. osculus.

Previous studies in the Metcalf lab have elucidated ancestral population genetics

39 within the regional populations and even within the tributary populations using microsatellite genotyping (Nerkowski, 2015). While regional analysis has also been completed utilizing the sequence analyses of two mitochondrial DNA markers, Control Region (dloop) and Cytochrome B (J. J. VanMeter, 2017; P. M.

VanMeter, 2017), this study’s purpose is the further characterization of the genetic status of the Santa Ana Speckled Dace (SASD) in relation to the nearest regional neighbor populations to provide a complete description of the taxonomy which may aid in the designation of the SASD as a unique taxonomic entity. To this end, it is necessary to include nuclear DNA sequence analysis.

The frequency-based population genetic statistic, Wright’s F-statistic, also known as the fixation index, is used in describing population genetic structure. F- statistics provide tools for comparison of allele frequency ratios of individuals and/or subpopulations to the total population. DNA sequence does not provide the same scale of genetic variation as microsatellite loci due to its slower mutation rates so rather than analyze allele frequency, an analog of the F- statistic, Phi-statistics are used to analyze haplotype diversity (Excoffier,

Smouse, & Quattro, 1992). Haplotypes are determined by the patterns of sequence variations (SNPs) within the nuclear DNA. Using GenAlEx 6.5 (Peakall

& Smouse, 2006, 2012), polymorphic nucleotide positions were identified. Based on the patterns of these polymorphisms, haplotypes within each of the intron alignment sets were distinguished by both watershed and region. The pairwise comparisons of each individual within a region results in a triangular matrix of all

40 the calculated genetic distance values between each pair of individuals, and consequently each region. PhiPT (ΦPT) quantifies the level of variation that exists technically between the subpopulations (P) to the total population (T). For our purposes, the analyses were run using the regions and therefore (P) will designate the regions and (T) will designate the entire dataset of all four regions.

Distance-based methods of analysis generate data by comparing the genetic divergence among the groups of the population. Analysis of Molecular

Variance (AMOVA) (Excoffier et al., 1992) analyzes the divergence of populations, or genetic distance, through the pairwise comparisons of the haplotypes creating matrices of these comparisons and data on the level of molecular variance among and within the groups. AMOVA also provides a pairwise analysis of the levels of migration (Nm) among the identified groups.

Principle Coordinates Analysis (PCoA) (Torgerson, 1958), transforms the data points of potentially correlated variables into linear form. PCoA, unlike PCA

(principle components analysis) analyzes the genetic dissimilarities among the sequences, and therefore focuses on the polymorphic nucleotide sites of each alignment. These statistical analyses were performed in GenAlEx 6.5 from the genetic distance matrices of each intron alignment.

Mantel Tests (Mantel, 1967) compare matrices to determine whether a correlation exists; if so, whether it is positive or negative, and the strength of that correlation. A pairwise genetic distance matrix can be analyzed against a geographic distance matrix to determine if and how much, the genetic distances

41 that exist between the regions can be explained by the geographic distances between them. Likewise, pairwise genetic variance (ΦPT) matrices can be analyzed against geographic distances to determine whether the level of variation that exists among the regions is in any way correlated to the geographic distances between them. These two analyses are not the same. The first quantifies a “corrected distance” based on the number of base substitutions in a DNA locus. The second quantifies the amount of overall variance. To best fit the purpose of this study, the ΦPT vs geographic distance was the optimal analysis to use. Average geographic coordinates in decimal form, for each of the four regions, were entered into GenAlEx. Pairwise geographic distance matrices were calculated for each pair of regions. The genetic variance matrices and geographic distance matrices for each intron were analyzed using a Mantel Test in GenAlEx. The resulting R value defines the level of correlation on a scale of -1 to 1, where R=-1 indicates a completely negative correlation; R=0 indicates no correlation at all; and R=1 indicates a completely positive correlation.

Phylogeography

Probability of phylogenetic relationships can be inferred and displayed by way of phylogenetic trees, created by the analysis of statistical algorithmic programs of genetic distance data. Before computing a phylogenetic tree, a best- fit model of evolution must be determined for each DNA sequence alignment.

The various models of evolution utilize differing criteria on nucleotide substitution rates to infer the most statistically probable evolutionary relationships between

42 the taxa being analyzed. The best fit models of substitution, i.e. evolution, for each of the three intron alignments were determined using MEGA X (Kumar et al., 2018).

Methods of statistical inference are also varied. Bayesian inference

(Bayes, 1763) is the preferred method of statistical probability inference.

Bayesian analysis is based not just on random probability outcomes but is based on a coalescence of probabilities; i.e. initial assumptions of set conditions, the evidence of all probable conditions within a range, and the probability after weighing the evidence of all conditions. Bayesian analysis was used to determine phylogenetic trees using the best-fit models of evolution determined by MEGA X.

A MrBayes 3.2.6 (Huelsenbeck & Ronquist, 2001) plugin was installed into

Geneious Prime allowing trees to be run on a GUI interface using MrBayes programming. All trees run included both Hypophthalmicthys outgroups. Two

Rhinichthys species, R. cataractae and R. atratalus, were included in phylogenetic analysis of the S7 ribosomal protein gene intron for the purposes of analyzing the outcome when taking into account samples representing both a common family and a common genus.

43

CHAPTER THREE

RESULTS

Molecular Methods

Using novel molecular markers for a phylogenetic study results in increased requirements for verification of success. In light of this, presentation of results may be somewhat more in depth than would typically be documented in these types of studies.

Preliminary Testing

In order to verify that the amplicons produced using the published primer sets (Li et al., 2010) that successfully amplified fish from the same family as

Rhinichthys osculus, the cyprinidae family, many checks were performed at each step of early testing. The initial determination to use three intron primer pairs

(4174E20, 36298E1, and 19231E4) from the Li study for this research made it necessary to verify the amplicons produced in the Metcalf lab were in fact introns of the same identity. Later, the 19231E4 primer set was replaced with the

S7RPEX1 primer set for amplification of the S7 ribosomal protein gene intron 1.

Subsequently, amplification data for the removed intron will not be reported in this thesis.

Once PCR was successful, band length of the R. osculus amplicons were determined to be similar in length to those of H. molitrix and H. nobilis. Band length for the ccr4-not transcription complex subunit 1, intron 20 (cnot1) was

44 approximately 750 bp for R. osculus while H. molitrix and H. nobilis were 779 bp and 833 bp, respectively. Band length for the hypothetical protein gene intron 1

(hpg) for R. osculus was approximately 350 bp, while the H. molitrix band was

342 bp and H. nobilis was 345 bp, (Appendix A, Figures 8 and 9). Initial sequence data was entered into a Genbank BLAST™ and resulted in the highest identity results being that of H. nobilis with 90.28% and H. molitrix with 88.2% identity with R. osculus for the cnot1 intron sequence. For the hpg intron, again the highest identity results were H. nobilis (86.14%) and H. molitrix (85.95%),

(Appendix B, Table 6). Lastly, a single R. osculus sequence was aligned with the

Genbank sequences for H. molitrix and H. nobilis for both introns revealing high levels of consensus throughout the strand lengths, shown in Appendix C,

Datasets 1 and 2. Collectively this verified that amplification of R. osculus DNA using these primers had resulted in the correct intron sequences.

Sequence Analysis

Sequence data was received electronically from Retrogen, Inc. and

MacrogenUSA. AB1 files were input into Geneious Prime and the forward and reverse contigs were assembled. Assemblies were then aligned. Alignments allowed ambiguity resolutions where base-calling was confident. In those cases where strong support for a base-call was not present, the ambiguous base was left in place.

Basic sequence traits are summarized in Appendix B, Tables 7. Out of the

55 R. osculus specimens used to amplify the three introns, two specimens were

45 removed from the data set, Mill Creek 4 (ML4) and West Fork San Gabriel River

9 (G9). ML4 DNA extract was found to be too poor quality to provide decent sequence. G9 sequence varied highly from all other sequences and could not be included in good conscience as it was possible this individual may not be a dace specimen. The most successful sequencing results were for hpg intron 1. All 53 viable samples provided quality sequence that was able to be aligned. Final trimmed alignment length was 292 bp out of the original 350 bp reads. Pairwise identity among the sequences was 98.0% with all four bases approximately equal in frequency (21.0%-27.7%). The cnot1 intron 20 final alignment length was 657 bp out of the original 750 bp. Only 45 sequences were of optimal quality to include in the alignment. Pairwise identity was 97.4% and base frequencies ranged from 19.1%-30.0%. The s7rp intron 1, while being the longest amplified intron at nearly 1100 bp, after trimming to the shortest quality read, resulted in only 507 bp of aligned sequence with 40 specimens. Pairwise identity was 97.6% and base frequencies were 13.6% (C), 20.3% (G), with A’s and T’s at equal frequency of 33.0%. In order to have as much sequence length as possible for the concatenated alignments, sample size was reduced to the longest 34 specimens with at least 2 or more individuals representing each region. To include more individuals would have required the sequence alignment be trimmed to the shortest sequence thus cutting out some informative sites. In the end, the concatenated alignments were 1493 bp in length with pairwise identity at

97.5%. Base frequencies were 17.8% C, 21.5% G, and 30.4% for both A’s and

46

T’s. GC content for all sequences was between 33.9%-45.4%. Transition- transversion data, Appendix B, Table 8, was included with the spreadsheet results of the best fit model of evolution analysis by MEGA. Only the data for the concatenated sequences is reported as it includes all sequence variation in entirety, for the three introns.

Population Genetics

Alignments were exported from Geneious Prime in FASTA file format and input into GenAlEx. FASTA input files are listed in Appendix D. FASTA alignments were analyzed and data was generated identifying all polymorphic nucleotide sites among the sequences of an alignment. Polymorphic nucleotide site lists are shown in Appendix C, Data Sets 3, 4, and 5. The patterns of SNPs among the individuals determines the number and traits of each haplotype. Basic haplotype counts are summarized in Appendix B, Table 9. The CNOT1 Intron 20 alignment resulted in 12 haplotypes with 8 of them being unique haplotypes, meaning only one individual represented that haplotype. The HPG Intron 1 alignment produced 15 haplotypes with 12 unique haplotypes. The S7RP Intron 1 alignment produced 18 haplotypes with 13 unique haplotypes. Haplotype specimen assignments for each intron alignment are shown in Appendix B,

Tables 10, 11, and 12. In each instance, the Eastern Sierra dace separated out into individual haplotypes. Based on the haplotype analyses of HPG Intron 1

(Table 11) and S7RP Intron 1 (Table 12), the 7-8 Owens Valley dace were placed into 7-8 haplotypes. The Colorado River dace, in the analyses of both the

47

CNOT1 Intron 20 (Table 10) and HPG Intron 1, resulted in the eight specimens being separated into four haplotypes. The eight Central Coast dace separated into two (S7RP), three (CNOT1), or four (HPG) haplotypes depending on the intron analyzed. The southern California Santa Ana speckled dace specimens largely clustered into only a few haplotypes. CNOT1 haplotypes include two individuals each in their own unique haplotype with all other individuals (n=26) clustered into one haplotype. The HPG intron haplotype analysis clustered all 29

SASD individuals into one haplotype in addition to five of the eight Colorado

River dace samples. The largest variation occurred in the analysis of the S7RP

Intron, with 23 SASD samples being separated into seven haplotypes, though the majority (n=16) clustered into two haplotypes.

Using the processed sequence data, an AMOVA was run on each set of sequences representing the three introns individually as well as on the concatenation of the three. AMOVA results are summarized in Appendix B, Table

13. All analyses of molecular variance were set at 999 permutations, the DNA was designated as haploid (or binary haploid) to account for it being single copy nuclear DNA (Blyton & Flanagan, 2012), and results were generated from a regional perspective rather than individual populations. Data table shows results based on AR=among regions or WR=within regions, though there are a few instances where inclusion of population level data may occur. Analyses of the individual pairwise genetic distances are consolidated into a regional ΦPT table.

PhiPT values typically indicate the level of separation of the individual populations

48 to the total (PT). In these analyses the (PT) is evaluating the variation of the regions (n=4) to the total dataset. All intron AMOVAs estimated levels of molecular variation at 0.906-0.966. These values are converted to percentages and are illustrated by pie charts of molecular variance for each of the three intron analyses, as well as analysis of the concatenated sequences. These charts are shown in Appendix A, Figures 10-13. The level of separation is intuitively inversely proportional to the level of migration (Nm) occurring between the populations. Consequently, the Nm (haploid) based on the S7RP intron is higher than the other three analyses, at 0.052 while the migration value for HPG is

0.031, and CNOT1 and the concatenated analyses both indicate an Nm=0.017.

The P-values for each comparison lend additional support, p ≤ 0.001.

Breakdowns of regional pairwise ΦPT and migration data are included in

Appendix B, Tables 14-17. The upper portion of each table displays the pairwise

ΦPT and p-values. The lower portion shows the migration values. Further analysis of these datasets through Principle Coordinates Analysis (PCoA) for each set of intron data as well as the concatenated data, are shown in Appendix A, Figures

14-17. For the CNOT-1, S7RP, and the concatenated sequence representations, the individuals are clustered together based on their genetic relatedness which also coincides with their geographic identification. In each case, the Colorado dace did cluster closely, if not directly on top of, the SASD. In order to parse out a more accurate relationship between them, an additional PCoA was performed using the concatenated sequence data of only the HPG and CNOT-1 introns.

49

This enabled me to include all eight Colorado dace represented in the study as all eight sequenced successfully for these two introns whereas only two of the eight successfully sequenced on the S7RP intron. Appendix A, Figure 22 shows an analysis of the SASD, Colorado River, and Central Coast dace populations providing a more precise distinction between the regions.

In GenAlEx, pairwise geographic distance matrices were calculated for each data set upon input of geographic coordinates correlated to the individuals of each region. Geographic coordinates were triangulated to create a central point representing the region. Average geographic coordinates are listed in

Appendix B, Table 18.

Mantel tests analyzing the correlation of the level of genetic variation

(ΦPT), derived in the AMOVA, to average geographic distances of each region, are shown in Appendix A, Figures 18-21. In each analysis, for the three introns independently, as well as the concatenation of the three, the data shows little to no correlation between geographic distance and the level of genetic variation existing in each region. In the Mantel tests for the HPG, S7RP, and tri- concatenation, the slopes are nearly horizontal to slightly negative with R-values of 0.055, -0.183, and -0.382, respectively. In the case of the CNOT-1 analysis, the slope is evidently negative with an R-value of -0.4605.

Phylogeography

In every formulation, the Bayesian analyses inferred reciprocally monophyletic phylogenetic relationships between the SASD/Colorado clades and

50 the Central Coast/Eastern Sierra clades with 100% support for that branch node in the lineage. In every variation of the analyses, the haplotypes representing the individuals physically collected from a geographic location were analytically grouped into the genetic clade corresponding to their geographic location.

Analyses were run using individual sequence data as well as haplotype sequence data and all verify that the SASD and Colorado River dace are more closely related to each other than to either the Central Coast or Eastern Sierra dace. The SASD/Colorado branch hierarchy consensus among all phylogenetic trees (Appendix A, Figures 23-26) illustrates the likelihood that the Speckled dace colonized the Colorado River region first and then the subsequent divergence event lead to the Speckled dace that colonized Southern California, as proposed by Smith and Dowling (Smith & Dowling, 2008).

51

CHAPTER FOUR

DISCUSSION

Molecular Markers

My foray into identifying novel introns within the DNA of Rhinichthys osculus for the purposes of phylogenetic analyses were successful as evidenced by the amplification of sequence using multiple primer sets designed for fish within the same Cyprinidae family and substantiated by the comparable length of the PCR bands to those of Li et al (2010). Verification of this success is seen by not only the resulting PCR amplification of DNA segments, but consequently substantiated by the comparable length of the PCR bands to those of Li et al, the

GenBank BLAST of and subsequent alignment with, the R. osculus sequences, showing high identity values to the sequences uploaded to GenBank by Li et al.

Most importantly is the resulting data trends that provided concrete support of previous mitochondrial and microsatellite analyses, both of which are highly regarded as trusted sources of molecular data in phylogenetic studies. This is not to state that any intron sequence would be useful in these types of studies but the same can be said of some mtDNA sequences as well as microsatellite loci.

It’s important for intron markers to be located within conserved genes among and/or within species but conversely not be so conserved as to disallow any form of mutation events. The intron markers utilized in this study were composed of sufficient sequence consensus but also significant nucleotide site variability as to

52 allow for the identification of species as well as geographic population identification, to a degree. The polymorphic nucleotide site tables reveal several regional trends such as the 8 base gap that exists in every included sample of the Colorado River and SASD sequences of the HPG intron (Appendix C,

Dataset 4). Regional patterns of base substitutions appear in multiple sites, for example sites 449-454 of the CNOT1 intron alignment (Appendix C, Dataset

3a/b). These represent the types of mutations that take place over millions of years between isolated populations of a species.

Population Genetics

The aforementioned population structure identified by the intron data is solidly supported by the various forms of population genetic analyses performed in this study. When trying to answer the question of whether there is evidence of geographic isolation between the regional populations, we look for either the presence or absence of genetic patterns unique to one or each of the regions, in the form of indels and/or substitutions, that tell us whether the individuals of each regional population are showing any indication of migration between populations.

High migration rates (Nm) would signify there is still connectivity between the populations, and we should expect less sequence variation. Populations with smaller ranges, less ability to move between habitat locations, and/or highly fragmented habitats, would be expected to have low levels of gene flow, i.e. migration (Hastings & Harrison, 1994).

53

The analyses of molecular variance (AMOVA) completed for each of the individual intron sequence alignments, as well as the concatenated sequence alignment, strongly indicate there is no current migration and there has not been for a substantial amount of time. The overall regional PhiPT values for every intron data analysis exceed 0.90, and the inversely proportional Nm values all fall between 0.01-0.06. As one would expect of populations experiencing long periods of separation and genetic isolation, genetic variance is extremely high while migration is equally as low. Pairwise comparisons of these values vary slightly from the overall but maintain the same trend of high genetic variance and low migration. Equally supportive are the pie charts (Appendix A, Figures 10-13) illustrating that the vast majority of the genetic variance is accounted for by the differences in regions rather than the tributaries within the regions or the individuals within the tributaries. This also makes sense when considering the separation of the populations over time.

Principal Coordinates Analysis (PCoA) lends further support to the conclusion that these populations are distinct genetically, which correlates to their geographic distinction. In this case, geography is simply a mark of location rather than distance.

The concept of Isolation by Distance (IBD), whereby the justification for the level of genetic variance is explained by the measured distance between the populations, is often the explanation for genetic differentiation (Wright, 1938).

This is not the case here. The Mantel Tests (Appendix A, Figures 18-21) illustrate

54 that there is little to no correlation between the geographic distances between the regions to the level of genetic variance. This makes sense in light of the evolutionary trajectory hypothesized by Smith and Dowling. If we consider that the Santa Ana Speckled Dace came to occupy the Los Angeles basin via the

Colorado River in a divergence event estimated at approximately 1.9 Ma, this would mean that dace of the Colorado River would be genetically more closely related to the SASD yet distance-wise, they are much farther apart than the

SASD are from the Central Coast dace. Geographic distance table (Appendix B,

Table 18) shows the geographic distance between the SASD and Colorado River regions to be approximately 520 miles while the SASD region is separated from the Central Coast region by approximately 296 miles. Despite this, genetic variance between SASD and Colorado dace is lower than that between the

SASD and Central Coast dace. The geographic distance appears to have no influence on the level of variance. Rather, historical dispersal and biogeographic variables played a much larger role as described below.

Phylogeography

To further support the proposition that rather than proximity as a factor to explain genetic variance, but rather the hypothesized path of historical migration and occupation of the areas (Smith & Dowling, 2008), the Bayesian analyses clearly display evidence of the relationship between the Colorado River dace and the Santa Ana Speckled Dace. As previously stated, each form of the analyses resulted in a reciprocally monophyletic branching of the SASD from the Central

55

Coast/Eastern Sierra clade. Additionally, in each run the Colorado dace, 100% of the time, branched out with the SASD.

Hydrographic History of Connectivity

Many hypotheses have been put forth over the last century about the path taken by R. osculus to their current regional habitats. These hypotheses go hand-in-hand with the hypotheses regarding historic hydrographic connectivity. It would seem the most likely explanation for the dace to have taken up residence in southern California would be that Rhinichthys species, noted to have been present in the Snake River during the Pliocene (Smith & Dowling, 2008), were able to expand their range via the Snake River connection to the Lahontan and

Columbia River basins (Hubbs & Miller, 1948), to the Northern California Owens

Valley, then diverging west to the Central Coast area and southward to the Los

Angeles basin. However, my genetic data does not support this trajectory. If this were the case, genetic variance between these three regions would be much lower indicating more relatedness. Additionally, divergence estimation based on genetic data indicates that the Northern California, i.e. Owens Valley, dace divergence from the Bonneville and Columbia Basins occurred about the same time as the Bonneville Basin drainage into the Snake River leading to dace occupation in the Colorado River (Smith, Morgan, & Gustafson, 2000).

Alternative explanations include inferences from geologic evidence of connectivity between the Mojave River basin and the Colorado River. The

Mojave River headwaters start in the San Bernardino Mountains, part of the

56

Transverse Ranges, flowing east to the desert culminating in the Silver and Soda

Lakes (Williamson, 1853). The Mojave River Basin included Pleistocene lakes,

Harper Lake, Lake Manix, and Mojave Lake. Geologic evidence connects these lakes indicating they existed during the same time periods (Enzel, Wells, &

Lancaster, 2003). Additionally, it was proposed that there had been connectivity between the Mojave River and the lower Colorado River by way of a series of overflow events from Troy Lake into Bristol Lake, Cadiz Lake, Danby Lake, and finally into the Colorado River southern regions (Blackwelder, 1954). It is conceivable this could provide a pathway for Pleistocene Rhinichthys to migrate up through the series of lakes into the Mojave River main and eventually to the headwaters in the San Bernardino Mountains. Current dace are quite adept at navigating shallow river systems with the ability to move upstream over or around small barriers between pools (Moyle, 2002)(personal observation, 2010).

Collectively, these pieces of information lend support to Smith and

Dowling’s genetic evidence suggesting that the SASD and the Gila/Salt River dace experienced a divergence event approximately 1.9-1.7 Ma, subsequent to an earlier divergence event of the Upper and Lower Colorado River segments from the Pacific Northwest regions approximately 3.6 Ma, and back to the original appearance of Rhinichthys species in the Snake River and the consequent divergence of R. osculus from their sister species approximately 6.3 Ma.

57

Conservation Implications

“Biodiversity is the totality of all inherited variation in the life forms of Earth, of which we are one species. We study and save it to our great benefit. We ignore and degrade it to our great peril.” (E. O. Wilson, n.d.).

In light of the evidence described in this study, attained through the use of novel intron sequence analysis, and the evidence provided by my graduate student predecessors, Stacey Nerkowski, Jay VanMeter, and Pia VanMeter, I propose that the Rhinichthys osculus populations of Southern California, the

Santa Ana Speckled Dace, show more than sufficient genetic distinctness from the most proximal dace populations to be considered a unique taxon at the species level. Not only has it been shown that the SASD are reciprocally monophyletic to the Eastern Sierra and Central Coast populations based on mitochondrial DNA analyses, I have also shown the same based on nuclear intron DNA analyses.

The Santa Ana Speckled Dace are a unique taxon having experienced an extreme expanse of time of reproductive isolation amid seasonally chaotic environmental conditions. They are an anomaly in the evolutionary trajectory of the Rhinichthys species. The Santa Ana Speckled Dace should be afforded federal protection under the Endangered Species Act because populations are declining rapidly due to anthropogenic forces, increased drought, fires, and floods. We must not ignore the need to protect this small but important source of biodiversity endemic to the Southern California inland waters.

58

APPENDIX A

FIGURES

59

FIGURE 1: Geographic subdivisions of California outlining the CA Floristic Province from http://ucjeps.berkeley.edu/cguide.html#Map

60

FIGURE 2: California map of mountain ranges and valleys illustrating placement of the ranges included in this study, the Coast Ranges, Transverse Ranges, and Peninsular Ranges.

This image is in the public domain in the United States because it only contains materials that originally came from the United States Geological Survey, an agency of the United States Department of the Interior.

61

FIGURE 3: Rhinichthys osculus range throughout the western United States.

http://explorer.natureserve.org/servlet/NatureServe?searchName=Rhinichthys%20osculus

62

FIGURE 4: Rhinichthys osculus’ adult specimens. The laterally positioned individual best displays the signature speckled phenotype. Dorsal and lateral views illustrate its subterminal mouth.

63

FIGURE 5: Diagram B displays reciprocal monophyly for haplotype a and haplotype b while diagram A displays populations that are paraphyletic (Kiziriana & Donnelly, 2004).

64

FIGURE 6: Basic illustration of a gene with exon and intron segments color- coded through each step of mRNA transcription and processing. The intervening intron segments are the sources of the DNA used in this study.

https://www.britannica.com/science/transcription-genetics/media/602486/114928

65

FIGURE 7: Rhinichthys osculus range map throughout California. Sampling locations within California and Arizona are designated by red circles. 1: Eastern Sierra (Owens Valley), 2 & 3: Central Coast (Santa Maria and San Luis Obispo Rivers), 4, 5, & 6: southern California (San Gabriel, Santa Ana, and San Jacinto Rivers), 7: Colorado River (Grand Canyon), and 8: Colorado River (Sonoita Creek).

1

7

2 3

4 5 6

8

Map created on https://databasin.org.

66

FIGURE 8: Gel electrophoresis verification of band length for the hpg intron.

Locus 36298E1 ~350 bp

Ladder

1000

500

250

01/21/2014

FIGURE 9: Gel electrophoresis verification of band length for the cnot1 intron.

1000 750 500

250

67

Figure 10: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of cnot1 intron 20 sequence data.

% Molecular Variance-CNOT1 Intron 20 3%

97%

Among Regions Within Regions

Figure 11: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of HPG Intron 1 sequence data.

% Molecular Variance-HPG Intron 1

6%

94%

Among Regions Within Regions

68

Figure 12: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of S7RP Intron 1 sequence data.

% Molecular Variance - S7RP Intron 1

9%

91%

Among Regions Within Regions

Figure 13: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of the concatenated sequences of the three introns.

% Molecular Variance-Concatenation

3%

97%

Among Regions Within Regions

69

Figure 14: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of CNOT1 Intron 20 DNA sequence [4 Regions-45 Samples-14.2% polymorphic nucleotide sites (PNS)].

Principal Coordinates (PCoA) - CNOT1 Intron 20

COORD.2

COORD. 1

Central Coast Owens Valley CO River SASD

Figure 15: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of HPG Intron 1 DNA sequence (4 Regions-53 Samples-7.2% PNS).

Principal Coordinates (PCoA) - HPG Intron 1

COORD.2

COORD. 1

Central Coast Owens Valley CO River SASD

70

Figure 16: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of S7RP Intron 1 DNA sequence (4 Regions-40 Samples-9.1% PNS).

Principal Coordinates (PCoA) - S7RP Intron 1

COORD.2

COORD. 1

Central Coast Owens Valley CO River SASD

Figure 17: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of three concatenated intron sequences (4 regions-34 Samples-11% PNS).

Principal Coordinates (PCoA) - 3 Intron Concatenation

COORD.2

COORD. 1

Central Coast Owens Valley CO River SASD

71

Figure 18: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for CNOT1 Intron 20 for four regions.

REGIONAL GGD X ΦPT - CNOT1 1.02 1.00 0.98 0.96 0.94 y = -0.0001x + 1.009 0.92 R² = 0.2121 0.90 R = -0.4605

0.88 REGIONAL PhiPT REGIONAL 0.86 0.84 0.82 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE

Figure 19: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for HPG Intron 1 for four regions.

REGIONAL GGD X ΦPT - HPG 1.20

1.00

0.80 y = 7E-05x + 0.75 0.60 R² = 0.003 R = 0.055

0.40 REGIONAL PhiPT REGIONAL

0.20

0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE

72

Figure 20: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for S7RP Intron 1 for four regions.

REGIONAL GGD X ΦPT - S7RP 1.00 0.90 0.80 0.70 0.60 y = -6E-05x + 0.8899 0.50 R² = 0.0335 0.40 R = -0.183

0.30 REGIONAL PhiPT REGIONAL 0.20 0.10 0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE

Figure 21: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for three concatenated sequences for four regions.

REGIONAL ΦPT X GGD - CONCATENATION 1.20

1.00

0.80 PhiPT 0.60 y = -0.0001x + 1.0001 R² = 0.146 0.40

Regional Regional R = -0.382

0.20

0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 Geographic Distance

73

Figure 22: PCoA of concatenated sequence data of two introns (cnot1, hpg) and three regions; CC.CA (n=7), CO.AZ (n=8), SASD (n=29). This analysis included all eight Colorado River samples.

Principal Coordinates (PCoA) - 3 Regions COORD.2

COORD. 1

CC.CA SASD CO.AZ

74

Figure 23: Phylogenetic tree of concatenated sequences (n=34) inferred using the MrBayes plugin within Geneious.

Sierra Sierra

Eastern Eastern (n=2) Region

Central Coast Coast Central (n=7) Region

SASD: Southern SASD: Southern California (n=23) Region

Region (n=2) Region CO River (AZ) River CO

s7rp

-

hpg

-

Concatenation of 3 intron sequences: sequences: of 3 intron Concatenation cnot1 Tree Bayesian Model: GTR+G & molitrix H. nobilis H. Outgroups:

Geneious 2019.1 created by Biomatters. Available Biomatters. Available by Geneious2019.1 created https://www.geneious.com from

75

Figure 24: Phylogenetic tree of CNOT1 Intron 20 (n=45) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table 19, Appendix B.

Eastern Sierra Sierra Eastern Watershed (n=2) Region

SASD: Southern California (n=28)Region

River (AZ) (AZ) River

Central Coast Coast Central Watershed (n=7) Region

CO CO Region (n=8) Region

Biomatters. Available Available Biomatters.

not 1 transcription complex subunit complex 1 not transcription

-

CCR4 20 Intron gene, Tree Bayesian Model: HKY nobilis H. molitrix, H. Outgroup:

Geneious 2019.1 created by by Geneious2019.1 created https://www.geneious.com from

76

Figure 25: Phylogenetic tree of HPG Intron 1 (n=53) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table 19, Appendix B.

Central Coast Coast Central Watershed (n=8) Region

Region (n=8) Region Eastern Sierra Sierra Eastern Watershed

SASD: Southern Region California (n=29)

CO River (AZ) River CO (n=8)* Region

*

s at nodes are bootstrap values bootstrap s are at nodes

Outgroup: H. molitrix, H. nobilis H. molitrix, H. Outgroup: Hypothetical Protein Gene, Intron 1 Gene, Protein Hypothetical Intron Bayesian Tree Model: HKY #

Geneious 2019.1 created by Biomatters. Available Biomatters. Available by Geneious2019.1 created https://www.geneious.com from

77

Figure 26: Phylogenetic tree of S7RP Intron 1 (n=40) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table

19, Appendix B.

thern

Coast Coast Watershed (n=8) Region

Central

Eastern Sierra Watershed (n=7) Region

Sou California Region (n=23)

SASD: Region (n=2) Region River CO (AZ)

1 S7 Ribosomal Protein Gene, Intron Gene, Intron Protein S7 Ribosomal Tree Bayesian Model: GTR+G molitrix/H. H. nobilis Outgroup:

Geneious 2019.1 created by Biomatters. Available from 2019.1 Biomatters. created by Geneious https://www.geneious.com

78

APPENDIX B

TABLES

79

TABLE 1: Rhinichthys osculus specimen information utilized in this study.

Mountain Successful Specimen Sample Watershed Range Tributary Sequencing ID Code Size (n) Headwaters Reactions San Jacinto San Jacinto Indian Creek IN 3 8 River CC City Creek 3 9 CF Mill Creek ML 3 3 San Bernardino Plunge Creek PC 3 7 Santa Ana River Twin Creek T 3 9 Cajon Creek CJ 3 9 Lytle Creek LC 3 9

Cattle Canyon CT 2 5 East Fork SGR E 1 3 San Gabriel San Gabriel Fish Canyon F 1 3 River North Fork SGR N 2 6

West Fork SGR G 2 3 Los Angeles Haine River H 2 6 River Marvin’s Marsh M 3 6 Owens River Sierra Nevada Pine Creek P 5 11 Brizziolari Creek BZ 1 3 San Luis Obispo Santa Lucia San Luis Obispo SLO 2 6 River Stenner Creek ST 1 3

Coast Range, Cuyama River CY 1 2 Los Padres Davy Brown D 1 3 Santa Maria National Forest Creek River Manzana Creek MZ 1 3 San Rafael Sisquoc River SS 1 3 Colorado River Sonoita Creek S 1 3 via Gila River Colorado River Colorado River Rocky Mtns CR 7 15 (Grand Canyon) 55 138

80

TABLE 2: Complete list of intron primer sequences tested on R. osculus samples in the Metcalf lab. Shaded rows signify the three intron primer sets chosen for this study.

Intron Gene Description Primer Sequences (5’-3’) Primer ID

UPF0027 protein F: GGAGATGGGYGTGGACTGGTCYCT 59107E2 homolog R: ATTGTAGATCTCVTCCACCACCTGRAT

F: ATGARGAAAATGAGGCCAACTTGCT 55378E1 Peroxisome proliferator R: GCCACCTGKGTATTGATTATAGCTGAG

F: CCTAGTGGACTGTARTAACGCCCCYCT 55305E1 Ret proto-oncogene R: AAGCCATCCAGTTTGCATAAACACTATC

Hypothetical protein F: GATCCTGAGGGAYTCCCAYGGTGT 36298E1 gene LOC415169 R: GGGCCAGGACTCTCYTGGTCTTGTAGT (hpg), intron 1

60S ribosomal protein F: GTACTCTCKGTACATGTTGTGRGTKCC 25073E1 L18a R: GAAGGTGAARAACTTTGGBATCTGG

F: CGGARGACTACGGACGTGATTTGAC 19231E4 Spectrin alpha 2 R: CTCCYTCCAGTGSTCCACAAACT

60S ribosomal protein F: CCACAARTACAAGGCCAAGAGRAACTG 14867E1 L8 R: GTTCTCCTTSTCCTGSACGGTCTT

Karyopherin (importin) F: GGAGGAGARTTYAAGAAGTAYCTGGACAT 8680E3 beta 1 R: CSCCCTTCAGGCCCTGGATGAT

CCR4-NOT trxn F: CTYTCGCTGGCTTTGTCTCAAATCA 4174E20 complex subunit 1 R: CTTTTACCATCKCCACTRAAATCCAC (cnot1), intron 20

F: AGGAGYTGGTGAACCAGAGCAAAGC 1777E4 Nucleoporin 155 R: AGATCRGCCTGAATSAGCCAGTT

Primers courtesy of (Li, Riethoven, & Ma, 2010)

S7 Ribosomal Protein F: TGGCCTCTTCCTTGGCCGTC S7RPEX1 (s7rp), intron 1 R: AACTGTCTGGCTTTTCGCC

Primers courtesy of (Chow & Hazama, 1998)

81

TABLE 3: Optimized PCR Protocols for EPIC primer sets.

Intron Loci 36298E1 4174E20 S7 ~bp length amplified ~350 ~750 ~1100 PCR Recipe (uL per reaction) Template DNA 1 1 1 10 uM Forward Primer 1 1 1 10 uM Reverse Primer 1 1 1 DreamTaq™ 0.25 0.3 0.3 10 mM premixed dNTPs 1 1 1 10x DreamTaq™ Buffer* 5 5 5

Sterile H2O 40.75 40.7 40.7

*10x Buffer contains 20 mM MgCl2 Amplification Protocol (on Eppendorf Mastercycler Gradient 5331) 94ºC 95ºC 95ºC Initial Denature 2 mins 3 mins 3-5 mins

94ºC 95ºC 95ºC Denature 45 sec 30 sec 30 sec 55ºC 60ºC 57ºC Anneal 45 sec 30 sec 60 sec 72ºC 72ºC 72ºC Elongate 2.5 mins 60 sec 2 mins # of Cycles 30 35 35

72ºC 72ºC 72ºC Final Extension 5 mins 5 mins 10 mins Cool Down 4ºC ∞ 4ºC ∞ 4ºC ∞

82

TABLE 4: Published Genbank Accession Numbers and base pair lengths of outgroup samples.

Intron Information [Gene Name, Accession #, Length (bp)] FISH SPECIES CCR4-NOT trxn Hypothetical protein S7 Ribosomal complex subunit LOC415169 protein HM0124881 HM0125621 AY3257782 H. molitrix 779 bp 342 bp 589 bp HM0124891 HM0125631 MH938839.13 H. nobilis 833 bp 345 bp 710 bp GU1342624 R. atratulus 815 bp

5 R. KF640208.1 cataractae* 791 bp 1(LI ET AL., 2010); 2(HE ET AL., 2008A); 3(Stepien et al.,2019); 4(Bufalino & Mayden, 2010); 5(Kim & Conway, 2014)

*Attempts were made to acquire specimens of R. cataractae from sources that have utilized this species in an effort to include a more closely related outgroup for all three intron markers, but none were accessible.

83

TABLE 5: Internal primers designed from alignments of subset of successful sequencing reactions. Primer number indicates the base location along the alignment, where the primer begins.

Internal Segment length Primer Sequences (5’-3’) Primer ID

S7 ribosomal protein gene (s7rp), Intron 1

S7RP-31F 544 bp F: TAGAGGTGAGTCTAGTGAATGTGCC S7RP-575R R: ACAGGTAAGCTAGGTGACATGC

S7RP-97F 627 bp F: TATTTACCTCCACGCATGAGCTTC S7RP-724R R: CCGTCAGGTCATAACATTACGCAC

S7RP-554F 246 bp F: GCATGTCACCTAGCTTACCTGT S7RP-800R R: TAASCCTCACTTTGCTCCAAACC

CCR4-NOT transcription complex subunit 1 gene (cnot1), Intron 20

Cnot-209F 442 bp F: CTACAGAGCCAGCCAGCAAG Cnot-651R R: CACTCTTGACACGACACACAAC

Cnot-630F 207 bp F: GTTGTGTGTCGTGTCAAGAGTG Cnot-837R R: CAGCGTAATAAATGCCGGTCTG

(LOC415169) Hypothetical protein gene (hpg), Intron 1

Hpg-156F* ~200 bp F: AAGGCTGTTGCTGTGAGGAAG

Hpg-201R* ~200 bp R: TTACCTTTCTGTTCCTTTCCAAGTG

*Due to the short length of the hpg intron, internal primers were paired with the complement primer of the original pair, i.e. 36298E1F/hpg-201R and hpg- 156F/36298E1R.

84

Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis nucleoporin 155 CJ5-1771124F-1777 gi|295821984|gb|HM012513.1| 76.09 184 3.00E-25 127 (nup155) gene, exons 29, 30 Cajon Creek Sample H. molitrix nucleoporin 155 PR4391H7014 #5; Primer Locus gi|295821982|gb|HM012512.1| 75.54 184 4.00E-24 123 (nup 155) gene, exons 29, 30 1777E4; PCR Date D. rerio nucleoporin 155 11_24_2014 gi|295821988|gb|HM012515.1| 81.43 70 9.00E-07 66.2 (nup155) gene, exons 29, 30 Reference Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis CCR4-NOT gi|295821936|gb|HM012489.1| transcription complex subunit 90.28 216 2.00E-69 274 E1-4741211F-4174 1 (cnot1) gene, exons 2, 3 Fish Creek Sample H. molitrix CCR4-NOT PRRZ3VR6014 #1; Primer Locus gi|295821934|gb|HM012488.1| transcription complex subunit 88.2 178 7.00E-49 206 4174E20; PCR Date 1 (cnot1) gene, exons 2, 3 12_11_2014 D. rerio CCR4-NOT gi|295821942|gb|HM012492.1| transcription complex subunit 85.71 175 4.00E-41 180 1 (cnot1) gene, exons 2, 3 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis ret proto-oncogene F1-505426F-55305 gi|295822014|gb|HM012528.1| 92.3 714 0 1014 (ret1) gene, exons 18, 19 Fish Creek Sample H. molitrix ret proto-oncogene PRT4KRF1015 #1; Primer Locus gi|295822012|gb|HM012527.1| 91.92 718 0 1005 (ret1) gene, exons 18, 19 55305E1; PCR Date D. rerio ret proto-oncogene 04_26_2014 gi|295822020|gb|HM012531.1| 89.71 243 1.00E-77 302 (ret1) gene, exons 18, 19 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value Ident(%) ScoreMax Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value H. nobilis nucleoporin 155 (%) Score CJ5-1771124F-1777 gi|295821984|gb|HM012513.1| 76.09 184 3.00E-25 127 gi|295822000|gb|HM012521.1| (nup155) gene, exons 29, 30 Cajon Creek Sample H. nobilis peroxisome 93 543 0 789 Range 1 H. molitrix nucleoporin 155 PR4391H7014 #5; Primer Locus gi|295821982|gb|HM012512.1| proliferator activated receptor 75.54 184 4.00E-24 123 gi|295822000|gb|HM012521.1| (nup 155) gene, exons 29, 30 1777E4; PCR Date gamma gene, exons 3, 4 79.16 451 2.00E-75 294 F1-578515F-55378 Range 2 D. rerio nucleoporin 155 TABLE 6: NCBI11_24_2014 Genbankgi|295821988|gb|HM012515.1| BLAST® score data for one forward81.43 sequence70 9.00E-07 66.2 Fish Creek Sample gi|295821998|gb|HM012520.1| (nup155) gene, exons 29, 30 representing each EPIC primer pair’s PCRH. molitrix amplification peroxisome of92.82 R. osculus543 0. The784 “% ReferencePRTGWG68014 #1; Primer Locus Range 1 proliferator activated receptor identity”Stephen F. Altschul, column55378E1; Thomas PCR L.represents DateMadden, gi|295821998|gb|HM012520.1| Alejandro the A. Schäffer, percentage Jinghui Zhang, Zheng of matching Zhang, Webb Miller, sequence. and David J. Lipman (1997), gamma gene, exons 3, 4 79.6 446 6.00E-80 309 "Gapped BLAST and PSI-BLAST:05_15_2014 a new generation ofRange protein 2 database search programs", Nucleic Acids Res. 25:3389-3402. D. rerio peroxisome Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value gi|295822004|gb|HM012523.1| proliferator activated receptor 86.33(%) 490 7.00E-144 Score521 gammaH. nobilis gene, CCR4-NOT exons 3, 4 gi|295821936|gb|HM012489.1| transcription complex subunit Ident90.28 216 2.00E-69 Max274 Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value E1-4741211F-4174 1 (cnot1) gene, exons 2, 3 (%) Score Fish Creek Sample H. H.nobilis molitrix spectrin CCR4-NOT alpha 2 T2-111216F-19231E gi|295822054|gb|HM012548.1| 87.74 261 1.00E-77 300 PRRZ3VR6014 #1; Primer Locus gi|295821934|gb|HM012488.1| transcription(spna2) gene, complex exons subunit38, 39 88.2 178 7.00E-49 206 Twin Creek Sample 4174E20; PCR Date H.1 (cnot1)molitrix gene, spectrin exons alpha 2, 3 2 PRUEJHGC014 #2; Primer Locus gi|295822052|gb|HM012547.1| 86.59 261 3.00E-73 285 12_11_2014 (spna2)D. rerio gene, CCR4-NOT exons 38, 39 19231E4; PCR Date gi|295821942|gb|HM012492.1| transcriptionD. rerio spectrin complex alpha subunit 2 85.71 175 4.00E-41 180 12_16_2013 gi|295822060|gb|HM012551.1| 79.69 261 1.00E-36 163 (spna2)1 (cnot1) gene, gene, exons exons 38, 2, 39 3 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H.H. nobilisnobilis hypotheticalret proto-oncogene protein T2-381216F-36298E1FF1-505426F-55305 gi|295822014|gb|HM012528.1|gi|295822084|gb|HM012563.1| 86.1492.3 714303 4.00E-820 1014315 (ret1)gene, gene, exons exons 4, 18,5 19 TwinFish Creek Creek Sample Sample H. molitrixH. molitrix ret proto-oncogenehypothetical PRUUBP13015PRT4KRF1015 #1;#2; Primer Locus gi|295822012|gb|HM012527.1|gi|295822082|gb|HM012562.1| 91.9285.95 718299 1.00E-810 1005313 (ret1)protein gene, gene, exons exons 18, 4, 19 5 55305E1;36298E1; PCR Date D.D. reriorerio hypotheticalret proto-oncogene protein 04_26_201412_16_2013 gi|295822020|gb|HM012531.1|gi|295822088|gb|HM012565.1| 89.7182.24 243304 1.00E-774.00E-62 302248 (ret1)gene, gene, exons exons 4, 18,5 19 Reference Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA(%) sequences", J Comput Biol Score2000; 7(1-2):203-14. gi|295822000|gb|HM012521.1| H. nobilis peroxisome 93 543 0 789 Reference - database indexing Range 1 proliferator activated receptor Aleksandr Morgulis, George Coulouris,gi|295822000|gb|HM012521.1| Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing gamma gene, exons 3, 4 79.16 451 2.00E-75 294 for Production MegaBLASTF1-578515F-55378 Searches", BioinformaticsRange 24:1757-1764. 2 Fish Creek Sample gi|295821998|gb|HM012520.1| H. molitrix peroxisome 92.82 543 0 784 PRTGWG68014 #1; Primer Locus Range 1 proliferator activated receptor 55378E1; PCR Date gi|295821998|gb|HM012520.1| gamma gene, exons 3, 4 79.6 446 6.00E-80 309 05_15_2014 Range 2 D. rerio peroxisome gi|295822004|gb|HM012523.1| proliferator activated receptor 86.33 490 7.00E-144 521 gamma gene, exons 3, 4 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis spectrin alpha 2 T2-111216F-19231E gi|295822054|gb|HM012548.1| 87.74 261 1.00E-77 300 (spna2) gene, exons 38, 39 Twin Creek Sample H. molitrix spectrin alpha 2 PRUEJHGC014 #2; Primer Locus gi|295822052|gb|HM012547.1| 86.59 261 3.00E-73 285 (spna2) gene, exons 38, 39 19231E4; PCR Date D. rerio spectrin alpha 2 12_16_2013 gi|295822060|gb|HM012551.1| 79.69 261 1.00E-36 163 (spna2) gene, exons 38, 39 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis hypothetical protein T2-381216F-36298E1F gi|295822084|gb|HM012563.1| 86.14 303 4.00E-82 315 gene, exons 4, 5 Twin Creek Sample H. molitrix hypothetical PRUUBP13015 #2; Primer Locus gi|295822082|gb|HM012562.1| 85.95 299 1.00E-81 313 protein gene, exons 4, 5 36298E1; PCR Date D.85 rerio hypothetical protein 12_16_2013 gi|295822088|gb|HM012565.1| 82.24 304 4.00E-62 248 gene, exons 4, 5 Reference Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Reference - database indexing Aleksandr Morgulis, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing for Production MegaBLAST Searches", Bioinformatics 24:1757-1764. TABLE 7: Summary of nucleotide statistics. ccr4-not1 Nucleotide transcription Hypothetical S7 ribosomal Concatenated Statistics complex protein gene protein gene Sequences subunit 1 Intron # 20 1 1 NA Final Aligned 657 bp 292 507 1493 Length # Sequences 45 53 40 34

Identical Sites 564 (85.8%) 271 (92.8%) 461 (90.9%) 1340 (89.8%) Pairwise 97.4% 98.0% 97.6% 97.5% Identity

A 8326 (29.7%)* 4197 (27.7%)* 6550 (33.0%) 14866 (30.4%)

C 5347 (19.1%) 3178 (21.0%) 2703 (13.6%) 8688 (17.8%)

G 5959 (21.2%) 3706 (24.4%) 4020 (20.3%) 10502 (21.5%)

T 8432 (30.0%) 4079 (26.9%) 6544 (33.0%) 14844 (30.4%)

Base Frequency Base Amb. 1 (0.0%)* 4 (0.0%) 12 (0.0%) 9 (0.0%)

GC Content 11306 (40.3%) 6884 (45.4%) 6723 (33.9%) 19190 (39.2%)

All Bases 28065 15164 19829 48909

# of gaps 1500 (5.1%) 312 (2.0%) 451 (2.2%) 1853 (3.7%)

*% of non-gaps

Table 8: Transition/Transversion data culled from analysis of the concatenated sequences of all three introns using the HKY+gamma model of substitution as determined to be the best fit model for the concatenated sequences.

From\To A T C G A - 0.092305898 0.054021801 0.084158004 T 0.092436475 - 0.069621476 0.065301215 C 0.092436475 0.118960728 - 0.065301215 G 0.119129012 0.092305898 0.054021801 -

Transitions (Ti) 0.39186922 Transversions (Tv) 0.60813078

Ti/Tv 0.644383138

86

TABLE 9: Summary of haplotype statistics. Number of Individuals per Haplotype Haplotype # CNOT1 Intron 20 HPG Intron 1 S7RP Intron 1 1 1 1 1 2 3 1 1 3 3 1 1 4 1 1 1 5 26 2 7 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 1 10 5 5 1 11 1 1 1 12 1 34 1 13 1 1 14 1 2 15 1 8 16 2 17 8 18 1 Total 45 53 40 Individuals Unique 8 12 13 Haplotypes* *Unique haplotypes are those of which only one individual displayed that haplotype.

87

12

S1

11

ST1

10

D1

SL3

BZ1

SS1

MZ1

9

SL2

8

P2

7

P4

6

T10

T6

N1 N2 H2

G1

IN4

CJ5

LC1

CF1

PC4

LC19

PC10

5

T2 F1

E1

H1

IN2 IN3

CJ1

CT1 CT2

ML1 PC7

SpecimenHaplotype by ID Code

CJ12

LC11

CC11

4

CC2

3

CR26 CR27 CR35

2

CR25 CR29 CR36

1

CR28

CO CO

Los

San San Pop San Ana San

Luis

River

Maria

Santa Santa

Creek

Valley

Owens

Jacinto Obispo

Gabriel

Sonoita

Angeles

TABLE 10: Haplotype table based on the alignment of 45 sequences of CNOT1 Intronof 20. ofbasedCNOT1 sequences 45 alignment the on table Haplotype 10: TABLE

CO CO

Cnot1Intron 20

River

Coast

Sierra

SASD

Region

Central

Eastern

88

15 CR25

14

P5

13

CR35

T2 T6

N1 N2 H2

G1

IN4

T10

PC7 ML1 ML3 PC4

PC10

CR29 CR36

12

F1

E1

H1

IN2 IN3

CJ1 CJ5

LC1

CF1 CT1 CT2

CC2

CJ12

LC11 LC19

CC11 CR26 CR27 CR28

11

S1

SL2 SL3

10

D1

BZ1

SS1

9

MZ1

8

CY3

7

ST1

6

M2

Specimen ID by Haplotype Code Haplotype by ID Specimen

P2

5

M1

4

P4

3

M3

2

P1

1

P3

CO CO

Los Los

Pop San Ana San San

Luis Luis

River

Maria

Santa Santa Santa

Creek

Valley

Owens Owens

Jacinto Obispo

Gabriel

Sonoita Sonoita

Angeles

. River CO Coast Central

TABLE 11: Haplotype table based on the alignment of 53 sequences of Intron 1. ofbasedHPG sequences 53 alignment the on table Haplotype 11: TABLE SASD

Hpg Intron 1 Intron Hpg

Reg

Sierra

Eastern Eastern

89

18

N2

17

T2 F1

E1

H1 H2

G1

T10

CJ12

16

N1

CT2

15

T6

CJ1 CJ5

LC1

PC7

LC11 LC19

CC11

14

IN2 IN4

13

CC2

12

CF1

11

S1

10

CR27

9

M1

8

P3

Specimen ID by Haplotype Code

7

P2

6

MZ1

5

D1

SL2 SL3

BZ1 ST1

SS1

CY3

4

P1

3

M3

2

M2

1

P4

1

CO

Los

Pop San Ana San San

Luis Luis

River

Maria

Santa Santa

Creek

Valley

Owens

Jacinto Obispo

Gabriel

Sonoita

Angeles

ntron

I

River

SASD Coast Central

CO CO

S7RP

Sierra

Region

Eastern Eastern

TABLE 12: Haplotype table based on the alignment of 40 sequences of S7RP Intron 1.of of based Intron S7RP sequences 40 alignment the on table Haplotype 12: TABLE

90

TABLE 13: Basic AMOVA statistics for all three intron haplotype analyses as well as the concatenation of the three introns. *AR=among regions; WR=within regions.

No. of 8 Populations:

No. of Regions: 4

No. of No. of PW 999 999 permutations: permutations:

cnot1 hpg s7rp concat No. of 45 53 40 34 Samples:

PhiPT: 0.966 0.941 0.906 0.966

P-value: 0.001 0.001 0.001 0.001

Nm (haploid): 0.017 0.031 0.052 0.017

Degrees AR* 3 3 3 3 of Freedom WR* 41 49 36 30 Sum of AR 345.064 138.13 39.891 571.344 Squares WR 19.625 12.625 19.036 35.45

Mean AR 115.021 46.043 13.297 190.448 Squares WR 0.479 0.258 0.529 1.182

Est AR 13.757 4.099 1.606 33.869 Variance WR 0.479 0.258 0.529 1.182 AR 97% 94% 75% 97% % WR 3% 6% 25% 3%

91

TABLE 14: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of CNOT1 Intron 20.

Pairwise Population PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.001 0.001 Central Coast Eastern Sierra 0.978 0.000 0.023 0.002 Eastern Sierra CO River, AZ 0.929 0.943 0.000 0.001 CO River, AZ SASD 0.994 0.996 0.842 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.

Pairwise Population Nm (Haploid) Values Based on PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.011 0.000 Eastern Sierra CO River, AZ 0.038 0.030 0.000 CO River, AZ SASD 0.003 0.002 0.094 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

Nm (Haploid) Values below diagonal.

92

TABLE 15: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of HPG Intron 1.

Pairwise Population PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.001 0.001 Central Coast Eastern Sierra 0.689 0.000 0.001 0.001 Eastern Sierra CO River, AZ 0.930 0.895 0.000 0.007 CO River, AZ SASD 0.986 0.971 0.245 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.

Pairwise Population Nm (Haploid) Values Based on PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.226 0.000 Eastern Sierra CO River, AZ 0.038 0.059 0.000 CO River, AZ SASD 0.007 0.015 1.537 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

Nm (Haploid) Values below diagonal.

93

TABLE 16: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of S7RP Intron 1.

Pairwise Population PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.021 0.001 Central Coast Eastern Sierra 0.752 0.000 0.032 0.001 Eastern Sierra CO River, AZ 0.882 0.784 0.000 0.001 CO River, AZ SASD 0.946 0.927 0.853 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.

Pairwise Population Nm (Haploid) Values Based on PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.165 0.000 Eastern Sierra CO River, AZ 0.067 0.138 0.000 CO River, AZ SASD 0.029 0.039 0.086 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

Nm (Haploid) Values below diagonal.

94

TABLE 17: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of the concatenation of the three sequences.

Pairwise Population PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.010 0.070 0.001 Central Coast Eastern Sierra 0.944 0.000 0.337 0.005 Eastern Sierra CO River, AZ 0.928 0.871 0.000 0.001 CO River, AZ SASD 0.979 0.986 0.878 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.

Pairwise Population Nm (Haploid) Values Based on PhiPT Values

Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.030 0.000 Eastern Sierra CO River, AZ 0.039 0.074 0.000 CO River, AZ SASD 0.011 0.007 0.070 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD

Nm (Haploid) Values below diagonal.

TABLE 18: Regional Pairwise Geographic Distances

Central Coast Eastern Sierra CO River, AZ SASD 0.000 Central Coast 304.999 0.000 Eastern Sierra 803.161 742.536 0.000 CO River, AZ 295.846 391.023 520.180 0.000 SASD

95

Freq G Freq

0.21473

0.208366 0.241336 0.196126

Freq C Freq

0.17764

0.190634 0.224151 0.151898

Freq T Freq

0.303529 0.302254 0.255017 0.336975

Freq Freq A

0.303958 0.298681 0.279326 0.314876

ecular Genetics Evolutionary across Analysis

R

0.615463 1.030044 2.141557 1.008816

Yano

-

n/a n/a

Kishino

-

1549.

1.10855

Gamma

-

0.307297

AICc

3924.48 3861.24

5228.552 1691.164

BIC

4630.65

5844.187 4715.712 2553.407

InformationCriterion corrected; BIC: Bayesian InformationCriterion [1]

70 95 92

111

#Param

HKY HKY

Model

HKY+G

GTR+G Transition/Transversion R: G: Bias; Gamma distribution AICc:Akaike

hpg hpg

s7rp s7rp

cnot1

Intron Intron 1 Intron 1

Marker

Intron Intron 20

Table19: Summary of maximum likelihood fits of nucleotide substitution models determined by MEGA X [2]. MEGA X by modelsdetermined fitsof maximumof likelihood nucleotide Table19:substitution Summary

2. Kumar 2. Li G., Stecher Knyaz C., M., S., andTamura (2018). K. X: MolMEGA Concatenation Abbreviations:GTR: General Time Reversible; HKY:Hasegawa Nei 1. M. and Kumar (2000). S. Molecular Evolution andPhylogenetics. Oxford University Press,New York. computingplatforms. Molecular Biology andEvolution 35:1547

96

APPENDIX C

SEQUENCE DATA

97

DATA SET 1: R. osculus sequence alignment consensus for intron 20 of the ccr4-not transcription complex subunit 1 gene (cnot1)

DATA SET 2: R. osculus sequence alignment consensus for intron 1 of the hypothetical protein gene.

98

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

0 5

4

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

9

4

4

-

2

2

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G G

G

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

1

2

4

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

8

9

3

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

, base base ,

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

A

5

8

3

A

A

A

A

A

A

A

A

A

8

7

3

20

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

4

6

3

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

2

6

3

T

9

5

3

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

ntron ntron

I

T

T

4

5

3

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

3

5

3

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

8

4

3

7

4

3

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

G

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

3

0

3

G

G

C

C

C

C

C

C

C

CNOT1

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

2

7

2

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

7

2

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

0

7

2

A

A

0

6

2

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

7

5

2

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

1

4

2

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

2

3

2

C

C

C

C

C

C

C

C

C

-

-

-

-

-

-

-

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

9

2

2

-

-

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

8

2

2

-

-

-

-

-

-

-

-

-

7

2

2

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

-

-

-

-

-

-

-

-

-

6

2

2

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

-

-

-

-

-

-

-

-

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

5

2

2

-

-

-

-

-

-

-

-

-

4

2

2

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

-

3

2

2

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

9

1

2

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

9

9

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

0

9

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

4

8

1

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

3

7

1

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

2

7

1

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

7

1

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

0

7

1

-

9

6

1

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

5

6

1

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

3

6

1

G

R

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

6

5

1

C

C

C

C

C

C

C

eotide sites among the individual sequences of the of sequences among individual eotidesites

A

A

A

A

A

A

A

A

A

6

4

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

8

2

1

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

6

2

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

C

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

A

A

A

A

A

A

A

9

1

1

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

9

8

C

C

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

T

8

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

450.

-

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

Owens

Owens

Region

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

48

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

T2

T6

P4

P2

S1

E1

F1

H1

H2

N1

N2

D1

G1

IN4

IN3

IN2

T10

ST1

SL2

SL3

BZ1

CJ5

CJ1

LC1

CT1

CT2

SS1

MZ1

ML1

PC4

PC7

CF1

CC2

CJ12

LC19

LC11

CR35

CR26

CR27

CR25

CR36

CR29

CR28

PC10

CC11

Dataset 3a: Polymorphic nucl Dataset3a: Polymorphic positions Specimen ID Specimen

99

T

T

T

T

T

T

T

T

T

5

0

6

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

9

4

5

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C C

C

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

8 4

5

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

7 4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

5

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

4

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

3

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

2

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

1

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

0

4

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

9

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

8

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

7

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

6

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

5

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

4

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

3

3

5

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

2

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

1

3

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

0

3

5

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

9

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

8

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

7

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

6

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

5

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

4

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

3

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

2

2

5

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

1

2

5

605

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

0

2

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

9

1

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

8

1

5

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

7

1

5

C

C

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

6

1

5

C

C

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

5

1

5

-

-

-

-

-

-

-

-

-

0

9

4

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

-

-

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

9

8

4

-

-

-

-

-

-

-

-

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

8

8

4

-

-

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

7

8

4

-

-

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

6

8

4

-

-

-

-

-

-

-

-

-

5

8

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

-

-

-

-

-

-

-

-

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

4

8

4

5

6

4

C

C

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

4

5

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

2

5

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

1

5

4

G

G

G

G

G

G

G

G

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

Owens

Owens

Region

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

CO River CO

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

T2

T6

P4

P2

S1

E1

F1

H1

H2

N1

N2

D1

G1

IN4

IN3

IN2

T10

ST1

SL2

SL3

BZ1

CJ5

CJ1

LC1

CT1

CT2

SS1

MZ1

ML1

PC4

PC7

CF1

CC2

CJ12

LC19

LC11

CR35

CR26

CR27

CR25

CR36

CR29

CR28

PC10

CC11

DATASET3b: Intron CNOT1 20, positionsBase 451 Specimen ID Specimen

100

DATA SET 4: Polymorphic nucleotide sites among the individual sequences of hpg intron 1, base positions 36-281. 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 7 0 5 6 6 7 8 8 8 8 8 8 8 9 0 0 1 4 4 8 Specimen ID Region 6 6 5 8 1 4 5 3 4 5 6 7 8 9 0 1 4 7 4 5 1 CR25 CO River G T G T A G Y ------G C C T T T CR35 CO River G C G T A G T ------G C C T T T S1 CO River G C G T A G C ------G A C T T T CR26 CO River G C G T A G C ------G C C T T T CR28 CO River G C G T A G C ------G C C T T T CR29 CO River G C G T A G C ------G C C T T T CR36 CO River G C G T A G C ------G C C T T T CC2 SASD G C G T A G C ------G C C T T T CC11 SASD G C G T A G C ------G C C T T T CJ1 SASD G C G T A G C ------G C C T T T CJ5 SASD G C G T A G C ------G C C T T T CJ12 SASD G C G T A G C ------G C C T T T CT1 SASD G C G T A G C ------G C C T T T CT2 SASD G C G T A G C ------G C C T T T F1 SASD G C G T A G C ------G C C T T T G1 SASD G C G T A G C ------G C C T T T H1 SASD G C G T A G C ------G C C T T T H2 SASD G C G T A G C ------G C C T T T IN2 SASD G C G T A G C ------G C C T T T IN4 SASD G C G T A G C ------G C C T T T LC1 SASD G C G T A G C ------G C C T T T LC11 SASD G C G T A G C ------G C C T T T LC19 SASD G C G T A G C ------G C C T T T ML1 SASD G C G T A G C ------G C C T T T ML3 SASD G C G T A G C ------G C C T T T N1 SASD G C G T A G C ------G C C T T T N2 SASD G C G T A G C ------G C C T T T PC4 SASD G C G T A G C ------G C C T T T PC7 SASD G C G T A G C ------G C C T T T PC10 SASD G C G T A G C ------G C C T T T T2 SASD G C G T A G C ------G C C T T T T6 SASD G C G T A G C ------G C C T T T T10 SASD G C G T A G C ------G C C T T T CR27 SASD G C G T A G C ------G C C T T T CF1 SASD G C G T A G C ------G C C T T T E1 SASD G C G T A G C ------G C C T T T IN3 SASD G C G T A G C ------G C C T T T ST1 Central Coast G C G K A A C A G A T T T C T A C T - T T BZ1 Central Coast G C G T A A C A G A T T T C T A C T - T T D1 Central Coast G C G T A A C A G A T T T C T A C T - T T SL2 Central Coast G C G T A A C A G A T T T C T A C T - T T SL3 Central Coast G C G T A A C A G A T T T C T A C T - T T SS1 Central Coast G C G T A A C A G A T T T C T A C T - T T CY3 Central Coast G C G G A A C A G A T T T C T A C T - T T MZ1 Central Coast G C G G A A C A G A T T T C T A C T - T - P3 Eastern Sierra A C A T A A C A G A T T T C T G C G - Y T P4 Eastern Sierra G C A T A A C A G A T T T C T G C G - C T P5 Eastern Sierra G C G T G A C A G A T T T C T G C G - T T M2 Eastern Sierra G C A T G A C A G A T T T C T G C G - T T M3 Eastern Sierra G C A T A A C A G A T T T C T G C G - Y T P1 Eastern Sierra G C A T A A C A G A T T T C T G C G C T T M1 Eastern Sierra G C A T A A C A G A T T T C T G C G - T T P2 Eastern Sierra G C A T A A C A G A T T T C T G C G - T T

101

-

-

-

-

-

-

-

3

9

4

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C C

C

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

2 9

4

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

1

9

4

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

6

7

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

T

T

T

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

8

6

4

A

7

5

4

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

5

5

4

W

W

W

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

1

5

4

A

6

4

4

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

1

4

4

G

G

G

G

G

K

G

G

A

A

A

A

A

A

A

A

A

A

A

5

3

4

G

G

R

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

4

3

4

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

3

3

4

C

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

2

3

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

0

3

4

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

8

2

4

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

6

2

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

T

2

2

4

T

T

1

1

4

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

3

0

4

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

6

9

3

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

5

5

3

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

T

5

3

3

A

A

A

A

A

A

A

A

A

7

8

2

C

C

C

C

C

C

C

C

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

8

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

7

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

6

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

5

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

4

7

2

G

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

3

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

T

2

7

2

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

1

7

2

C

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

A

0

7

2

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

0

5

2

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

T

T

T

T

T

T

T

T

5

3

2

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

T

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1

0

2

W

W

6

9

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

6

7

1

G

G

G

G

G

G

G

T

6

6

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

4

6

1

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

M

C

C

C

1

5

1

C

C

G

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

T

T

T

T

T

T

T

T

T

T

T

8

2

1

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

K

G

G

G

G

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

T

2

9

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

8

A

A

3

R

G

R

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

8

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

SASD

493.

Region

CO River CO

CO River CO

-

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Central Coast Central

Eastern Sierra Eastern

Eastern Sierra Eastern

Eastern Sierra Eastern

Eastern Sierra Eastern

Eastern Sierra Eastern

Eastern Sierra Eastern

Eastern Sierra Eastern

T2

T6

P4

P3

P2

P1

H2

H1

N2

N1

F1

E1

S1

M3

M2

M1

D1

G1

IN4

IN2

T10

ST1

SL3

SL2

BZ1

CT2

LC1

CJ5

CJ1

SS1

MZ1

CY3

PC7

CF1

CC2

LC19

LC11

CJ12

CC11

CR27

positions 8 positions DATA SET 5: Polymorphic nucleotide sites among the individual sequences of s7rp intron 1, base intron s7rp of sequences among individual the 5:sites DATAnucleotide SETPolymorphic Specimen ID Specimen

102

APPENDIX D

INPUT FILES

103

GenAlEx and MEGA FASTA Input Files:

Cnot1 Intron 20 >SASD_CC2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAAAACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_IN2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_CF1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_CT2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC

104

TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_LC11_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_IN3_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_CT1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_T10_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTTCAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT

105

ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT

>SASD_CC11_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_LC1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_PC7_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_PC4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA

106

CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_N2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_CJ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_LC19_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>SASD_T6_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC

107

TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_PC10_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_N1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_CJ12_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_CJ5_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC

108

TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_IN4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_T2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_ML1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_H2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC

109

TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_H1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_G1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>SASD_F1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT

>SASD_E1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC

110

TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>COR_S1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA CACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGT-----CAAAACCAACCTATCTGC AAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTCAAA AAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATTCCATTTAAATATCG TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGT—TGTGTCGTGTCAAGAGTGA TGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACCTGG AATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCA CAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTT ATTACGCTGTCT

>COR_CR28_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TACAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGG-AAAATCTTTGTACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>COR_CR29_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT

>COR_CR36_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC- AAAATCTTTGGACATACCATTTAAATATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGA TGTTATTGTTGTGTGTCGTGTCAAGAGTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAA CATGGCCGTAGTTATCAGTTTTTAACCTGGAATATTAAGT------GAACATTG

111

TTAGTTCCATCAGTGAACTAAAGGTACTCACAATTTGATGCATCACTTTCTCAGATTGTGAAT CGTCACGGCCCTGAGGCAGACCGGCATTTATTACGCTGTCT

>COR_CR25_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT

>COR_CR27_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT

>COR_CR26_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT

>COR_CR35_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------

112

GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAATTTGATGCATCACTTTCTCAGA TTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTACGCTGTCT

>CC_BZ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_D1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_MZ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_SL3_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAA

113

TTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_SS1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_SL2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGRTTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>CC_ST1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGGTTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT

>ORV_P2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACTTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTAC------ACGCGTATGTGATGAGGCTTTCAAAA AGAATACAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCGATCGAGTTGCAGT GCAGAGCTCCTCCGTCGCACAGGAAAATGGCAAAAAATTTTGGACTTACCATTTAAATATCAT ATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGAT GCAAAATTGTTTATTTTTGCTTCCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATAT

114

TAAGTTCCAATTGTTATTAAGTTGTTATAATATTAAGTTAATTGTTAGTTCCATCAGTGAACTAA AGGTACTCACAATTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGA CCGGCATTTATTACGCTGTCT

>ORV_P4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACTTATCT GCAAGGGTTACTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGATGAGGCTTTCAAAA AGAATACAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCGATCGAGTTGCAGT GCAGAGCTCCTCCGTCGCACAGGAAAATGGCAAAAAATTTTGGACTTACCATTTAAATATCAT ATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGAT GCAAAATTATTTATTTTTGCTTCCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATAT TAAGTTCCAATTGTTATTAAGTTGTTATAATATTAAGTTAATTGTTAGTTCCATCAGTGAACTAA AGGTACTCACAATTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGA CCGGCATTTATTACGCTGTCT

Hpg Intron 1 >COR_CR25_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATTTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATYATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_CR35_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATTATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_S1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTAAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_CR26_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_CR28_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

115

>COR_CR29_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_CR36_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CC2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CC11_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CJ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CJ5_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CJ12_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CT1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG

116

AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CT2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_F1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_G1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_H1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_H2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_IN2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_IN4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

117

>SASD_LC1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_LC11_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_LC19_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_ML1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_ML3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_N1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_N2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_PC4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG

118

AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_PC7_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_PC10_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_T2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_T6_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_T10_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>COR_CR27_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_CF1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

119

>SASD_E1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>SASD_IN3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_ST1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATKAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_BZ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_D1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_SL2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_SL3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_SS1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG

120

AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_CY3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATGAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>CC_MZ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATGAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGG-TGAGAGCAGAA

>ORV_P3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAAATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-YGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_P4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-CGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_P5_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAGACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_M2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAGACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_M3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-YGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

121

>ORV_P1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTTCTGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_M1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

>ORV_P2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA

S7rp Intron 1 >COR_CR27_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTAACTTGTAATAACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAA TCAATGCTAACGGCATGCTAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTA GCCGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAATTTTAAATTAGA-GATTAATA ATTCGATT-AAATGCCGAGACATTTTGGTTAATGTAKTAAATTATGGTTTCTTATGAATAGCAT G

>COR_S1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCAAT GCTAACGGCATGCTAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGCCG CCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAATTTTAAATTAGA-GATTAATAATTCG ATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_IN2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTATGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

122

>SASD_IN4_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTATGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CF1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AKGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CC2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTMTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CC11_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_PC7_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_T6_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC

123

AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_T2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_T10_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CJ1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CJ5_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CJ12_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA

124

ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_LC1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_LC11_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_LC19_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_CT2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTWATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_E1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

125

>SASD_F1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_G1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_N1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTWATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_N2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTTATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTAT ATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>SASD_H1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

126

>SASD_H2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_CY3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_D1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_MZ1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATCTAATTAAKAATT CGATT-AAAWGACGAGACATTTAGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_SS1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_BZ1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC

127

AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_SL2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_SL3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>CC_ST1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG

>ORV_P1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCATTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCTGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAWGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_P2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA

128

ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAATAATT AGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_P3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_P4_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCRTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATRATTAATAATT CGATTAAAAWGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_M1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGGTAGCAGACGTGTCTTTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCAAT GCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCCG CCTAGCCGGTGACTTACTTAGAACAGCTGTAGTTAAAA-TTTGATTAATATAATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_M2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCRTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAAGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

>ORV_M3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCATTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAAGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG

129

REFERENCES

Anderson, R. C., Fralish, J. S., & Baskin, J. M. (2007). Savannas, Barrens, and

Rock Outcrop Plant Communities of North America. Cambridge University

Press.

Avise, J. C. (2009). Phylogeography: Retrospect and prospect. Journal of

Biogeography, 36(1), 3–15.

Avise, J. C., & Ferguson, M. M. (1995). Molecular markers, natural history and

evolution. Systematic Biology, 44(1), 117–119.

Avise, J. C., Giblin-Davidson, C., Laerm, J., Patton, J. C., & Lansman, R. A.

(1979). Mitochondrial DNA clones and matriarchal phylogeny within and

among geographic populations of the pocket gopher, Geomys pinetis.

Proceedings of the National Academy of Sciences, 76(12), 6694–6698.

Avise, J. C., & Ph.D, D. P. E. & E. B. J. C. A. (2000). Phylogeography: The

History and Formation of Species. Harvard University Press.

Baker, C. S., Perry, A., Bannister, J. L., Weinrich, M. T., Abernethy, R. B.,

Calambokidis, J., … Vasquez, O. (1993). Abundant mitochondrial DNA

variation and world-wide population structure in humpback whales.

Proceedings of the National Academy of Sciences, 90(17), 8239–8243.

Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances.

1763. M.D. Computing: Computers in Medical Practice, 8(3), 157–171.

130

Benson, T. A. (2006). Population Genetics and Phylogeography of the Pygmy

Nuthatch in Southern California (M.S. Thesis). California State University,

San Bernardino.

Berget, S. M., Moore, C., & Sharp, P. A. (1977). Spliced segments at the 5′

terminus of adenovirus 2 late mRNA. Proceedings of the National

Academy of Sciences, 74(8), 3171–3175.

Bernatchez, L., & Wilson, C. C. (1998). Comparative Phylogeography of Nearctic

and Palearctic . Molecular Ecology, 7(4), 431–452.

Birky, C. W., Maruyama, T., & Fuerst, P. (1983). An approach to population and

evolutionary genetic theory for genes in mitochondria and chloroplasts,

and some results. Genetics, 103(3), 513–527.

Blackwelder, E. (1954). Pleistocene lakes and drainage in the Mojave region,

southern California. Geology of Southern California: California Division of

Mines Bulletin, (170), 35–40.

Blyton, M. D. J., & Flanagan, N. S. (2012). A Comprehensive Guide to: GenAlEx

6.5. Retrieved from Australian National University website:

http://biology.anu.edu.au/GenAlEx

Bogenhagen, D., & Clayton, D. A. (1974). The number of mitochondrial

deoxyribonucleic acid genomes in mouse L and human HeLa cells

Quantitative isolation of mitochondrial deoxyribonucleic acid. Journal of

Biological Chemistry, 249(24), 7991–7995.

Brooks, M. (1974). Blazing Saddles [Comedy Western]. Warner Bros.

131

Brown, W. M., George, M., & Wilson, A. C. (1979). Rapid evolution of animal

mitochondrial DNA. Proceedings of the National Academy of Sciences,

76(4), 1967–1971.

Bruno, M. C., Casciotta, J. R., Almirón, A. E., Riccillo, F. L., & Lizarralde, M. S.

(2015). Quaternary refugia and secondary contact in the southern

boundary of the Brazilian subregion: Comparative phylogeography of

freshwater fish.

Bufalino, A. P., & Mayden, R. L. (2010a). Molecular phylogenetics of North

American phoxinins (: Cypriniformes: ) based on

RAG1 and S7 nuclear DNA sequence data. Molecular Phylogenetics and

Evolution, 55(1), 274–283.

Bufalino, A. P., & Mayden, R. L. (2010b). Phylogenetic relationships of North

American phoxinins (Actinopterygii: Cypriniformes: Leuciscidae) as

inferred from S7 nuclear DNA sequences. Molecular Phylogenetics and

Evolution, 55(1), 143–152.

Burbrink, F. T., Yao, H., Ingrasci, M., Bryson, R. W., Guiher, T. J., & Ruane, S.

(2011). Speciation at the Mogollon Rim in the Arizona Mountain

Kingsnake (Lampropeltis pyromelana). Molecular Phylogenetics and

Evolution, 60(3), 445–454. https://doi.org/10.1016/j.ympev.2011.05.009

California Department of Fish and Wildlife. (2015). Natural Diversity Database,

Special List. California Department of Fish and Wildlife.

132

Calsbeek, R., Thompson, J. N., & Richardson, J. E. (2003). Patterns of molecular

evolution and diversification in a biodiversity hotspot: The California

Floristic Province. Molecular Ecology, 12(4), 1021–1029.

Campbell, N. A., & Reece, J. B. (2005). Biology. 7th. Pearson Education.

Chatzimanolis, S., & Caterino, M. S. (2007). Toward a better understanding of

the “Transverse Range Break”: Lineage diversification in southern

California. Evolution: International Journal of Organic Evolution, 61(9),

2127–2141.

Chen, W.-J., Miya, M., Saitoh, K., & Mayden, R. L. (2008). Phylogenetic utility of

two existing and four novel nuclear gene loci in reconstructing Tree of Life

of ray-finned fishes: The order Cypriniformes (Ostariophysi) as a case

study. Gene, 423(2), 125–134.

Chenowith, K., & Menzel, I. (2003). "For Good" [Musical Theatre]. Decca

Broadway.

Chow, S., & Hazama, K. (1998). Universal PCR primers for S7 ribosomal protein

gene introns in fish. Molecular Ecology, 7(9), 1255–1256.

Cody, M. L. (1986). Diversity, rarity, and conservation in mediterranean-climate

regions. Retrieved from http://agris.fao.org/agris-

search/search.do?recordID=US880692188

Coyne, J. A., & Orr, H. A. (2004). Speciation. Sinauer Associates.

Darwin, C. (1859). On the origin of species by means of natural selection. 1859.

Murray, London, 502.

133

Enzel, Y., Wells, S. G., & Lancaster, N. (2003). Paleoenvironments and

Paleohydrology of the Mojave and Southern Great Basin Deserts.

Geological Society of America.

Excoffier, L., Smouse, P. E., & Quattro, J. M. (1992). Analysis of Molecular

Variance Inferred from Metric Distances among DNA Haplotypes:

Application to Human Mitochondrial DNA Restriction Data. Genetics,

131(2), 479–491.

Falah, M., & Gupta, R. S. (1994). Cloning of the hsp70 (dnaK) genes from

Rhizobium meliloti and Pseudomonas cepacia: Phylogenetic analyses of

mitochondrial origin based on a highly conserved protein sequence.

Journal of Bacteriology, 176(24), 7748–7753.

Franklin, R. E., & Gosling, R. G. (1953). Molecular configuration in sodium

thymonucleate. Nature, 171(4356), 740.

Gold, A. (1978). "Thank You for Being a Friend" [Song]. Asylum Records.

Grossman, G. D., Hill, J., & Petty, J. T. (1995). Observations on habitat structure,

population regulation, and habitat use with respect to evolutionarily

significant units: A landscape perspective for lotic systems. American

Fisheries Society Symposium, 17, 381–391.

Grossman, L. I., Watson, R., & Vinograd, J. (1973). The presence of

ribonucleotides in mature closed-circular mitochondrial DNA. Proceedings

of the National Academy of Sciences, 70(12), 3339–3343.

134

Hastings, A., & Harrison, S. (1994). Metapopulation Dynamics and Genetics.

Annual Review of Ecology and Systematics, 25, 167–188. Retrieved from

JSTOR.

He, S., Mayden, R. L., Wang, X., Wang, W., Tang, K. L., Chen, W.-J., & Chen, Y.

(2008a). Molecular phylogenetics of the family Cyprinidae (Actinopterygii:

Cypriniformes) as evidenced by sequence variation in the first intron of S7

ribosomal protein-coding gene: Further evidence from a nuclear gene of

the systematic chaos in the family. Molecular Phylogenetics and Evolution,

46(3), 818–829. https://doi.org/10.1016/j.ympev.2007.06.001

He, S., Mayden, R. L., Wang, X., Wang, W., Tang, K. L., Chen, W.-J., & Chen, Y.

(2008b). Molecular phylogenetics of the family Cyprinidae (Actinopterygii:

Cypriniformes) as evidenced by sequence variation in the first intron of S7

ribosomal protein-coding gene: Further evidence from a nuclear gene of

the systematic chaos in the family. Molecular Phylogenetics and Evolution,

46(3), 818–829.

Hoekzema, K., & Sidlauskas, B. L. (2014). Molecular phylogenetics and

microsatellite analysis reveal cryptic species of speckled dace

(Cyprinidae: Rhinichthys osculus) in Oregon’s Great Basin. Molecular

Phylogenetics and Evolution, 77, 238–250.

https://doi.org/10.1016/j.ympev.2014.04.027

135

Hollingsworth, P. R., Jr., & Hulsey, C. D. (2011). Reconciling gene trees of

eastern North American minnows. Molecular Phylogenetics and Evolution,

61(1), 149–156.

Houston, D. D., Shiozawa, D. K., & Riddle, B. R. (2010). Phylogenetic

relationships of the western North American cyprinid genus

Richardsonius, with an overview of phylogeographic structure. Molecular

Phylogenetics and Evolution, 55, 259–273.

https://doi.org/10.1016/j.ympev.2009.10.017

Hubbs, C. L., & Miller, R. R. (1948). The zoological evidence: Correlation

between fish distribution and hydrographic history in the desert basins of

western United States. University of Utah.

Huelsenbeck, J. P., & Ronquist, F. (2001). MRBAYES: Bayesian inference of

phylogenetic trees. Bioinformatics, 17(8), 754–755.

Illumina. (2015). An Introduction to Next-Generation Sequencing Technology.

Retrieved April 17, 2019, from

https://www.bing.com/search?q=An%20Introduction%20to%20Next-

Generation%20Sequencing%20Technology.%20Retrieved%20from%20Ill

umina&qs=n&form=QBRE&sp=-1&pq=google%20scholar&sc=8-

14&sk=&cvid=BE1119908D6A4135B1519F850BEB4BB9

Kim, D., & Conway, K. W. (2014). Phylogeography of Rhinichthys cataractae

(Teleostei: Cyprinidae): pre-glacial colonization across the Continental

Divide and Pleistocene diversification within the Rio Grande drainage.

136

Biological Journal of the Linnean Society, 111(2), 317–333.

https://doi.org/10.1111/bij.12209

Kinniburgh, A. J., Mertz, J. E., & Ross, J. (1978). The precursor of mouse β-

globin messenger RNA contains two intervening RNA sequences. Cell,

14(3), 681–693.

Kizirian, D., & Donnelly, M. A. (2004). The criterion of reciprocal monophyly and

classification of nested diversity at the species level. Molecular

Phylogenetics and Evolution, 32(3), 1072–1076.

Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X:

Molecular Evolutionary Genetics Analysis across Computing Platforms.

Molecular Biology and Evolution, 35(6), 1547–1549.

https://doi.org/10.1093/molbev/msy096

Lee, D. S., Gilbert, C. R., Hocutt, C. H., Jenkins, R. E., McAllister, D. E., &

Stauffer Jr, J. R. (1980). Atlas of North American freshwater fishes. North

Carolina State Museum of Natural History.

Li, C., Riethoven, J.-J. M., & Ma, L. (2010). Exon-primed intron-crossing (EPIC)

markers for non-model teleost fishes. BMC Evolutionary Biology, 10(1),

90.

Lunt, D. H., & Hyman, B. C. (1997). Animal mitochondrial DNA recombination.

Nature, 387(6630), 247.

Mantel, N. (1967). The detection of disease clustering and a generalized

regression approach. Cancer Research, 27(2 Part 1), 209–220.

137

Mayden, R. L., & Allen, J. (2015). Phylogeography of Pteronotropis signipinnis,

P. euryzonus, and the P. hypselopterus Complex (Teleostei:

Cypriniformes), with Comments on Diversity and History of the Gulf and

Atlantic Coastal Streams. BioMed Research International, 2015.

Mendel, G. (1865). Experiments in plant hybridization. Verhandlungen Des

Naturforschenden Vereins Brünn.

Mills, R. (1993, September 14). Win Big [Television]. In Animaniacs. FOX

Network.

Mooi, R. (2009). Evolution, Second Edition. Douglas J. Futuyma. Integrative and

Comparative Biology, 49, 722–723. https://doi.org/10.1093/icb/icp095

Morcillo, F., Ornelas-García, C. P., Alcaraz, L., Matamoros, W. A., & Doadrio, I.

(2016). Phylogenetic relationships and evolutionary history of the

Mesoamerican endemic freshwater fish family Profundulidae

(Cyprinodontiformes: Actinopterygii). Molecular Phylogenetics and

Evolution, 94, 242–251. https://doi.org/10.1016/j.ympev.2015.09.002

Moritz, C. (1994). Defining ‘evolutionarily significant units’ for conservation.

Trends in Ecology & Evolution, 9(10), 373–375.

Moyer, G. R., Remington, R. K., & Turner, T. F. (2009). Incongruent gene trees,

complex evolutionary processes, and the phylogeny of a group of North

American minnows (Hybognathus Agassiz 1855). Molecular Phylogenetics

and Evolution, 50(3), 514–525.

138

Moyle, P. B. (2002). Inland fishes of California: Revised and expanded. Univ of

California Press.

Moyle, P. B., Williams, J. E., & Wikramanayake, E. D. (1989). Fish species of

special concern of California. Sacramento: California Department of Fish

and Game.

Mullis, K. B. (1987). Process for amplifying nucleic acid sequences.

Mussmann, S. M. (2018). Diversification Across a Dynamic Landscape:

Phylogeography and Riverscape Genetics of Speckled Dace (Rhinichthys

osculus) in Western North America (PhD Thesis).

Myers, E. A., Rodríguez-Robles, J. A., DeNardo, D. F., Staub, R. E., Stropoli, A.,

Ruane, S., & Burbrink, F. T. (2013). Multilocus phylogeographic

assessment of the California Mountain Kingsnake (Lampropeltis zonata)

suggests alternative patterns of diversification for the California Floristic

Province. Molecular Ecology, 22(21), 5418–5429.

https://doi.org/10.1111/mec.12478

Myers, N., Mittermeier, R. A., Mittermeier, C. G., Da Fonseca, G. A., & Kent, J.

(2000). Biodiversity hotspots for conservation priorities. Nature,

403(6772), 853.

Naora, H., & Deacon, N. J. (1982). Relationship between the total size of exons

and introns in protein-coding genes of higher eukaryotes. Proceedings of

the National Academy of Sciences, 79(20), 6196–6200.

139

National Marine Fisheries Society. (1991). Policy on Applying the Definition of

Species Under the Endangered Species Act to Pacific Salmon. Federal

Register, 56(224), 58612–58618.

Nei, M., & Kumar, S. (2000). Molecular Evolution and Phylogenetics. Oxford

University Press.

Nerkowski, S. A. (2015). Microsatellite Analysis of Population Structure in the

Santa Ana Speckled Dace (Rhinichthys osculus) (M.S. Thesis).

Oakey, D. D., Douglas, M. E., & Douglas, M. R. (2004). Small fish in a large

landscape: Diversification of Rhinichthys osculus (Cyprinidae) in western

North America. Copeia, 2004(2), 207–221.

Palumbi, S. R., & Baker, C. S. (1994). Contrasting population structure from

nuclear intron sequences and mtDNA of humpback whales. Molecular

Biology and Evolution, 11(3), 426–435.

Peakall, R., & Smouse, P. E. (2006). GENALEX 6: Genetic analysis in Excel.

Population genetic software for teaching and research. Molecular Ecology

Notes, 6(1), 288–295.

Peakall, R., & Smouse, P. E. (2012). GENALEX 6.5: Genetic analysis in Excel.

Population genetic software for teaching and research-an update.

Bioinformatics, 28(19), 2537–2539.

Phillipsen, I. C., & Metcalf, A. E. (2009). Phylogeography of a stream-dwelling

frog (Pseudacris cadaverina) in southern California. Molecular

Phylogenetics and Evolution, 53(1), 152–170.

140

Rabinowitz, M., & Swift, H. (1970). Mitochondrial nucleic acids and their relation

to the biogenesis of mitochondria. Physiological Reviews, 50(3), 376–427.

Raymond, S. (1962). A convenient apparatus for vertical gel electrophoresis.

Clinical Chemistry, 8(5), 455–470.

Rodríguez-Robles, J. A., Denardo, D. F., & Staub, R. E. (1999). Phylogeography

of the California Mountain Kingsnake, Lampropeltis zonata (Colubridae).

Molecular Ecology, 8(11), 1923–1934.

Sanger, F., Nicklen, S., & Coulson, A. R. (1977). DNA sequencing with chain-

terminating inhibitors. Proceedings of the National Academy of Sciences,

74(12), 5463–5467.

Santa Ana Watershed Project Authority. (2004). Old, Grand Prix, and Padua

Fires (October, 2003) Burn Impacts to Water Systems and Resources.

U.S. National Forest Service.

Schaffer, J. P. (1993). California’s geological history and changing landscapes.

The Jepson Manual: Higher Plants of California, 49–54.

Schoenherr, A. A. (2017). A Natural History of California: Second Edition. Univ of

California Press.

Schultz, S. S., & Wallace, R. E. (2013). The San Andreas Fault. USGS.

Slade, R. W., Moritz, C., & Heideman, A. (1994). Multiple nuclear-gene

phylogenies: Application to pinnipeds and comparison with a mitochondrial

DNA gene phylogeny. Molecular Biology and Evolution, 11(3), 341–356.

https://doi.org/10.1093/oxfordjournals.molbev.a040117

141

Smith, G. R., & Dowling, T. E. (2008). Correlating hydrographic events and

divergence times of speckled dace (Rhinichthys: Teleostei: Cyprinidae) in

the Colorado River drainage. SPECIAL PAPERS-GEOLOGICAL

SOCIETY OF AMERICA, 439, 301.

Smith, G. R., Morgan, N., & Gustafson, E. (2000). Fishes of the Mio-Pliocene

Ringold Formation, Washington: Pliocene capture of the snake river by the

Columbia River.

Smithies, O. (1955). Zone electrophoresis in starch gels: Group variations in the

serum proteins of normal human adults. Biochemical Journal, 61(4), 629.

Smithies, O. (1959a). An improved procedure for starch-gel electrophoresis:

Further variations in the serum proteins of normal individuals. Biochemical

Journal, 71(3), 585.

Smithies, O. (1959b). Zone electrophoresis in starch gels and its application to

studies of serum proteins. In Advances in protein chemistry (Vol. 14, pp.

65–113). Elsevier.

Spellman, G. M., Riddle, B., & Klicka, J. (2007). Phylogeography of the mountain

chickadee (Poecile gambeli): Diversification, introgression, and expansion

in response to Quaternary climate change. Molecular Ecology, 16(5),

1055–1068. https://doi.org/10.1111/j.1365-294X.2007.03199.x

Stepien, C. A., Snyder, M. R., & Elz, A. E. (2019). Invasion genetics of the silver

carp Hypophthalmichthys molitrix across North America: Differentiation of

142

fronts, introgression, and eDNA metabarcode detection. PLOS ONE,

14(3), e0203012. https://doi.org/10.1371/journal.pone.0203012

Swift, C. C., Haglund, T. R., Ruiz, M., & Fisher, R. N. (1993). The status and

distribution of the freshwater fishes of southern California. Bulletin of the

Southern California Academy of Sciences, 92(3), 101–167.

Taylor, E. B., McPhail, J. D., & Ruskey, J. A. (2015). Phylogeography of the

(Rhinichthys cataractae) species group in northwestern

North America—The Nooksack dace problem. Canadian Journal of

Zoology, 93(10), 727–734. https://doi.org/10.1139/cjz-2015-0014

Torgerson, W. S. (1958). Theory and methods of scaling.

U.S. Congress. Endangered Species Act. , (1973).

U.S. Congress. Endangered Species Act Amendments. , § 3 (1978).

US EPA. (2015, March 17). Surf Your Watershed [Overviews and Factsheets].

Retrieved April 17, 2019, from US EPA website:

https://www.epa.gov/waterdata/surf-your-watershed

U.S. Fish and Wildlife Service. (1996). Interagency Policy Regarding the

Recognition of Distinct Vertebrate Population Segments Under the ESA.

Federal Register, 61, 4722.

Vandergast, A. G., Bohonak, A. J., Weissman, D. B., & Fisher, R. N. (2006).

Understanding the genetic effects of recent habitat fragmentation in the

context of evolutionary history: Phylogeography and landscape genetics of

a southern California endemic Jerusalem cricket (Orthoptera:

143

Stenopelmatidae: Stenopelmatus). Molecular Ecology, 16(5), 977–992.

https://doi.org/10.1111/j.1365-294X.2006.03216.x

VanMeter, J. J. (2017). The Santa Ana Speckled Dace (Rhinichthys osculus):

Phylogeography and Molecular Evolution of the Mitochondrial DNA

Control Region (M.S. Thesis). California State University, San Bernardino.

VanMeter, P. M. (2017). Molecular Evolution and Phylogeography of

Mitochondrial DNA Cytochrome B Gene in Southern California Santa Ana

Speckled Dace (Rhinichthys osculus) (M.S. Thesis). California State

University, San Bernardino.

Waples, R. S. (1991). Pacific salmon, Oncorhynchus spp., and the definition of"

species" under the Endangered Species Act. Marine Fisheries Review,

53(3), 11–22.

Williamson, R. S., Lieutenant. (1853). Report of Explorations in California for

Railroad Routes to Connect With the Routes Near the 35th and 32D

Parallels of North Latitude. Corps of Topographical Engineers.

Wilson, A. C., Cann, R. L., Carr, S. M., George, M., Gyllensten, U. B., Helm-

Bychowski, K. M., … Stoneking, M. (1985). Mitochondrial DNA and two

perspectives on evolutionary genetics. Biological Journal of the Linnean

Society, 26(4), 375–400. https://doi.org/10.1111/j.1095-

8312.1985.tb02048.x

Wilson, E. O. (n.d.). E.O. Wilson Biodiversity Foundation. Retrieved July 18,

2019, from https://eowilsonfoundation.org/portfolio/e-o-wilson/

144

Wright, S. (1938). Size of population and breeding structure in relation to

evolution. Science, 87, 430–431.

145