California State University, San Bernardino CSUSB ScholarWorks
Electronic Theses, Projects, and Dissertations Office of aduateGr Studies
9-2019
GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING
Liane Raynette Greaver California State University - San Bernardino
Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd
Part of the Evolution Commons, Molecular Genetics Commons, and the Population Biology Commons
Recommended Citation Greaver, Liane Raynette, "GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING" (2019). Electronic Theses, Projects, and Dissertations. 931. https://scholarworks.lib.csusb.edu/etd/931
This Thesis is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF
RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS
ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING
A Thesis
Presented to the
Faculty of
California State University,
San Bernardino
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
in
Biology
by
Liane Raynette Greaver
September 2019 GEOGRAPHIC POPULATION STRUCTURE AND TAXONOMIC IDENTITY OF
RHINICHTHYS OSCULUS, THE SANTA ANA SPECKLED DACE, AS
ELUCIDATED BY NUCLEAR DNA INTRON SEQUENCING
A Thesis
Presented to the
Faculty of
California State University,
San Bernardino
by
Liane Raynette Greaver
September 2019
Approved by:
Dr. Anthony Metcalf, Committee Chair, Biology
Dr. James Ferrari, Committee Member
Dr. David Polcyn, Committee Member
© 2019 Liane Raynette Greaver
ABSTRACT
Rhinichthys osculus (Cyprinidae), the speckled dace, is the most widely distributed freshwater fish in the western United States. The southern California populations of R. osculus are identified as the Santa Ana speckled dace (SASD), though the SASD has not yet been formally recognized as a distinct taxon.
Current mtDNA analysis performed in the Metcalf Lab has shown a reciprocally monophyletic relationship among three California regions; southern, central coast, and Owens Valley. Similarly, microsatellite genotyping has shown significant levels of geographic population structure. The purpose of this study was to provide nuclear DNA sequence data to determine the taxonomic status of the SASD to elucidate their evolutionary history and the relationships among the three regions, and to further define their evolutionary trajectory by comparing
SASD sequence data to that of speckled dace from the Colorado River of
Arizona. To examine this, three EPIC intron markers were sequenced on 54 samples representing all four regions. Based on the mtDNA and microsatellite data alone, there is strong support that the southern California populations of R. osculus are a reproductively isolated taxon at the species level. My study confirms this by showing the SASD to be reciprocally monophyletic for nuclear
DNA markers, in conjunction with the mitochondrial DNA marker analyses.
Because they are evolutionarily independent and face increased incidence of drought, fire, and flood, endangered species status should be considered.
iii ACKNOWLEDGEMENTS
I would like to acknowledge and thank my thesis advisor, Dr. Tony
Metcalf, who gave me the opportunity of an internship with the WRI as an undergrad, which led to this master’s thesis project. Most importantly, he allowed me to wander my way through at my pace, sometimes nudging me to finish sooner but hopefully realizing that my interest has always been in the exploration of all things as I stroll through. He gave me the inspiration and much of the funding for the project, plus the stories to entertain me when I was losing track of my path or the path became bumpy. Niiiice truuuuck…
I also thank Dr. Dave Polcyn and Dr. Jim Ferrari for giving their time to advise me all through my education at CSUSB. Each of you has influenced my path in various ways and I cannot thank you enough for your encouragement.
Every student that has had the benefit of learning from you has been inspired by you. I consider it an honor to have worked with all three of you over the years.
To my mentor, Pam MacKay, who is the reason I ended up in the field, literally, of ecology. At VVC, Pam didn’t just tell us about the field, she took us out into it every week, and showed us how to look at the world from a different perspective, one of not just curiosity but, now having the tools to find out the answers to those curiosities, one of understanding. She introduced us all to opportunities and people, some of whom have become those I call “my people.”
To my lab compadres over the years who have all assisted me in so many ways, Jay and Pia VanMeter for welcoming me into the lab and teaching me all
iv the things I needed to know to survive the molecular pathway. To Joe Riley for suffering my first year as a grad student and teaching me all about British culture.
To all my lab assistants without whom, I’d never have finished: Nguyen Tran,
Diane Villalvazo, Caitlin Hazelquist (Lab Lackey), and Lauren Morrison (Lab Elf), you have earned your sock.
To Stacey Nerkowski, the Brain to my Pinky (Mills, 1993), with whom I began the quest to take over the world at Victor Valley College in 2007, starting with Biology Club, then ASB, Phi Theta Kappa, and Biology Club at CSUSB.
Stacey is also the person who got me addicted to Starbucks in physics. I became part of another family over the years as Stacey, and her parents, Kim and Jerry
Nerkowski, “adopted” me and introduced me to many things I never would have experienced otherwise, including nearly being impaled by trees while whitewater rafting. I am thankful for all the additional advice and support I’ve received from them. Stacey and I have taken the divide and conquer path over the last couple of years, but the quest is not over.
To Suzy Neal, my sister Pinky, together we couldn’t take over an airplane restroom much less the world, “Ooh, what’s this button do?” Suzy is that person who gets my need to push the button. Often, she’s the one who shows me the button, especially if it’s shiny, but she also consoles me if something explodes when I push the button.
To Tricia Turturro Fredendall, the one who I believe has the other half of my brain. We are too much alike in so many ways. Thank you for being a friend
v (Gold, 1978). It has been a blessing to have you as a friend through school,
“Student?” (Brooks, 1974) and through life. “Because I knew you…” (Chenowith
& Menzel, 2003).
I could never have completed this without the ever-supportive assistance of Debbie Reynolds. Thank you so much for being my cheerleader, my therapist, my enabler, my cohort, my accomplice, my social planner, my hostess, and my competitor. I can’t remember life before the Snoopy incident.
To my brothers, James and Matthew Greaver, for always understanding when I couldn’t be there for family events, for picking up my slack helping Mom, and every other bit of encouragement and humor you threw my way.
This work would not have been possible also without the funding from the following sources: CSUSB Associated Students Incorporated, CSUSB Office of
Student Research, U.S. Forest Service, and California Department of Fish and
Wildlife.
vi DEDICATION
I dedicate this work to my mom, Raynette Greaver, who has supported me in EVERY way; emotionally, mentally, and financially. She may not understand why I love what I do, but she has suffered my journey. She is the reason I love all life and its place in the world. She taught me and my younger brothers, to stop during a hike and just listen; to the wind in the trees, the calls of the birds, the sounds of the forest critters under the brush. She allowed me to run somewhat free, as a child, through the desert where I caught lizards and insects, and learned my place on the planet.
To my dad, Earl Greaver, who taught me by example, to help people whenever possible, and that we make a greater impact on the world by being that person who always tries to be there for others in whatever way we can. We are not always rewarded with monetary riches, but we gain much respect and trust from those we help and those are priceless. I miss you, Dad.
To my grandparents, Paul and Frances Freiling, there aren’t enough words to describe everything they gave to me. My grandfather inspired my curiosity and humor. He was an entertainer and inventor at heart. My grandmother inspired my drive to be helpful, to be accepting, and to be strong.
She battled so many things in life, all without loss of faith, humor, and love. She is and always will be the light that guides me when I’m lost in the dark.
TABLE OF CONTENTS
ABSTRACT ...... iii
ACKNOWLEDGEMENTS ...... iv
CHAPTER ONE: INTRODUCTION
Overview ...... 1
Phylogeography ...... 2
Model Habitat ...... 4
Model Organism ...... 9
Conservation Policies ...... 12
Molecular Markers ...... 14
Literature Reviews ...... 21
Phylogenetics Studies Enhanced by the Addition of Nuclear DNA Data ...... 21
Phylogeography as Influenced by California Floristic Province Topography ...... 26
Phylogenetics and Population Structure of the Santa Ana Speckled Dace ...... 29
CHAPTER TWO: MATERIALS AND METHODS
Research Objective ...... 33
Sample Collection ...... 33
Molecular Methods ...... 34
Sequence Analysis ...... 39
Population Genetics ...... 39
Phylogeography ...... 42
v
CHAPTER THREE: RESULTS
Molecular Methods ...... 44
Preliminary Testing ...... 44
Sequence Analysis ...... 45
Population Genetics ...... 47
Phylogeography ...... 50
CHAPTER FOUR: DISCUSSION
Molecular Markers ...... 52
Population Genetics ...... 53
Phylogeography ...... 55
Hydrographic History of Connectivity ...... 56
Conservation Implications ...... 57
APPENDIX A: FIGURES ...... 59
APPENDIX B: TABLES ...... 79
APPENDIX C: SEQUENCE DATA ...... 97
APPENDIX D: INPUT FILES ...... 103
REFERENCES ...... 130
vi
CHAPTER ONE
INTRODUCTION
Overview
The evolution of a species and its surrounding environment are unequivocally bound. The history of environmental changes from Darwin (1859) to the present, e.g. Coyne and Orr (2004), provides verification for hypothesized evolutionary changes in the species inhabiting those environments. This is the area of study, linking biology with geography, referred to as phylogeography.
Studying those evolutionary changes in the populations of organisms using molecular information is known as phylogenetics.
The study of molecular phylogenetics requires a study organism for which its current population structure and geographic range have been molded by the forces of evolution in conjunction with geologic forces. This organism must be one among many populations of a species that have evolved over time allowing for the accumulation of genetic differences among and within the populations.
This provides a tool to analyze the variability that exists as a result of the interrelationships between biological and geological forces. They must exist in natural populations in natural habitats in order to see the true structure resulting from years of modification in response to ever-changing environments.
Stream-dwelling vertebrates are an excellent resource to study phylogeography and molecular evolution. They exist in discrete populations of
1 varying population density and are intimately linked to the streams embedded within watersheds that are shaped by both geographic and biogeographic factors. Thus, many different species of stream-dwelling vertebrates have served as model organisms for the study of phylogeography (Bruno, Casciotta, Almirón,
Riccillo, & Lizarralde, 2015; Mayden & Allen, 2015; Phillipsen & Metcalf, 2009).
Phylogeography
Phylogeography involves the study of the distribution and extent of genetic variation in geographically distributed populations. This may be either intraspecific populations or populations of closely related species. This area of study elucidates the evolutionary processes that have provided us with the level of biodiversity that exists currently. It also affords a way to reveal links between extant species and those for which science has concrete evidence of past existence. The forces responsible for the genealogical path of a population are most often climatic and topographic in nature (Cody, 1986). Historically documented events can be correlated to the geographic ranges of many species.
To apply phylogeography to the conservation of biodiversity, we must be able to exemplify the correlation between species lineages and their present distribution
(Moyer, Remington, & Turner, 2009).
While research has been done in this field of study for many years, it initially had no official title by which to refer. The term “phylogeography” was not coined until 1987 by John C. Avise. While it is possible to draw inferences regarding the lineages of species based on differences in morphological or
2 behavioral characteristics, which have a basis in genetics, it is far less concrete than that of molecular evidence of genetic differentiation. By Avise’s definition, phylogeography in its most pure form deals only with allelic distribution (Avise &
Ph.D, 2000). With the advent of methods to visualize molecular markers, science now has a way to document in fine-scale detail, the differences and similarities in populations of species. These allow individuals to be categorized into haplotypes which often mirror the geographic distribution of those same individuals within their respective populations.
The analysis of evolutionary lineages can best be represented by the creation of phylogenetic trees. These trees are formed by the categorization of individuals of different species into groups based on the analyses of the comparisons of the molecular sequences of interest. These sequences are referred to as haplotypes. A haplotype can be designated based on a single nucleotide polymorphism (SNP) or a series of differences that are unique to any other. There may be a single individual in a haplotype or many individuals. The relationships among haplotypes are estimated by the application of models of molecular evolution.
A correlation between the haplotype groups and the geographical groups is often seen when geography has played a role in the evolution of the species’ populations within the regions. For this reason, phylogenetic analyses of populations of a single species that inhabit areas isolated from other populations of the same species are highly useful. Watersheds and their tributaries,
3 separated by geologic and topographic features, are prime examples of how landscape can influence the structure of stream-dwelling vertebrates.
Model Habitat
California is rich in biodiversity, particularly the cismontane southern
California region where it is known to have some of the highest levels of biodiversity in the world. This region is included in what is known as the
California Floristic Province (Appendix A, Figure 1), one of the world’s 25 published biodiversity “hotspots” (Calsbeek, Thompson, & Richardson, 2003; N.
Myers, Mittermeier, Mittermeier, Da Fonseca, & Kent, 2000).
Biodiversity can be measured in many variables, one of which is species richness, or the actual number of different species inhabiting an area. This can be increased in a variety of ways but ultimately results from the divergence of biological populations into distinct species. The pattern of dispersal of subpopulations and subsequent speciation are the focal point of this thesis project.
Geographically distinct populations arise via two possible distribution patterns, vicariance or dispersal. Dispersal is defined by the distribution of a population across an existing geographical barrier such as over and across a mountain range via migration. Vicariance denotes the distribution of populations whereby a new geographical barrier arises thus fragmenting the ancestral population into smaller groups (Avise & Ferguson, 1995; Coyne & Orr, 2004).
4
There are four major proposed processes of speciation. Sympatric speciation allows populations to diversify while still within an overlapping range.
Parapatric speciation is the differentiation of a population inhabiting an adjacent region to the parent population. Peripatric speciation is the branching off of a small portion of the original population that is then isolated nearby, though not directly adjacent to, the parent population. Allopatric speciation results from the partitioning of a population, by the development of a physical barrier, completely isolating the fragmented groups from one another. These are not mutually exclusive and extant populations of a species may have gone through multiples of these processes over time distinguishing them further from their ancestors
(Coyne & Orr, 2004; Mooi, 2009).
Changes in topographic landscape that lead to isolation of vertebrate populations occur intermittently or extremely slowly. Conversely, the frequency of changes that occur in freshwater environments of rivers, lakes, and streams can have a much greater impact on the diversification of freshwater fish species in a shorter expanse of time as streams and tributaries are in constant flux within and across watersheds (G. D. Grossman, Hill, & Petty, 1995). Therefore, watershed environments play an important role in the creation of ichthyologic diversity and population structure.
John Wesley Powell, leader of the 1869 Powell Geographic Expedition, stated a watershed is "that area of land, a bounded hydrologic system, within which all living things are inextricably linked by their common water course and
5 where, as humans settled, simple logic demanded that they become part of a community." The U.S. Environmental Protection Agency lists 153 watershed systems within California (US EPA, 2015). These meandering and typically isolated, though intermittently overlapping, stream systems provide a unique opportunity to study the effects of seasonal climate changes as well as large climatic events on the genetic structure of vertebrate populations. How are population lineages affected by natural seasonal fires and flooding, as well as by the human impact? In southern California, the major river systems may interweave in many areas while in others the streams are too distant for the stream-dwelling organisms to migrate naturally. Such is the case among the southern California river systems of interest in this study.
Several studies have presented evidence suggesting the influence of topography in southern California on the development and evolution of population structure (Benson, 2006; Calsbeek et al., 2003; Phillipsen & Metcalf,
2009; Vandergast, Bohonak, Weissman, & Fisher, 2006). Specifically, the prominent tectonic breaks within the region have created distinct mountain ranges and river systems. The San Andreas Fault, in existence for approximately
15-20 million years, is the division between the Pacific and the North American plates and is the major fault in a large complex of faults throughout southern
California, including the San Jacinto, Banning, and Elsinore faults (Schultz &
Wallace, 2013). The movement of the earth’s plates over vast periods of time, eventually lead to the development and translocation of what we currently deem
6 the Peninsular and Transverse mountain ranges. The north-south oriented
Peninsular range and particularly the uniquely east-west oriented Transverse range (Appendix A, Figure 2), are as such due to the movements of several plates that alternately moved them north and broke up the Transverse range thus rotating sections of it to give us the modern landscape arrangement (Schaffer,
1993). These mountain ranges have shown to be an influential factor in the shaping of population structure, as well as the increase of species richness, in the region (Chatzimanolis & Caterino, 2007; Phillipsen & Metcalf, 2009;
Spellman, Riddle, & Klicka, 2007).
The unique climate in this area, referred to as a Mediterranean climate, developed some 8 million years ago. Inland Southern California shares this designation with four other regions in the world, Central Chile, the Mediterranean
Basin, the Cape of Africa, and Southwest and Southern Australia. This climate type is characterized by rainy winters and long, hot, dry summers. These regions can claim some of the highest levels of biodiversity in the world (Cody, 1986;
Schoenherr, 2017).
The climatic characteristics of the Mediterranean ecosystem, in combination with the tectonic construction of the terrain, bred many unique features that led to the evolution of many endemic species. The subduction of the faults that caused the uprising of the mountain ranges also resulted in the formation of more varying soil types, often situated in pockets across the region.
For example, many of the plants that currently grow in these areas have evolved
7 to endure the low nutrient and often heavy-metal rich content of the serpentine soils in this region (Anderson, Fralish, & Baskin, 2007). These same uprisings created the valleys that direct the flow of the many waterways in the area. These valleys are intertwining through the multi-directional mountain ranges. During the hot dry summers, the flow of water running through these valleys decreases creating similar pockets of environment for the stream-dwelling organisms. In addition, the flora becomes dry brush fueling late summer wildfires. Conversely, during the rainy winters not only do the waters increase but do so in flash flood form. The soils become so dry during the summer that the sudden surge of rains will not soak in but rather floods along the surface of the ground. These flood waters from the various streams can collide at junctions and/or flood over areas that usually separate the streams creating the forced displacement of individuals from populations of one area into a new area. Once the floods subside, these individuals are now required to adapt to the new environment or perish. The flooding situation also destroys the terrain, leading to landslides. After fires, much of the ground that is pushed down the valleys is ash and dead timber. This debris clogs the waterways creating more isolation circumstances following the translocation of organisms (Moyle, Williams, & Wikramanayake, 1989). Ancestral relationships between the individual populations that range throughout the watershed environments can become convoluted due to these tempestuous circumstances (Swift, Haglund, Ruiz, & Fisher, 1993).
8
Model Organism
The utilization of model organisms allows us to broadly study various aspects of the biological world on a scale that is manageable in the laboratory
(Campbell & Reece, 2005). Model organisms are non-human organisms that represent a larger group and are chosen specifically to assist in answering the specific biological questions of a researcher.
In the interest of studying phylogeography, as stated previously, freshwater streams and watersheds in southern California provide the optimal habitat conditions to study the effects of environmental change on biological organisms. Decidedly, stream-dwelling vertebrate species are perfect models for the documentation of the biological and environmental changes that are occurring within these habitats.
While over 30 years ago, John Avise noted the staggering lack of interest in studying freshwater organisms despite their usefulness (Avise, Giblin-
Davidson, Laerm, Patton, & Lansman, 1979), since that time a resurgence of interest has occurred. Many phylogeographic studies have been done utilizing freshwater vertebrate species (Bernatchez & Wilson, 1998; Bufalino & Mayden,
2010a; Chen, Miya, Saitoh, & Mayden, 2008; Hollingsworth & Hulsey, 2011;
Houston, Shiozawa, & Riddle, 2010; Moyer et al., 2009; Phillipsen & Metcalf,
2009). Specifically, freshwater fish make an excellent study organism for this topic. Stream-dwelling fish typically have little ability to migrate which means they are restricted to small areas of habitat. In addition, these habitats are extremely
9 isolated (Avise & Ph.D, 2000). Because their habitats may be highly unstable, easily altered by the slightest environmental change, the evolution of the stream- dwelling freshwater fish populations can parallel the historical geologic transitions of the environment (Bernatchez & Wilson, 1998).
One group of fish that make excellent study populations are those of the
Cyprinidae family, also known as the minnow family. This family of fish is one of the most successful in the world, with over 250 species in North America alone
(Moyle, 2002). The cyprinids inhabit a wide variety of environments and have been found to inhabit a large percentage of the dominant freshwater drainage systems in the western side of North America (Lee et al., 1980).
Within the Cyprinidae family is found the species of concern for this study,
Rhinichthys osculus, commonly referred to as the speckled dace. The speckled dace is the most widely distributed freshwater fish in the western United States ranging from Canada south to Sonora, Mexico (Lee et al., 1980; Moyle, 2002).
Rhinichthys osculus is expansively distributed throughout much of western North
America, (Appendix A, Figure 3), (Oakey, Douglas, & Douglas, 2004). California is broken into five ichthyologic provinces, Klamath, Great Basin, Sacramento,
South Coastal, and Colorado. R. osculus is known to reside in four of the five provinces predominantly in watershed habitats where the waterbeds are shallow gravel and the waters constantly flowing (Moyle et al., 1989). R. osculus grows no more than 80 mm in length and its latin name derives from its distinctive snout with a small subterminal mouth. R. osculus’ coloring usually consists of dark
10 blotches along its body (Appendix A, Figure 4), hence the species’ common name, speckled dace. Speckled dace commonly inhabit the shallow gravel and riffle streams utilizing the overhanging flora as protection from predators (Moyle,
2002).
The southern California populations of R. osculus are called the Santa
Ana speckled dace (SASD), though SASD has not yet been formally recognized as a distinct species. At one time, it inhabited much of the Santa Ana, San
Gabriel, and Los Angeles river systems. Through the years, the group has experienced a decline in its populations and now can only be documented to inhabit smaller isolated creeks intermittently throughout the Santa Ana and San
Gabriel riverways. The SASD has been reported to be completely extirpated from the Los Angeles river system (Santa Ana Watershed Project Authority, 2004).
This decline mirrors that of the entire species within all of California. The current range of the SASD has been manipulated by the frequent fires and floods in this area (Santa Ana Watershed Project Authority, 2004) as well as human interaction with the environment (Swift et al., 1993), creating a highly fragmented habitat and small populations that are isolated from one another.
According to the Department of Fish and Wildlife’s California Natural
Diversity Database (CNDDB), the SASD is recognized as a species of special concern by the CA Department of Fish and Game, a threatened species by the
American Fisheries Society, and the US Forest Service lists it as a sensitive species (California Department of Fish and Wildlife, 2015). They have both
11 identified it as a population worthy of conservation. The Santa Ana Speckled
Dace was submitted for ESA consideration in a petition dated September 1994 that also addressed conservation needs of the Santa Ana Sucker and the Shay
Creek Three-spine Stickleback. It was submitted by the Sierra Club Legal
Defense Fund, Inc. on behalf of seven organizations. In 1996, despite these designations, the SASD was denied federal listing as an endangered species as it was lacking in genetic information and any peer-reviewed taxonomic description.
Conservation Policies
For this population to be considered for federal protection under the amended Endangered Species Act (ESA), it must be considered a “distinct population segment (DPS),” (U.S. Congress, 1978). Though the ESA does not officially define this term, the National Marine Fisheries Society (1991), in the interest of following the intent of the ESA with regards to listing Pacific salmon, determined that to be classified as a DPS, a population must be considered an evolutionary significant unit (ESU) of a species. This is defined by two criteria 1) reproductive isolation from other same species populations, and 2) be an important link in the evolutionary chain of the species (National Marine Fisheries
Society, 1991; Waples, 1991). This policy was later applied to all other vertebrate populations (U.S. Fish and Wildlife Service, 1996). ESU, however, was not defined by any of the above referenced agencies, and as such, has been defined via several published “conversations” by authorities in the field of taxonomy and
12 phylogenetics. The definition has evolved to be widely accepted as requiring that a species be reciprocally monophyletic at the mitochondrial DNA (mtDNA) level, as well as show significant divergence at the nuclear DNA (nDNA) level (Moritz,
1994). A monophyletic group is one which includes an ancestral population and all its descendent populations. To be reciprocally monophyletic simply means that the populations of interest show equal divergence from the ancestral population. Appendix A, Figure 5 illustrates this relationship using basic phylogeny diagrams. Diagram B demonstrates reciprocal monophyly. We see that all “a” populations are equally descended from the same common ancestor as all “b” populations. Diagram A demonstrates paraphyletic relationships
(Kizirian & Donnelly, 2004). Deriving from these conversations has come another term, management unit (MU) which is used to describe populations showing allelic variation at any DNA loci, mitochondrial or nuclear. This is a more general level of genetic diversity that conservation biologists agree is deserving of conservation management (Moritz, 1994).
The ultimate goal of this study has been to determine the genetic composition of multiple informative nuclear DNA markers to provide a more thorough taxonomic description of the Santa Ana speckled dace. In addition, to analyze the genetic differences between the local taxon and the populations that inhabit the most proximal stream environments in Southern California, specifically the Eastern Sierra and Central Coast watershed systems, as well as the Colorado River.
13
Molecular Markers
The fields of study of phylogenetics and phylogeography have exploded since the development of polymerase chain reaction and sequencing techniques.
Grouping of taxa previously was done based on phenotypic polymorphisms and behavioral traits, as this was the extent of information that could be obtained by researchers (Avise, 2009). Science has known of the connection between those visible/behavioral traits and the heritable markers that exist within the organism, but did not have a way to visualize those “characters”, as Mendel referred to them (Mendel, 1865). Charles Darwin described evidence of the evolution of traits within and among populations (Darwin, 1859) relating them to inheritance but again did not have the knowledge that has since become available. The visualization and confirmation of the structure of DNA (Franklin & Gosling, 1953), paved the way for its utilization in molecular research from that time forward.
The development of protein electrophoresis in the 1950’s and 60’s
(Raymond, 1962; Smithies, 1955, 1959a, 1959b), which is still used today, was the precursor to DNA electrophoresis. The ability to amplify or clone DNA in order to obtain the necessary quantities to visualize was brought about through polymerase chain reaction, PCR (Mullis, 1987). Utilizing the continuing development of nucleic acid sequencing methods (Illumina, 2015; Sanger,
Nicklen, & Coulson, 1977) we have the ability to analyze the fundamental heritable elements affected by the forces of evolution – from a single nucleotide to an entire genome.
14
Proteins, the product of DNA transcription and translation, were one of the original molecular markers used to analyze species variation. While protein variation may reveal a certain level of differentiation, it cannot give as true a description as that of nucleotide sequence. The amino acid subunits of a polypeptide are determined by a three-base sequence, or codon, which dictates which amino acids are put into a specific arrangement along the chain. A basic piece of information in genetics is the flexibility of certain codon positions to experience mutation and yet still code for the same amino acid. This is referred to as redundancy. Analysis of protein sequences can only tell us if a protein substitution has occurred, but depending on the nucleotide substitutions that take place, this does not always happen. If the sequence experiences a mutation at the third position in the codon, also known as the wobble position, it is possible the same amino acid will result, thus masking the nucleotide substitution that may have occurred.
Along with advancement in molecular techniques came a transition to finer scale markers, the nucleic acid subunits that are the recipe for the protein products. Long sequences made up of just four nitrogenous bases each linked to a sugar and phosphate molecule, twined together by mere hydrogen bonds are the current source of genetic material of choice. In eukaryotic organisms, we can obtain DNA from a couple of different sources. In vertebrate organisms, among others, we can extract chromosomal DNA from the nucleus, or mitochondrial
DNA from the mitochondria.
15
Mitochondria were revealed to be a highly effective tool in the analysis of phylogenetics (Brown, George, & Wilson, 1979). The mitochondria are the cellular organelles responsible for oxidative phosphorylation in the process of cellular respiration. These organelles reside in the cytoplasm of eukaryotic cells.
Mitochondria are able to self-replicate due to the possession of their own DNA.
This DNA can be found in the inner mitochondrial membrane space of the organelle. The evolution of mitochondria within eukaryotic cells is believed to have been a result of an endosymbiotic event that occurred between α- proteobacteria and an early form of eukaryotic organism. Phylogenetic analysis of the hsp70 gene supported the link of a common ancestor between several species of α-proteobacteria and multiple eukaryotic species (Falah & Gupta,
1994). Mitochondrial DNA (mtDNA) is a small circular, usually double-stranded, piece of DNA that is completely separated from nuclear DNA. It is most often uniparentally transmitted; specifically, it is commonly only inherited from the maternal parent due to the complete transmission of the cytoplasmic fluid containing the mitochondria, of the female gamete to the zygote. Due to this fact, it does not go through genetic recombination (Dawid & Blackler, 1972).
The benefits of analysis based on mitochondrial DNA are plentiful. The maternal inheritance makes it haploid which provides a single template to reproduce thereby not adding confusion with allelic variation within an individual.
There are multiple mitochondria in each cell therefore it has a high copy number and is easy to isolate (Bogenhagen & Clayton, 1974). It is much smaller than
16 nuclear DNA and its length is predominantly composed of coding strand. For this reason, the structure of mtDNA is highly conserved among a wide variety of taxa.
Mitochondrial DNA has a single origin of replication (ORI), which is also known as the d-loop or control region where the instructions for control of mtDNA transcription; it contains the genes for the small and large ribosomal RNA subunits, and its replication is unidirectional in all species (Brown et al., 1979).
Additionally, mtDNA evolution has been shown to be extremely rapid as compared to nuclear DNA, including single copy nuclear DNA, despite the concept that genetic material with high functional constraint usually displays a slow evolutionary rate. It was hypothesized that mtDNA would evolve at a similar rate as single copy nuclear DNA (haploid DNA), but studies have shown that mtDNA evolves at a much higher rate possibly due to an increased mutation rate
(Brown et al., 1979). Mitochondria have low to no properly functioning repair mechanisms which brings about opportunity for the incorporation of more substitutions into the gene (L. I. Grossman, Watson, & Vinograd, 1973). In addition, more frequent replications combined with little repair of misincorporation errors increases the frequency of detectable mutations (Rabinowitz & Swift,
1970). Additionally, due to the haploid maternal characteristics the mitochondrial genome can exist at ¼ the effective population size of a nuclear genome (Birky,
Maruyama, & Fuerst, 1983). Therefore, mtDNA can demonstrate the development of population structure more quickly than nDNA (Brown et al., 1979) and thus the phylogenetic analysis can demonstrate reciprocal monophyly
17 earlier. From a conservation standpoint, populations of taxa may be considered for management and protection if their genetic description reveals them to be reciprocally monophyletic at the mtDNA level (Moritz, 1994). This allows the potential for a taxa to be designated as an evolutionarily significant unit (ESU) which is a subcategory of population distinction under the Endangered Species
Act (1973).
While mtDNA has been highly beneficial in the advent of phylogenetics, it has come to the attention of many scientists that there are limitations to the accuracy of the phylogenetic relationships hypothesized based on mtDNA.
Some of the characteristics that make mtDNA useful are also its shortcomings.
The haploid maternally inherited genome dictates that analyses based on mtDNA only gives a one-sided view of the historical relationship among taxa. The lineage will only reflect that of the matrilineal succession. Groupings of taxa may not mirror the actual gene sequence descent in populations where the males and females migrate or group contrarily (Palumbi & Baker, 1994). Since there is no paternal chromosome donation and recombination is rarely found (Lunt &
Hyman, 1997), the mitochondrial genome is transmitted as a whole entity and for all intents and purposes can be considered a single locus (A. C. Wilson et al.,
1985). In addition, as the genome is circular and there are no intervening non- coding intron segments in vertebrate mitochondrial genes, there may be a lack of independence in the evolution of the genes due to their close proximity (He et al.,
2008). Therefore, the use of multiple genetic loci in phylogenetic reconstruction
18 has become necessary. Combining mtDNA with nuclear DNA loci has shown to provide a more accurate hypothesis of evolutionary relationships (Palumbi &
Baker, 1994). Phylogenies based solely on mtDNA sequence have been recommended to be revisited to incorporate nuclear DNA information into the analyses to determine whether the original suppositions were accurate (E. A.
Myers et al., 2013). One of the forms of nuclear DNA revealed to be useful in examining both inter- and intra-species phylogenetic relationships is the predominantly non-coding intron segments within the coding genes of vertebrate organisms.
In 1977, while studying a mature viral structural protein and its corresponding mapped mRNA sequence, researchers at M.I.T. discovered when they hybridized the mature mRNA to the single-stranded DNA, the RNA included
“tails” on the 5’ end that did not compliment the DNA at that point along the strand. Rather, they found the complimentary DNA segments that matched the mRNA further upstream. After ruling out several possibilities it was deduced that these tail segments may result from alternate splicing of the precursor mRNA
(Berget, Moore, & Sharp, 1977). This research led to the discovery of introns, the intervening segments that reside within a gene (Appendix A, Figure 6). This term refers to the stretches of mRNA transcribed from the DNA template that are subsequently excised out during mRNA processing and excluded from the final mRNA transcript, as well as the corresponding sequence of the template DNA from which the excised segments are transcribed (Kinniburgh, Mertz, & Ross,
19
1978). As evolutionary divergence times are derived from analysis of nucleotide substitution within DNA, this dictates the importance of using chromosomal DNA intron segments as a basis for study when determining phylogenetic relationships between subpopulations experiencing isolation, rather than mRNA.
Introns are labeled “noncoding” segments of DNA as they frequently do not directly encode for the production of amino acids. They reside within gene segments but as they themselves do not code for proteins, they are not under the same selective pressures as are exons. They may, however, contain regulatory information pertaining to the transcription and/or replication of the gene. Introns are found interspersed throughout the genes. Introns can vary anywhere from
200 to 4900 base pairs long (Naora & Deacon, 1982). Therefore, the sequences from several independent introns provide sufficient sequence data for phylogenetic analysis. In contrast, exons are those segments of the gene that are transcribed and translated into amino acid sequences forming protein products
(Nei & Kumar, 2000) and are therefore under stronger selection and therefore less informative in closely related taxa. This is one reason that makes introns an excellent tool to study population structure.
This is not to say that all intron segments can be utilized for evolutionary study. Choosing introns within genes that are conserved across taxa, and even across species, will make it possible to predict the presence of the gene among multiple species. Studying the coding sequence would be uninformative as the most conserved sequences are those for which little to no base substitutions are
20 present. The introns within those genes though, will also be conserved to a point.
Their presence is expected within the gene while their actual sequence order can often withstand nucleotide substitution without detriment to the development and/or function of the organism. This provides a level of variation that enables analysis of genetic differentiation among populations of the same species.
Literature Reviews
Phylogenetics Studies Enhanced by the Addition of Nuclear DNA Data
As previously noted, mitochondrial DNA was initially the foremost molecular marker utilized in genetic characterization of both marine and freshwater vertebrate organisms. However, a multi-marker approach, comparing both mtDNA as well as multiple nuclear DNA markers, has become a much more widely effective method of population identification and differentiation of eukaryotic macroorganisms. Concordance among multiple genetic loci provides assurance of historically distant divergence allowing for a more concrete image of current population structure (Avise, 2009).
Twenty-five years ago, Stephen Palumbi began using a combination of both mtDNA and nuclear intron sequences to perform population studies on marine mammals. Palumbi and Baker (1994) analyzed the population structure of humpback whales, Megaptera novaengliae, using previously sequenced mtDNA
D-loop information (Baker et al., 1993). In conjunction with the D-loop sequences, they also amplified and sequenced the first intron within the highly
21 conserved musculoskeletal actin protein gene. DNA samples were taken from ten free-ranging humpback whales located around Hawaii and California, as well as blue whales and bowhead whales from all ocean populations, for comparison.
The 1,409 bp actin intron sequenced revealed approximately 3% and 4% difference between the humpbacks and the blue whales and bowhead whales, respectively. The phylogenetic tree created based on the initial study (Baker et al., 1993) utilizing mtDNA did distinguish the Hawaiian humpbacks from the
California humpbacks but revealed no variation within the Hawaiian population and minimal variation within the California population. The most informative phylogenetic tree formulated based on the intron marker however, presented a more highly varied structure. The two comparison whale species clustered accurately. The individuals belonging to the study group of humpback whales were clustered into two clades with no clear geographic structure to the cladogram. This study shows a strong difference in the evolutionary relationships using mtDNA versus a single nuclear DNA marker. The intron sequence revealed no particular pattern which could be heavily influenced by the fact that the males of these populations are migratory and thus the chromosomal DNA of a single male may be distributed throughout multiple populations reducing the amount of variation between the populations. This also illustrates the necessity for utilizing multiple gene markers. Based on the mtDNA alone we would conclude distinct genetic separation between populations but the nDNA reveals the level of introgression that actually occurs among the populations.
22
Further advancement of these techniques was demonstrated (Slade,
Moritz, & Heideman, 1994) by the use of three conserved nuclear intron sequences as well as an exon sequence to compare to the mtDNA control region phylogenetic tree of six species of pinnipeds. Corroboration of multiple nuclear loci lends stronger support to the determination of phylogenetic relationships. In this case, phylogenetic analysis based on the combination of loci was run via multiple minimum-evolution models each showing concurring relationship structure among the pinnipeds with canids as the outgroup, all members of order
Carnivora. This study showed the usefulness of using conserved intron sequences in the production of multigene phylogenetic analysis of closely related species.
Studies utilizing intron sequence to evaluate phylogenies of freshwater minnows have become more frequent. For example, Angelo Bufalino and
Richard Mayden (2010b), recognized the need for nuclear DNA to lend greater support to previous mtDNA phylogenetic data on the relationships between many species of North American phoxinins, fish of the Cypriniformes family. In this case, previous mtDNA work had been done, but Bufalino & Mayden were looking to further support that work. The results confirmed the three major clades revealed in the original studies. Analysis utilizing three analytical models provided a consensus of results. This study was a very extensive look attempting to resolve the relationships of several hundred species of fish. It resulted in further strong support for the previously identified relationships and additionally,
23 gave support to previously hypothesized relationships of many species. It resolved many loosely supported branches, as well. Many researchers have elected to reevaluate their previous phylogenetic studies, adding nuclear DNA data, in the form of microsatellites and introns, to provide a more accurate picture of the relationships between closely related species (Bufalino & Mayden, 2010a;
Chen et al., 2008; He et al., 2008a) and often reaffirm the pre-existing phylogenies.
Much more recently, a study detailed the phylogenetic resolution of eight species of freshwater fish endemic to southern Mexico and Central America
(Morcillo, Ornelas-García, Alcaraz, Matamoros, & Doadrio, 2016). These eight species are part of the Profundulidae family. Utilizing three mtDNA markers and two introns of the S7 ribosomal protein gene, two models were hypothesized, 8- species and 12-species. At the time of the study, the family included a single genus, Profundulus divided into two subgenera, Profundulus and Tlaloc. Each subgenus included four species. Morcillo et al, ran both Maximum Likelihood and
Bayesian Inference phylogenetic analyses on two concatenated sets of sequence data. The first analyzed the concatenation of the three mtDNA gene segments only while the second analyzed the concatenation of the three mtDNA segments (1897 bp), plus the two intron segments (998 bp) for a total of 2895 bp of sequence data.
The phylogeny based solely on the mtDNA concatenation in both the ML and BI analyses resulted in a somewhat convoluted division of lineage. Many
24 individuals sampled from the same region were grouped with others of another geographic location. One species of the Tlaloc subgenus, P. (T.) candalarius, was absorbed by another of the same subgenus, but this occurred in every phylogenetic reconstruction that was done until the Bayesian Species
Delimitation was constructed based on the hypothesized 12-species breakdown, which then separated out T. candalarius.
The phylogeny based on the mtDNA plus nDNA also resulted in the same number of clades but the individuals representing the geographic locations separated out with their nearest proximal relatives. The finer distinction is often the result of the more detailed genetic pool. Mitochondrial DNA alone will only provide a single parentage which often creates relationships among individuals that are more geographically distant. Including the biparental aspect of the recombinant nDNA intron sequences provided a tighter clustering of the individuals of the same geographic habitats.
In all permutations of the phylogenetic data though, the the mtDNA alone and the combination of the mtDNA and nDNA loci calculated extremely high probability of completely reciprocally monophyletic relationships all the way back to the outgroup(s). The authors of this study, due to the irrefutable support in all calculations of the data, state that while it could be conceivable that there are actually twelve distinct species among two definitely distinct genera, to remain on the conservative side, they proposed that the two subgenera should be recognized as two distinct genera, Profundulus and Tlaloc.
25
Phylogeography as Influenced by California Floristic Province Topography
The extensive biological variation that exists in Southern California makes it a prime location for phylogeographic studies. The varying ecosystems that make up this region; from evergreen mountain top communities, to coastal plains, to vast deserts and inland chaparral and scrublands, provide highly diverse conditions of which the effects can be studied in the multitude of organisms that inhabit each of these various environments. As evidence of this, many studies have documented the correlation between the distinct topographical formations present in this region, and the population distribution of many plant and animal species.
The phylogeography of the California mountain kingsnake has been studied using multiple mtDNA sequences (Rodríguez-Robles, Denardo, & Staub,
1999), and later using multiple genetic loci (E. A. Myers et al., 2013). The 1999 study sequenced the Ndh4 gene as well as three tRNA genes of seven subspecies such as the Baja California, San Bernardino mountain, San Diego mountain, and Sierra mountain kingsnakes to name a few. A maximum parsimony tree concurred with two variations of maximum likelihood trees resulting in two clades, a northern and a southern clade. The northern clade was further divided into two subclades, the coastal and the northeastern subclades.
The two ML trees differed only in the determination of one relationship and otherwise agreed.
26
Edward Myers (2013) revisited this original study with the purpose of determining the answers to two questions. He was interested to discern whether the geographic distinctions deduced from the mtDNA study would hold when also including nuclear DNA. Secondly, what area of the California Floristic Province was integral in the divergence events of this species?
Using the original 34 DNA extraction samples from the 1999 study, Myers sequenced two anonymous nuclear loci, CL4 and 2CL8 (Burbrink et al., 2011).
He then ran three separate phylogenetic trees, one for the mtDNA, one for the
CL4 nuclear loci, and one for 2CL8. The mtDNA sequence contained many more variable nucleotide sites than either of the nuclear loci, but only resolved the two clades, northern and southern, with the barrier between proposed to be inland seaways in the approximate region of the southern tip of the Sierra Nevadas
(Rodríguez-Robles et al., 1999). The nuclear loci also proposed two major clades, northern and southern but provides support for the hypothesis that the barrier may be climatic in nature and that it is located further north putting the separation between the two clades in the region of current Monterey Bay. This groups the former Coastal subclade with the current southern clade. Both of the nuclear loci trees independently revealed a third prospective clade to the east, that was teased out of the southern clade. Myers concludes that conservatively, it is suggested that at minimum this species be recognized as two separate species with the southern clade a candidate for future division.
27
Another study utilized mtDNA analysis in phylogeography of a California endemic invertebrate species (Vandergast et al., 2006). The authors of this study sought to distinguish between isolation factors contributing to genetic differentiation resulting from historical geological forces as opposed to the relatively more recent factors contributed by human presence. A single species of the Jerusalem cricket, endemic to southern California, is geographically isolated from the many closely related species that range throughout the rest of western North America. This species, Stenopelmatus ‘mahogani’, inhabits the cismontane region of southern California bordered to the north and east by the
Transverse and Peninsular mountain ranges, and to the south by the San Diego
River. By analyzing sequence data for the cytochrome oxidase I gene, results resolved the individuals into 65 unique haplotypes. These haplotypes did, in fact, mirror their geographic sampling locations. In the few instances where this was not the case, it was found the individuals were from a nearby sampling location to the one in which they were phylogenetically grouped. One case of a single haplotype present in two sampling locations occurred along the Santa Ana River.
It was proposed this could have been the result of a previous flooding event or possibly that the river acted as a dispersal corridor. The populations showed little to no genetic variation within them but very high variation between them demonstrating the isolation of each population and the lack of migration between them.
28
Most locally, Ivan Phillipsen and Anthony Metcalf (2009), reported on the phylogeography of a local stream-dwelling frog, Pseudacris cadaverina. They hypothesized that genetic variation would be delineated by either the watershed systems, the mountain ranges, or into coastal and desert habitats. This is an organism that inhabits the shallow streams of the creeks within the watershed systems and has very limited dispersal, within 25 meters. In this respect, it is similar to R. osculus in that much of its genetic dispersal occurs due to flooding events or transplantation and thus populations become isolated from one another. Using the cytb and tRNA-Glu mtDNA genes, the authors ran multiple statistical scenarios based on the various hypothesized landscape features of influence. The resulting phylogenetic tree showed three distinct groups, which directly correlated with the three groups illustrated by the haplotype network diagram. Haplotypes grouped into either the Northern, Central, or Southern clades, with some overlap of the Central and Southern Groups. Overall evidence supports the hypothesis that the mountain ranges act as isolating barriers between the populations with the Transverse Range break as the primary barrier between the Northern groups, west of the San Gabriel Mountains, and the
Central/Southern groups.
Phylogenetics and Population Structure of the Santa Ana Speckled Dace
Initial study of the phylogenetic relationships of R. osculus in the southern
California region to populations of R. osculus inhabiting the central coast and eastern Sierra regions began in the Metcalf lab with work on two mitochondrial
29
DNA loci (J. J. VanMeter, 2017; P. M. VanMeter, 2017). Concurrently, a study of the relationships within and among the populations of the Santa Ana Speckled
Dace utilizing seven microsatellite loci was undertaken (Nerkowski, 2015).
Jay VanMeter sequenced the mtDNA control region (d-loop) and its related tRNAs for a total of 1143 bp of sequence data in 74 dace samples representing three California regions. Pia Van Meter sequenced the mtDNA cytochrome b gene which totaled 1155 bp in 92 dace samples representing three
California regions and the Colorado River, AZ region. Stacey Nerkowski genotyped seven microsatellite loci in 146 dace samples representing the three
California regions.
Analysis of the dloop sequence resulted in 14 unique haplotypes in the
Southern California region alone (Santa Ana, San Gabriel, and San Jacinto
Watersheds). Of the 14 haplotypes, only one was a mix of individuals from both
Santa Ana and San Gabriel watershed tributaries. The other 13 were composed of individuals within the same watershed. Bayesian inference phylogeny illustrated the relationship among the SASD, central coast, and eastern Sierra regions showing a reciprocally monophyletic lineage between the Santa Ana
Speckled dace and the coast/Sierra dace (J. J. VanMeter, 2017).
Analysis of the cytB gene, which included dace from the Colorado River, resulted in both Bayesian and Maximum Likelihood phylogenies revealing reciprocally monophyletic relationships between the SASD/Colorado and coast/Sierra dace. In both analyses, Colorado River dace were more closely
30 related to southern California dace. Minimum spanning haplotype network of the three watersheds of the Santa Ana Speckled Dace mirrored that of the d-loop network (P. M. VanMeter, 2017).
Microsatellite analysis as visualized in a Discriminant Analysis of Principle
Components (DAPC), based on the characterization of alleles of the SASD, central coast, and eastern Sierra regions, when the number of populations (K) is set to 3, all individuals were clustered as befits the region from which they were sampled (Nerkowski, 2015).
Collectively, these studies strongly suggest that the populations of R. osculus are genetically distinct from those of the Central California coast, the
Eastern Sierra Nevada valley, and the Colorado River. Recommendation for formal species status must include not only evidence of genetic reciprocal monophyly based on mtDNA sequence data but also significant genetic differentiation based on nuclear DNA sequence data. The mtDNA information has been provided by Jay and Pia VanMeter. The nDNA microsatellite data provided by Stacey Nerkowski corroborates the VanMeters’ conclusions regarding the levels of genetic differentiation between the three California regions’ populations.
The goal of this study has been to identify multiple nuclear DNA loci that would be informative in phylogenetic analyses and further, to determine whether these nuclear loci will support the mtDNA and microsatellite DNA evidence. In light of the fact that only a single intron marker had been sequenced on any R.
31 osculus specimens, the S7 ribosomal protein gene intron 1, I questioned whether
I could identify other novel introns for R. osculus. Secondly, would novel introns be useful in the characterization of molecular variation and divergence among populations of a species? If so, will it provide completion to the taxonomic description of the Santa Ana Speckled Dace as warranted by the ESA rejection of 1996?
32
CHAPTER TWO
MATERIALS AND METHODS
Research Objective
The intent of this study was to characterize the evolutionary relationships of Rhinichthys osculus populations among four regions of California and Arizona, using multiple intron markers. As only one intron has been sequenced on R. osculus and published, the S7 ribosomal protein gene intron 1 (Chow & Hazama,
1998), additional candidate intron sequences were needed to provide a more complete study. Based on the successful amplification of several intron sequences of fellow Cyprinidae family genera, Hypophthalmichthys and Danio
(Li, Riethoven, & Ma, 2010), research began on testing whether any of the primer sequences from this study would amplify on the R. osculus tissue samples in the
Metcalf lab.
Sample Collection
Rhinichthys osculus specimens, obtained from the Santa Ana watershed, were collected by the Metcalf Lab (CSUSB) under the auspices of the U.S Forest
Service (USFS) and California Department of Fish and Wildlife (CDFW). All other samples were collected under the auspices of the appropriate authorities and sent to the Metcalf Lab by the CDFW or USFS. Samples include those from the coastal region that lies north of the western Transverse Mountain range segment and west of the Sierra Nevada range, represented by tributaries of the San Luis
33
Obispo and Santa Maria River watersheds; the eastern Sierra Nevada region, north of the central Transverse range segment, represented by two Owens River tributaries; and the southern inland region, south of the Transverse range and east of the Peninsular Mountain range, represented by the Santa Ana, San
Gabriel, San Jacinto, and Los Angeles River watershed tributaries. Specimens from the Colorado River through the Grand Canyon were provided by the Arizona
Department of Game and Fish and one individual specimen was obtained from a southern tributary of the Colorado River via the Gila River. Sample sites overlayed onto Speckled Dace range map viewable in Appendix A, Figure 7.
R. osculus specimens were stored in 100% ethanol in 50 mL conical vials at -4ºC. A subset of the specimens, representing each tributary of each watershed, were identified for this study. Eight each from the Owens Valley,
Central Coast, San Gabriel, and Colorado Rivers, and 3 each from six tributaries of the Santa Ana River watershed, plus three samples from the San Jacinto River and two samples from the Los Angeles River. In total, 55 dace samples were identified to be used in this study (Appendix B, Table 1).
Molecular Methods
Approximately 0.25 grams of muscle tissue along the left or right lateral side was aseptically dissected from each designated specimen, being careful not to include any eggs from female specimens. Each tissue sample was placed on a sterilized pre-weighed watch glass and precise wet weight was obtained. Each sample was then treated for the removal of ethanol before beginning the DNA
34 extraction procedure. The removal of ethanol was accomplished by immersing the tissue in purified H2O on the watch glass, letting is sit for 5 minutes, then removing the water by wicking with a clean kim-wipe. This process was repeated
2-3 more times. At this time the watch glasses holding tissue samples were placed in a vacuum desiccator for one hour under a fume hood. At the end of one hour, samples were placed in sterile MC-15 tubes to begin the DNA extraction process. Initial DNA was extracted using a Qiagen® DNeasy Blood and Tissue
Kit according to the DNeasy® Blood and Tissue Handbook protocol for
Purification of Total DNA from Animal Tissues (Spin-Column Protocol). Final
DNA extract was suspended in 200 uL of Qiagen AE buffer. Extracted DNA was visualized on a 0.8% agarose gel run at 100 volts for 55 mins. Concentration and purity were determined using a NanoDrop® ND1000 spectrophotometer (US patent 6,628,382 and 6,809,826) at absorbance of 340 nm. This DNA was used for all preliminary primer amplification testing. DNA was stored in MC-15 tubes at
-20ºC. Aliquots were made of each DNA extraction stock for purposes of PCR.
Aliquots were stored at -4ºC.
All ten sets of primers that were successfully amplified on cypriniformes species, by Li et al. (2010), were obtained. Initial PCR was performed using the protocol described in the reference study (Li et al., 2010) but did not result in amplification of any Rhinichthys osculus DNA so optimization was undertaken over a long period of time. On an Eppendorf Mastercycler Gradient, PCR was run on a small subset of R. osculus samples representing the Santa Ana watershed,
35 with variations on reagent concentrations and temperature gradients, testing for all optimal parameters of the PCR protocol per each set of 10 primer pairs. Once successful amplification was obtained on this subset of samples, the primer list was narrowed to the six sets that provided the cleanest amplification. These initial PCR reactions were purified with ExoSAP-IT™ according to the ExoSAP-
IT™ protocol. Purified PCR was verified a second time by 1% agarose gel electrophoresis. Purified PCR was then prepared for Sanger sequencing according to submission requirements for premixed template by Retrogen Inc.
Before continuing, it was necessary to verify that the PCR amplicons and resulting sequence were, in fact, the same markers as those amplified by Li, et al. Several parameters were analyzed, including band length via gel electrophoresis and nucleotide BLAST™.
Once amplicons were verified, the subset of R. osculus samples was expanded to represent most of the streams of the Santa Ana, San Gabriel, Santa
Maria, San Luis Obispo, and Owens Valley watershed for which samples were available and viable. Upon further PCR optimization, three primer sets were most frequently successful at amplifying the expanded subset of samples and were therefore selected to be the three that would be used for the purposes of this study. Upon further consideration, it was decided it would be remiss not to include the one published sequenced intron of various Rhinichthys species, including several R. osculus subspecies, which is the first intron of the S7 ribosomal protein gene, as referenced earlier in this section of the thesis. The
36 purpose was to allow for continued analysis in later studies, to expand comparison with the published S7 intron sequences that many other studies have utilized to infer phylogenetic relationships of Rhinichthys species within and among other geographic areas (Bufalino & Mayden, 2010a; He et al., 2008b;
Hoekzema & Sidlauskas, 2014; Kim & Conway, 2014; Mussmann, 2018; Taylor,
McPhail, & Ruskey, 2015). One primer set of the three was substituted with the
S7 intron primers (Chow & Hazama, 1998). Primer sequences for all tested pairs are listed in Appendix B, Table 2 with final selected primers highlighted. PCR protocols for each of the final three are summarized in Appendix B, Table 3.
Once PCR protocols were optimized and primer selection was complete, the list of R. osculus specimens to be included in this study was expanded to include at least one representative of each tributary, of all watersheds represented in the sample collection, provided to the Metcalf lab. At this time additional specimens representing the Colorado River through the Grand
Canyon, were obtained from Arizona Game and Fish with an equivalent subset of these specimens added to the list. In all, 55 specimens were identified to be included in this study. Outgroups choices were limited due to the novel introns being used therefore, the two species of Hypophthalmichthys from the reference study (Li et al., 2010), H. molitrix, and H. nobilis, being the most closely related to
R. osculus, would serve as outgroups. It was determined that these two species had also been characterized on the S7 ribosomal protein intron, H. molitrix (He et al., 2008b), and very recently, H. nobilis (Stepien, Snyder, & Elz, 2019), making
37 them the optimal choices. Summary of Rhinichthys osculus outgroup specimens is listed in Appendix B, Table 4.
After expansion of the sample list and designation of outgroups, DNA was extracted from the selected R. osculus samples in the Metcalf lab using 2- phenol-chloroform isoamyl (25:24:1) and 1 chloroform isoamyl (24:1) extractions.
The isolated DNA pellets were resuspended in 100 uL of TE (tris-EDTA) buffer, with stocks stored at -20 ºC, and aliquots stored at -4ºC. DNA presence was verified by 1% gel electrophoresis, and quantity and purity were verified by
NanoDrop® One spectrophotometer.
PCR was performed on each DNA sample for all three introns under optimally determined conditions. PCR results were verified on 1% agarose gel electrophoresis at 100 volts for 55 min. Earlier sets of successfully amplified DNA were purified using Exo-SAP-IT™. Later PCR product was purified using the
Thermo Scientific™ GeneJet PCR Purification Kit.
All further Sanger sequencing was done by MacrogenUSA in the
Rockville, Maryland facility. Sequencing samples were prepared according to the
Macrogen sample preparation guidelines for premixed purified PCR product.
Sequence contigs were assembled, aligned, and base ambiguities resolved, using Geneious Prime 2019.1.3 (https://www.geneious.com). After multiple attempts to PCR all samples, a subset of individuals, though providing successful amplification, did not result in quality sequence data. To generate higher quality sequence, internal primers were designed for all three introns.
38
Internal primers were designed by aligning all successful sequence data per intron within Geneious Prime and primers were identified visually based on widely accepted optimal primer criteria. Annealing temperatures were calculated using the Thermo Fisher Scientific™ webtool, Tm Calculator. Internal primers are listed in Appendix B, Table 5. PCR on remaining problematic samples with internal primers was then completed.
Sequence Analysis
In the phylogenetic analysis of a population, standard sequence statistics such as nucleotide frequency, GC content, transition/tranversion ratios, polymorphic nucleotide site numbers, and haplotype identification are utilized to illustrate levels of genetic variation among and within the populations. These statistics were obtained using Geneious Prime 2019.1.3
(https://www.geneious.com), GenAlEx 6.5 (Peakall & Smouse, 2006, 2012), and
MEGA X (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). In addition to running all analyses on each primer alignment individually, a concatenation of the three sequences was created for each individual, and these sequences were also aligned and analyzed under the same conditions.
Population Genetics
Population genetics parameters describe the genetic variation within and among a defined population set. In this study, the central focus is the determination of the variation among the four regional populations of R. osculus.
Previous studies in the Metcalf lab have elucidated ancestral population genetics
39 within the regional populations and even within the tributary populations using microsatellite genotyping (Nerkowski, 2015). While regional analysis has also been completed utilizing the sequence analyses of two mitochondrial DNA markers, Control Region (dloop) and Cytochrome B (J. J. VanMeter, 2017; P. M.
VanMeter, 2017), this study’s purpose is the further characterization of the genetic status of the Santa Ana Speckled Dace (SASD) in relation to the nearest regional neighbor populations to provide a complete description of the taxonomy which may aid in the designation of the SASD as a unique taxonomic entity. To this end, it is necessary to include nuclear DNA sequence analysis.
The frequency-based population genetic statistic, Wright’s F-statistic, also known as the fixation index, is used in describing population genetic structure. F- statistics provide tools for comparison of allele frequency ratios of individuals and/or subpopulations to the total population. DNA sequence does not provide the same scale of genetic variation as microsatellite loci due to its slower mutation rates so rather than analyze allele frequency, an analog of the F- statistic, Phi-statistics are used to analyze haplotype diversity (Excoffier,
Smouse, & Quattro, 1992). Haplotypes are determined by the patterns of sequence variations (SNPs) within the nuclear DNA. Using GenAlEx 6.5 (Peakall
& Smouse, 2006, 2012), polymorphic nucleotide positions were identified. Based on the patterns of these polymorphisms, haplotypes within each of the intron alignment sets were distinguished by both watershed and region. The pairwise comparisons of each individual within a region results in a triangular matrix of all
40 the calculated genetic distance values between each pair of individuals, and consequently each region. PhiPT (ΦPT) quantifies the level of variation that exists technically between the subpopulations (P) to the total population (T). For our purposes, the analyses were run using the regions and therefore (P) will designate the regions and (T) will designate the entire dataset of all four regions.
Distance-based methods of analysis generate data by comparing the genetic divergence among the groups of the population. Analysis of Molecular
Variance (AMOVA) (Excoffier et al., 1992) analyzes the divergence of populations, or genetic distance, through the pairwise comparisons of the haplotypes creating matrices of these comparisons and data on the level of molecular variance among and within the groups. AMOVA also provides a pairwise analysis of the levels of migration (Nm) among the identified groups.
Principle Coordinates Analysis (PCoA) (Torgerson, 1958), transforms the data points of potentially correlated variables into linear form. PCoA, unlike PCA
(principle components analysis) analyzes the genetic dissimilarities among the sequences, and therefore focuses on the polymorphic nucleotide sites of each alignment. These statistical analyses were performed in GenAlEx 6.5 from the genetic distance matrices of each intron alignment.
Mantel Tests (Mantel, 1967) compare matrices to determine whether a correlation exists; if so, whether it is positive or negative, and the strength of that correlation. A pairwise genetic distance matrix can be analyzed against a geographic distance matrix to determine if and how much, the genetic distances
41 that exist between the regions can be explained by the geographic distances between them. Likewise, pairwise genetic variance (ΦPT) matrices can be analyzed against geographic distances to determine whether the level of variation that exists among the regions is in any way correlated to the geographic distances between them. These two analyses are not the same. The first quantifies a “corrected distance” based on the number of base substitutions in a DNA locus. The second quantifies the amount of overall variance. To best fit the purpose of this study, the ΦPT vs geographic distance was the optimal analysis to use. Average geographic coordinates in decimal form, for each of the four regions, were entered into GenAlEx. Pairwise geographic distance matrices were calculated for each pair of regions. The genetic variance matrices and geographic distance matrices for each intron were analyzed using a Mantel Test in GenAlEx. The resulting R value defines the level of correlation on a scale of -1 to 1, where R=-1 indicates a completely negative correlation; R=0 indicates no correlation at all; and R=1 indicates a completely positive correlation.
Phylogeography
Probability of phylogenetic relationships can be inferred and displayed by way of phylogenetic trees, created by the analysis of statistical algorithmic programs of genetic distance data. Before computing a phylogenetic tree, a best- fit model of evolution must be determined for each DNA sequence alignment.
The various models of evolution utilize differing criteria on nucleotide substitution rates to infer the most statistically probable evolutionary relationships between
42 the taxa being analyzed. The best fit models of substitution, i.e. evolution, for each of the three intron alignments were determined using MEGA X (Kumar et al., 2018).
Methods of statistical inference are also varied. Bayesian inference
(Bayes, 1763) is the preferred method of statistical probability inference.
Bayesian analysis is based not just on random probability outcomes but is based on a coalescence of probabilities; i.e. initial assumptions of set conditions, the evidence of all probable conditions within a range, and the probability after weighing the evidence of all conditions. Bayesian analysis was used to determine phylogenetic trees using the best-fit models of evolution determined by MEGA X.
A MrBayes 3.2.6 (Huelsenbeck & Ronquist, 2001) plugin was installed into
Geneious Prime allowing trees to be run on a GUI interface using MrBayes programming. All trees run included both Hypophthalmicthys outgroups. Two
Rhinichthys species, R. cataractae and R. atratalus, were included in phylogenetic analysis of the S7 ribosomal protein gene intron for the purposes of analyzing the outcome when taking into account samples representing both a common family and a common genus.
43
CHAPTER THREE
RESULTS
Molecular Methods
Using novel molecular markers for a phylogenetic study results in increased requirements for verification of success. In light of this, presentation of results may be somewhat more in depth than would typically be documented in these types of studies.
Preliminary Testing
In order to verify that the amplicons produced using the published primer sets (Li et al., 2010) that successfully amplified fish from the same family as
Rhinichthys osculus, the cyprinidae family, many checks were performed at each step of early testing. The initial determination to use three intron primer pairs
(4174E20, 36298E1, and 19231E4) from the Li study for this research made it necessary to verify the amplicons produced in the Metcalf lab were in fact introns of the same identity. Later, the 19231E4 primer set was replaced with the
S7RPEX1 primer set for amplification of the S7 ribosomal protein gene intron 1.
Subsequently, amplification data for the removed intron will not be reported in this thesis.
Once PCR was successful, band length of the R. osculus amplicons were determined to be similar in length to those of H. molitrix and H. nobilis. Band length for the ccr4-not transcription complex subunit 1, intron 20 (cnot1) was
44 approximately 750 bp for R. osculus while H. molitrix and H. nobilis were 779 bp and 833 bp, respectively. Band length for the hypothetical protein gene intron 1
(hpg) for R. osculus was approximately 350 bp, while the H. molitrix band was
342 bp and H. nobilis was 345 bp, (Appendix A, Figures 8 and 9). Initial sequence data was entered into a Genbank BLAST™ and resulted in the highest identity results being that of H. nobilis with 90.28% and H. molitrix with 88.2% identity with R. osculus for the cnot1 intron sequence. For the hpg intron, again the highest identity results were H. nobilis (86.14%) and H. molitrix (85.95%),
(Appendix B, Table 6). Lastly, a single R. osculus sequence was aligned with the
Genbank sequences for H. molitrix and H. nobilis for both introns revealing high levels of consensus throughout the strand lengths, shown in Appendix C,
Datasets 1 and 2. Collectively this verified that amplification of R. osculus DNA using these primers had resulted in the correct intron sequences.
Sequence Analysis
Sequence data was received electronically from Retrogen, Inc. and
MacrogenUSA. AB1 files were input into Geneious Prime and the forward and reverse contigs were assembled. Assemblies were then aligned. Alignments allowed ambiguity resolutions where base-calling was confident. In those cases where strong support for a base-call was not present, the ambiguous base was left in place.
Basic sequence traits are summarized in Appendix B, Tables 7. Out of the
55 R. osculus specimens used to amplify the three introns, two specimens were
45 removed from the data set, Mill Creek 4 (ML4) and West Fork San Gabriel River
9 (G9). ML4 DNA extract was found to be too poor quality to provide decent sequence. G9 sequence varied highly from all other sequences and could not be included in good conscience as it was possible this individual may not be a dace specimen. The most successful sequencing results were for hpg intron 1. All 53 viable samples provided quality sequence that was able to be aligned. Final trimmed alignment length was 292 bp out of the original 350 bp reads. Pairwise identity among the sequences was 98.0% with all four bases approximately equal in frequency (21.0%-27.7%). The cnot1 intron 20 final alignment length was 657 bp out of the original 750 bp. Only 45 sequences were of optimal quality to include in the alignment. Pairwise identity was 97.4% and base frequencies ranged from 19.1%-30.0%. The s7rp intron 1, while being the longest amplified intron at nearly 1100 bp, after trimming to the shortest quality read, resulted in only 507 bp of aligned sequence with 40 specimens. Pairwise identity was 97.6% and base frequencies were 13.6% (C), 20.3% (G), with A’s and T’s at equal frequency of 33.0%. In order to have as much sequence length as possible for the concatenated alignments, sample size was reduced to the longest 34 specimens with at least 2 or more individuals representing each region. To include more individuals would have required the sequence alignment be trimmed to the shortest sequence thus cutting out some informative sites. In the end, the concatenated alignments were 1493 bp in length with pairwise identity at
97.5%. Base frequencies were 17.8% C, 21.5% G, and 30.4% for both A’s and
46
T’s. GC content for all sequences was between 33.9%-45.4%. Transition- transversion data, Appendix B, Table 8, was included with the spreadsheet results of the best fit model of evolution analysis by MEGA. Only the data for the concatenated sequences is reported as it includes all sequence variation in entirety, for the three introns.
Population Genetics
Alignments were exported from Geneious Prime in FASTA file format and input into GenAlEx. FASTA input files are listed in Appendix D. FASTA alignments were analyzed and data was generated identifying all polymorphic nucleotide sites among the sequences of an alignment. Polymorphic nucleotide site lists are shown in Appendix C, Data Sets 3, 4, and 5. The patterns of SNPs among the individuals determines the number and traits of each haplotype. Basic haplotype counts are summarized in Appendix B, Table 9. The CNOT1 Intron 20 alignment resulted in 12 haplotypes with 8 of them being unique haplotypes, meaning only one individual represented that haplotype. The HPG Intron 1 alignment produced 15 haplotypes with 12 unique haplotypes. The S7RP Intron 1 alignment produced 18 haplotypes with 13 unique haplotypes. Haplotype specimen assignments for each intron alignment are shown in Appendix B,
Tables 10, 11, and 12. In each instance, the Eastern Sierra dace separated out into individual haplotypes. Based on the haplotype analyses of HPG Intron 1
(Table 11) and S7RP Intron 1 (Table 12), the 7-8 Owens Valley dace were placed into 7-8 haplotypes. The Colorado River dace, in the analyses of both the
47
CNOT1 Intron 20 (Table 10) and HPG Intron 1, resulted in the eight specimens being separated into four haplotypes. The eight Central Coast dace separated into two (S7RP), three (CNOT1), or four (HPG) haplotypes depending on the intron analyzed. The southern California Santa Ana speckled dace specimens largely clustered into only a few haplotypes. CNOT1 haplotypes include two individuals each in their own unique haplotype with all other individuals (n=26) clustered into one haplotype. The HPG intron haplotype analysis clustered all 29
SASD individuals into one haplotype in addition to five of the eight Colorado
River dace samples. The largest variation occurred in the analysis of the S7RP
Intron, with 23 SASD samples being separated into seven haplotypes, though the majority (n=16) clustered into two haplotypes.
Using the processed sequence data, an AMOVA was run on each set of sequences representing the three introns individually as well as on the concatenation of the three. AMOVA results are summarized in Appendix B, Table
13. All analyses of molecular variance were set at 999 permutations, the DNA was designated as haploid (or binary haploid) to account for it being single copy nuclear DNA (Blyton & Flanagan, 2012), and results were generated from a regional perspective rather than individual populations. Data table shows results based on AR=among regions or WR=within regions, though there are a few instances where inclusion of population level data may occur. Analyses of the individual pairwise genetic distances are consolidated into a regional ΦPT table.
PhiPT values typically indicate the level of separation of the individual populations
48 to the total (PT). In these analyses the (PT) is evaluating the variation of the regions (n=4) to the total dataset. All intron AMOVAs estimated levels of molecular variation at 0.906-0.966. These values are converted to percentages and are illustrated by pie charts of molecular variance for each of the three intron analyses, as well as analysis of the concatenated sequences. These charts are shown in Appendix A, Figures 10-13. The level of separation is intuitively inversely proportional to the level of migration (Nm) occurring between the populations. Consequently, the Nm (haploid) based on the S7RP intron is higher than the other three analyses, at 0.052 while the migration value for HPG is
0.031, and CNOT1 and the concatenated analyses both indicate an Nm=0.017.
The P-values for each comparison lend additional support, p ≤ 0.001.
Breakdowns of regional pairwise ΦPT and migration data are included in
Appendix B, Tables 14-17. The upper portion of each table displays the pairwise
ΦPT and p-values. The lower portion shows the migration values. Further analysis of these datasets through Principle Coordinates Analysis (PCoA) for each set of intron data as well as the concatenated data, are shown in Appendix A, Figures
14-17. For the CNOT-1, S7RP, and the concatenated sequence representations, the individuals are clustered together based on their genetic relatedness which also coincides with their geographic identification. In each case, the Colorado dace did cluster closely, if not directly on top of, the SASD. In order to parse out a more accurate relationship between them, an additional PCoA was performed using the concatenated sequence data of only the HPG and CNOT-1 introns.
49
This enabled me to include all eight Colorado dace represented in the study as all eight sequenced successfully for these two introns whereas only two of the eight successfully sequenced on the S7RP intron. Appendix A, Figure 22 shows an analysis of the SASD, Colorado River, and Central Coast dace populations providing a more precise distinction between the regions.
In GenAlEx, pairwise geographic distance matrices were calculated for each data set upon input of geographic coordinates correlated to the individuals of each region. Geographic coordinates were triangulated to create a central point representing the region. Average geographic coordinates are listed in
Appendix B, Table 18.
Mantel tests analyzing the correlation of the level of genetic variation
(ΦPT), derived in the AMOVA, to average geographic distances of each region, are shown in Appendix A, Figures 18-21. In each analysis, for the three introns independently, as well as the concatenation of the three, the data shows little to no correlation between geographic distance and the level of genetic variation existing in each region. In the Mantel tests for the HPG, S7RP, and tri- concatenation, the slopes are nearly horizontal to slightly negative with R-values of 0.055, -0.183, and -0.382, respectively. In the case of the CNOT-1 analysis, the slope is evidently negative with an R-value of -0.4605.
Phylogeography
In every formulation, the Bayesian analyses inferred reciprocally monophyletic phylogenetic relationships between the SASD/Colorado clades and
50 the Central Coast/Eastern Sierra clades with 100% support for that branch node in the lineage. In every variation of the analyses, the haplotypes representing the individuals physically collected from a geographic location were analytically grouped into the genetic clade corresponding to their geographic location.
Analyses were run using individual sequence data as well as haplotype sequence data and all verify that the SASD and Colorado River dace are more closely related to each other than to either the Central Coast or Eastern Sierra dace. The SASD/Colorado branch hierarchy consensus among all phylogenetic trees (Appendix A, Figures 23-26) illustrates the likelihood that the Speckled dace colonized the Colorado River region first and then the subsequent divergence event lead to the Speckled dace that colonized Southern California, as proposed by Smith and Dowling (Smith & Dowling, 2008).
51
CHAPTER FOUR
DISCUSSION
Molecular Markers
My foray into identifying novel introns within the DNA of Rhinichthys osculus for the purposes of phylogenetic analyses were successful as evidenced by the amplification of sequence using multiple primer sets designed for fish within the same Cyprinidae family and substantiated by the comparable length of the PCR bands to those of Li et al (2010). Verification of this success is seen by not only the resulting PCR amplification of DNA segments, but consequently substantiated by the comparable length of the PCR bands to those of Li et al, the
GenBank BLAST of and subsequent alignment with, the R. osculus sequences, showing high identity values to the sequences uploaded to GenBank by Li et al.
Most importantly is the resulting data trends that provided concrete support of previous mitochondrial and microsatellite analyses, both of which are highly regarded as trusted sources of molecular data in phylogenetic studies. This is not to state that any intron sequence would be useful in these types of studies but the same can be said of some mtDNA sequences as well as microsatellite loci.
It’s important for intron markers to be located within conserved genes among and/or within species but conversely not be so conserved as to disallow any form of mutation events. The intron markers utilized in this study were composed of sufficient sequence consensus but also significant nucleotide site variability as to
52 allow for the identification of species as well as geographic population identification, to a degree. The polymorphic nucleotide site tables reveal several regional trends such as the 8 base gap that exists in every included sample of the Colorado River and SASD sequences of the HPG intron (Appendix C,
Dataset 4). Regional patterns of base substitutions appear in multiple sites, for example sites 449-454 of the CNOT1 intron alignment (Appendix C, Dataset
3a/b). These represent the types of mutations that take place over millions of years between isolated populations of a species.
Population Genetics
The aforementioned population structure identified by the intron data is solidly supported by the various forms of population genetic analyses performed in this study. When trying to answer the question of whether there is evidence of geographic isolation between the regional populations, we look for either the presence or absence of genetic patterns unique to one or each of the regions, in the form of indels and/or substitutions, that tell us whether the individuals of each regional population are showing any indication of migration between populations.
High migration rates (Nm) would signify there is still connectivity between the populations, and we should expect less sequence variation. Populations with smaller ranges, less ability to move between habitat locations, and/or highly fragmented habitats, would be expected to have low levels of gene flow, i.e. migration (Hastings & Harrison, 1994).
53
The analyses of molecular variance (AMOVA) completed for each of the individual intron sequence alignments, as well as the concatenated sequence alignment, strongly indicate there is no current migration and there has not been for a substantial amount of time. The overall regional PhiPT values for every intron data analysis exceed 0.90, and the inversely proportional Nm values all fall between 0.01-0.06. As one would expect of populations experiencing long periods of separation and genetic isolation, genetic variance is extremely high while migration is equally as low. Pairwise comparisons of these values vary slightly from the overall but maintain the same trend of high genetic variance and low migration. Equally supportive are the pie charts (Appendix A, Figures 10-13) illustrating that the vast majority of the genetic variance is accounted for by the differences in regions rather than the tributaries within the regions or the individuals within the tributaries. This also makes sense when considering the separation of the populations over time.
Principal Coordinates Analysis (PCoA) lends further support to the conclusion that these populations are distinct genetically, which correlates to their geographic distinction. In this case, geography is simply a mark of location rather than distance.
The concept of Isolation by Distance (IBD), whereby the justification for the level of genetic variance is explained by the measured distance between the populations, is often the explanation for genetic differentiation (Wright, 1938).
This is not the case here. The Mantel Tests (Appendix A, Figures 18-21) illustrate
54 that there is little to no correlation between the geographic distances between the regions to the level of genetic variance. This makes sense in light of the evolutionary trajectory hypothesized by Smith and Dowling. If we consider that the Santa Ana Speckled Dace came to occupy the Los Angeles basin via the
Colorado River in a divergence event estimated at approximately 1.9 Ma, this would mean that dace of the Colorado River would be genetically more closely related to the SASD yet distance-wise, they are much farther apart than the
SASD are from the Central Coast dace. Geographic distance table (Appendix B,
Table 18) shows the geographic distance between the SASD and Colorado River regions to be approximately 520 miles while the SASD region is separated from the Central Coast region by approximately 296 miles. Despite this, genetic variance between SASD and Colorado dace is lower than that between the
SASD and Central Coast dace. The geographic distance appears to have no influence on the level of variance. Rather, historical dispersal and biogeographic variables played a much larger role as described below.
Phylogeography
To further support the proposition that rather than proximity as a factor to explain genetic variance, but rather the hypothesized path of historical migration and occupation of the areas (Smith & Dowling, 2008), the Bayesian analyses clearly display evidence of the relationship between the Colorado River dace and the Santa Ana Speckled Dace. As previously stated, each form of the analyses resulted in a reciprocally monophyletic branching of the SASD from the Central
55
Coast/Eastern Sierra clade. Additionally, in each run the Colorado dace, 100% of the time, branched out with the SASD.
Hydrographic History of Connectivity
Many hypotheses have been put forth over the last century about the path taken by R. osculus to their current regional habitats. These hypotheses go hand-in-hand with the hypotheses regarding historic hydrographic connectivity. It would seem the most likely explanation for the dace to have taken up residence in southern California would be that Rhinichthys species, noted to have been present in the Snake River during the Pliocene (Smith & Dowling, 2008), were able to expand their range via the Snake River connection to the Lahontan and
Columbia River basins (Hubbs & Miller, 1948), to the Northern California Owens
Valley, then diverging west to the Central Coast area and southward to the Los
Angeles basin. However, my genetic data does not support this trajectory. If this were the case, genetic variance between these three regions would be much lower indicating more relatedness. Additionally, divergence estimation based on genetic data indicates that the Northern California, i.e. Owens Valley, dace divergence from the Bonneville and Columbia Basins occurred about the same time as the Bonneville Basin drainage into the Snake River leading to dace occupation in the Colorado River (Smith, Morgan, & Gustafson, 2000).
Alternative explanations include inferences from geologic evidence of connectivity between the Mojave River basin and the Colorado River. The
Mojave River headwaters start in the San Bernardino Mountains, part of the
56
Transverse Ranges, flowing east to the desert culminating in the Silver and Soda
Lakes (Williamson, 1853). The Mojave River Basin included Pleistocene lakes,
Harper Lake, Lake Manix, and Mojave Lake. Geologic evidence connects these lakes indicating they existed during the same time periods (Enzel, Wells, &
Lancaster, 2003). Additionally, it was proposed that there had been connectivity between the Mojave River and the lower Colorado River by way of a series of overflow events from Troy Lake into Bristol Lake, Cadiz Lake, Danby Lake, and finally into the Colorado River southern regions (Blackwelder, 1954). It is conceivable this could provide a pathway for Pleistocene Rhinichthys to migrate up through the series of lakes into the Mojave River main and eventually to the headwaters in the San Bernardino Mountains. Current dace are quite adept at navigating shallow river systems with the ability to move upstream over or around small barriers between pools (Moyle, 2002)(personal observation, 2010).
Collectively, these pieces of information lend support to Smith and
Dowling’s genetic evidence suggesting that the SASD and the Gila/Salt River dace experienced a divergence event approximately 1.9-1.7 Ma, subsequent to an earlier divergence event of the Upper and Lower Colorado River segments from the Pacific Northwest regions approximately 3.6 Ma, and back to the original appearance of Rhinichthys species in the Snake River and the consequent divergence of R. osculus from their sister species approximately 6.3 Ma.
57
Conservation Implications
“Biodiversity is the totality of all inherited variation in the life forms of Earth, of which we are one species. We study and save it to our great benefit. We ignore and degrade it to our great peril.” (E. O. Wilson, n.d.).
In light of the evidence described in this study, attained through the use of novel intron sequence analysis, and the evidence provided by my graduate student predecessors, Stacey Nerkowski, Jay VanMeter, and Pia VanMeter, I propose that the Rhinichthys osculus populations of Southern California, the
Santa Ana Speckled Dace, show more than sufficient genetic distinctness from the most proximal dace populations to be considered a unique taxon at the species level. Not only has it been shown that the SASD are reciprocally monophyletic to the Eastern Sierra and Central Coast populations based on mitochondrial DNA analyses, I have also shown the same based on nuclear intron DNA analyses.
The Santa Ana Speckled Dace are a unique taxon having experienced an extreme expanse of time of reproductive isolation amid seasonally chaotic environmental conditions. They are an anomaly in the evolutionary trajectory of the Rhinichthys species. The Santa Ana Speckled Dace should be afforded federal protection under the Endangered Species Act because populations are declining rapidly due to anthropogenic forces, increased drought, fires, and floods. We must not ignore the need to protect this small but important source of biodiversity endemic to the Southern California inland waters.
58
APPENDIX A
FIGURES
59
FIGURE 1: Geographic subdivisions of California outlining the CA Floristic Province from http://ucjeps.berkeley.edu/cguide.html#Map
60
FIGURE 2: California map of mountain ranges and valleys illustrating placement of the ranges included in this study, the Coast Ranges, Transverse Ranges, and Peninsular Ranges.
This image is in the public domain in the United States because it only contains materials that originally came from the United States Geological Survey, an agency of the United States Department of the Interior.
61
FIGURE 3: Rhinichthys osculus range throughout the western United States.
http://explorer.natureserve.org/servlet/NatureServe?searchName=Rhinichthys%20osculus
62
FIGURE 4: Rhinichthys osculus’ adult specimens. The laterally positioned individual best displays the signature speckled phenotype. Dorsal and lateral views illustrate its subterminal mouth.
63
FIGURE 5: Diagram B displays reciprocal monophyly for haplotype a and haplotype b while diagram A displays populations that are paraphyletic (Kiziriana & Donnelly, 2004).
64
FIGURE 6: Basic illustration of a gene with exon and intron segments color- coded through each step of mRNA transcription and processing. The intervening intron segments are the sources of the DNA used in this study.
https://www.britannica.com/science/transcription-genetics/media/602486/114928
65
FIGURE 7: Rhinichthys osculus range map throughout California. Sampling locations within California and Arizona are designated by red circles. 1: Eastern Sierra (Owens Valley), 2 & 3: Central Coast (Santa Maria and San Luis Obispo Rivers), 4, 5, & 6: southern California (San Gabriel, Santa Ana, and San Jacinto Rivers), 7: Colorado River (Grand Canyon), and 8: Colorado River (Sonoita Creek).
1
7
2 3
4 5 6
8
Map created on https://databasin.org.
66
FIGURE 8: Gel electrophoresis verification of band length for the hpg intron.
Locus 36298E1 ~350 bp
Ladder
1000
500
250
01/21/2014
FIGURE 9: Gel electrophoresis verification of band length for the cnot1 intron.
1000 750 500
250
67
Figure 10: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of cnot1 intron 20 sequence data.
% Molecular Variance-CNOT1 Intron 20 3%
97%
Among Regions Within Regions
Figure 11: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of HPG Intron 1 sequence data.
% Molecular Variance-HPG Intron 1
6%
94%
Among Regions Within Regions
68
Figure 12: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of S7RP Intron 1 sequence data.
% Molecular Variance - S7RP Intron 1
9%
91%
Among Regions Within Regions
Figure 13: Percentages of molecular variance within and among the four regions as a result of AMOVA analysis of the concatenated sequences of the three introns.
% Molecular Variance-Concatenation
3%
97%
Among Regions Within Regions
69
Figure 14: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of CNOT1 Intron 20 DNA sequence [4 Regions-45 Samples-14.2% polymorphic nucleotide sites (PNS)].
Principal Coordinates (PCoA) - CNOT1 Intron 20
COORD.2
COORD. 1
Central Coast Owens Valley CO River SASD
Figure 15: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of HPG Intron 1 DNA sequence (4 Regions-53 Samples-7.2% PNS).
Principal Coordinates (PCoA) - HPG Intron 1
COORD.2
COORD. 1
Central Coast Owens Valley CO River SASD
70
Figure 16: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of S7RP Intron 1 DNA sequence (4 Regions-40 Samples-9.1% PNS).
Principal Coordinates (PCoA) - S7RP Intron 1
COORD.2
COORD. 1
Central Coast Owens Valley CO River SASD
Figure 17: Principle Coordinates Analysis (PCoA) of genetic dissimilarity of three concatenated intron sequences (4 regions-34 Samples-11% PNS).
Principal Coordinates (PCoA) - 3 Intron Concatenation
COORD.2
COORD. 1
Central Coast Owens Valley CO River SASD
71
Figure 18: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for CNOT1 Intron 20 for four regions.
REGIONAL GGD X ΦPT - CNOT1 1.02 1.00 0.98 0.96 0.94 y = -0.0001x + 1.009 0.92 R² = 0.2121 0.90 R = -0.4605
0.88 REGIONAL PhiPT REGIONAL 0.86 0.84 0.82 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE
Figure 19: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for HPG Intron 1 for four regions.
REGIONAL GGD X ΦPT - HPG 1.20
1.00
0.80 y = 7E-05x + 0.75 0.60 R² = 0.003 R = 0.055
0.40 REGIONAL PhiPT REGIONAL
0.20
0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE
72
Figure 20: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for S7RP Intron 1 for four regions.
REGIONAL GGD X ΦPT - S7RP 1.00 0.90 0.80 0.70 0.60 y = -6E-05x + 0.8899 0.50 R² = 0.0335 0.40 R = -0.183
0.30 REGIONAL PhiPT REGIONAL 0.20 0.10 0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 GEOGRAPHIC DISTANCE
Figure 21: Mantel Test of Correlation between Genetic (ΦPT) and Geographic Regional Distances for three concatenated sequences for four regions.
REGIONAL ΦPT X GGD - CONCATENATION 1.20
1.00
0.80 PhiPT 0.60 y = -0.0001x + 1.0001 R² = 0.146 0.40
Regional Regional R = -0.382
0.20
0.00 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 Geographic Distance
73
Figure 22: PCoA of concatenated sequence data of two introns (cnot1, hpg) and three regions; CC.CA (n=7), CO.AZ (n=8), SASD (n=29). This analysis included all eight Colorado River samples.
Principal Coordinates (PCoA) - 3 Regions COORD.2
COORD. 1
CC.CA SASD CO.AZ
74
Figure 23: Phylogenetic tree of concatenated sequences (n=34) inferred using the MrBayes plugin within Geneious.
Sierra Sierra
Eastern Eastern (n=2) Region
Central Coast Coast Central (n=7) Region
SASD: Southern SASD: Southern California (n=23) Region
Region (n=2) Region CO River (AZ) River CO
s7rp
-
hpg
-
Concatenation of 3 intron sequences: sequences: of 3 intron Concatenation cnot1 Tree Bayesian Model: GTR+G & molitrix H. nobilis H. Outgroups:
Geneious 2019.1 created by Biomatters. Available Biomatters. Available by Geneious2019.1 created https://www.geneious.com from
75
Figure 24: Phylogenetic tree of CNOT1 Intron 20 (n=45) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table 19, Appendix B.
Eastern Sierra Sierra Eastern Watershed (n=2) Region
SASD: Southern California (n=28)Region
River (AZ) (AZ) River
Central Coast Coast Central Watershed (n=7) Region
CO CO Region (n=8) Region
Biomatters. Available Available Biomatters.
not 1 transcription complex subunit complex 1 not transcription
-
CCR4 20 Intron gene, Tree Bayesian Model: HKY nobilis H. molitrix, H. Outgroup:
Geneious 2019.1 created by by Geneious2019.1 created https://www.geneious.com from
76
Figure 25: Phylogenetic tree of HPG Intron 1 (n=53) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table 19, Appendix B.
Central Coast Coast Central Watershed (n=8) Region
Region (n=8) Region Eastern Sierra Sierra Eastern Watershed
SASD: Southern Region California (n=29)
CO River (AZ) River CO (n=8)* Region
*
←
s at nodes are bootstrap values bootstrap s are at nodes
’
Outgroup: H. molitrix, H. nobilis H. molitrix, H. Outgroup: Hypothetical Protein Gene, Intron 1 Gene, Protein Hypothetical Intron Bayesian Tree Model: HKY #
Geneious 2019.1 created by Biomatters. Available Biomatters. Available by Geneious2019.1 created https://www.geneious.com from
77
Figure 26: Phylogenetic tree of S7RP Intron 1 (n=40) with two outgroups, inferred using the MrBayes plugin within Geneious 2019. Best fit model of substitution according to the determined models for each sequence segment listed in Table
19, Appendix B.
thern
Coast Coast Watershed (n=8) Region
Central
Eastern Sierra Watershed (n=7) Region
Sou California Region (n=23)
SASD: Region (n=2) Region River CO (AZ)
1 S7 Ribosomal Protein Gene, Intron Gene, Intron Protein S7 Ribosomal Tree Bayesian Model: GTR+G molitrix/H. H. nobilis Outgroup:
Geneious 2019.1 created by Biomatters. Available from 2019.1 Biomatters. created by Geneious https://www.geneious.com
78
APPENDIX B
TABLES
79
TABLE 1: Rhinichthys osculus specimen information utilized in this study.
Mountain Successful Specimen Sample Watershed Range Tributary Sequencing ID Code Size (n) Headwaters Reactions San Jacinto San Jacinto Indian Creek IN 3 8 River CC City Creek 3 9 CF Mill Creek ML 3 3 San Bernardino Plunge Creek PC 3 7 Santa Ana River Twin Creek T 3 9 Cajon Creek CJ 3 9 Lytle Creek LC 3 9
Cattle Canyon CT 2 5 East Fork SGR E 1 3 San Gabriel San Gabriel Fish Canyon F 1 3 River North Fork SGR N 2 6
West Fork SGR G 2 3 Los Angeles Haine River H 2 6 River Marvin’s Marsh M 3 6 Owens River Sierra Nevada Pine Creek P 5 11 Brizziolari Creek BZ 1 3 San Luis Obispo Santa Lucia San Luis Obispo SLO 2 6 River Stenner Creek ST 1 3
Coast Range, Cuyama River CY 1 2 Los Padres Davy Brown D 1 3 Santa Maria National Forest Creek River Manzana Creek MZ 1 3 San Rafael Sisquoc River SS 1 3 Colorado River Sonoita Creek S 1 3 via Gila River Colorado River Colorado River Rocky Mtns CR 7 15 (Grand Canyon) 55 138
80
TABLE 2: Complete list of intron primer sequences tested on R. osculus samples in the Metcalf lab. Shaded rows signify the three intron primer sets chosen for this study.
Intron Gene Description Primer Sequences (5’-3’) Primer ID
UPF0027 protein F: GGAGATGGGYGTGGACTGGTCYCT 59107E2 homolog R: ATTGTAGATCTCVTCCACCACCTGRAT
F: ATGARGAAAATGAGGCCAACTTGCT 55378E1 Peroxisome proliferator R: GCCACCTGKGTATTGATTATAGCTGAG
F: CCTAGTGGACTGTARTAACGCCCCYCT 55305E1 Ret proto-oncogene R: AAGCCATCCAGTTTGCATAAACACTATC
Hypothetical protein F: GATCCTGAGGGAYTCCCAYGGTGT 36298E1 gene LOC415169 R: GGGCCAGGACTCTCYTGGTCTTGTAGT (hpg), intron 1
60S ribosomal protein F: GTACTCTCKGTACATGTTGTGRGTKCC 25073E1 L18a R: GAAGGTGAARAACTTTGGBATCTGG
F: CGGARGACTACGGACGTGATTTGAC 19231E4 Spectrin alpha 2 R: CTCCYTCCAGTGSTCCACAAACT
60S ribosomal protein F: CCACAARTACAAGGCCAAGAGRAACTG 14867E1 L8 R: GTTCTCCTTSTCCTGSACGGTCTT
Karyopherin (importin) F: GGAGGAGARTTYAAGAAGTAYCTGGACAT 8680E3 beta 1 R: CSCCCTTCAGGCCCTGGATGAT
CCR4-NOT trxn F: CTYTCGCTGGCTTTGTCTCAAATCA 4174E20 complex subunit 1 R: CTTTTACCATCKCCACTRAAATCCAC (cnot1), intron 20
F: AGGAGYTGGTGAACCAGAGCAAAGC 1777E4 Nucleoporin 155 R: AGATCRGCCTGAATSAGCCAGTT
Primers courtesy of (Li, Riethoven, & Ma, 2010)
S7 Ribosomal Protein F: TGGCCTCTTCCTTGGCCGTC S7RPEX1 (s7rp), intron 1 R: AACTGTCTGGCTTTTCGCC
Primers courtesy of (Chow & Hazama, 1998)
81
TABLE 3: Optimized PCR Protocols for EPIC primer sets.
Intron Loci 36298E1 4174E20 S7 ~bp length amplified ~350 ~750 ~1100 PCR Recipe (uL per reaction) Template DNA 1 1 1 10 uM Forward Primer 1 1 1 10 uM Reverse Primer 1 1 1 DreamTaq™ 0.25 0.3 0.3 10 mM premixed dNTPs 1 1 1 10x DreamTaq™ Buffer* 5 5 5
Sterile H2O 40.75 40.7 40.7
*10x Buffer contains 20 mM MgCl2 Amplification Protocol (on Eppendorf Mastercycler Gradient 5331) 94ºC 95ºC 95ºC Initial Denature 2 mins 3 mins 3-5 mins
94ºC 95ºC 95ºC Denature 45 sec 30 sec 30 sec 55ºC 60ºC 57ºC Anneal 45 sec 30 sec 60 sec 72ºC 72ºC 72ºC Elongate 2.5 mins 60 sec 2 mins # of Cycles 30 35 35
72ºC 72ºC 72ºC Final Extension 5 mins 5 mins 10 mins Cool Down 4ºC ∞ 4ºC ∞ 4ºC ∞
82
TABLE 4: Published Genbank Accession Numbers and base pair lengths of outgroup samples.
Intron Information [Gene Name, Accession #, Length (bp)] FISH SPECIES CCR4-NOT trxn Hypothetical protein S7 Ribosomal complex subunit LOC415169 protein HM0124881 HM0125621 AY3257782 H. molitrix 779 bp 342 bp 589 bp HM0124891 HM0125631 MH938839.13 H. nobilis 833 bp 345 bp 710 bp GU1342624 R. atratulus 815 bp
5 R. KF640208.1 cataractae* 791 bp 1(LI ET AL., 2010); 2(HE ET AL., 2008A); 3(Stepien et al.,2019); 4(Bufalino & Mayden, 2010); 5(Kim & Conway, 2014)
*Attempts were made to acquire specimens of R. cataractae from sources that have utilized this species in an effort to include a more closely related outgroup for all three intron markers, but none were accessible.
83
TABLE 5: Internal primers designed from alignments of subset of successful sequencing reactions. Primer number indicates the base location along the alignment, where the primer begins.
Internal Segment length Primer Sequences (5’-3’) Primer ID
S7 ribosomal protein gene (s7rp), Intron 1
S7RP-31F 544 bp F: TAGAGGTGAGTCTAGTGAATGTGCC S7RP-575R R: ACAGGTAAGCTAGGTGACATGC
S7RP-97F 627 bp F: TATTTACCTCCACGCATGAGCTTC S7RP-724R R: CCGTCAGGTCATAACATTACGCAC
S7RP-554F 246 bp F: GCATGTCACCTAGCTTACCTGT S7RP-800R R: TAASCCTCACTTTGCTCCAAACC
CCR4-NOT transcription complex subunit 1 gene (cnot1), Intron 20
Cnot-209F 442 bp F: CTACAGAGCCAGCCAGCAAG Cnot-651R R: CACTCTTGACACGACACACAAC
Cnot-630F 207 bp F: GTTGTGTGTCGTGTCAAGAGTG Cnot-837R R: CAGCGTAATAAATGCCGGTCTG
(LOC415169) Hypothetical protein gene (hpg), Intron 1
Hpg-156F* ~200 bp F: AAGGCTGTTGCTGTGAGGAAG
Hpg-201R* ~200 bp R: TTACCTTTCTGTTCCTTTCCAAGTG
*Due to the short length of the hpg intron, internal primers were paired with the complement primer of the original pair, i.e. 36298E1F/hpg-201R and hpg- 156F/36298E1R.
84
Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis nucleoporin 155 CJ5-1771124F-1777 gi|295821984|gb|HM012513.1| 76.09 184 3.00E-25 127 (nup155) gene, exons 29, 30 Cajon Creek Sample H. molitrix nucleoporin 155 PR4391H7014 #5; Primer Locus gi|295821982|gb|HM012512.1| 75.54 184 4.00E-24 123 (nup 155) gene, exons 29, 30 1777E4; PCR Date D. rerio nucleoporin 155 11_24_2014 gi|295821988|gb|HM012515.1| 81.43 70 9.00E-07 66.2 (nup155) gene, exons 29, 30 Reference Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis CCR4-NOT gi|295821936|gb|HM012489.1| transcription complex subunit 90.28 216 2.00E-69 274 E1-4741211F-4174 1 (cnot1) gene, exons 2, 3 Fish Creek Sample H. molitrix CCR4-NOT PRRZ3VR6014 #1; Primer Locus gi|295821934|gb|HM012488.1| transcription complex subunit 88.2 178 7.00E-49 206 4174E20; PCR Date 1 (cnot1) gene, exons 2, 3 12_11_2014 D. rerio CCR4-NOT gi|295821942|gb|HM012492.1| transcription complex subunit 85.71 175 4.00E-41 180 1 (cnot1) gene, exons 2, 3 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis ret proto-oncogene F1-505426F-55305 gi|295822014|gb|HM012528.1| 92.3 714 0 1014 (ret1) gene, exons 18, 19 Fish Creek Sample H. molitrix ret proto-oncogene PRT4KRF1015 #1; Primer Locus gi|295822012|gb|HM012527.1| 91.92 718 0 1005 (ret1) gene, exons 18, 19 55305E1; PCR Date D. rerio ret proto-oncogene 04_26_2014 gi|295822020|gb|HM012531.1| 89.71 243 1.00E-77 302 (ret1) gene, exons 18, 19 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value Ident(%) ScoreMax Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value H. nobilis nucleoporin 155 (%) Score CJ5-1771124F-1777 gi|295821984|gb|HM012513.1| 76.09 184 3.00E-25 127 gi|295822000|gb|HM012521.1| (nup155) gene, exons 29, 30 Cajon Creek Sample H. nobilis peroxisome 93 543 0 789 Range 1 H. molitrix nucleoporin 155 PR4391H7014 #5; Primer Locus gi|295821982|gb|HM012512.1| proliferator activated receptor 75.54 184 4.00E-24 123 gi|295822000|gb|HM012521.1| (nup 155) gene, exons 29, 30 1777E4; PCR Date gamma gene, exons 3, 4 79.16 451 2.00E-75 294 F1-578515F-55378 Range 2 D. rerio nucleoporin 155 TABLE 6: NCBI11_24_2014 Genbankgi|295821988|gb|HM012515.1| BLAST® score data for one forward81.43 sequence70 9.00E-07 66.2 Fish Creek Sample gi|295821998|gb|HM012520.1| (nup155) gene, exons 29, 30 representing each EPIC primer pair’s PCRH. molitrix amplification peroxisome of92.82 R. osculus543 0. The784 “% ReferencePRTGWG68014 #1; Primer Locus Range 1 proliferator activated receptor identity”Stephen F. Altschul, column55378E1; Thomas PCR L.represents DateMadden, gi|295821998|gb|HM012520.1| Alejandro the A. Schäffer, percentage Jinghui Zhang, Zheng of matching Zhang, Webb Miller, sequence. and David J. Lipman (1997), gamma gene, exons 3, 4 79.6 446 6.00E-80 309 "Gapped BLAST and PSI-BLAST:05_15_2014 a new generation ofRange protein 2 database search programs", Nucleic Acids Res. 25:3389-3402. D. rerio peroxisome Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value gi|295822004|gb|HM012523.1| proliferator activated receptor 86.33(%) 490 7.00E-144 Score521 gammaH. nobilis gene, CCR4-NOT exons 3, 4 gi|295821936|gb|HM012489.1| transcription complex subunit Ident90.28 216 2.00E-69 Max274 Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value E1-4741211F-4174 1 (cnot1) gene, exons 2, 3 (%) Score Fish Creek Sample H. H.nobilis molitrix spectrin CCR4-NOT alpha 2 T2-111216F-19231E gi|295822054|gb|HM012548.1| 87.74 261 1.00E-77 300 PRRZ3VR6014 #1; Primer Locus gi|295821934|gb|HM012488.1| transcription(spna2) gene, complex exons subunit38, 39 88.2 178 7.00E-49 206 Twin Creek Sample 4174E20; PCR Date H.1 (cnot1)molitrix gene, spectrin exons alpha 2, 3 2 PRUEJHGC014 #2; Primer Locus gi|295822052|gb|HM012547.1| 86.59 261 3.00E-73 285 12_11_2014 (spna2)D. rerio gene, CCR4-NOT exons 38, 39 19231E4; PCR Date gi|295821942|gb|HM012492.1| transcriptionD. rerio spectrin complex alpha subunit 2 85.71 175 4.00E-41 180 12_16_2013 gi|295822060|gb|HM012551.1| 79.69 261 1.00E-36 163 (spna2)1 (cnot1) gene, gene, exons exons 38, 2, 39 3 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H.H. nobilisnobilis hypotheticalret proto-oncogene protein T2-381216F-36298E1FF1-505426F-55305 gi|295822014|gb|HM012528.1|gi|295822084|gb|HM012563.1| 86.1492.3 714303 4.00E-820 1014315 (ret1)gene, gene, exons exons 4, 18,5 19 TwinFish Creek Creek Sample Sample H. molitrixH. molitrix ret proto-oncogenehypothetical PRUUBP13015PRT4KRF1015 #1;#2; Primer Locus gi|295822012|gb|HM012527.1|gi|295822082|gb|HM012562.1| 91.9285.95 718299 1.00E-810 1005313 (ret1)protein gene, gene, exons exons 18, 4, 19 5 55305E1;36298E1; PCR Date D.D. reriorerio hypotheticalret proto-oncogene protein 04_26_201412_16_2013 gi|295822020|gb|HM012531.1|gi|295822088|gb|HM012565.1| 89.7182.24 243304 1.00E-774.00E-62 302248 (ret1)gene, gene, exons exons 4, 18,5 19 Reference Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA(%) sequences", J Comput Biol Score2000; 7(1-2):203-14. gi|295822000|gb|HM012521.1| H. nobilis peroxisome 93 543 0 789 Reference - database indexing Range 1 proliferator activated receptor Aleksandr Morgulis, George Coulouris,gi|295822000|gb|HM012521.1| Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing gamma gene, exons 3, 4 79.16 451 2.00E-75 294 for Production MegaBLASTF1-578515F-55378 Searches", BioinformaticsRange 24:1757-1764. 2 Fish Creek Sample gi|295821998|gb|HM012520.1| H. molitrix peroxisome 92.82 543 0 784 PRTGWG68014 #1; Primer Locus Range 1 proliferator activated receptor 55378E1; PCR Date gi|295821998|gb|HM012520.1| gamma gene, exons 3, 4 79.6 446 6.00E-80 309 05_15_2014 Range 2 D. rerio peroxisome gi|295822004|gb|HM012523.1| proliferator activated receptor 86.33 490 7.00E-144 521 gamma gene, exons 3, 4 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis spectrin alpha 2 T2-111216F-19231E gi|295822054|gb|HM012548.1| 87.74 261 1.00E-77 300 (spna2) gene, exons 38, 39 Twin Creek Sample H. molitrix spectrin alpha 2 PRUEJHGC014 #2; Primer Locus gi|295822052|gb|HM012547.1| 86.59 261 3.00E-73 285 (spna2) gene, exons 38, 39 19231E4; PCR Date D. rerio spectrin alpha 2 12_16_2013 gi|295822060|gb|HM012551.1| 79.69 261 1.00E-36 163 (spna2) gene, exons 38, 39 Ident Max Request ID R. osculus ID Sequence ID|Accession # Description Ident E Value (%) Score H. nobilis hypothetical protein T2-381216F-36298E1F gi|295822084|gb|HM012563.1| 86.14 303 4.00E-82 315 gene, exons 4, 5 Twin Creek Sample H. molitrix hypothetical PRUUBP13015 #2; Primer Locus gi|295822082|gb|HM012562.1| 85.95 299 1.00E-81 313 protein gene, exons 4, 5 36298E1; PCR Date D.85 rerio hypothetical protein 12_16_2013 gi|295822088|gb|HM012565.1| 82.24 304 4.00E-62 248 gene, exons 4, 5 Reference Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Reference - database indexing Aleksandr Morgulis, George Coulouris, Yan Raytselis, Thomas L. Madden, Richa Agarwala, Alejandro A. Schäffer (2008), "Database Indexing for Production MegaBLAST Searches", Bioinformatics 24:1757-1764. TABLE 7: Summary of nucleotide statistics. ccr4-not1 Nucleotide transcription Hypothetical S7 ribosomal Concatenated Statistics complex protein gene protein gene Sequences subunit 1 Intron # 20 1 1 NA Final Aligned 657 bp 292 507 1493 Length # Sequences 45 53 40 34
Identical Sites 564 (85.8%) 271 (92.8%) 461 (90.9%) 1340 (89.8%) Pairwise 97.4% 98.0% 97.6% 97.5% Identity
A 8326 (29.7%)* 4197 (27.7%)* 6550 (33.0%) 14866 (30.4%)
C 5347 (19.1%) 3178 (21.0%) 2703 (13.6%) 8688 (17.8%)
G 5959 (21.2%) 3706 (24.4%) 4020 (20.3%) 10502 (21.5%)
T 8432 (30.0%) 4079 (26.9%) 6544 (33.0%) 14844 (30.4%)
Base Frequency Base Amb. 1 (0.0%)* 4 (0.0%) 12 (0.0%) 9 (0.0%)
GC Content 11306 (40.3%) 6884 (45.4%) 6723 (33.9%) 19190 (39.2%)
All Bases 28065 15164 19829 48909
# of gaps 1500 (5.1%) 312 (2.0%) 451 (2.2%) 1853 (3.7%)
*% of non-gaps
Table 8: Transition/Transversion data culled from analysis of the concatenated sequences of all three introns using the HKY+gamma model of substitution as determined to be the best fit model for the concatenated sequences.
From\To A T C G A - 0.092305898 0.054021801 0.084158004 T 0.092436475 - 0.069621476 0.065301215 C 0.092436475 0.118960728 - 0.065301215 G 0.119129012 0.092305898 0.054021801 -
Transitions (Ti) 0.39186922 Transversions (Tv) 0.60813078
Ti/Tv 0.644383138
86
TABLE 9: Summary of haplotype statistics. Number of Individuals per Haplotype Haplotype # CNOT1 Intron 20 HPG Intron 1 S7RP Intron 1 1 1 1 1 2 3 1 1 3 3 1 1 4 1 1 1 5 26 2 7 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 1 10 5 5 1 11 1 1 1 12 1 34 1 13 1 1 14 1 2 15 1 8 16 2 17 8 18 1 Total 45 53 40 Individuals Unique 8 12 13 Haplotypes* *Unique haplotypes are those of which only one individual displayed that haplotype.
87
12
S1
11
ST1
10
D1
SL3
BZ1
SS1
MZ1
9
SL2
8
P2
7
P4
6
T10
T6
N1 N2 H2
G1
IN4
CJ5
LC1
CF1
PC4
LC19
PC10
5
T2 F1
E1
H1
IN2 IN3
CJ1
CT1 CT2
ML1 PC7
SpecimenHaplotype by ID Code
CJ12
LC11
CC11
4
CC2
3
CR26 CR27 CR35
2
CR25 CR29 CR36
1
CR28
CO CO
Los
San San Pop San Ana San
Luis
River
Maria
Santa Santa
Creek
Valley
Owens
Jacinto Obispo
Gabriel
Sonoita
Angeles
TABLE 10: Haplotype table based on the alignment of 45 sequences of CNOT1 Intronof 20. ofbasedCNOT1 sequences 45 alignment the on table Haplotype 10: TABLE
CO CO
Cnot1Intron 20
River
Coast
Sierra
SASD
Region
Central
Eastern
88
15 CR25
14
P5
13
CR35
T2 T6
N1 N2 H2
G1
IN4
T10
PC7 ML1 ML3 PC4
PC10
CR29 CR36
12
F1
E1
H1
IN2 IN3
CJ1 CJ5
LC1
CF1 CT1 CT2
CC2
CJ12
LC11 LC19
CC11 CR26 CR27 CR28
11
S1
SL2 SL3
10
D1
BZ1
SS1
9
MZ1
8
CY3
7
ST1
6
M2
Specimen ID by Haplotype Code Haplotype by ID Specimen
P2
5
M1
4
P4
3
M3
2
P1
1
P3
CO CO
Los Los
Pop San Ana San San
Luis Luis
River
Maria
Santa Santa Santa
Creek
Valley
Owens Owens
Jacinto Obispo
Gabriel
Sonoita Sonoita
Angeles
. River CO Coast Central
TABLE 11: Haplotype table based on the alignment of 53 sequences of Intron 1. ofbasedHPG sequences 53 alignment the on table Haplotype 11: TABLE SASD
Hpg Intron 1 Intron Hpg
Reg
Sierra
Eastern Eastern
89
18
N2
17
T2 F1
E1
H1 H2
G1
T10
CJ12
16
N1
CT2
15
T6
CJ1 CJ5
LC1
PC7
LC11 LC19
CC11
14
IN2 IN4
13
CC2
12
CF1
11
S1
10
CR27
9
M1
8
P3
Specimen ID by Haplotype Code
7
P2
6
MZ1
5
D1
SL2 SL3
BZ1 ST1
SS1
CY3
4
P1
3
M3
2
M2
1
P4
1
CO
Los
Pop San Ana San San
Luis Luis
River
Maria
Santa Santa
Creek
Valley
Owens
Jacinto Obispo
Gabriel
Sonoita
Angeles
ntron
I
River
SASD Coast Central
CO CO
S7RP
Sierra
Region
Eastern Eastern
TABLE 12: Haplotype table based on the alignment of 40 sequences of S7RP Intron 1.of of based Intron S7RP sequences 40 alignment the on table Haplotype 12: TABLE
90
TABLE 13: Basic AMOVA statistics for all three intron haplotype analyses as well as the concatenation of the three introns. *AR=among regions; WR=within regions.
No. of 8 Populations:
No. of Regions: 4
No. of No. of PW 999 999 permutations: permutations:
cnot1 hpg s7rp concat No. of 45 53 40 34 Samples:
PhiPT: 0.966 0.941 0.906 0.966
P-value: 0.001 0.001 0.001 0.001
Nm (haploid): 0.017 0.031 0.052 0.017
Degrees AR* 3 3 3 3 of Freedom WR* 41 49 36 30 Sum of AR 345.064 138.13 39.891 571.344 Squares WR 19.625 12.625 19.036 35.45
Mean AR 115.021 46.043 13.297 190.448 Squares WR 0.479 0.258 0.529 1.182
Est AR 13.757 4.099 1.606 33.869 Variance WR 0.479 0.258 0.529 1.182 AR 97% 94% 75% 97% % WR 3% 6% 25% 3%
91
TABLE 14: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of CNOT1 Intron 20.
Pairwise Population PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.001 0.001 Central Coast Eastern Sierra 0.978 0.000 0.023 0.002 Eastern Sierra CO River, AZ 0.929 0.943 0.000 0.001 CO River, AZ SASD 0.994 0.996 0.842 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.
Pairwise Population Nm (Haploid) Values Based on PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.011 0.000 Eastern Sierra CO River, AZ 0.038 0.030 0.000 CO River, AZ SASD 0.003 0.002 0.094 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
Nm (Haploid) Values below diagonal.
92
TABLE 15: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of HPG Intron 1.
Pairwise Population PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.001 0.001 Central Coast Eastern Sierra 0.689 0.000 0.001 0.001 Eastern Sierra CO River, AZ 0.930 0.895 0.000 0.007 CO River, AZ SASD 0.986 0.971 0.245 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.
Pairwise Population Nm (Haploid) Values Based on PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.226 0.000 Eastern Sierra CO River, AZ 0.038 0.059 0.000 CO River, AZ SASD 0.007 0.015 1.537 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
Nm (Haploid) Values below diagonal.
93
TABLE 16: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of S7RP Intron 1.
Pairwise Population PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.001 0.021 0.001 Central Coast Eastern Sierra 0.752 0.000 0.032 0.001 Eastern Sierra CO River, AZ 0.882 0.784 0.000 0.001 CO River, AZ SASD 0.946 0.927 0.853 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.
Pairwise Population Nm (Haploid) Values Based on PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.165 0.000 Eastern Sierra CO River, AZ 0.067 0.138 0.000 CO River, AZ SASD 0.029 0.039 0.086 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
Nm (Haploid) Values below diagonal.
94
TABLE 17: Regional pairwise ΦPT values, Nm, and corresponding P-values based on analysis of the haplotype differences of the concatenation of the three sequences.
Pairwise Population PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 0.010 0.070 0.001 Central Coast Eastern Sierra 0.944 0.000 0.337 0.005 Eastern Sierra CO River, AZ 0.928 0.871 0.000 0.001 CO River, AZ SASD 0.979 0.986 0.878 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
PhiPT Values below diagonal. Probability, P (rand >= data) based on 999 permutations is shown above diagonal.
Pairwise Population Nm (Haploid) Values Based on PhiPT Values
Central Coast Eastern Sierra CO River, AZ SASD Central Coast 0.000 Central Coast Eastern Sierra 0.030 0.000 Eastern Sierra CO River, AZ 0.039 0.074 0.000 CO River, AZ SASD 0.011 0.007 0.070 0.000 SASD Central Coast Eastern Sierra CO River, AZ SASD
Nm (Haploid) Values below diagonal.
TABLE 18: Regional Pairwise Geographic Distances
Central Coast Eastern Sierra CO River, AZ SASD 0.000 Central Coast 304.999 0.000 Eastern Sierra 803.161 742.536 0.000 CO River, AZ 295.846 391.023 520.180 0.000 SASD
95
Freq G Freq
0.21473
0.208366 0.241336 0.196126
Freq C Freq
0.17764
0.190634 0.224151 0.151898
Freq T Freq
0.303529 0.302254 0.255017 0.336975
Freq Freq A
0.303958 0.298681 0.279326 0.314876
ecular Genetics Evolutionary across Analysis
R
0.615463 1.030044 2.141557 1.008816
Yano
-
n/a n/a
Kishino
-
1549.
1.10855
Gamma
-
0.307297
AICc
3924.48 3861.24
5228.552 1691.164
BIC
4630.65
5844.187 4715.712 2553.407
InformationCriterion corrected; BIC: Bayesian InformationCriterion [1]
70 95 92
111
#Param
HKY HKY
Model
HKY+G
GTR+G Transition/Transversion R: G: Bias; Gamma distribution AICc:Akaike
hpg hpg
s7rp s7rp
cnot1
Intron Intron 1 Intron 1
Marker
Intron Intron 20
Table19: Summary of maximum likelihood fits of nucleotide substitution models determined by MEGA X [2]. MEGA X by modelsdetermined fitsof maximumof likelihood nucleotide Table19:substitution Summary
2. Kumar 2. Li G., Stecher Knyaz C., M., S., andTamura (2018). K. X: MolMEGA Concatenation Abbreviations:GTR: General Time Reversible; HKY:Hasegawa Nei 1. M. and Kumar (2000). S. Molecular Evolution andPhylogenetics. Oxford University Press,New York. computingplatforms. Molecular Biology andEvolution 35:1547
96
APPENDIX C
SEQUENCE DATA
97
DATA SET 1: R. osculus sequence alignment consensus for intron 20 of the ccr4-not transcription complex subunit 1 gene (cnot1)
DATA SET 2: R. osculus sequence alignment consensus for intron 1 of the hypothetical protein gene.
98
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
0 5
4
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
9
4
4
-
2
2
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1
2
4
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
8
9
3
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
, base base ,
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
5
8
3
A
A
A
A
A
A
A
A
A
8
7
3
20
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
4
6
3
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
2
6
3
T
9
5
3
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
ntron ntron
I
T
T
4
5
3
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
3
5
3
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
8
4
3
7
4
3
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
G
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
3
0
3
G
G
C
C
C
C
C
C
C
CNOT1
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
2
7
2
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
1
7
2
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
0
7
2
A
A
0
6
2
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
7
5
2
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1
4
2
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
2
3
2
C
C
C
C
C
C
C
C
C
-
-
-
-
-
-
-
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
9
2
2
-
-
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
8
2
2
-
-
-
-
-
-
-
-
-
7
2
2
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
-
-
-
-
-
-
-
-
-
6
2
2
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
-
-
-
-
-
-
-
-
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
5
2
2
-
-
-
-
-
-
-
-
-
4
2
2
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
-
3
2
2
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
9
1
2
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
9
9
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
0
9
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
4
8
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
3
7
1
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
2
7
1
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
1
7
1
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
0
7
1
-
9
6
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
5
6
1
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
3
6
1
G
R
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
6
5
1
C
C
C
C
C
C
C
eotide sites among the individual sequences of the of sequences among individual eotidesites
A
A
A
A
A
A
A
A
A
6
4
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
8
2
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
6
2
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
C
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
A
A
A
A
A
A
A
9
1
1
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
9
8
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
8
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
450.
-
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
Owens
Owens
Region
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
48
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
T2
T6
P4
P2
S1
E1
F1
H1
H2
N1
N2
D1
G1
IN4
IN3
IN2
T10
ST1
SL2
SL3
BZ1
CJ5
CJ1
LC1
CT1
CT2
SS1
MZ1
ML1
PC4
PC7
CF1
CC2
CJ12
LC19
LC11
CR35
CR26
CR27
CR25
CR36
CR29
CR28
PC10
CC11
Dataset 3a: Polymorphic nucl Dataset3a: Polymorphic positions Specimen ID Specimen
99
T
T
T
T
T
T
T
T
T
5
0
6
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
9
4
5
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C
C
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
8 4
5
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
7 4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
5
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
4
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
3
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
2
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
1
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
0
4
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
9
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
8
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
7
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
6
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
5
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
4
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
3
3
5
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
2
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
1
3
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0
3
5
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
9
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
8
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
7
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
6
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
5
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
4
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
3
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
2
2
5
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
1
2
5
605
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
0
2
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
9
1
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
8
1
5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
7
1
5
C
C
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
6
1
5
C
C
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
5
1
5
-
-
-
-
-
-
-
-
-
0
9
4
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
-
-
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
9
8
4
-
-
-
-
-
-
-
-
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
8
8
4
-
-
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
7
8
4
-
-
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
6
8
4
-
-
-
-
-
-
-
-
-
5
8
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
-
-
-
-
-
-
-
-
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
4
8
4
5
6
4
C
C
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
4
5
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
2
5
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1
5
4
G
G
G
G
G
G
G
G
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
Owens
Owens
Region
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
CO River CO
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
T2
T6
P4
P2
S1
E1
F1
H1
H2
N1
N2
D1
G1
IN4
IN3
IN2
T10
ST1
SL2
SL3
BZ1
CJ5
CJ1
LC1
CT1
CT2
SS1
MZ1
ML1
PC4
PC7
CF1
CC2
CJ12
LC19
LC11
CR35
CR26
CR27
CR25
CR36
CR29
CR28
PC10
CC11
DATASET3b: Intron CNOT1 20, positionsBase 451 Specimen ID Specimen
100
DATA SET 4: Polymorphic nucleotide sites among the individual sequences of hpg intron 1, base positions 36-281. 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 7 0 5 6 6 7 8 8 8 8 8 8 8 9 0 0 1 4 4 8 Specimen ID Region 6 6 5 8 1 4 5 3 4 5 6 7 8 9 0 1 4 7 4 5 1 CR25 CO River G T G T A G Y ------G C C T T T CR35 CO River G C G T A G T ------G C C T T T S1 CO River G C G T A G C ------G A C T T T CR26 CO River G C G T A G C ------G C C T T T CR28 CO River G C G T A G C ------G C C T T T CR29 CO River G C G T A G C ------G C C T T T CR36 CO River G C G T A G C ------G C C T T T CC2 SASD G C G T A G C ------G C C T T T CC11 SASD G C G T A G C ------G C C T T T CJ1 SASD G C G T A G C ------G C C T T T CJ5 SASD G C G T A G C ------G C C T T T CJ12 SASD G C G T A G C ------G C C T T T CT1 SASD G C G T A G C ------G C C T T T CT2 SASD G C G T A G C ------G C C T T T F1 SASD G C G T A G C ------G C C T T T G1 SASD G C G T A G C ------G C C T T T H1 SASD G C G T A G C ------G C C T T T H2 SASD G C G T A G C ------G C C T T T IN2 SASD G C G T A G C ------G C C T T T IN4 SASD G C G T A G C ------G C C T T T LC1 SASD G C G T A G C ------G C C T T T LC11 SASD G C G T A G C ------G C C T T T LC19 SASD G C G T A G C ------G C C T T T ML1 SASD G C G T A G C ------G C C T T T ML3 SASD G C G T A G C ------G C C T T T N1 SASD G C G T A G C ------G C C T T T N2 SASD G C G T A G C ------G C C T T T PC4 SASD G C G T A G C ------G C C T T T PC7 SASD G C G T A G C ------G C C T T T PC10 SASD G C G T A G C ------G C C T T T T2 SASD G C G T A G C ------G C C T T T T6 SASD G C G T A G C ------G C C T T T T10 SASD G C G T A G C ------G C C T T T CR27 SASD G C G T A G C ------G C C T T T CF1 SASD G C G T A G C ------G C C T T T E1 SASD G C G T A G C ------G C C T T T IN3 SASD G C G T A G C ------G C C T T T ST1 Central Coast G C G K A A C A G A T T T C T A C T - T T BZ1 Central Coast G C G T A A C A G A T T T C T A C T - T T D1 Central Coast G C G T A A C A G A T T T C T A C T - T T SL2 Central Coast G C G T A A C A G A T T T C T A C T - T T SL3 Central Coast G C G T A A C A G A T T T C T A C T - T T SS1 Central Coast G C G T A A C A G A T T T C T A C T - T T CY3 Central Coast G C G G A A C A G A T T T C T A C T - T T MZ1 Central Coast G C G G A A C A G A T T T C T A C T - T - P3 Eastern Sierra A C A T A A C A G A T T T C T G C G - Y T P4 Eastern Sierra G C A T A A C A G A T T T C T G C G - C T P5 Eastern Sierra G C G T G A C A G A T T T C T G C G - T T M2 Eastern Sierra G C A T G A C A G A T T T C T G C G - T T M3 Eastern Sierra G C A T A A C A G A T T T C T G C G - Y T P1 Eastern Sierra G C A T A A C A G A T T T C T G C G C T T M1 Eastern Sierra G C A T A A C A G A T T T C T G C G - T T P2 Eastern Sierra G C A T A A C A G A T T T C T G C G - T T
101
-
-
-
-
-
-
-
3
9
4
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C
C
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
2 9
4
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1
9
4
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
6
7
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
8
6
4
A
7
5
4
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
5
5
4
W
W
W
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
1
5
4
A
6
4
4
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
1
4
4
G
G
G
G
G
K
G
G
A
A
A
A
A
A
A
A
A
A
A
5
3
4
G
G
R
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
4
3
4
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
3
3
4
C
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
2
3
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
0
3
4
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
8
2
4
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
6
2
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
T
2
2
4
T
T
1
1
4
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
3
0
4
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
6
9
3
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
5
5
3
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
T
5
3
3
A
A
A
A
A
A
A
A
A
7
8
2
C
C
C
C
C
C
C
C
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
8
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
7
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
6
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
5
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
4
7
2
G
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
3
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
2
7
2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
1
7
2
C
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
A
0
7
2
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
0
5
2
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
T
T
T
T
T
T
T
T
5
3
2
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
T
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
1
0
2
W
W
6
9
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
6
7
1
G
G
G
G
G
G
G
T
6
6
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
4
6
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
M
C
C
C
1
5
1
C
C
G
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
T
T
T
T
8
2
1
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
K
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
2
9
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
8
A
A
3
R
G
R
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
8
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
SASD
493.
Region
CO River CO
CO River CO
-
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Central Coast Central
Eastern Sierra Eastern
Eastern Sierra Eastern
Eastern Sierra Eastern
Eastern Sierra Eastern
Eastern Sierra Eastern
Eastern Sierra Eastern
Eastern Sierra Eastern
T2
T6
P4
P3
P2
P1
H2
H1
N2
N1
F1
E1
S1
M3
M2
M1
D1
G1
IN4
IN2
T10
ST1
SL3
SL2
BZ1
CT2
LC1
CJ5
CJ1
SS1
MZ1
CY3
PC7
CF1
CC2
LC19
LC11
CJ12
CC11
CR27
positions 8 positions DATA SET 5: Polymorphic nucleotide sites among the individual sequences of s7rp intron 1, base intron s7rp of sequences among individual the 5:sites DATAnucleotide SETPolymorphic Specimen ID Specimen
102
APPENDIX D
INPUT FILES
103
GenAlEx and MEGA FASTA Input Files:
Cnot1 Intron 20 >SASD_CC2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAAAACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_IN2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_CF1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_CT2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC
104
TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_LC11_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_IN3_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_CT1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_T10_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTTCAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT
105
ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT
>SASD_CC11_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_LC1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_PC7_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_PC4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA
106
CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_N2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_CJ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_LC19_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>SASD_T6_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC
107
TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_PC10_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_N1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_CJ12_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_CJ5_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC
108
TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_IN4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_T2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_ML1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_H2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC
109
TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_H1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_G1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>SASD_F1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT
>SASD_E1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCATACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATTATTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCATCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC
110
TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>COR_S1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA CACTATCCACACTATTAATTGCAAGTGATGAAATAAGATTTGT-----CAAAACCAACCTATCTGC AAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTCAAA AAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATTCCATTTAAATATCG TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGT—TGTGTCGTGTCAAGAGTGA TGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACCTGG AATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCA CAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTT ATTACGCTGTCT
>COR_CR28_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TACAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGG-AAAATCTTTGTACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>COR_CR29_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT
>COR_CR36_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC- AAAATCTTTGGACATACCATTTAAATATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGA TGTTATTGTTGTGTGTCGTGTCAAGAGTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAA CATGGCCGTAGTTATCAGTTTTTAACCTGGAATATTAAGT------GAACATTG
111
TTAGTTCCATCAGTGAACTAAAGGTACTCACAATTTGATGCATCACTTTCTCAGATTGTGAAT CGTCACGGCCCTGAGGCAGACCGGCATTTATTACGCTGTCT
>COR_CR25_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTACTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTA CTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGC ATTTATTACGCTGTCT
>COR_CR27_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAAT ATCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGA GTGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAAC CTGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGT ACTCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGG CATTTATTACGCTGTCT
>COR_CR26_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTAC TCACAATTTGATGCATCACTTTCTCAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCA TTTATTACGCTGTCT
>COR_CR35_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTGTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGTAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GACTATCCACACTATTAATTGCAAGTGATGAAATAAGATATGTCAAAACAAAACCAACCTATC TGCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACCCACGTACGTGTATGTGATGAGGCTTTC AAAAAGGATGCAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCTATCGAGTTG CAGTGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACATACCATTTAAATA TCGTATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAG TGATGCAAAAAATGTGATTTTTGCTTGCAGTCTGAACATGGCCGTAGTTATCAGTTTTTAACC TGGAATATTAAGT------
112
GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAATTTGATGCATCACTTTCTCAGA TTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTACGCTGTCT
>CC_BZ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_D1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_MZ1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_SL3_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAA
113
TTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_SS1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGATTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_SL2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGRTTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>CC_ST1_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGCAATAAGGTTTGTCAAAACAAAACCAACCTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGAGGAGGCTTTCAAA AAGAATGCAGGAAAATATCTGGATCATCTCGCTCTCCAAATAAGCGCCCCATCGAGTTGCAG TGCAGAGCTCCTCCGTCGCACAGGAAAATGGC-AAAATCTTTGGACTTACCATTTAAATATCA TATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGA TGCAAAATTGTTTATTTTTGCTTGCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATA TTAAGT------GAACATTGTTAGTTCCATCAGTGAACTAAAGGTACTCACAAT TTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGACCGGCATTTATTA CGCTGTCT
>ORV_P2_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACTTATCT GCAAGGGTTGCTTTGGTAAAGATGTAGGCGTAC------ACGCGTATGTGATGAGGCTTTCAAAA AGAATACAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCGATCGAGTTGCAGT GCAGAGCTCCTCCGTCGCACAGGAAAATGGCAAAAAATTTTGGACTTACCATTTAAATATCAT ATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGAT GCAAAATTGTTTATTTTTGCTTCCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATAT
114
TAAGTTCCAATTGTTATTAAGTTGTTATAATATTAAGTTAATTGTTAGTTCCATCAGTGAACTAA AGGTACTCACAATTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGA CCGGCATTTATTACGCTGTCT
>ORV_P4_cnot1 TACAGAGCCAGCCAGCAAGAAATACAACATGTGAGTGCATTAGTGTTTTGGTGTCATAAATA CACTTATGCTATGTTCAGACTGCAGGCAAATTGGATTTTTTCCTCAAATCAGATCTACAGGCA GATTATCCACACTATTAATTACAAGTGATGAAATAAGATTTGTCAAAACAAAACCAACTTATCT GCAAGGGTTACTTTGGTAAAGATGTAGGCGTACC------CGCGTATGTGATGAGGCTTTCAAAA AGAATACAGGAAAATATTTGGATCATCTCGCTCTCCAAATAAGCGCCCGATCGAGTTGCAGT GCAGAGCTCCTCCGTCGCACAGGAAAATGGCAAAAAATTTTGGACTTACCATTTAAATATCAT ATGAATACTGGCGTTGTCGTCACCAGTTTTGATGTTATTGTTGTGTGTCGTGTCAAGAGTGAT GCAAAATTATTTATTTTTGCTTCCAGTCTGAACATGGCCGT------AGTTTTTAACCTGGAATAT TAAGTTCCAATTGTTATTAAGTTGTTATAATATTAAGTTAATTGTTAGTTCCATCAGTGAACTAA AGGTACTCACAATTTGATGCATCACTTTCTTAGATTGTGAATCGTCACGGCCCTGAGGCAGA CCGGCATTTATTACGCTGTCT
Hpg Intron 1 >COR_CR25_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATTTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATYATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_CR35_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATTATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_S1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTAAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_CR26_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_CR28_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
115
>COR_CR29_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_CR36_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CC2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CC11_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CJ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CJ5_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CJ12_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CT1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG
116
AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CT2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_F1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_G1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_H1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_H2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_IN2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_IN4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
117
>SASD_LC1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_LC11_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_LC19_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_ML1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_ML3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_N1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_N2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_PC4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG
118
AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_PC7_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_PC10_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_T2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_T6_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_T10_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>COR_CR27_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_CF1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
119
>SASD_E1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>SASD_IN3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACGCATGCAAGATCATTTCGG------AGAAAAATCTGTTCAGTGCACTTCAACAGTTTTCTCACCATTGTCACTTGTTTTTGCAGGACA AGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_ST1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATKAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_BZ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_D1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_SL2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_SL3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_SS1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG
120
AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_CY3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATGAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>CC_MZ1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATGAAAACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTATTCAGTGCACTTCAATAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGG-TGAGAGCAGAA
>ORV_P3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAAATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-YGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_P4_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-CGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_P5_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAGAAGGCTGTTGCTGTGAGG AAGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAGACACATGCAAGATCATTTCGGAGA TTTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_M2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAGACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_M3_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-YGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
121
>ORV_P1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTTCTGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_M1_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
>ORV_P2_hpg GGTGTTGCTCAGGTGCGTTTTGTCACTGGCAACAAGATCCTTAGGATCCTGAAGTCCAAAGG CCTGGCCCCTGATCTGCCTGAGGATCTCTACCACCTTATCAAAAAGGCTGTTGCTGTGAGGA AGCACTTGGAAAGGAACAGAAAGGTAATGAAATTAAAACACATGCAAGATCATTTCGGAGAT TTCTAGAAAAATCTGTTCAGTGCACTTCAAGAGTTTTCTCACCATTGTCACTTGTTT-TGCAG GACAAGGATGCTAAGTTCCGCCTGATTCTGGTTGAGAGCAGAA
S7rp Intron 1 >COR_CR27_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTAACTTGTAATAACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAA TCAATGCTAACGGCATGCTAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTA GCCGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAATTTTAAATTAGA-GATTAATA ATTCGATT-AAATGCCGAGACATTTTGGTTAATGTAKTAAATTATGGTTTCTTATGAATAGCAT G
>COR_S1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCAAT GCTAACGGCATGCTAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGCCG CCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAATTTTAAATTAGA-GATTAATAATTCG ATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_IN2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTATGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
122
>SASD_IN4_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTATGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CF1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AKGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAAT TCGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CC2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTMTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CC11_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_PC7_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_T6_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC
123
AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_T2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_T10_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CJ1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CJ5_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CJ12_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA
124
ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_LC1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_LC11_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_LC19_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGT ATATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_CT2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTWATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_E1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
125
>SASD_F1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_G1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_N1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTWATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_N2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTTATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTAT ATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>SASD_H1_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
126
>SASD_H2_S7 AAGGGATCTCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCTAGTATGTACGAAAATGGATGGCTATTTTAGAAAC ATGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCCA TGTGGTCTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACGTA TATTTGTGATAGAATTA------AACTTGTAGTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCAAAGAACACACTTGGAGTTAGTAGAAAATGTTTTTCTTAACGGTAGC CGCCTAGCCGGTGAATTACTTGGAACAGCCGTAGTTAAAAATTTAAATTAGA-GATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATGTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_CY3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_D1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_MZ1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATCTAATTAAKAATT CGATT-AAAWGACGAGACATTTAGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_SS1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_BZ1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC
127
AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_SL2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_SL3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>CC_ST1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTATTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTATTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTACTGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAAGAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGTTTCTTATGAATAGCATG
>ORV_P1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCATTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCTGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAWGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_P2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA
128
ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAATAATT AGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_P3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATAATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_P4_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCRTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATRATTAATAATT CGATTAAAAWGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_M1_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCGTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGGTAGCAGACGTGTCTTTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCAAT GCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCCG CCTAGCCGGTGACTTACTTAGAACAGCTGTAGTTAAAA-TTTGATTAATATAATTAATAATT CGATT-AAATGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_M2_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCRTTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAAGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
>ORV_M3_S7 AAGGGATATCAAGTTAAAATGTAGAAAATAAGTAATATTTACCTCCACGCATGAGCTTCATAT GTTGTTTTCACTGAGATGCATTAGATGCGAGTATGTACGAAAATGGATGGCTATTTTAGAAAC AGGAATTTATAAATGTAAGATGTGCTAGCAGACGTGTCTGTACTACAGTGTTACTGCGGCCC ATGTGGTGTTCTAATATGCCCGAAAATGCCTCTATTAAGTAAAGTACTATATATTGAAGACTTA TATTTGTGATAGAATTA------AACTTGTAATGTAAATGTAGTGATTCTGTGCTAGCTAATCA ATGCTAACGGCATGCTAAGAACACACTTGGAGTTACTAGAAAATGTTTTTCTTAACGGTAGCC GCCTAGCCGGTGACTTACTTAGAACAGCCGTAGTTAAAA-TTTGATTAATATGATTAATAATT CGATT-AAAAGCCGAGACATTTTGGTTAATTTATTAAATTATGGT---TTATGAATAGCATG
129
REFERENCES
Anderson, R. C., Fralish, J. S., & Baskin, J. M. (2007). Savannas, Barrens, and
Rock Outcrop Plant Communities of North America. Cambridge University
Press.
Avise, J. C. (2009). Phylogeography: Retrospect and prospect. Journal of
Biogeography, 36(1), 3–15.
Avise, J. C., & Ferguson, M. M. (1995). Molecular markers, natural history and
evolution. Systematic Biology, 44(1), 117–119.
Avise, J. C., Giblin-Davidson, C., Laerm, J., Patton, J. C., & Lansman, R. A.
(1979). Mitochondrial DNA clones and matriarchal phylogeny within and
among geographic populations of the pocket gopher, Geomys pinetis.
Proceedings of the National Academy of Sciences, 76(12), 6694–6698.
Avise, J. C., & Ph.D, D. P. E. & E. B. J. C. A. (2000). Phylogeography: The
History and Formation of Species. Harvard University Press.
Baker, C. S., Perry, A., Bannister, J. L., Weinrich, M. T., Abernethy, R. B.,
Calambokidis, J., … Vasquez, O. (1993). Abundant mitochondrial DNA
variation and world-wide population structure in humpback whales.
Proceedings of the National Academy of Sciences, 90(17), 8239–8243.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances.
1763. M.D. Computing: Computers in Medical Practice, 8(3), 157–171.
130
Benson, T. A. (2006). Population Genetics and Phylogeography of the Pygmy
Nuthatch in Southern California (M.S. Thesis). California State University,
San Bernardino.
Berget, S. M., Moore, C., & Sharp, P. A. (1977). Spliced segments at the 5′
terminus of adenovirus 2 late mRNA. Proceedings of the National
Academy of Sciences, 74(8), 3171–3175.
Bernatchez, L., & Wilson, C. C. (1998). Comparative Phylogeography of Nearctic
and Palearctic fishes. Molecular Ecology, 7(4), 431–452.
Birky, C. W., Maruyama, T., & Fuerst, P. (1983). An approach to population and
evolutionary genetic theory for genes in mitochondria and chloroplasts,
and some results. Genetics, 103(3), 513–527.
Blackwelder, E. (1954). Pleistocene lakes and drainage in the Mojave region,
southern California. Geology of Southern California: California Division of
Mines Bulletin, (170), 35–40.
Blyton, M. D. J., & Flanagan, N. S. (2012). A Comprehensive Guide to: GenAlEx
6.5. Retrieved from Australian National University website:
http://biology.anu.edu.au/GenAlEx
Bogenhagen, D., & Clayton, D. A. (1974). The number of mitochondrial
deoxyribonucleic acid genomes in mouse L and human HeLa cells
Quantitative isolation of mitochondrial deoxyribonucleic acid. Journal of
Biological Chemistry, 249(24), 7991–7995.
Brooks, M. (1974). Blazing Saddles [Comedy Western]. Warner Bros.
131
Brown, W. M., George, M., & Wilson, A. C. (1979). Rapid evolution of animal
mitochondrial DNA. Proceedings of the National Academy of Sciences,
76(4), 1967–1971.
Bruno, M. C., Casciotta, J. R., Almirón, A. E., Riccillo, F. L., & Lizarralde, M. S.
(2015). Quaternary refugia and secondary contact in the southern
boundary of the Brazilian subregion: Comparative phylogeography of
freshwater fish.
Bufalino, A. P., & Mayden, R. L. (2010a). Molecular phylogenetics of North
American phoxinins (Actinopterygii: Cypriniformes: Leuciscidae) based on
RAG1 and S7 nuclear DNA sequence data. Molecular Phylogenetics and
Evolution, 55(1), 274–283.
Bufalino, A. P., & Mayden, R. L. (2010b). Phylogenetic relationships of North
American phoxinins (Actinopterygii: Cypriniformes: Leuciscidae) as
inferred from S7 nuclear DNA sequences. Molecular Phylogenetics and
Evolution, 55(1), 143–152.
Burbrink, F. T., Yao, H., Ingrasci, M., Bryson, R. W., Guiher, T. J., & Ruane, S.
(2011). Speciation at the Mogollon Rim in the Arizona Mountain
Kingsnake (Lampropeltis pyromelana). Molecular Phylogenetics and
Evolution, 60(3), 445–454. https://doi.org/10.1016/j.ympev.2011.05.009
California Department of Fish and Wildlife. (2015). Natural Diversity Database,
Special Animals List. California Department of Fish and Wildlife.
132
Calsbeek, R., Thompson, J. N., & Richardson, J. E. (2003). Patterns of molecular
evolution and diversification in a biodiversity hotspot: The California
Floristic Province. Molecular Ecology, 12(4), 1021–1029.
Campbell, N. A., & Reece, J. B. (2005). Biology. 7th. Pearson Education.
Chatzimanolis, S., & Caterino, M. S. (2007). Toward a better understanding of
the “Transverse Range Break”: Lineage diversification in southern
California. Evolution: International Journal of Organic Evolution, 61(9),
2127–2141.
Chen, W.-J., Miya, M., Saitoh, K., & Mayden, R. L. (2008). Phylogenetic utility of
two existing and four novel nuclear gene loci in reconstructing Tree of Life
of ray-finned fishes: The order Cypriniformes (Ostariophysi) as a case
study. Gene, 423(2), 125–134.
Chenowith, K., & Menzel, I. (2003). "For Good" [Musical Theatre]. Decca
Broadway.
Chow, S., & Hazama, K. (1998). Universal PCR primers for S7 ribosomal protein
gene introns in fish. Molecular Ecology, 7(9), 1255–1256.
Cody, M. L. (1986). Diversity, rarity, and conservation in mediterranean-climate
regions. Retrieved from http://agris.fao.org/agris-
search/search.do?recordID=US880692188
Coyne, J. A., & Orr, H. A. (2004). Speciation. Sinauer Associates.
Darwin, C. (1859). On the origin of species by means of natural selection. 1859.
Murray, London, 502.
133
Enzel, Y., Wells, S. G., & Lancaster, N. (2003). Paleoenvironments and
Paleohydrology of the Mojave and Southern Great Basin Deserts.
Geological Society of America.
Excoffier, L., Smouse, P. E., & Quattro, J. M. (1992). Analysis of Molecular
Variance Inferred from Metric Distances among DNA Haplotypes:
Application to Human Mitochondrial DNA Restriction Data. Genetics,
131(2), 479–491.
Falah, M., & Gupta, R. S. (1994). Cloning of the hsp70 (dnaK) genes from
Rhizobium meliloti and Pseudomonas cepacia: Phylogenetic analyses of
mitochondrial origin based on a highly conserved protein sequence.
Journal of Bacteriology, 176(24), 7748–7753.
Franklin, R. E., & Gosling, R. G. (1953). Molecular configuration in sodium
thymonucleate. Nature, 171(4356), 740.
Gold, A. (1978). "Thank You for Being a Friend" [Song]. Asylum Records.
Grossman, G. D., Hill, J., & Petty, J. T. (1995). Observations on habitat structure,
population regulation, and habitat use with respect to evolutionarily
significant units: A landscape perspective for lotic systems. American
Fisheries Society Symposium, 17, 381–391.
Grossman, L. I., Watson, R., & Vinograd, J. (1973). The presence of
ribonucleotides in mature closed-circular mitochondrial DNA. Proceedings
of the National Academy of Sciences, 70(12), 3339–3343.
134
Hastings, A., & Harrison, S. (1994). Metapopulation Dynamics and Genetics.
Annual Review of Ecology and Systematics, 25, 167–188. Retrieved from
JSTOR.
He, S., Mayden, R. L., Wang, X., Wang, W., Tang, K. L., Chen, W.-J., & Chen, Y.
(2008a). Molecular phylogenetics of the family Cyprinidae (Actinopterygii:
Cypriniformes) as evidenced by sequence variation in the first intron of S7
ribosomal protein-coding gene: Further evidence from a nuclear gene of
the systematic chaos in the family. Molecular Phylogenetics and Evolution,
46(3), 818–829. https://doi.org/10.1016/j.ympev.2007.06.001
He, S., Mayden, R. L., Wang, X., Wang, W., Tang, K. L., Chen, W.-J., & Chen, Y.
(2008b). Molecular phylogenetics of the family Cyprinidae (Actinopterygii:
Cypriniformes) as evidenced by sequence variation in the first intron of S7
ribosomal protein-coding gene: Further evidence from a nuclear gene of
the systematic chaos in the family. Molecular Phylogenetics and Evolution,
46(3), 818–829.
Hoekzema, K., & Sidlauskas, B. L. (2014). Molecular phylogenetics and
microsatellite analysis reveal cryptic species of speckled dace
(Cyprinidae: Rhinichthys osculus) in Oregon’s Great Basin. Molecular
Phylogenetics and Evolution, 77, 238–250.
https://doi.org/10.1016/j.ympev.2014.04.027
135
Hollingsworth, P. R., Jr., & Hulsey, C. D. (2011). Reconciling gene trees of
eastern North American minnows. Molecular Phylogenetics and Evolution,
61(1), 149–156.
Houston, D. D., Shiozawa, D. K., & Riddle, B. R. (2010). Phylogenetic
relationships of the western North American cyprinid genus
Richardsonius, with an overview of phylogeographic structure. Molecular
Phylogenetics and Evolution, 55, 259–273.
https://doi.org/10.1016/j.ympev.2009.10.017
Hubbs, C. L., & Miller, R. R. (1948). The zoological evidence: Correlation
between fish distribution and hydrographic history in the desert basins of
western United States. University of Utah.
Huelsenbeck, J. P., & Ronquist, F. (2001). MRBAYES: Bayesian inference of
phylogenetic trees. Bioinformatics, 17(8), 754–755.
Illumina. (2015). An Introduction to Next-Generation Sequencing Technology.
Retrieved April 17, 2019, from
https://www.bing.com/search?q=An%20Introduction%20to%20Next-
Generation%20Sequencing%20Technology.%20Retrieved%20from%20Ill
umina&qs=n&form=QBRE&sp=-1&pq=google%20scholar&sc=8-
14&sk=&cvid=BE1119908D6A4135B1519F850BEB4BB9
Kim, D., & Conway, K. W. (2014). Phylogeography of Rhinichthys cataractae
(Teleostei: Cyprinidae): pre-glacial colonization across the Continental
Divide and Pleistocene diversification within the Rio Grande drainage.
136
Biological Journal of the Linnean Society, 111(2), 317–333.
https://doi.org/10.1111/bij.12209
Kinniburgh, A. J., Mertz, J. E., & Ross, J. (1978). The precursor of mouse β-
globin messenger RNA contains two intervening RNA sequences. Cell,
14(3), 681–693.
Kizirian, D., & Donnelly, M. A. (2004). The criterion of reciprocal monophyly and
classification of nested diversity at the species level. Molecular
Phylogenetics and Evolution, 32(3), 1072–1076.
Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X:
Molecular Evolutionary Genetics Analysis across Computing Platforms.
Molecular Biology and Evolution, 35(6), 1547–1549.
https://doi.org/10.1093/molbev/msy096
Lee, D. S., Gilbert, C. R., Hocutt, C. H., Jenkins, R. E., McAllister, D. E., &
Stauffer Jr, J. R. (1980). Atlas of North American freshwater fishes. North
Carolina State Museum of Natural History.
Li, C., Riethoven, J.-J. M., & Ma, L. (2010). Exon-primed intron-crossing (EPIC)
markers for non-model teleost fishes. BMC Evolutionary Biology, 10(1),
90.
Lunt, D. H., & Hyman, B. C. (1997). Animal mitochondrial DNA recombination.
Nature, 387(6630), 247.
Mantel, N. (1967). The detection of disease clustering and a generalized
regression approach. Cancer Research, 27(2 Part 1), 209–220.
137
Mayden, R. L., & Allen, J. (2015). Phylogeography of Pteronotropis signipinnis,
P. euryzonus, and the P. hypselopterus Complex (Teleostei:
Cypriniformes), with Comments on Diversity and History of the Gulf and
Atlantic Coastal Streams. BioMed Research International, 2015.
Mendel, G. (1865). Experiments in plant hybridization. Verhandlungen Des
Naturforschenden Vereins Brünn.
Mills, R. (1993, September 14). Win Big [Television]. In Animaniacs. FOX
Network.
Mooi, R. (2009). Evolution, Second Edition. Douglas J. Futuyma. Integrative and
Comparative Biology, 49, 722–723. https://doi.org/10.1093/icb/icp095
Morcillo, F., Ornelas-García, C. P., Alcaraz, L., Matamoros, W. A., & Doadrio, I.
(2016). Phylogenetic relationships and evolutionary history of the
Mesoamerican endemic freshwater fish family Profundulidae
(Cyprinodontiformes: Actinopterygii). Molecular Phylogenetics and
Evolution, 94, 242–251. https://doi.org/10.1016/j.ympev.2015.09.002
Moritz, C. (1994). Defining ‘evolutionarily significant units’ for conservation.
Trends in Ecology & Evolution, 9(10), 373–375.
Moyer, G. R., Remington, R. K., & Turner, T. F. (2009). Incongruent gene trees,
complex evolutionary processes, and the phylogeny of a group of North
American minnows (Hybognathus Agassiz 1855). Molecular Phylogenetics
and Evolution, 50(3), 514–525.
138
Moyle, P. B. (2002). Inland fishes of California: Revised and expanded. Univ of
California Press.
Moyle, P. B., Williams, J. E., & Wikramanayake, E. D. (1989). Fish species of
special concern of California. Sacramento: California Department of Fish
and Game.
Mullis, K. B. (1987). Process for amplifying nucleic acid sequences.
Mussmann, S. M. (2018). Diversification Across a Dynamic Landscape:
Phylogeography and Riverscape Genetics of Speckled Dace (Rhinichthys
osculus) in Western North America (PhD Thesis).
Myers, E. A., Rodríguez-Robles, J. A., DeNardo, D. F., Staub, R. E., Stropoli, A.,
Ruane, S., & Burbrink, F. T. (2013). Multilocus phylogeographic
assessment of the California Mountain Kingsnake (Lampropeltis zonata)
suggests alternative patterns of diversification for the California Floristic
Province. Molecular Ecology, 22(21), 5418–5429.
https://doi.org/10.1111/mec.12478
Myers, N., Mittermeier, R. A., Mittermeier, C. G., Da Fonseca, G. A., & Kent, J.
(2000). Biodiversity hotspots for conservation priorities. Nature,
403(6772), 853.
Naora, H., & Deacon, N. J. (1982). Relationship between the total size of exons
and introns in protein-coding genes of higher eukaryotes. Proceedings of
the National Academy of Sciences, 79(20), 6196–6200.
139
National Marine Fisheries Society. (1991). Policy on Applying the Definition of
Species Under the Endangered Species Act to Pacific Salmon. Federal
Register, 56(224), 58612–58618.
Nei, M., & Kumar, S. (2000). Molecular Evolution and Phylogenetics. Oxford
University Press.
Nerkowski, S. A. (2015). Microsatellite Analysis of Population Structure in the
Santa Ana Speckled Dace (Rhinichthys osculus) (M.S. Thesis).
Oakey, D. D., Douglas, M. E., & Douglas, M. R. (2004). Small fish in a large
landscape: Diversification of Rhinichthys osculus (Cyprinidae) in western
North America. Copeia, 2004(2), 207–221.
Palumbi, S. R., & Baker, C. S. (1994). Contrasting population structure from
nuclear intron sequences and mtDNA of humpback whales. Molecular
Biology and Evolution, 11(3), 426–435.
Peakall, R., & Smouse, P. E. (2006). GENALEX 6: Genetic analysis in Excel.
Population genetic software for teaching and research. Molecular Ecology
Notes, 6(1), 288–295.
Peakall, R., & Smouse, P. E. (2012). GENALEX 6.5: Genetic analysis in Excel.
Population genetic software for teaching and research-an update.
Bioinformatics, 28(19), 2537–2539.
Phillipsen, I. C., & Metcalf, A. E. (2009). Phylogeography of a stream-dwelling
frog (Pseudacris cadaverina) in southern California. Molecular
Phylogenetics and Evolution, 53(1), 152–170.
140
Rabinowitz, M., & Swift, H. (1970). Mitochondrial nucleic acids and their relation
to the biogenesis of mitochondria. Physiological Reviews, 50(3), 376–427.
Raymond, S. (1962). A convenient apparatus for vertical gel electrophoresis.
Clinical Chemistry, 8(5), 455–470.
Rodríguez-Robles, J. A., Denardo, D. F., & Staub, R. E. (1999). Phylogeography
of the California Mountain Kingsnake, Lampropeltis zonata (Colubridae).
Molecular Ecology, 8(11), 1923–1934.
Sanger, F., Nicklen, S., & Coulson, A. R. (1977). DNA sequencing with chain-
terminating inhibitors. Proceedings of the National Academy of Sciences,
74(12), 5463–5467.
Santa Ana Watershed Project Authority. (2004). Old, Grand Prix, and Padua
Fires (October, 2003) Burn Impacts to Water Systems and Resources.
U.S. National Forest Service.
Schaffer, J. P. (1993). California’s geological history and changing landscapes.
The Jepson Manual: Higher Plants of California, 49–54.
Schoenherr, A. A. (2017). A Natural History of California: Second Edition. Univ of
California Press.
Schultz, S. S., & Wallace, R. E. (2013). The San Andreas Fault. USGS.
Slade, R. W., Moritz, C., & Heideman, A. (1994). Multiple nuclear-gene
phylogenies: Application to pinnipeds and comparison with a mitochondrial
DNA gene phylogeny. Molecular Biology and Evolution, 11(3), 341–356.
https://doi.org/10.1093/oxfordjournals.molbev.a040117
141
Smith, G. R., & Dowling, T. E. (2008). Correlating hydrographic events and
divergence times of speckled dace (Rhinichthys: Teleostei: Cyprinidae) in
the Colorado River drainage. SPECIAL PAPERS-GEOLOGICAL
SOCIETY OF AMERICA, 439, 301.
Smith, G. R., Morgan, N., & Gustafson, E. (2000). Fishes of the Mio-Pliocene
Ringold Formation, Washington: Pliocene capture of the snake river by the
Columbia River.
Smithies, O. (1955). Zone electrophoresis in starch gels: Group variations in the
serum proteins of normal human adults. Biochemical Journal, 61(4), 629.
Smithies, O. (1959a). An improved procedure for starch-gel electrophoresis:
Further variations in the serum proteins of normal individuals. Biochemical
Journal, 71(3), 585.
Smithies, O. (1959b). Zone electrophoresis in starch gels and its application to
studies of serum proteins. In Advances in protein chemistry (Vol. 14, pp.
65–113). Elsevier.
Spellman, G. M., Riddle, B., & Klicka, J. (2007). Phylogeography of the mountain
chickadee (Poecile gambeli): Diversification, introgression, and expansion
in response to Quaternary climate change. Molecular Ecology, 16(5),
1055–1068. https://doi.org/10.1111/j.1365-294X.2007.03199.x
Stepien, C. A., Snyder, M. R., & Elz, A. E. (2019). Invasion genetics of the silver
carp Hypophthalmichthys molitrix across North America: Differentiation of
142
fronts, introgression, and eDNA metabarcode detection. PLOS ONE,
14(3), e0203012. https://doi.org/10.1371/journal.pone.0203012
Swift, C. C., Haglund, T. R., Ruiz, M., & Fisher, R. N. (1993). The status and
distribution of the freshwater fishes of southern California. Bulletin of the
Southern California Academy of Sciences, 92(3), 101–167.
Taylor, E. B., McPhail, J. D., & Ruskey, J. A. (2015). Phylogeography of the
longnose dace (Rhinichthys cataractae) species group in northwestern
North America—The Nooksack dace problem. Canadian Journal of
Zoology, 93(10), 727–734. https://doi.org/10.1139/cjz-2015-0014
Torgerson, W. S. (1958). Theory and methods of scaling.
U.S. Congress. Endangered Species Act. , (1973).
U.S. Congress. Endangered Species Act Amendments. , § 3 (1978).
US EPA. (2015, March 17). Surf Your Watershed [Overviews and Factsheets].
Retrieved April 17, 2019, from US EPA website:
https://www.epa.gov/waterdata/surf-your-watershed
U.S. Fish and Wildlife Service. (1996). Interagency Policy Regarding the
Recognition of Distinct Vertebrate Population Segments Under the ESA.
Federal Register, 61, 4722.
Vandergast, A. G., Bohonak, A. J., Weissman, D. B., & Fisher, R. N. (2006).
Understanding the genetic effects of recent habitat fragmentation in the
context of evolutionary history: Phylogeography and landscape genetics of
a southern California endemic Jerusalem cricket (Orthoptera:
143
Stenopelmatidae: Stenopelmatus). Molecular Ecology, 16(5), 977–992.
https://doi.org/10.1111/j.1365-294X.2006.03216.x
VanMeter, J. J. (2017). The Santa Ana Speckled Dace (Rhinichthys osculus):
Phylogeography and Molecular Evolution of the Mitochondrial DNA
Control Region (M.S. Thesis). California State University, San Bernardino.
VanMeter, P. M. (2017). Molecular Evolution and Phylogeography of
Mitochondrial DNA Cytochrome B Gene in Southern California Santa Ana
Speckled Dace (Rhinichthys osculus) (M.S. Thesis). California State
University, San Bernardino.
Waples, R. S. (1991). Pacific salmon, Oncorhynchus spp., and the definition of"
species" under the Endangered Species Act. Marine Fisheries Review,
53(3), 11–22.
Williamson, R. S., Lieutenant. (1853). Report of Explorations in California for
Railroad Routes to Connect With the Routes Near the 35th and 32D
Parallels of North Latitude. Corps of Topographical Engineers.
Wilson, A. C., Cann, R. L., Carr, S. M., George, M., Gyllensten, U. B., Helm-
Bychowski, K. M., … Stoneking, M. (1985). Mitochondrial DNA and two
perspectives on evolutionary genetics. Biological Journal of the Linnean
Society, 26(4), 375–400. https://doi.org/10.1111/j.1095-
8312.1985.tb02048.x
Wilson, E. O. (n.d.). E.O. Wilson Biodiversity Foundation. Retrieved July 18,
2019, from https://eowilsonfoundation.org/portfolio/e-o-wilson/
144
Wright, S. (1938). Size of population and breeding structure in relation to
evolution. Science, 87, 430–431.
145