UNIVERSITY OF CALIFORNIA, MERCED

Diversification, speciation, and phylogeography of freshwater sculpins (, Cottopsis) in California

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy

in

Quantitative and Systems Biology

by

Jason David Baumsteiger

Chair: Professor Michael Dawson Professor Andres Aguilar Professor Andrew Kinziger Professor Peter Moyle

2013

Copyright Jason David Baumsteiger, 2013 All rights reserved

The Dissertation of Jason David Baumsteiger is approved, and is acceptable in quality and form for publication on microfilm and electronically:

Advisor: Professor Andres Aguilar

Professor Andrew P. Kinziger

Professor Peter B. Moyle

Chair: Professor Michael N. Dawson

Date

University of California, Merced

2013

iii

DEDICATION

I would like to dedicate this dissertation to my parents, who taught me the strength and fortitude necessary to complete a task as arduous as this dissertation. From their perseverance came my success.

iv

TABLE OF CONTENTS

Signature Page ...... iii

Dedication ...... iv

Table of Contents ...... v

List of Abbreviations ...... vii

List of Tables ...... viii

List of Figures ...... x

Acknowledgements ...... xii

Vita...... xiv

Abstract ...... xvii

Chapter 1: Introduction ...... 1

1.1. Freshwater Sculpin ...... 1 1.2. Speciation and Hybridization ...... 2 1.3. Cryptic and Incipient Species ...... 2 1.4. Phylogeography ...... 2 1.5. Molecular Marker Development ...... 3 1.6. Objective ...... 4

Chapter 2: Life history and biogeographic diversification of an endemic Western North American freshwater clade using a comparative species tree approach...... 5

2.1. Abstract ...... 5 2.2. Introduction ...... 5 2.3. Materials and Methods ...... 7 2.4. Results ...... 11 2.5. Discussion ...... 14 2.6. Tables and Figures ...... 18 2.7. Supplemental Material ...... 26

Chapter 3: Complex phylogeography and historical hybridization in two closely related sister taxa of freshwater sculpin ...... 42

3.1. Abstract ...... 42 3.2. Introduction ...... 42 3.3. Materials and Methods ...... 44 3.4. Results ...... 47 3.5. Discussion ...... 50 3.6. Tables and Figures ...... 56 3.7. Supplemental Material ...... 64

v

Chapter 4: Crypsis and potential parapatric incipient speciation in a species of freshwater sculpin, Cottus asper ...... 74

4.1. Abstract ...... 74 4.2. Introduction ...... 74 4.3. Materials and Methods ...... 76 4.4. Results ...... 78 4.5. Discussion ...... 81 4.6. Tables and Figures ...... 86 4.7. Supplemental Material ...... 98

Chapter 5: Synthesis and Future Directions ...... 103

References…………………………………………………………………………………. 107

vi

LIST OF ABBREVIATIONS

ABI Applied Biosystems Incorporated AIC Akaike Information Criterion AICC Akaike Information Criterion, corrected AMOVA Analysis of Molecular Variance BAY San Jose Region BSA Bovine Serum Albumin CA California CO1 Cytochrome Oxidase 1 CR Control Region or D-loop CYTB Cytochrome B DNA Deoxyribonucleic acid ESS Effective Sample Size G Gamma Distribution HKY Hasegawa, Kishino, and Yano Model HPD Highest Posterior Density HSU Humboldt State University I Invariable sites IBD Isolation by Distance ILS Incomplete Lineage Sorting K Clusters LLR Log Likelihood Ratio MCMC Markov Chain Monte Carlo MDC Minimize Deep Coalescence ML Maximum Likelihood mtDNA Mitochondrial DNA MYA Million Years Ago NCBI National Center for Biotechnology Information nDNA Nuclear DNA NMT Nuclear and Mitochondrial Sequences NNI Nearest Neighbor Interchange NUC Nuclear sequence only OR Oregon PCR Polymerase Chain Reaction PSRF Potential Scale Reduction Factors RM River Mile SNP Single Nucleotide Polymorphism SPR Subtree Pruning and Regrafting TrN Tamura-Nei Model UCSC University of California, Santa Cruz UCLA University of California, Los Angeles µSAT Microsatellite WA Washington

vii

LIST OF TABLES

Table 2.1: Samples collected for molecular systematic analysis, showing identification number, number of samples and locations ...... 18

Table 2.2: Maker information for each nuclear (Nuc) or mitochondrial (mt) sequence used to generate individual gene trees or concatenated tree (Nmt) ...... 19

Table 3.1: Sampling locations and numbers of individuals covering known ranges of each species……………………………………………………………………………………… 56

Table 3.2: Descriptive statistics, by location, for three nuclear (508,517,520), one mitochondrial (CB), and six microsatellites (µsats) loci. Number of individuals (N), substitutions (S), nucleotide diversity (π), number of haplotypes (H), haplotype diversity (Hd), allele number corrected for rarefaction (AR), expected heterozygosity (HE), observed heterozygosity (HO), and inbreeding coefficient (FIS). *outside HWE (p < 0.01) ...... 57

Table 4.1: List of sampling locations, by name, and numbers sampled. Microsatellite number (usat #) refers to numbering used for locations in all microsatellite analyses……. 86

Table 4.2: Overall table of individuals used for either mitochondrial (cytb) or microsatellites analyses, showing general measures of diversity for each marker type. Number of individuals (N), substitutions (S), nucleotide diversity (π), number of haplotypes (H), haplotype diversity

(Hd), allele number corrected for rarefaction (AR), expected heterozygosity (HE), observed

heterozygosity (HO), and inbreeding coefficient (FIS). * outside HWE (p < 0.01)…………. 87

Table 4.3A: Population diversity measures for unique haplotypes of cytb sequence data. Intermediate locations (Napa and Suisun Bay) included with Inland group..……………… 88

Table 4.3B: Measure of DNA divergence between groups using the average number nucleotide substitutions per site for 962 base pairs of mitochondrial cytochrome b in DNASP. Intermediate locations (Napa and Suisun Bay) are included with inland grouping…………………………………………………………….……………………… 88

Table 4.4A: Average pairwise FST estimates between coastal, , intermediate, and inland groups…...... 89

Table 4.4B: Summary statistics from 11 microsatellites are shown, separated by region (rarefaction adjusted values according to lowest sample size, Clear Lake - 27). N = number

of individuals, A = number of alleles, AR = adjustment for rarefaction, pAR = proportion of

private alleles, HE = expected heterozygosity, HO = observed heterozygosity, FIS = inbreeding coefficient. * indicates values out of HWE (p < 0.01)……………………………………. 89

viii

Table 4.5: Results from two separate analyses concerning hybridization and comparative assignments of coastal individuals (A or B) by marker type. Accuracy reflects individual

agreement between haplotype and genotype, not just overall numbers. F1 = First generation cross, F2 = F1 x F1, BCC = backcross of F1 x coastal parent, and BCI = backcross of F1 x inland parent……………………………………………………………………………….. 90

ix

LIST OF FIGURES

Figure 2.1: Contemporary distributions of 10 species of freshwater sculpin within the Cottopsis clade……………………...... 20

Figure 2.2: Species delimitation using 83 SNP’s in STRUCTURE (Pritchard et al. 2000); with two individuals per population. Three initial clusters are shown followed by additional structuring within each initial cluster……………………………………………………… 21

Figure 2.3: Concatenated mitochondrial only (A), nuclear only (B) and nuclear/mitochondrial combined (C) phylogenetic trees for sculpin species within the Cottopsis clade. Unmarked nodes represent > 0.9 & > 75% Bayesian and maximum likelihood support, respectively, whereas marked nodes are as shown..……………………………………………………... 22

Figure 2.4: MetaTree showing topological differences between species tree methods (numbers on trees match nodes on MetaTree). All species trees were generated using nuclear and mitochondrial (Nmt) sequence gene trees. Species trees for internal nodes 4 and 9 also shown…………………………………………………………………………. 23

Figure 2.5: 50% majority rule consensus trees of all species trees generated for the nuclear/mitochondrial (Nmt) dataset. Five distinct clusters/clades are identified in the consensus tree. The outgroup C. bairdi is not shown. Numerical values on nodes are for reference to Supplemental Table 2B. Green indicates amphidromous and blue indicates fluvial/lacustrine (freshwater resident) life history modes. Black dashed lines indicate uncertainty in the ancestral node for the group…………………………………………….. 24

Figure 2.6: Hybridization network showing potential gene flow between known and putative species or lineages for two different sets of Bayesian gene trees. Nuc refers to only gene trees from the nine nuclear markers whereas Nmt refers to those same nine loci plus the addition of two mitochondrial markers…………………………………………………… 25

Figure 3.1: Collection location (black dots) and distribution map for A) contemporary species ranges, B) nuclear haplotypes from each individual nuclear locus (508, 517, and 520, reading left to right) according to 3 individuals per location, C) mitochondrial haplotypes, and D) putative populations according to clustering obtained in STRUCTURE for six microsatellite loci. Black bars indicate possible breaks in species/populations and colors are indicative of species haplotypes: green (C. pitensis), red (C. gulosus), yellow (coastal C. gulosus), and purple (Sacramento basin C. pitensis)…….....……………………………………………………. 58

Figure 3.2: Bayesian phylogenetic species tree developed from three nuclear loci. A single coastrange sculpin (C. aleuticus) served as an out-group, as in Baumsteiger et al. (2012). Three clades are evident: coastal C. gulosus, C. gulosus, and C. pitensis. Individuals from locations within the are intermediate between C. gulosus and C. pitensis clades. Branch support indicated by posterior probabilities (above)……………………… 59

x

Figure 3.3: mtDNA haplotype networks from unique cytochrome b sequences for each of three putative species (C. gulosus-riffle, C. pitensis-Pit, and C. gulosus-coast). Circle size correlates to number of individuals, with circle in legend approximately equal to one individual. Grey outlines represent locations containing a particular haplotype. Gray unlabeled box in coastal C. gulosus represents haplotype also found in Penitencia Creek. The gray box imbedded in the Kings River group is the Kaweah individuals. Black dots represent inferred substitutions between haplotypes. A complete list of haplotypes (CBH), by location, can be found in Table S2…………………………………………………………………………... 60

Figure 3.4: Mitochondrial coalescent tree estimated in BEAST, showing age of diversification based on a 0.9% mutation rate (per million years). Parentheses represent upper and lower bounds of estimate. Numbers above branches are posterior probabilities based on 37,500 trees. Colors match Figure 3.1 distinctions……………………………………………….. 61

Figure 3.5: Clustering analysis based on microsatellite allele frequencies, as determined in STRUCTURE. Clusters (K) are indicated by color or shade. Individuals appear as slight vertical bars within each location (bordered by black lines)………………………………. 62

Figure 3.6: Unrooted neighbor-joining tree showing Cavalli-Sforza Edwards chord distances obtained from 6 microsatellite loci (locus 318 removed). Values on nodes represent bootstrap percentages (out of 1000). Nodes without values showed less than 65% bootstrap support. 63

Figure 4.1: Map on left shows all sampling locations (Washington and Oregon locations not shown). Map on right indicates distribution and proportion of unique mtDNA haplotypes over a select sub-sampling of locations…………………………………………………… 91

Figure 4.2: Consensus phylogenetic tree of Bayesian and maximum likelihood statistical approaches showing regional groupings of locations. Branch support is given as a posterior probability (over 7500 trees) and bootstrap values (from 1000 bootstraps), respectively… 92

Figure 4.3: Haplotype network showing regional (shaded) diversification amongst haplotypes. Numbers refer to specific haplotypes (see Table SA) whereas circle size represents the number of individuals with that particular haplotype…………………… 93

Figure 4.4: Coalescent trees with only a 0.9% mutation rate (per million years) (A) or with the same mutation rate and a dated node (Clear Lake divergence Z = 480,000 years ago) (B). Divergence times and ranges in millions of years (m.y.a.) or thousands of years (t.y.a.) for nodes related to regional groupings (given as capital letters). Branch supports are posterior probabilities from 3750 trees. The group labeled as Coastal C is three individuals, one from the Russian and Klamath Rivers, and one from the Oregon Coast………………..………………………… 94

Figure 4.5: Plot of Bayesian skyline showing changes in effective population size over time. Both axes are log transformed……………………………………………………………… 95

Figure 4.6: Initial STRUCTURE analysis showing three primary clusters: coastal, inland, and Clear Lake. Each primary cluster was run separately to revel substantial sub-structuring, though Clear Lake showed no additional clustering. K refers to number of clusters according to Evanno et al. (2005). Unique clustering was witnessed between Sacramento and San

xi

Joaquin River regions, but only the San Joaquin had additional sub-structuring………… 96

Figure 4.7: Isolation by distance (IBD) measures across all samples and at each regional scale. Overall, Coastal, Inland, Sacramento, and San Joaquin groups are within group IBD whereas Clear Lake is a comparison between the two Clear Lake locations and all other locations in this study (except point with gray outline, which is the pairwise comparison)…………… 97

xii

ACKNOWLEDGEMENTS

First, I would like to thank God for giving me the opportunity to try and understand this world a little better. I would like to thank my parents, Jean and Dave (and Luke and Karen) for raising me to always be curious, to be strong and determined, and to never relent on something I wanted to do. Those sacrifices to send me to private school certainly paid off in the long run. I would like to thank my best friend Nathan for always reminding me to laugh and enjoy life, not just work through it. Special thanks go to Jake, Jon, and Steve, who pushed me to finally do this crazy Ph.D. thing. And finally to my fiancée Mallisa and her daughter Nicole, who are my life outside of academia, went through this whole thing with me, and who now share in my future; you are loved. As for my academic appreciation, primary I want to thank Dr. Andy Aguilar, who despite making a tough choice to leave UC Merced, stuck with me all the way to the end. I appreciate your willingness to take me on, to tolerate my frustrations, and to make sure I became the best scientist I could be. I came to UC Merced to work with you and I am glad I did. I would like to thank the rest of my committee, Mike, Andrew, and Peter, each of whom challenged and advised me in their unique way. There is no one way to do things and the combined advice of the committee was instrumental in elevating my approach and understanding. I also want to thank a number of faculty members here at UC Merced (Dave, Jen, Marcos, and others) who gave me such great insight and advice for not just understanding science, but the people in it. This will be invaluable going forward. And upon that note, I lastly want to thank the great support staff here at UC Merced, Carrie, Rachel, Paul, Jen, Jesus, Anne, Belinda, and on and on were such a supportive and thoughtful group that often go unappreciated. Believe me, you are appreciated. And finally to my fellow students, Andy and Joe, the Aguilar lab! You guys are terrific and fun and made doing a dissertation a whole lot more exciting. Sometimes you are just lucky to get great people to be around for 5 years and I believe we made the most of our time here. I wish you both the best going forward. And to the ladies of the Dawson lab (Sarah, Lauren, Holly, Joan, Lisa), our “rivals”, you ladies are fantastic. Though I will not be here to continue our banter, having you in the suite next to us was just great luck. I hope each of you finds your calling and that one day we can all sit at a conference and laugh. A special thanks to the ladies of GSA as well. Thank you for your inspiration and for not killing me during my time on GSA. I have no doubt that each of you will be successful. There are a number of graduate students here at UC Merced that I have been able to interact with over my 5 years who have been just a joy to talk to or spend time with. Too many names to mention but each you is remembered. Finally, thanks go to my poor undergraduate students, who endured my moods, my tedious work, and whatever other challenges I could find for you. Each of you (Michelle, Tyler, Aldwin, Charlene, and Lisa) is capable of so much that it is exciting for me to see what the future holds for you. I know that if you put your mind to it, each of you can be successful. There is no way I could have pulled off all the sampling without the help of a laundry list of people. I will attempt to mention just some of the great folks who shared their time and effort to help better understand sculpin. To that end I wish to thank D. Goodman, R.Bilski, and J. Kindopp (United States Fish and Wildlife Service), D. Markel and B. Sidlauskas (Oregon State Univ.), J. Smith (San Jose State Univ.), D. Girman (Sonoma State Univ.), R. Vincik, E. Guzman, D. Coulon, and K Gibson (California Department of Fish and Game), L Long and H. Isner (Kings River Conservation District), Teejay O’Rear (UC Davis), D Rishbieter (California State Parks), J.Seigel and C. Swift (Los Angeles Natural History Museum), J. Koehler (Napa County Resource Conservation District), and R. Douglas (Mendocino Redwood Company).

xiii

VITA Education 2013 Doctor of Philosophy, Quantitative and Systems Biology, University of California, Merced 2002 Masters of Science in Natural Resources (Fish Genetics), Humboldt State University 1998 Bachelors of Science in Biology, University of California, Los Angeles Professional Experience 08/06 – 08/08 Amphibian Genetics Laboratory Manager Washington State University, Pullman, WA Supervisor: Dr. Andrew Storfer 11/02 – 08/06 Fisheries Biologist/Geneticist Abernathy Fish Technology Center U.S. Fish and Wildlife Service, Longview, WA Supervisors: Dr. William Ardren and Dr. Donald Campton 08/99 – 08/02 Genetics Laboratory Manager/ Graduate Student Department of Fisheries Biology, Humboldt State University Advisor: Dr. Eric Loudenslager. 08/99 – 08/01 Teaching Hatchery Assistant/Technician Department of Fisheries Biology, Humboldt State University Teaching Experience Teaching Assistant Intro to Cell and Molecular Biology (UCM) F+Sp 2010, F2011, F+Sp2012 Conservation Biology Field (UCM) Sp2009, Sp2011 Conservation Biology (UCM) F2008, Sp2011 Guest Lecturer Conservation Biology (UCM) March 2012 Introduction to Cell and Molecular Biology (UCM) March, Oct, Nov 2012 Fisheries Genetics (HSU) March 2002 Introduction to Sampling Theory (HSU) February 2002 Undergraduate TA Physiology (UCLA) Fall 1997 Volunteer Experience 08/12 – 06/13 Academic Affairs Officer for Graduate Student Association 08/12 – 06/13 Graduate Student Rep. on Academic Planning and Resource Allocation 08/12 – 06/13 Graduate Student Rep. on Ombuds Oversight Committee 08/11 – 08/13 Co-founder of Environmental Science, Ecology and Evolution Discussion Group 08/09 – 08/13 Instructional Intern for Center for Research on Teaching Excellence (CRTE) 06/98 – 08/99 Laboratory Technician for Dr. Robert K. Wayne, UCLA Awards • UC Merced School of Natural Sciences Summer Fellowship - 2013 Stipend($6100) Travel ($1000) • UC Merced QSB Dissertation Fellowship – Spring 2013 ($9000) • Sigma Xi Fellowship Grant - 2012, One year ($1000) • UC Merced Graduate Research Council - Graduate Student Fellowship 2010 - 2012 o Summer Stipend ($6000), Research and Travel ($1500) per year

xiv

• Certificate of Teaching Accomplishment and Instructional Training – UC Merced Center for Research on Teaching Excellence (CRTE) 2010-2011 • Nominated for K. Patricia Cross Future Leaders Award (Association of American Colleges and Universities) by UC Merced Center for Research on Teaching Excellence • Nominated for Teaching Assistant of the Year - 2011 • Governmental STAR award - 2003 and 2005 Publications Baumsteiger, J., and Aguilar, A. Impact of dams on distribution and hybridization of two species of California freshwater sculpin (Cottus). Submitted to Conservation Genetics. Takahashi, M., Eastman, J., Griffin, D., Baumsteiger, J., Parris, M., and A. Storfer. A stable niche assumption-free test of ecological divergence. Submitted to Molecular Ecology. Trumbo, D.R., Spear, S.F., Baumsteiger, J., and A. Storfer. 2013. Rangewide landscape genetics of an endemic Pacific northwestern salamander. Molecular Ecology 22(5): 1250-1266. Aguilar, A., Douglas, R.B., Gordon, E., Baumsteiger, J. and M.O. Goldsworthy. 2013. Elevated genetic structure in the coastal tailed frog (Ascaphus truei) in managed redwood forests. Journal of Heredity 104(2): 202-216. Baumsteiger, J., Kinziger, A.P., and A. Aguilar. 2012. Life history and biogeographic diversification of an endemic western North American freshwater fish clade using a comparative species tree approach. Molecular Phylogenetics and Evolution 65: 940–952. Baumsteiger, J. and A. Aguilar. 2012. Nine original microsatellite loci in prickly sculpin (Cottus asper) and their applicability to other closely related Cottus species. Conservation Genetics Resources 5(1): 279-282. Baumsteiger, J., Swift, H.F., Lehman, J.M., Heras, J., and L. Gomez-Daglio. 2010. Getting to the root of phylogenetics. Frontiers of Biogeography 2(3): 68-69. Johnson, J.R., Baumsteiger, J., J.M. Hudson, Zydlewski, J. and W.R. Ardren. 2010. Relationship between sympatric life history forms of coastal cutthroat trout in two lower Columbia River tributaries. North American Journal of Fisheries Management 30: 691- 701. Ardren, W.R., Baumsteiger, J., Allen, C. and D.E. Campton. 2009. Genetic analysis of threatened Foskett Springs Speckled Dace. Conservation Genetics 11(4): 1299-1315. Kennedy, B.M., Baumsteiger, J., Ostrand, K.G., Ardren, W.R., and W.L. Gale. 2009. Morphological, physiological, and genetic methods for improving field identification of steelhead trout, coastal cutthroat trout, and hybrid smolts. Marine and Coastal Fisheries 1: 45-56. Steele, C., Baumsteiger, J., and A. Storfer. 2009. Influence of life history variation on the genetic structure of two sympatric salamander taxa. Molecular Ecology 18: 1629-1639. Baumsteiger, J. and J. Kerby. 2009. The effectiveness of Chinook salmon carcasses as an alternative source of tissue for DNA extraction in conservation genetic studies. North American Journal of Fisheries Management 29(1): 40-49. Baumsteiger, J., Hand, D., Olson, D.E., Spateholts, R., FitzGerald, G.W. and W.R. Ardren. 2008. Parentage analysis reveals successful reproduction of hatchery-origin Chinook salmon outplanted into an Oregon stream. North American Journal of Fisheries Management 28(5): 1472-1485.

xv

Spear, S.F., Baumsteiger, J., and A. Storfer. 2008. A newly developed suite of polymorphic microsatellite markers for the coastal tailed frog, Ascaphus truei. Molecular Ecology Research 8(4): 936-938. Steele, C. Baumsteiger, J., and A. Storfer. 2008. Development of microsatellite markers for Dicamptodon copei and tenebrosus. Molecular Ecology Resources 8(5): 1071-1073. Baumsteiger, J., Hankin, D., and E.J. Loudenslager. 2005. Genetic analysis of juvenile steelhead, coastal cutthroat trout, and their hybrids differ substantially from field identifications. Transactions of the American Fisheries Society 134: 829-840.

xvi

ABSTRACT

Freshwater systems across North America and Eurasia are home to inconspicuous species of fish known as sculpins (genus Cottus). As generally small, benthic species, sculpins have largely been ignored by humans. One lineage in particular, Cottopsis, is endemic to the west coast of North America and inhabits diverse geographic areas and ecological conditions. Such complexity is the perfect setting for understanding issues related to speciation, divergence, population structure, gene flow, and phylogeography of freshwater . Comprehensive phylogenetic work using multiple nuclear and mitochondrial markers generated species trees and delimitations which varied slightly by technique and statistical approach, but which revealed a number of cryptic lineages and relationships within the clade. Three species groups in particular showed remarkable variation in California, riffle (C. gulosus) and Pit (C. pitensis) sculpin, and prickly sculpin (C. asper). Following an extensive distributional sampling, a number of phylogeographic breaks, cryptic lineages, and speciation mechanisms were discovered. Within C. gulosus and C. pitensis, the Sacramento River proved to be distinctive showing potential historical hybridization, complete C. pitensis mitochondrial introgression, and restricted contemporary population structure. Consistent with other studies, a phylogeographic break along the Sacramento/San Joaquin River Delta led to the prospective divergence of C. pitensis into the Sacramento/Pit River basin and C. gulosus into the San Joaquin River basin; with different measures of isolation by distance and intra-population structure. A similar break and population structure was seen in inland C. asper populations. Estimated to have diverged from amphidromous coastal populations 110,000 years ago, inland C. asper represent incipient speciation, as evident by poor phylogenetic distinctiveness. Low measures of gene flow and hybridization in the narrow connective corridor between coastal and inland populations, along with no sign of physical barriers suggest inland C. asper are diverging by way of parapatric speciation. A novel lineage of C. asper in Clear Lake is also well-supported, originating from coastal, not inland populations. Overall, results clearly signify sculpin are a complex species group, with structure reflective of their historical hydrology, and worthy of intensive conservation and management strategies.

xvii

Chapter 1: Introduction

“The sculpins of the genus Cottus comprise one of the most perplexing groups of North American freshwater fishes. Variation is so marked and haphazard that interpretation of species limits is difficult” Robins and Miller (1957)

1.1 Freshwater Sculpin With the advent of modern genetic approaches, one group of organisms long overdue for a comprehensive systematic overview is freshwater sculpin (genus Cottus). Distributed over inland and coastal habitats across North America and Eurasia, freshwater sculpins are a complex group of species that are difficult to identify (Adams and Schmetterling 2007). Their systematics have largely been contingent upon morphological characters and meristic measurements; the literature is filled with variation, different taxonomic names, and contrasting range limits owing to plasticity between and within species (Hubbs and Lagler 1941, Barlow 1961, Krejsa 1967a, Strauss and Bond 1990). Additionally, some locations are exceptionally speciose, such as Lake Baikal in eastern Siberia, (boasting an endemic flock of close to 33 species - Kontula et al. 2003) whereas others in the Rhine River, Germany contain hybridizing sympatric species, invading novel locations based on specialized habitat adaptations (Nolte et al. 2005a,b, Nolte et al. 2006). Derived from a marine based family (; 70 genera, 300 species), Cottus contains 42 known species (Kinziger et al. 2005, Page and Burr 2011). Most Cottus are small, benthic fish with flattened heads, fanlike pectoral fins, smooth scaleless bodies, and no swim bladders (Moyle 2002). Dark in color and mottled, many species contain various amounts of prickling (small raised ridges above and below the skin) as well as variation in pre-opercular spines, dorsal fin connectiveness, palatine teeth, and occasional independent morphological characters (Moyle 2002). As sit and wait predators, Cottus primarily feed on aquatic insects (Mertz 2002, Tabor et al. 2007), though some piscivory has been shown, preying on conspecifics and juvenile salmon and trout species (Bond 1963, Moyle 1977, Foote and Brown 1998). Despite being considered relatively poor swimmers, often hindered by in-stream obstructions (Utzinger et al. 1998), many still make extensive in-stream migrations (Kresja 1965, Englebrecht et al. 2000, Moyle 2002). Along the Pacific slope of North America is an endemic clade of freshwater sculpin known as Cottopsis (Kinziger et al. 2005). Named after one of the original names for Cottus (see Girard 1856, Cooper 1868, Jordan 1878), Cottopsis consists of 11 (suspected 12; C. tenuis) species distributed from Alaska to Southern California, radiating as far inland as western Idaho in the Columbia River system (Page and Burr 2011). Three well supported species groups are recognized within Cottopsis: the C. klamathensis group, C. aleuticus, and the C. asper group (Kinziger et al. 2005). The first is composed of three primary species (one species contains potentially three subspecies); all contained within the greater Klamath or upper Pit River basins (Daniels and Moyle 1984, Moyle 2002). The second, coastrange sculpin (C. aleuticus), is a strictly coastal species distributed tentatively from Santa Barbara, California to Alaska. The final group, C. asper, consists of five known species, two of which are confined to coastal Oregon and Washington (C. perplexus) or the Walla Walla/Umatilla drainage of the Columbia River (C. marginatus) (Page and Burr 2011). The remaining three species in the C. asper group (prickly - C. asper, riffle - C. gulosus, and Pit sculpin - C. pitensis) have large distributions within California (the latter two being endemic), particularly in the Great Central Valley (Moyle 2002). Strong evolutionary and geographic forces have shaped the distribution and ecology of each species, leading to ecological specialists (C. gulosus, C. pitensis) and generalists (C. asper) (Moyle 2002). The limited but telling literature on each species suggests a great deal more genetic complexity is present but yet 1

2

to be explored (Kresja 1965, Hopkirk 1973, Daniels 1987, Swift et al. 1993, Moyle et al. 1996, Moyle 2002, Kinziger et al. 2005, Page and Burr 2011). Therefore, with interesting potential questions and advances in genetic research, it is time to investigate these species.

1.2 Speciation and Hybridization Species are the operational taxonomic unit of biology, an essential component in understanding, conserving, and managing biodiversity today. However, the concept of what constitutes a species is still debated. Novel molecular and genetic approaches have only muddled ideas further; as inherent in the number of “species concepts” available (Coyne and Orr 2004). Integral in defining a species is identifying what causes divergence initially. Early on, scientists attributed speciation to geographic differences, looking at how species were forced to adapt to novel environments (Endler 1977). Lately, knowledge of additional speciation mechanisms such as sexual selection, genetic drift, polyploidy, hybridization, and ecological reinforcement have all been shown to drive speciation regardless of geographic barrier (Bolnick and Fitzpatrick 2007) When it comes to speciation, one of the most difficult factors to interpret is hybridization (Coyne and Orr 2004). Hybrids are often viewed as a breakdown of speciation, where closely related species or potential (incipient) species have reached some level of separation, be it geographic, ecological, or genetic, only to resume contact and inter-mate. Simply identifying how separation was achieved can be debatable, with most studies leaning toward some form of allopatry (Butlin et al. 2012). Additionally, hybrids can occasionally be ecologically superior in intermediate locations, representing yet another speciation mechanism (Mallet 2007). Lastly, ascertaining whether hybridization is natural or artificial can be exceedingly difficult, with questions of fitness related to pre- and post-zygotic barriers influencing introgression (Arnold and Hodges 1995, Rhymer and Simberloff 1996). Clearly the presence of hybrids can greatly interfere with the ability to manage and conserve a species, as most regulations are contingent upon a narrow definition of species (Allendorf et al. 2004, Campton and Kaeding 2005). Therefore, components of hybridization must be addressed whenever speciation factors are determined.

1.3 Cryptic and Incipient Species Advances in genetic technology on non-model organisms have opened the door to novel diversity in the form of cryptic species. Previously unavailable due to limited morphological differences, crypsis may be attributable to physiological differences, unknown reproductive behavior, chemical pheromones, or other non-visible cues that are difficult to quantify using standard taxonomic methods (Bickford et al. 2007). Cryptic species represent important adaptive biodiversity, shedding light on ecological or evolutionary differences in range, niche, or gene flow. Potential biodiversity is also being added through incipient speciation, as divergence between populations reaches the level of species. This important time is vital to understanding factors which cause evolutionary divergence, whether genetic drift, natural or sexual selection, or simple geography (through allopatric, parapatric, or sympatric methods) (Bolnick and Fitzpatrick 2007). Identifying species undergoing incipient speciation is often challenging, but once observed, represents a window into evolution in action.

1.4 Phylogeography Using multiple fields or lines of evidence to assess questions in evolutionary biology is becoming commonplace. For example, phylogenetics focuses on gene trees for discerning species and higher taxa, often concatenating multiple genes within mitochondrial DNA (mtDNA) or incorporating mtDNA and nuclear DNA (nDNA) (Brito and Edwards 2009, Edwards 2009). 3

Population genetics, on the other hand, focuses on understanding forces that shape patterns of molecular genetic variation within a species (populations), using summary statistics (FST and Θ) and mathematical models (Wright-Fisher, Island, Isolation by distance) (Hey and Machado 2003). Combining concepts from each field, John Avise and colleagues proposed the field of phylogeography, intended to bridge the gap between microevolution (population genetics) and macroevolution (phylogenetics) (Avise et al. 1987, Avise 2000, Hickerson et al. 2010). It sought to address “the principles and processes governing the geographic distributions of genealogical lineages, especially those within and among closely related species” (Avise 2000). By focusing on the coalescent (the ancestral relationship between chromosomal segments - Kingman 1982), one could develop evolutionary trees based on patterns of DNA ancestry in a population (often through a molecular clock) as opposed to patterns of taxonomic ancestry, thereby inferring historical demographic parameters such as population size, divergence times, and migration rates (Beerli and Felsenstein 2001, Hey and Machado 2003, Majoram and Tavare 2006, Wakeley 2008, Kuhner 2008). These demographic parameters could then be tied to geographic events, lending additional information on patterns of speciation and population structure (Avise 1987). By incorporating aspects of both fields, a thorough picture of speciation, population structure, and the role of ecological and evolutionary forces can be deduced. Phylogeography has not always been comprehensive. Initially, genealogical trees were based solely on mtDNA sequences due to reduced effective population size, lack of recombination, and higher rates of variation than nuclear sequences (Moore 1995, Avise 2000, Avise 2009). Single gene trees were used to infer phylogenetic relationships as well, as in the barcoding gene cytochrome oxidase 1 (CO1) (Rubinoff et al. 2006, Brito and Edwards 2009, Edwards 2009). Additional limitations of computational power, cost, and marker availability reduced the scope of many early studies. With the advent of modern techniques, limitations have been alleviated, allowing biologists to incorporate mitochondrial and nuclear DNA into studies (Marjoram and Tavare 2006, Edwards et al. 2007). In fact, single gene approaches often fail to confer an accurate assessment of a lineage due to lineage sorting, being influenced by individual forces of selection and genetic drift (Maddison and Knowles 2006). One solution was the development of “species trees”, where species or distinct populations are determined based on multiple independent sequences or genes, forming large lineage comprehensive trees (Edwards et al. 2007, Liu and Pearl 2007, Liu et al. 2008, Edwards 2009, Heled and Drummond 2010). Additionally, the dawn of statistical phylogeography has greatly improved estimations and accuracy, using complex simulations and likelihood and Bayesian algorithms to infer parameters based on multiple genes (Knowles and Maddison 2002, Knowles 2009).

1.5 Molecular Marker Development Because sculpin are non-model organisms, obtaining the necessary molecular markers to examine patterns suggested here was initially challenging. Fortunately, advances in next- generation sequencing now allow for rapid coverage of large portions of the genome without prior knowledge of the species. Using liver tissue obtained from a single fresh sample of C. asper, DNA was extracted and prepared for sequencing. A Genome Sequencer FLX (GS-FLX) System (Roche Inc.) using a shotgun sequencing method based on Roche 454 technology (Abdelkrim et al. 2009) was used at the UCLA Genomics Core facility. Approximately 29,836 reads 100-600 base pairs in length were obtained and mined for potential Cottus specific nuclear markers. Raw reads were filtered for adaptor sequences and subsequently scanned for repetitive elements with REPEATMASKER (Smit et al. 2004) using available information from E. coli (for contamination), humans, and five model teleosts (danio, fugu, stickleback, medaka, and tetraodon). Sequences were filtered for length (>400 bp) and 4

subjected to BLAST searches against Genbank (NCBI) to remove sequences of mitochondrial or bacterial origin. Additional searching for repetitive elements was performed by BLAST and BLAT searches against the genomes of model teleosts available on the UCSC Genome Browser. Sequences showing elevated homology to bacterial or mitochondrial accessions or any repetitive elements were removed. The remaining subset of large sequence fragments was assembled with CAP3 (Huang and Madan 1999), from which PCR primers were designed for 20 ‘loci’. Of these 20 ‘loci’, 10 amplified reliably across multiple species of Cottopsis, contained variation, and did not show signs of being paralogs. These initial 10 loci served as the nuclear sequences for a number of phylogenetic reconstructions. Two mitochondrial markers were chosen based on previous studies of sculpin: cytochrome b (L14724 and H15915) (Kinziger and Wood 2003) and control region or D-loop (CR-A and CR-E) (Lee et al. 1995). For microsatellites, a full description of available markers developed and tested for this study was published in the journal Conservation Genetics Resources (Baumsteiger and Aguilar 2012).

1.5 Objective This dissertation explores the speciation and distribution of C. gulosus, C. pitensis, and C. asper in the state of California by identifying factors which historically gave rise to each species and led to their contemporary distribution. Before species could be investigated individually, a full phylogenetic re-appraisal of Cottopsis was necessary to identify biogeographic factors influencing the entire clade as well as determine the true evolutionary relationship between species. From here, a comprehensive molecular analysis was applied to each target species, with the idea of looking at questions from different perspectives. From a theoretical perspective, identified factors explore and expand on some of the most basic questions in ecology and evolutionary biology including species delimitation, speciation, hybridization, and phylogeography. From a practical perspective, real usable information was obtained for these species including an improved list of cryptic species, an updated species distribution (historic and current), and a recognition of factors influencing a number of other native sympatric species in the state. All of which should greatly improve long-term management and conservation of California’s ichthyofauna and the factors endemic species need to persist.

1) Life history and biogeographic diversification of an endemic Western North American freshwater fish clade using a comparative species tree approach.

2) Complex phylogeography and historical hybridization in two closely related sister taxa of freshwater sculpin.

3) Crypsis and potential incipient parapatric speciation in a species of freshwater sculpin, Cottus asper. 5

Chapter 2: Life history and biogeographic diversification of an endemic Western North American freshwater fish clade using a comparative species tree approach

2.1 Abstract The west coast of North America contains a number of biogeographic freshwater provinces which reflect an ever-changing aquatic landscape. Clues to understanding this complex structure are often encapsulated genetically in the ichthyofauna, though frequently as unresolved evolutionary relationships and putative cryptic species. Advances in molecular phylogenetics through species tree analyses now allow for improved exploration of these relationships. Using a comprehensive approach, we analyzed two mitochondrial and nine nuclear loci for a group of endemic freshwater fish (sculpin-Cottus) known for a wide ranging distribution and complex species structure in this region. Species delimitation techniques identified three novel cryptic lineages, all well supported by phylogenetic analyses. Comparative phylogenetic analyses consistently found five distinct clades reflecting a number of unique biogeographic provinces. Some internal node relationships varied by species tree reconstruction method, and were associated with either Bayesian or maximum likelihood statistical approaches or between mitochondrial, nuclear, and combined datasets. Limited cases of mitochondrial capture were also evident, suggestive of putative ancestral hybridization between species. Biogeographic diversification was associated with four major regions and revealed historical faunal exchanges across regions. Mapping of an important life-history character (amphidromy) revealed two separate instances of trait evolution, a transition that has occurred repeatedly in Cottus. This study demonstrates the power of current phylogenetic methods, the need for a comprehensive phylogenetic approach, and the potential for sculpin to serve as an indicator of biogeographic history for native ichthyofauna in the region.

2.2 Introduction Recent studies have shown single gene approaches often fail to resolve true species relationships as a result of individual gene specific forces of selection and genetic drift (Maddison and Knowles 2006, Degnan and Rosenberg 2009). To account for this problem, biologists have begun to employ “species trees” approaches, where species relationships are determined based on multiple independent genes (Edwards et al. 2007, Liu and Pearl 2007, Liu et al. 2008, Edwards 2009, Heled and Drummond 2010). For many years mitochondrial DNA (mtDNA) was the marker of choice for phylogenetic reconstruction due to its reduced effective population size, lack of recombination, and higher rates of variation than nuclear sequences (Moore 1995, Avise 2000, Avise 2009). However, with the advent of modern approaches, many of the limitations of nuclear DNA have been alleviated, allowing biologists to incorporate both mitochondrial (mtDNA) and nuclear DNA (nDNA) into species tree studies (Marjoram and Tavare 2006, Edwards et al. 2007). Numerous methods have been developed for species tree estimation (STEM- Kubatko et al. 2009, *BEAST- Drummond and Rambaut 2007, MDC- Than and Nakhleh 2009, SPLITSTREE4 - Huson and Bryant 2006), BUCKy - Larget et al. 2010, Ane et al. 2007 and BEST - Liu 2008). Each method incorporates multi-gene data but differs in statistical approach (maximum likelihood or Bayesian), data input format (sequence or individual gene tree), estimates of discordance (incomplete lineage sorting or hybridization) and marker types (organellar or nDNA, or combinations of both). The most accurate method is dataset dependent, influenced by underlying biological processes relating to divergence history and attributes of the sampling protocol (number and choice of loci, and taxon sampling) (Knowles 2009). Thus, applications of multiple approaches in a comparative framework have proven useful for species tree estimation in taxonomic groups with complex phylogenetic structure (Galaxias fishes: Waters et al. 2010,

6

Emys turtles: Spinks et al. 2010, and Sistrurus rattlesnakes: Kubatko et al. 2011). Correct species identification is a vital part of any molecular systematics study. No true evolutionary history can be inferred if all contemporary lineages are not accounted for (Knowles 2009). deQueiroz (2007), in trying to generate a unified approach, separated species conceptualization from species delimitation. Whether species delimitation uses reproductive isolation, (biological species concept: Mayr 1963), ecological adaptive zones (ecological species concept: Van Valen 1976), or unique basal clustering (phylogenetic species concept: Cracraft 1989), the data must strongly support the delimitation. By focusing on delimiting factors (reproductive isolation, diagnosability, and monophyly) rather than a single concept, lines of evidence (operational criteria) are developed for assessing a species based on all possible data (Sites and Marshall 2003, 2004, deQueiroz 2007). These data can then be updated with new information, providing the most up to date, comprehensive approach to species identification possible. Freshwater sculpin (genus Cottus) are a widely distributed group of fish found in inland and coastal aquatic habitats across North America and Eurasia. Taxonomic status of this group has largely been based upon morphometrics and meristics (Hubbs and Lagler 1941, Barlow 1961, Krejsa 1967a, Strauss and Bond 1990). However, putatively diagnostic characters exhibit high intraspecific variation and overlap among species, making relatedness and species delimitation uncertain. In such a complex system, a molecular systematic approach seems ideal to tease apart these subtle differences. Kinziger et al. (2005) was one of the first to apply molecular systematics to freshwater sculpin. Using two mitochondrial markers they successfully identified a number of distinct clades within Cottus. One clade in particular, Cottopsis (from early nomenclature of the genus - see Girard 1856, Cooper 1868, Jordan 1878) is composed of 10 species occupying inland lakesrivers, and coastal estuarine environments along western North American from California to Alaska (Kinziger et al. 2005). Phylogenetic estimates based upon mtDNA provide strong support for the monophyly of this clade (Kinziger et al. 2005). However a phylogenetic reappraisal of relationships among taxa seems warranted. First, the use of mtDNA alone may have resulted in inaccurate tree reconstruction (Maddison and Knowles 2006). Second, many of the nodes in the existing phylogeny are unresolved (e.g., polytomies) or weakly supported (e.g., low bootstrap and Bayesian posterior probabilities). Lastly, several taxa hypothesized to represent distinct lineages were not included in the analysis (e.g., Kresja 1965, Hopkirk 1973, Moyle 2002, Kinziger et al., 2005), a factor which may have impeded efforts to resolve the true evolutionary history of the group (Knowles 2009). Much of the phylogenetic complexity within Cottopsis may be attributable to a highly variable biogeographic landscape. For example, within California alone, there are 22 different zoogeographic sub-provinces (Moyle 2002). Geologic events, such as glaciations, volcanism, seismic shifts, and marine incursions, have also shaped the landscape (Janzen et al. 2002). These factors have likely contributed to complex diversity in Cottopsis, including variation in life history (amphidromy and freshwater residents), hybridization, and ecological generalist and specialist species (Moyle 2002). If a strong correlation can be made between phylogeny and biogeography in Cottopsis, information can be projected onto similar sympatric species, improving conservation and management of numerous species in this region. The goal of this study was to resolve the phylogenetic relationships within the Cottopsis clade using a comparative species tree approach. We incorporated multiple phylogenetic techniques and methods with a suite of mitochondrial and nuclear markers. Owing to the possibility that several species within Cottopsis are polytypic, we first conducted analyses to identify cryptic lineages using two different delimiting methods. Next, delimited lineages underwent a comprehensive species tree analysis to investigate both marker and methodological

7

influences on estimates of phylogenetic relationships. Final phylogenetic structure was linked to biogeographic regions along the west coast of North America, providing insight into potential historical and contemporary patterns influencing sculpin and other native sympatric fish in the region. Lastly, we used our estimate of the phylogenetic relationships to investigate the evolutionary history of freshwater and amphidromous life histories in Cottopsis.

2.3 Materials and Methods Sampling The data matrix included all 10 known species currently recognized as members of the monophyletic Cottopsis clade (Kinziger et al. 2005) (Table 2.1). Also included were multiple individuals of each species, with a particular focus upon species that previous literature indicated were polytypic: C. asper (Kresja 1965, Hopkirk 1973), C. gulosus (Moyle 2002), C. asperrimus (Robins and Miller 1957), and C. klamathensis (Daniels and Moyle 1984). Fin clips and whole individuals were collected to augment taxon sampling from previous investigations (Kinziger et al. 2005). For all new collections, ventral caudal fin clips were stored in 100% non-denatured ethanol prior to extraction and whole individuals were vouchered into the Humboldt State University Fish Collection. After testing a number of possible outgroups (according to Kinziger et al. 2005), a single mottled sculpin (Cottus bairdi) served as the best outgroup based on its relatedness to the Cottopsis clade and amplification success across all genetic markers in this study. DNA was extracted from fin clips using the DNeasy tissue extraction kit (Qiagen Inc.), following manufacturer’s protocols. DNA was quantified on a NanoDrop 2000 spectrophotometer (Thermo Scientific), with concentrations ranging from 25to150 ng/µl. Nuclear marker development – 454 sequencing Genomic DNA was extracted from liver tissue obtained from a single prickly sculpin (C. asper). Genomic DNA was shotgun sequenced on a Genome Sequencer FLX (GS-FLX) System (Roche Inc.) at the UCLA Genomics Core facility. Approximately 29,836 reads, 100-600 base pairs in length, were obtained and mined for potential Cottus specific nuclear markers. After filtering and assembly (see Supplementary Material for details) we proceeded with 10 nuclear loci for phylogenetic reconstruction (Table 2.2 & Supplementary Material for primer sequences). Mitochondrial sequences Two mitochondrial markers were chosen for phylogeny estimation: cytochrome b (L14724 5’-GTGACTTGAAAAACCACCGTT-3’, H15915 5’- CAACGATCTCCGGTTTACAAG-3’) (Schmidt and Gold 1993, Kinziger and Wood 2003) and control region (D-loop) (CR-A 5’-CCTGAAGTAGGAACCAGATG-3’, CR-E 5’- TTCCACCTCTAACTCCCAAAGCTAG-3’) (Lee et al. 1995). Amplification Polymerase chain reactions (PCRs) were performed in 30 µl volumes with the following components: 1X PCR buffer (10 mM Tris HCl [pH 8.3] and 50 mM KCl), 1.5 mM MgCl2, 0.5 units of Taq DNA Polymerase (Applied Biosystems Inc.), 20 mM of each dNTP, 5 mg BSA, 0.67 mM of each primer, and 2μl of DNA. Amplifications were run at 94°C for 5 min., followed by 30-35 cycles of 94°C for 1 min., 48°C for 1 min., and 72°C for 1 min., and a final 72°C for 5 min. for the cytochrome b fragment. Conditions for the control region differed by having a 30 sec denaturation, 50°C annealing, and a 1.5 min. extension. All nuclear markers were run under identical conditions as the control region except for an annealing temp of 52°C. Amplified products were cleaned for sequencing with ExoSap-IT (USB Inc.) following manufacturers’ protocols and sent to the University of California, Berkeley’s DNA Sequencing Facility for sequencing on an ABI 3730XL Sequencing Analyzer using BigDye chemistry. All fragments

8

were sequenced in both directions and deposited in GenBank (accession numbers: JX484297- JX484737). Sequences were imported into SEQUENCHER (Gene Codes Corp.) and MEGA5 (Tamura et al. 2010), visually inspected for quality and used to generate consensus sequences from forward and reverse sequence reads of each individual sample. Gene specific alignments were performed with MUSCLE (Edgar 2004), which uses search algorithms without gap limitations. Unique haplotypes for mitochondrial sequences were obtained with DNASP (Rozas et al. 2003, Librado and Rozas 2009). Nucleotide substitution testing was conducted with JMODELTEST v0.1.1 (Posada 2008) using the corrected Akaike Inference Criterion (AICc) delta scores (Posada and Buckley 2004). Individual loci (nuclear and mitochondrial) were tested for adherence to a molecular clock using the likelihood scores (LSCORES) function in PAUP* (Swofford 2002). We examined each nuclear locus for recombination using the program GARD (Genetic Algorithm for Recombination Detection), embedded in the online server DATAMONKEY (www.datamonkey.com) (Pond et al. 2006). This method tests for incongruence within sequences at a particular locus and applies statistical support for recombinants via AIC. Species Delimitation Due to potential cryptic lineages within the Cottopsis clade, we examined our data for the presence of potentially valid yet unrecognized taxa. Given these unrecognized taxa are not fully described, we will simply refer to these as ‘lineages’ from here on. For all delimitations, we used only nuclear data due to inconsistent and variable signals observed in mitochondrial DNA (McGuire et al. 2007, Galtier et al. 2009). First, we applied the Bayesian modeling approach described by Yang and Rannala (2010) and implemented in the program BP&P. This approach assumes no recent gene flow, consistent with the biological species concept. A guide tree (Supplemental Figure A) representative of the phylogenetic relationships among the most subdivided possible delimitations of individuals was discerned from known morphological traits, geographic areas, and from initial concatenated tree estimates. Our initial search for cryptic taxa focused on riffle (C. gulosus) and prickly sculpin (C. asper), as previous investigations have suggested species are polytypic (Kresja 1965, Hopkirk 1973, Moyle 2002). A separate search was performed for species within the Klamath basin (C. asperrimus, C. princeps, C.klamathensis, and C. tenuis) where polytypic species also exist (Robins and Miller 1957, Daniels and Moyle 1984). Nine nuclear loci were used, with provided program priors resulting in fine tuning parameters within directed ranges (0.15< x < 0.80). A total of 1.5 million iterations (minus 500,000 for burn-in) were run to insure convergence. Runs were performed three times to test for consistency. A second method was employed to ascertain putative cryptic lineages utilizing an individual-based clustering approach. This method has been shown to be useful in the initial steps of species delimitation, identifying units that are likely candidates for species recognition (Shaffer and Thompson 2007, Carstens and Dewey 2010). Focusing separately again on C. asper / C. gulosus and Klamath basin sculpin, single nucleotide polymorphisms (SNPs) were obtained from all nuclear loci within this study. Following initial SNP discovery, SNPs were filtered, removing loci shown to be in linkage disequilibrium. This resulted in an 83 SNP dataset for C. asper / C. gulosus and a 76 SNP dataset for the Klamath basin sculpin. These datasets were analyzed independently with the Bayesian assignment program STRUCTURE (Pritchard et al. 2000). The program is useful for discerning a hierarchy of relationships between samples by establishing which samples cluster together without a priori knowledge of an individual’s species or population. Samples were run with an initial burn-in of 100,000 followed by 500,000 runs using an admixture model, with allele frequencies assumed to be correlated, K (clusters) ranging from 1-6, and a minimum of five iterations per K. Correct K estimation was achieved with

9

STRUCTURE HARVESTER (Earl 2011), which uses the methods of Evanno et al. (2005), and CLUMPP (Jakobsson and Rosenberg 2007). Once primary clusters were established, a secondary run was performed on each well supported primary cluster using the same parameters to infer if additional clustering was present within the primary cluster. Final visual presentation of data was performed in DISTRUCT (Rosenberg 2007). Phylogenetic analysis Individual gene trees were obtained via two different phylogenetic methods: maximum likelihood and Bayesian Markov Chain Monte Carlo (MCMC). Maximum likelihood (ML) trees were generated with PHYML (Guindon and Gascuel 2003) using results derived from JMODELTEST, including model, across site rate variation, and proportion of invariable sites. The best tree topology was searched over two heuristic algorithms, NNI (Nearest Neighbor Interchange) and SPR (Subtree Pruning and Regrafting). Node support was assessed using 1000 bootstrap replicates. MRBAYES (Huelsenbeck and Ronquist 2001, Ronquist and Huelsenbeck 2003) was used to construct Bayesian trees. Runs were conducted over 5 million iterations and sampled every 100th iteration, resulting in 50,000 trees of which the first 12,500 (25%) were discarded as burn-in. We utilized two simultaneous runs with six chains each and a heated chain temperature of 0.1. Initial testing showed these conditions successfully achieved convergence, with a split frequency p-value well below 0.01 and all Potential Scale Reduction Factors (PSRF) close to 1.0 in all loci. Three supermatrices were initially used for phylogeny estimation using the two mitochondrial markers (mtDNA ~1400bp), nine nuclear loci (Nuc ~4800bp), and a combination of nuclear and mitochondrial loci (Nmt ~6200bp). Trees were generated using maximum likelihood and Bayesian methods, as described above, with appropriate models identified in JMODELTEST. We modified the MRBAYES runs to include 100 million iterations sampled every 1000th iteration for a total of 75,000 trees following a 25% burn-in. These parameters ensured convergence, with a split frequency p-value well below 0.01 and all PSRF values close to 1.0. We recognize that concatenated data from multiple loci can lead to unreliable phylogenetic inference and inflated support values (see Kubatko and Degnan 2007); however such approaches provide a useful comparison to other methods. Species tree Estimation Most species tree estimators require species be known a priori so as to assign known individuals together (BUCKy- Larget et al. 2010, Ane et al. 2007, STEM- Kubatko et al. 2009, *BEAST- Drummond and Rambaut 2007, MDC- Than and Nakhleh 2009). Violation of this assumption can lead to topological and branch length errors or reduction in nodal support as the assumption of no gene flow is violated (Edwards 2009). Since the identification of species or cryptic lineage is important for discerning the proper phylogenetic relationship between lineages, we used results from BP&P for each species tree analysis. This method partitioned lineages in the most liberal manner, thereby allowing uniformity between methods and assuring individuals were not grouped improperly. Unless otherwise stated, all methods were run on two datasets, one composed solely of nuclear (Nuc) gene trees, and a second composed of both nuclear and mitochondrial (Nmt) gene trees. An initial attempt was made to generate species trees with the program BEST (Liu 2008). BEST is a modification of MRBAYES, where multilocus data is combined to find the joint posterior distribution of gene trees and the subsequent species tree under a hierarchical Bayesian model. As with previous MRBAYES runs, two runs of six chains were used, with all heated chains at a temperature of 0.1. Despite varying runs over 200 million iterations, no convergence for either split frequencies or PSRF values was achieved for either dataset. Subsequently we are unable to present results from this method.

10

Initial species trees were estimated with *BEAST (Drummond and Rambaut 2007). This program uses Bayesian MCMC methods, partitioning loci in such a way as to allow them to vary according to their specific nucleotide substitution model. Preliminary analysis revealed some loci do not adhere to a strict clock, requiring initial testing in *BEAST for the best molecular clock model. Following a number of initial runs, a relaxed lognormal clock gave the best parameter estimation and was subsequently used in all remaining analyses. Runs consisted of 300 million iterations sampled every ten thousand iterations, resulting in 22,500 trees following 25% discarded as burn-in. The two mitochondrial markers were considered as linked and not independent loci in the Nmt dataset. The program BUCKy (Larget et al. 2010, Ane et al. 2007) was implemented to estimate species trees using individual gene trees generated from MRBAYES. BUCKy employs a Bayesian concordance approach to assess clade agreement between trees, assessed using a concordance factor. Because BUCKy does not currently allow for missing data, two loci (507, 514) were removed, along with a few individual samples, including all three samples of C. tenuis. All other species and lineages were still represented in this analysis. Runs were varied from default values, especially alpha (the a priori discordance parameter) and MCMC values (1, 100,000 respectively) to ensure uniformity in all outputs. Species trees were estimated in STEM (Kubatko et al. 2009) using gene trees generated from ML approaches in PHYML. Using a maximum round of one million, likelihood scores are allowed to vary until no new values are obtained after 10,000 rounds. To assess tree topology, STEM requires an a priori estimate of θ (= 4Neμ) and the relative rates of evolution for each gene. Using suggestions from Kubatko et al. (2009) and Yang (2002), rates were estimated for each gene using branch length distances between outgroup C. bairdi and C. pitensis, C. gulosus (Kings R.), and C. gulosus (Sacramento R.) and averaged over all genes in the dataset. Values of θ were also varied over orders of magnitude to discern sensitivity in overall topology and branch lengths to this parameter. The final method employed to estimate a species tree was amendable to either Bayesian or ML gene trees as input. Using rooted or unrooted gene trees, the Minimize Deep Coalescence (MDC) approach was implemented in PHYLONET (Than and Nakhleh 2009, Than et al. 2008). This method assumes gene tree discordance is due to incomplete lineage sorting and can run on a single individual per species or with multiple individuals per species assigned to the correct “species” when known. Four different sets of gene trees were analyzed: only nuclear trees (Bayesian and ML) and combined nuclear and mitochondrial trees (Bayesian and ML). Additional runs were performed on single or multiple individuals per species to ensure groupings were accurate. PHYLONET was also used to generate a simple majority-rule consensus species tree from each set of gene trees. To assess different species tree methods, we compared alternate tree topologies in the program METATREE (Nye 2008). Here the most parsimonious tree was constructed based on overall character partitions or splits of the set of taxa in each species tree. Relationships were presented diagrammatically in a simple tree, with branch lengths indicating dissimilarity between trees. A final consensus tree, incorporating results from all species tree methods, was estimated in R (R Core Development Team) using a program written by J.Eastman (Univ. of Idaho). A 50% majority-rule tree was obtained, with nodes labeled for comparative differences between methods. Species Network Estimation Given the sympatric distribution of a number of species and potential cryptic lineages, hybridization or introgression is a possible contributing factor to the observed evolutionary relationships and could obscure the true species tree. To account for this possibility, a species

11

network was developed. Using the program SPLITSTREE4 (Huson and Bryant 2006), an initial super-network was created using a total score assessed over all inputted trees and a maximum distortion score that accounted for > 95% of possible networks. This initial network was modified by running a hybridization network using the Recomb2007 method, with edge weights equal to the tree size weighted mean, the maximum reticulations per tangle equal to four, and a percent offset of 10 to 15 percent. As with the all analysis, individual runs were performed with Bayesian and maximum likelihood gene trees using only nuclear (9 loci) and both nuclear and mitochondrial loci (11 loci). Lineages shown with connections may represent introgression, ongoing hybridization, or simply gene flow within the same species.

2.4 Results At least two individuals were collected from each species or proposed cryptic lineage, except those identified as Washington C. gulosus, C. marginatus, or outgroup C. bairdi (Table 2.1, Figure 2.1). All samples were successfully extracted and amplified at each nuclear and mitochondrial locus except four species from the upper the Klamath Basin region at locus 502 and C. gulosus from the Kings River at locus 514 (Table 2.2). Because the Klamath Basin represents four species, one which has three subspecies, this locus (502) was removed, leaving nine nuclear loci. BLAST searches of nuclear loci revealed no matches to known genomic regions, suggesting markers could represent coding or non-coding nuclear regions. Tests for recombination within loci revealed only a single candidate region in the 516 locus, which was corrected for by taking the largest non-recombining fragment (279 of 459 bp) to represent this locus. All loci conformed to a molecular clock except 508, 510, 518, control region, and cytochrome b (Table 2.2). Best-fit nucleotide substitution models varied by locus and are reported in Table 2.2. Relative rate estimates for each locus ranged from 0.33 to 2.02, and were higher in mitochondrial loci than in nuclear loci (Table 2.2). Species Delimitation Species delimitation was conducted according to geographic location, including five sites for C. asper and six sites for C. gulosus. Using only nuclear sequences, analyses from BP&P identified six distinct lineages (>97% posterior probability) in C. gulosus, one from each geographic location. In C. asper, all locations were resolved as distinct lineages (>97% posterior probability) except coastal C. asper from the Smith River and Columbia River which only had a 44% posterior probability. Of the overall trees generated for this analysis, the most prevalent tree (54%) collapsed the coastal C. asper node , while the next two most prevalent trees showed no nodes collapsed (43%), and a tree with the coastal C. asper and Clear Lake and Bear Creek C. asper nodes collapsed (1.5%). Using the Bayesian assignment program STRUCTURE, primary clustering results indicated the presence of three distinct clusters within the C. asper / C. gulosus group (Figure 2.2) represented by C. asper, C. gulosus, and a new lineage composed of C. gulosus from the San Jose region and the Russian River (Figure 2.2). Secondary cluster analysis identified lineage breaks similar to those seen in the BP&P analysis, with K = 3 in the C. asper cluster and K = 2 in the C. gulosus and newly identified lineage (Figure 2.2). Though coastal populations of C. asper were again seen as a single cluster, STRUCTURE analyses combined C. asper from Bear Creek and Clear Lake, C. gulosus from the Kings and Sacramento River, and both populations (Uvas and Guadalupe Creeks) of San Jose C. gulosus into single lineages. Similar species delimitation analyses were performed on the four known species of Klamath basin sculpin, including all three subspecies of C. klamathensis. BP&P indicated no cryptic lineages within any species (100% posterior probability). All three subspecies of C.

12

klamathensis were collapsed into a single species in the most frequent tree (80%), followed by a collapsed C. k. klamathensis and C. k. macrops group (16%) and each subspecies as distinct (4%). Analyses with STRUCTURE found similar results using 76 nuclear SNPs (results not shown). Initially two clusters were identified, one of all three subspecies of C. klamathensis and the second of the remaining three species (C. princeps, C. tenuis, C .asperrimus). Secondary clustering found no differences between subspecies within the C. klamathenis cluster and fully supported clusters for each of the remaining species. Phylogenetic analysis Individual gene tree estimates revealed some nuclear fragments were not highly variable for each cryptic lineage, resulting in a number of polytomies at any single gene tree. This was not the case for mitochondrial fragments, which exhibited high levels of variation that successfully discerned each identified lineage above, though nodes were not always strongly supported (either bootstrap or posterior probability) (Supplemental Figure D). Concatenated datasets identified similar lineages as those using species delimiting techniques. A number of differences in topology were observed between trees generated using only mitochondrial markers, only nuclear markers (Nuc), and those generated with both nuclear and mitochondrial markers (Nmt) (Figure 2.3). One difference was the relationship between C. gulosus from the San Jose and Russian River regions. Nuc markers indicate a common ancestor of these lineages with other C.gulosus (minus C. gulosus_WA) whereas these lineages are closer to C. asper in the Nmt concatenated tree (Figure 2.3). A second difference was found in the placement of C. marginatus, C. perplexus, and C.gulosus_WA individuals. This group forms a “clade” basal to C. asper and C. gulosus with the Nuc dataset, but the addition of mtDNA (Nmt) indicates a number of C. gulosus samples are more basal to these three species (Figure 2.3). Additionally, the relationship of C. asper from the San Joaquin River is found to be sister to Clear Lake C. asper individuals in the Nuc tree while these same individuals are found to be more closely related to coastal C. asper (Columbia and Smith Rivers) with mtDNA added (Figure 2.3). Species trees estimation, while exhibiting a number of topological discordances between each method and dataset, revealed the existence of four consistently resolved clades (Figures 2.4 and 2.5): 1) C.klamathensis, C. tenuis, C. princeps, and C. asperrimus from the Klamath Lakes region (Klamath group), 2) a cluster of samples containing C. gulosus, C. pitensis, C. gulosus_RR, and C. gulosus_BAY (Riffle group), 3) a cluster of C. asper (Prickly group), and 4) a final group from OR/WA (C. perplexus, C. marginatus, and C. gulosus_WA: OR/WA Coast group) (Figures 2.4 and 2.5). Species network analysis confirmed these clusters, but resolved different evolutionary relationships among them (Figure 2.6). A number of cryptic lineages were supported, consistent with findings from the species delimitation and concatenation analyses. These include the C. gulosus_WA (from Washington), C. gulosus_BAY (Uvas and Guadalupe Creeks), and C. gulosus_RR (Russian River), as well as C. asper_CL and BC (Clear Lake and Bear Creek). The inland (San Joaquin River) C. asper lineage was found to be divergent from the more coastal locations (Smith and Columbia Rivers). While this lineage is not well defined using only nuclear markers, the addition of mitochondrial data greatly increases distinctiveness (Figures 2.4 and 2.5). Species tree estimation proved to be strongly contingent upon which method (ML or Bayesian) or dataset (Nuc or Nmt) was employed. Species trees generated with ML approaches gave similar topologies (Figure 2.4; Supplemental Figure B). A comparable pattern was found within Bayesian species tree reconstructions (Figure 2.4; Supplemental Figure B). Bayesian and ML methods primarily differ in their placement of C. marginatus, C. perplexus, and C. gulosus_WA individuals. Maximum likelihood methods produce trees with these three species most closely related to C. asper individuals whereas Bayesian methods show them to be basal to

13

C. asper or C. gulosus. This pattern holds true regardless of dataset. A similar topological discrepancy between methods exists for the sister taxa of C. tenuis (Figure 2.4), where maximum likelihood estimates indicate C. klamathensis and Bayesian methods indicate C. asperrimus. This discrepancy likely led to the polytomy in the consensus trees witnessed between different species tree methods (Figure 2.4, Supplemental Figure B). A final discrepancy exists between Nuc and Nmt datasets, where the nuclear data does not give consistent estimates of relationships among C. asper lineages from the coast (Smith and Columbia R), inland (San Joaquin R), Clear Lake, or Bear Creek like those seen in the Nmt dataset (Figure 2.5, Supplemental Figure C). Of the individual species tree estimates, only the MDC comparison using ML gene trees seemed to be noticeably different from all other generated species trees. The primary difference was the basal node estimate of the Klamath clade and more derived node estimate for C. aleuticus, a species which appears basal in all other species tree estimates (Figure 2.4). Because this discrepancy occurs for both the Nuc and Nmt datasets, it is likely method specific and not reflective of the true evolutionary relationships of these two groups. A second discrepancy was observed between the Nuc and Nmt *BEAST trees, which may be indicative of possible mitochondrial capture of the novel lineages of C. gulosus (RR and BAY) by the C. asper clade (Figure 2.4, Supplemental Figure B). All remaining discrepancies are attributable to single topological changes and do not affect the topology between the major four clades. To identify instances of hybridization, a species network was created using the Nuc and Nmt maximum likelihood and Bayesian gene trees (Figure 2.6). Patterns that emerge from the Nuc gene trees (either ML or Bayesian) show some initial gene flow between C. princeps and C. asperrimus as well as between C. tenuis and C. klamathensis, though each of these lineages has continued without outside gene flow for some time (Figure 2.6). Additional gene flow is evident between the following lineages: C. perplexus and C. gulosus_WA, C. gulosus_BAY and C. gulosus_RR, C. gulosus and C. pitensis, and C. asper from the coastal and inland regions. There was no evidence for hybridization among contemporary species, and most gene flow predictions are either in the past or within lineages of the same species (such as the C. asper complex).

2.5 Discussion Using a comprehensive approach, our study successfully resolved the evolutionary relationships among endemic freshwater sculpin species along the west coast of North America, as well as identified a number of cryptic lineages in Cottopsis. Building upon initial phylogenetic relationships of Kinziger et al. (2005), multiple contemporary species tree methods using nuclear and mitochondrial data consistently identified five primary clades; one consisting of a single species, C. aleuticus (coastrange), a second composed of species from the Klamath and Pit basins (C. princeps, C. klamathensis, C. tenuis, C. asperrimus), a third from the Oregon/Washington coast (C. marginatus, C. perplexus, and C. gulosus_WA), a fourth group composed of C. asper, and a final group composed of C. gulosus, and C. pitensis. Our molecular analysis supported predictions regarding cryptic taxa based upon morphological and biogeographic considerations (Kresja 1967, Hopkirk 1973, Moyle 2002). Further, we identified a number of new lineages within C. gulosus (Russian River “RR”, San Jose region “BAY”, Oregon/Washington coast). One of these lineages (C.gulosus_WA) was only distantly related to the C. gulosus sampled from the type locality and may represent convergent or parallel evolution, as species diagnosis of C. gulosus is currently based on morphological characters alone (Moyle 2002). Additionally, three C. asper lineages were found to differ from the Columbia River C. asper (type locality - Richardson 1836) including inland populations from the San Joaquin River (C. asper_San Joaquin), Clear Lake (C. asper_CL), and Bear Creek (C. asper_BC). Lastly, indications of potential introgression through mitochondrial capture were discovered, which led to topological

14

differences between trees generated with and without cytonuclear data. Overall, our results are a clear indication of the power of molecular studies on difficult taxonomic questions, especially when morphological characters are limited, and represent a considerable improvement in the understanding of freshwater sculpin systematics. Species Delimitation Given the uncertain status of many Cottopsis lineages, it was expected that the method of Yang and Rannala (2010) would support recognition of several cryptic lineages. Excluding the two coastal populations of C. asper, all other locations were identified as distinct by BP&P. Normally one might expect a reduction in species estimates if gene flow assumptions are violated or the proposed guide tree too broad (Yang and Rannala 2010). Yet the partitioning here is almost too fine, as if each individual location or population has developed species level variation. This fine scale partitioning might be expected if cytonuclear data were included in this analysis (a faster evolutionary rate), but we only used nuclear loci. Yang and Rannala (2010) admit additional ground-truthing on empirical datasets is still needed for this method. Comparatively, the STRUCTURE analysis gave a more conservative estimation of putative lineages. Clusters included Central Valley C. gulosus, both C. gulosus_BAY locations, and all C. asper from historically connected Bear Creek and Clear Lake. These analyses are useful for identification of candidate lineages, however further analysis will still be necessary to fully delimit species. Species Tree Estimation Differences in the evolutionary placement of C. marginatus, C. perplexus, and C.gulosus_WA (OR/WA Coast group) seem to be driving the discordance between Bayesian and ML gene and species trees. Maximum likelihood species trees show OR/WA Coast group species are most closely related to C. asper, whereas Bayesian analyses show this group to be basal to all C. asper and C. gulosus. Other than outgroup C. bairdi, C. marginatus and C. gulousus_WA are the only samples which have but a single representative individual. To what extent this influences gene and or species tree estimation is unknown. Knowles (2009) stressed the importance of sampling more than one individual is cases where taxa have originated recently, though the data suggest these individuals are fairly well diverged. Perhaps future sampling will help resolve this ambiguity and solidify the evolutionary history of these species. Discrepancies between statistical approaches bring up a larger concern about species tree generation and the inherent assumptions of each method. Most methods assume gene trees are generated without error (Kubatko et al. 2009, Than and Nakhleh 2009). Factors such as missing data, incomplete sampling, hybridization, paralogy, isolation, and cryptic lineages can all obscure one’s ability to identify the correct method for species tree reconstruction (Edwards 2009). Other than the mitochondrial capture evident in the Nmt dataset (see below), no hybridization was identified in this study, making this factor unlikely to contribute to tree discordance. Our liberal estimation of species limits should also remove any bias associated with artificially grouping species for species tree analysis. With no evidence to suggest otherwise, it would seem incomplete lineage sorting is the most likely reason for discordance between trees. Therefore species tree assumptions made about incomplete lineage sorting (Blair and Murphy 2011) would likely hold here. In the end, each method attempted in this study had a slightly different assumptive approach which potentially accounted for the differences seen topologically. One final measure of discordance between species trees was linked to the inclusion of cytonuclear data. Considerable numbers of studies have combined nuclear and mitochondrial marker information in their phylogenetic analyses (Sota and Vogler 2001, Pereira et al. 2002, O’Grady and Kidwell 2002, Near and Cheng 2008). We were relatively cautious about doing so without addressing possible discrepancies. Mitochondrial DNA is essentially a single marker, undergoing evolutionary changes that are quicker and sometimes detached from the nuclear

15

genome (Avise et al. 1987, Avise 2009). In this study, estimated per locus evolutionary rates for mitochondrial loci are nearly 1.5 to 2 times those of nuclear loci, indicating a great deal more variation in the cytonuclear data than in the nuclear data. When one considers that the nine nuclear loci combine for approximately 4500 base pairs of sequence, while the mtDNA account for approximately 1400 base pairs, but at twice the rate, we can see that inclusion of mtDNA strongly influences the overall phylogenetic signal in the species tree. Additional concerns have also been raised about mitochondrial markers in relation to clonality, near-neutrality, and perceived clock-like substitution rates that should be strongly considered and tested when including in any species tree study (Galtier et al. 2009). Potential hybridization is also identifiable through comparison of trees constructed using nuclear and mitochondrial loci. One example is through mitochondrial capture, a pattern seen in a number of studies employing species tree methods (McGuire et al. 2007, Spinks and Shaffer 2009, Near et al. 2011), though the mechanism responsible remains unclear. We found newly identified lineages of C. gulosus (San Jose “BAY” and Russian River “RR”) share a common ancestor with C. asper lineages and not the C. gulosus/pitensis lineage in the combined dataset (Nmt), while the strictly nuclear sequence dataset grouped C. gulosus_BAY and C. gulosus_RR lineages with C. gulosus/pitensis. Individuals of C. gulosus_BAY and C. gulosus_RR are sympatric with C. asper in these locations, making contemporary and/or historical hybridization a possibility. However, this pattern is not consistent in all species trees or network analyses. Further exploration of this idea seems warranted but is outside the purview of this study. Regardless, our findings stress the importance of running nuclear and mtDNA separately and the discordance that can occur when only a single species tree method is employed. Biogeography Lineages identified in this study show strong correlations with distinct biogeographic regions. Three regions in particular correspond well with identified phylogenetic groups of sculpin (excluding C. aleuticus): coastal Oregon/Washington, the Klamath/Pit Rivers, and California's Great Central Valley. Each region is correlated with different genera or species of fish like lamprey, tule perch, dace, suckers, roach, hitch, and sticklebacks (Minckley et al. 1986). The region encompassing coastal Oregon and Washington is characterized by fish originating from the Columbia River basin (Minckley et al. 1986) but has been isolated from the Columbia River by the Coastal Range since the early Pleiocene (~5 million years ago) (Baldwin 1981). This is largely consistent with findings of the OR/WA Coastal group. Interestingly, the C. gulosus_WA lineage from this region is more closely related to C. marginatus and C. perplexus than any other C. gulosus lineages. In all likelihood, this lineage is not actually C. gulosus, but merely a misidentified new species originating in this region. If true, it may represent convergent evolution similar to that suspected in different inland radiations of C. asper (Kresja 1967). Endemic suckers Catostomus and Ptychocheilus have also been found in this same region, known as the Oregon Coastal Subprovince (Kettratad 2008). Taken together, this sub-radiation continues to support the biogeographic distinctiveness of the region and the ability of cottids to evolve new forms in situ. Perhaps the most unique biogeographic region amongst those recognized in this study is the Klamath/Pit River system. Identified as a distinct ichthyogeographic province (Hughes et al. 1987, Moyle 2002), fish from this region reflect a complex geologic history resulting in ancestors from the Great Basin/Columbia River (via the Snake River) and Sacramento River (Pease 1965). Phylogenetic relationships in sculpin support this complex history, with three sympatric species in the Pit River representing two very different ancestral lineages (C. pitensis – Riffle group and C. asperrimus and C. klamathensis macrops – Klamath group). Prior to the mid Pleistocene, the Pit River flowed northward into the Klamath basin (Pease 1965). Subsequent volcanic activity in

16

the region reversed the river’s course and connected it with the Sacramento River. Gene flow in ancestral C. asperrimus and C. klamathensis was then effectively blocked and the ancestor for C. pitensis / C. gulosus allowed to invade this new region, leading to the evolution of C. pitensis and the modern day sympatric distributions of each species. Similar molecular studies on Catostomus suckers, lamprey, and trout suggest this scenario has played out in other taxa (Nielsen et al. 1999, Smith et al. 2009, Kettratad 2008, S. Reid – personal communication), reflecting patterns similar to those in sculpin. However, the evolutionary relationships here in sculpin are by far the most definitive evidence that fish in the Pit River ichthyofauna had two separate origins. The final region of biogeographic diversification in this study is the Great Central Valley of California, which is composed of two groups within Cottopsis: the Riffle group and the Prickly group. Phylogenetic patterns suggest C. asper populations in this region are consistent with an inland radiation from coastal populations, whereas C. gulosus populations are more consistent with remnant isolated populations. For example, the newly identified lineages of C. gulosus (BAY and RR) appear to be isolated by the coastal range mountains, a barrier to other terrestrial (Carlsbeek et al. 2003; Burns and Barhoum 2006) and aquatic species (Spinks et al. 2009). Thus, biogeographic influences appear to substantially influence C. gulosus distributions and have apparently driven the allopatric origin of two 'new' C. gulosus lineages or species. The Clear Lake region represents a special case of biogeographic isolation of C. asper brought on by the rise of the coastal range. As a relatively old lake (480,000+ years), Clear Lake contains an exclusive ichthyofauna, including a distinct species of splittail (Pogonichthys ciscoides - now extinct) and a subspecies of hitch (Lavinia exilicauda chi - Aguilar and Jones 2009), tule perch (Hysterocarpus traskii lagunae - Hopkirk 1973), and prickly sculpin (Hopkirk 1973). Clear Lake displays a complex geologic history that reflects fish populations that have been isolated for long periods of time (Casteel et al. 1977), with ancestral sources for this region likely coming from the Russian (coastal) or Sacramento Rivers (inland). Along these lines we found highly distinct lineages of C. asper from the Clear Lake region (C. asper_CL and C. asper_BC) which probably indicates a distinct species that needs to be further evaluated. Unfortunately our limited sampling did not allow us to definitively identify the sister group to this lineage. More intensive sampling is necessary to discern whether the original ancestor of the contemporary population hails from coastal, inland, or combinations of both regions. Life History Evolution Similar to other clades of the genus Cottus (Yokoyama and Goto 2005), Cottopsis contains members with two different life histories, freshwater residents and amphidromous. Freshwater resident species conduct their entire life cycle in freshwater whereas amphidromous species spawn in freshwater and their pelagic eggs and larvae passively drift on river currents to estuaries where juveniles rear for a few months before migrating upstream. Two species of Cottopsis, C. asper and C. aleuticus, exhibit an amphidromous life history while the remainder of the species is freshwater residents. Life history types in Cottopsis are correlated with geographic distribution, as the two amphidromous species range from central California to Alaska and are restricted to coastal areas, rarely occurring farther than 20-50 km inland from estuarine habitats. In contrast, freshwater residents have much narrower geographic distributions and some species are found deep inland (Figure 2.1). Mapping life history type (freshwater or amphidromous) on the consensus species tree indicates two separate instances of the evolution of amphidromy in Cottopsis; in the basal lineage of C. aleuticus and the apically positioned C. asper (Figure 2.5). As necessitated by their life history, both amphidromous species are tolerant of full strength saltwater (Bond 1963: 30 ppt). Retention of saltwater tolerance is evident in freshwater resident species (C. perplexus and C. gulosus from Washington: 20 ppt, Bond 1963); suggesting members of Cottopsis are exapted to a

17

transition to an amphidromous life history. Life history type also appears to be strongly correlated with reproductive attributes, as the two amphidromous species, C. asper and C. aleuticus, have higher fecundities and smaller eggs than freshwater residents (Bond 1963, Krejsa 1965, Millikan 1968, Patten 1971, Moyle et al. 1983, Daniels 1987; Daniels and Moyle 1978, Bentivoglio 1998). Importantly, identical patterns of egg size evolution have been observed in four freshwater/amphidromous lineage pairs of Cottus from Japan and adjacent waters (Yokoyama and Goto 2005). Thus, parallel evolution of life history type and its correlation to reproductive attributes appears to be a general trend within Cottus and likely plays a significant role in the diversification process of the genus. Conclusion We feel our study reflects how beneficial multiple marker sequencing can be. For nearly 100 years, the true species identification of many of these freshwater sculpin species has been debated, including work done by a number of prominent ichthyologists. For the first time, technological and methodological advances have allowed for a more thorough examination of the systematics of these species. Though interpretation of the data with species trees is not without error, it represents a significant improvement over initial molecular attempts with single genes. Considering the geographic scope and distribution of many sculpin species, informed conservation and management decisions can now be made that apply to a number of sympatric species, greatly improving our understanding of species living within a vital resource, our fresh water.

18

2.6 Tables and Figures

Table 2.1 Samples collected for molecular systematic analysis, showing identification number, number of samples, and locations.

Species Figure ID Common Name # Location Latitude Longitude Cottus gulosus Cgulosus_Sac Riffle 2 Sacramento River, Tehama County, CA 40.155 -122.205 Cottus gulosus Cgulosus Riffle 2 Kings River (Avocado), Fresno County, CA. 36.783 -119.417 Cottus gulosus Cgulosus_WA Riffle 1 Wynoochee River, Grays Harbor County, WA. 47.001 -123.681 Cottus gulosus Cgulosus_RR Riffle 2 Russian River, Sonoma County, CA. 38.484 -122.824 Cottus gulosus Cgulosus_BAY Riffle 2 Uvas Creek, Santa Clara County, CA. 37.012 -121.627 Cottus gulosus Cgulosus_BAY Riffle 2 Guadalupe Creek, Santa Clara County, CA. 37.225 -121.905 Cottus pitensis Cpitensis Pit 1 Pit River along Clark Creek Rd, Shasta County, CA. 40.985 -121.777 Cottus pitensis Cpitensis Pit 1 Hat Creek., trib to Pit R., Shasta County, CA. 40.981 -121.571 Cottus asper Casper_coastal Prickly 2 Smith River, Del Norte County, CA. 41.853 -124.122 Cottus asper Casper_inland Prickly 2 San Joaquin River, Madera County, CA. 36.933 -119.750 Cottus asper Casper_CL Prickly 2 Clear Lake, Lake County, CA. 38.995 -122.705 Cottus asper Casper_BC Prickly 2 Bear Creek, east of Clear Lake, Lake County, CA. 39.012 -122.561 Cottus asper Casper_coastal Prickly 2 Mouth of Columbia River, Clatsop County, OR. 46.241 -123.797 Cottus aleuticus Caleuticus Coastrange 1 Scott Creek, Santa Cruz County, CA. 37.042 -122.227 Cottus aleuticus Caleuticus Coastrange 1 Smith River, Del Norte County, CA. 41.815 -123.093 Cottus marginatus Cmarginatus Margined 1 Little Tucannon River, Columbia County, WA. 46.358 -117.688 Cottus bairdi Cbairdi Mottled 1 Cobb Creek, Laclede County, MO. 37.478 - 92.501 Cottus perplexus Cperplexus Reticulate 1 Bear Creek, Jackson County, OR. 42.196 -122.680 Cottus perplexus Cperplexus Reticulate 1 Greasy Creek, Benton County, OR. 44.534 -123.374 Cottus princeps Cprinceps Klamath Lake 2 Klamath Lake, Klamath County, OR. 42.382 -121.813 Cottus asperrimus Casperrimus Rough 1 Crystal Lake Springs at Hat Creek, Shasta County, CA. 40.934 -121.548 Cottus asperrimus Casperrimus Rough 1 Above Hat Creek Falls, Shasta County, CA. 41.114 -121.392 Cottus tenuis Ctenuis Slender 2 Upper Klamath Lake, Klamath County, OR. 42.382 -121.813 Cottus klamathensis polyporus Cklamathensis Marbled-Lower 2 Shasta River, Siskiyou County, CA. 41.781 -122.597 Cottus klamathensis klamathensis Cklamathensis Marbled-Upper 2 Upper Klamath Lake, Klamath County, OR. 42.382 -121.813 Cottus klamathensis macrops Cklamathensis Marbled-Big Eyed 1 Hat Creek, Shasta County, CA. 40.961 -121.548 Cottus klamathensis macrops Cklamathensis Marbled-Big Eyed 1 Big Lake, Shasta County, CA. 41.114 -121.392

19

Table 2.2Marker information for each nuclear (Nuc) or mitochondrial (mt) sequence used to generate individual gene trees or concatenated tree (Nmt).

Marker Length (bp) Model Recomb? Relative Molecular Notes (AICc) Rate Clock 505 492 K80+G No 0.82 Yes - 507 511 HKY+G No 1.09 Yes Fails in C. princeps 508 488 HKY+G No 0.8 No - 510 490 F81 No 1.5 No - 514 491 HKY+G No 1.06 Yes Fails in C. gulosus_Kings 516 459(279) HKY+G Yes (179) 0.41 Yes - 517 437 K80+G No 0.41 Yes - 518 440 K80+G No 0.33 No - 520 443 TPM+G No 1.01 Yes - Total Nuc 4071 TPM+G CR 435 HKY+G 2.02 No - Cyt _B 962 TrN+I+G 1.55 No - Total mt 1397 TrN+I+G Total Nmt 5468 GTR+G

20

Figure 2.1 Contemporary distributions of 10 species of freshwater sculpin within the Cottopsis clade.

21

Figure 2.2 Species delimitation using 83 SNP’s in STRUCTURE (Pritchard et al. 2000); with two individuals per population. Three initial clusters are shown followed by additional structuring within each initial cluster.

22

Figure 2.3 Concatenated mitochondrial only (A), nuclear only (B) and nuclear/mitochondrial combined (C) phylogenetic trees for sculpin species within the Cottopsis clade. Unmarked nodes represent > 0.9 & > 75% Bayesian and maximum likelihood support, respectively, whereas marked nodes are as shown.

23

Figure 2.4 MetaTree showing topological differences between species tree methods (numbers on trees match nodes on MetaTree). All species trees were generated using nuclear and mitochondrial (Nmt) sequence gene trees. Species trees for internal nodes 4 and 9 also shown.

24

Figure 2.5 50% majority rule consensus trees of all species trees generated for the nuclear/mitochondrial (Nmt) dataset. Five distinct clusters/clades are identified in the consensus tree. The outgroup C. bairdi is not shown. Numerical values on nodes are for reference to Supplemental Table 2B. Green indicates amphidromous and blue indicates fluvial/lacustrine (freshwater resident) life history modes. Black dashed lines indicate uncertainty in the ancestral node for the group.

25

Figure 2.6 Hybridization network showing potential gene flow between known and putative species or lineages for two different sets of Bayesian gene trees. Nuc refers to only gene trees from the nine nuclear markers whereas Nmt refers to those same nine loci plus the addition of two mitochondrial markers.

26

2.7 Supplemental Material

Supplemental Table 2A Primer sequences for a collection of 10 nuclear markers indicating sequence, direction, and accession numbers. Accession Marker Primer # Cas502 F 5' CGTATGAGGTATCGTTATTCATTTT 3' R 5' GCGCACCGTTACGGACTA 3'

Cas505 F 5' TTCCTCTGTGATGTGCTTCG 3' R 5' CGTCCAGTAGCCGCTTTC 3'

Cas507 F 5' TTGTAAGTCTTTAATACAGCATGACG 3' R 5' GACTGCAGCTATTTGGTACTGG 3'

Cas508 F 5' AAATAGTCAAAGCATCTTAAATCCTAA 3' R 5' TCACTCTTCTTCCATCAAAATGAA 3'

Cas510 F 5' ACTCGATCTGGCCACAGAAA 3' R 5' TGTTAGGGCAGGAAAAGGTG 3'

Cas514 F 5' ACCTGCCCTCACCCATCTAC 3' R 5' TCCCAAACCTGAAACAGATTTTA 3'

Cas516 F 5' TTTTTCCTTCTGTGACATTGGTT 3' R 5' CGCGCCTGTCTACGTTTAAG 3'

Cas517 F 5' ACACCCAGTGGTCGGATAAG 3' R 5' CAGCACAACGTCTCTGCATT 3'

Cas518 F 5' TGCCGGTCATCATAGTCATAA 3' R 5' GGCGTACCTTGCTCAGAGTC 3'

Cas520 F 5' CCCTGTCAATGGTCTCCAGT 3' R 5' TTCCACAACAAGCATAAAGCA 3'

27

Supplemental Table 2B Comparative matrix of maximum likelihood and Bayesian species tree methods for each consensus tree in Figure 5 (Nmt) and supplementary Figure B (Nuc). A value of one indicates nodal presence in a particular species tree method, whereas a value of zero indicates its absence.

Nuc Count PHYLONE STEM PHYLONE Bucky PHYLONE StarBEAS PHYLONET 1 7 1 1 1 1 1 1 1 2 7 1 1 1 1 1 1 1 3 7 1 1 1 1 1 1 1 4 7 1 1 1 1 1 1 1 5 7 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 7 6 0 1 1 1 1 1 1 8 6 0 1 1 1 1 1 1 9 6 1 1 0 1 1 1 1 10 6 1 0 1 1 1 1 1 11 6 1 1 1 0 1 1 1 12 4 0 0 0 1 1 1 1

Nmt Count PHYLONE STEM PHYLONE Bucky PHYLONE StarBEAS PHYLONET 1 7 1 1 1 1 1 1 1 2 7 1 1 1 1 1 1 1 3 7 1 1 1 1 1 1 1 4 7 1 1 1 1 1 1 1 5 7 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 7 6 0 1 1 1 1 1 1 8 6 0 1 1 1 1 1 1 9 6 1 1 0 1 1 1 1 10 6 0 1 1 1 1 1 1 11 6 0 1 1 1 1 1 1 12 6 1 1 1 1 0 1 1 13 6 1 1 1 1 1 0 1 14 4 0 0 0 1 1 1 1

28

Casper_coastal_Smith

Casper_coastal_Columbia

Casper_inland_Kings

Casper_inland_ClearLake

Casper_inland_BearCr

Cgulosus_Russian

Cgulosus_Guadalupe

Cgulosus_Uvas

Cpitensis_Pit

Cgulosus_Kings

Cgulosus_Sacramento 0.5

Supplemental Figure 2A Guide tree for initial species delimitation using BP&P by Yang and Rannala (2010) based on current morphological identifications and concatenated tree estimates.

29

Supplemental Figure 2B MetaTree showing differences between species tree methods (numbers match nodes on MetaTree). All species trees were generated using nuclear (Nuc) sequence gene trees. Projected species trees for internal nodes 4 and 9 also shown.

30

Supplemental Figure 2C 50% majority rule consensus trees of all species trees generated for the nuclear (Nuc) dataset. Five distinct clusters/clades are identified in each consensus tree with outgroup C. bairdi not shown. Numerical values on nodes are for reference to Supplementary Table B. Black dashed lines indicate uncertainty in the ancestral mode for the group.

31

Cottus_bairdi Cottus_bairdi Caleuticus2 Caleuticus1 Caleuticus1 Caleuticus2 Cgulosus_RR1 Casper_coastal3 Cgulosus_RR2 Casper_BC1 Cgulosus1 Casper_inland2 Cgulosus2 Casper_inland1 Cpitensis3 Casper_coastal4 Cpitensis4 Casper_coastal1 Cpitensis2 Casper_coastal2 Cpitensis1 Casper_CL2 Cgulosus_BAY3 Casper_CL1 Cgulosus_BAY4 Casper_BC2 Cgulosus_BAY1 Cpitensis1 Cgulosus_BAY2 Cpitensis2 Cgulosus_WA Cpitensis4 Cmarginatus Cpitensis3 Cperplexus1 Cgulosus1 Cperplexus2 Cgulosus2 Casper_inland1 Cgulosus_BAY2 Casper_inland2 Cgulosus_BAY1 Casper_BC1 Cgulosus_BAY4 Casper_BC2 Cgulosus_BAY3 Casper_CL1 Cgulosus_RR1 Casper_CL2 Cgulosus_RR2 Casper_coastal3 Cgulosus_WA Casper_coastal4 Cmarginatus Casper_coastal1 Cperplexus2 Casper_coastal2 Cperplexus1 Ctenuis2 Cprinceps1 Ctenuis3 Cprinceps2 Ctenuis1 Casperrimus2 Casperrrimus1 Casperrimus1 Casperrimus2 Ctenuis2 Ckpolyporus1 Ctenuis3 Ckpolyporus2 Ctenuis1 Ckmacrops2 Ckmacrops1 Ckklamathensis2 Ckklamathensis1 Ckklamathensis1 Ckklamathensis2 Ckmacrops1 Ckmacrops2 Cprinceps2 Ckpolyporus2 Cprinceps1 Ckpolyporus1

0.2 0.0030

505

Supplemental Figure 2D Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

32

Cottus_bairdi Caleuticus1 Caleuticus1 Caleuticus2 Caleuticus2 Casper_coastal3 Casper_coastal4 Cgulosus_RR1 Casper_coastal2 Cgulosus_RR2 Casper_inland1 Cpitensis1 Casper_inland2 Cpitensis2 Casper_BC1 Cgulosus_BAY4 Casper_BC2 Cgulosus_BAY3 Casper_CL1 Cgulosus_BAY2 Casper_CL2 Cgulosus_BAY1 Casper_coastal1 Cpitensis3 Cgulosus1 Cgulosus1 Cgulosus2 Cgulosus2 Cpitensis3 Cpitensis4 Cpitensis4 Casper_inland1 Cgulosus_BAY3 Casper_inland2 Cgulosus_BAY4 Casper_CL1 Cgulosus_RR1 Casper_CL2 Cgulosus_RR2 Casper_coastal3 Cgulosus_BAY1 Casper_coastal1 Cgulosus_BAY2 Casper_BC1 Cpitensis2 Casper_BC2 Cpitensis1 Cmarginatus Cgulosus_WA Cperplexus1 Cperplexus2 Cgulosus_WA Cperplexus1 Cperplexus2 Cmarginatus Casper_coastal4 Casperrimus2 Casper_coastal2 Casperrimus1 Ctenuis3 Ctenuis3 Ctenuis1 Ckpolyporus1 Ckklamathensis1 Ckmacrops1 Ckklamathensis2 Ctenuis1 Ckmacrops2 Ckmacrops2 Ckmacrops1 Ckklamathensis1 Ckpolyporus1 Ckklamathensis2 Casperrimus2 Cottus_bairdi Casperrimus1

0.2 0.0050

507

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

33

Cottus_bairdi Cottus_bairdi Cgulosus_WA Cgulosus_WA Cmarginatus Cmarginatus Cgulosus_RR1 Caleuticus2 Cgulosus_RR2 Caleuticus1 Cgulosus_BAY3 Cperplexus1 Cgulosus_BAY4 Cperplexus2 Cgulosus_BAY1 Casper_inland1 Cgulosus_BAY2 Casper_inland2 Cgulosus1 Casper_coastal4 Cgulosus2 Casper_coastal3 Cpitensis3 Casper_coastal1 Cpitensis2 Casper_BC2 Cpitensis1 Casper_coastal2 Cpitensis4 Casper_CL1 Cperplexus1 Casper_CL1 Casper_coastal3 Cgulosus_RR1 Casper_coastal4 Cgulosus_BAY3 Casper_inland1 Cgulosus_BAY4 Casper_inland2 Cgulosus_RR2 Casper_coastal1 Cgulosus_BAY1 Casper_coastal2 Cgulosus_BAY2 Casper_BC2 Cpitensis4 Casper_CL2 Cgulosus1 Casper_CL1 Cgulosus2 Cperplexus2 Cpitensis1 Ckklamathensis1 Cpitensis3 Ckklamathensis2 Cpitensis2 Ckmacrops1 Casperrimus2 Ckmacrops2 Casperrimus1 Ckpolyporus1 Ckmacrops1 Ckpolyporus2 Ckpolyporus2 Ctenuis2 Ckklamathensis1 Ctenuis3 Ckklamathensis2 Cprinceps2 Ckmacrops2 Cprinceps1 Ckpolyporus1 Casperrimus2 Ctenuis2 Casperrimus1 Ctenuis3 Caleuticus2 Cprinceps2 Caleuticus1 Cprinceps1

0.2 0.0040

508

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

34

Cgulosus_RR1 Cottus_bairdi Cgulosus_RR2 Caleuticus1 Cgulosus_BAY3 Caleuticus2 Cgulosus_BAY4 Cperplexus2 Cgulosus_BAY1 Cperplexus1 Cgulosus_BAY2 Cmarginatus Cgulosus1 CguUC1 Cgulosus2 CguUC2 Cpitensis3 CguGC1 Cpitensis4 CguGC2 Casper_coastal3 Cgulosus_RR1 Casper_coastal4 Cgulosus_RR2 Casper_coastal2 Cpitensis3 Casper_coastal1 Cpitensis4 Casper_inland1 Cgulosus1 Casper_inland2 Cgulosus2 Casper_BC1 Cgulosus_WA Casper_BC2 Casper_BC1 Cpitensis2 Casper_inland1 Cpitensis1 Casper_inland2 Casper_CL1 Casper_BC2 Casper_CL2 Casper_CL1 Cgulosus_WA Casper_CL2 Ctenuis2 Cpitensis2 Ctenuis1 Cpitensis1 Ckklamathensis1 Casper_coastal2 Ckklamathensis2 Casper_coastal1 Ckmacrops1 Casper_coastal3 Ckmacrops2 Casper_coastal4 Ckpolyporus1 Cprinceps2 Ckpolyporus2 Cprinceps1 Casperrimus2 Ctenuis2 Casperrimus1 Ctenuis1 Cprinceps2 Casperrimus2 Cprinceps1 Casperrimus1 Cperplexus1 Ckklamathensis2 Cmarginatus Ckmacrops1 Cperplexus2 Ckmacrops2 Caleuticus1 Ckklamathensis1 Caleuticus2 Ckpolyporus1 Cottus_bairdi Ckpolyporus2

0.0050 0.0040

510

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

35

Caleuticus2 Cottus_bairdi Caleuticus1 Caleuticus2 Cprinceps1 Caleuticus1 Cprinceps2 Casper_coastal2 Ckklamathensis1 Casper_coastal3 Ckklamathensis2 Casper_BC2 Ckmacrops1 Casper_coastal4 Ckmacrops2 Casper_CL1 Ckpolyporus1 Casper_inland1 Ckpolyporus2 Casper_inland2 Ctenuis1 Casper_CL2 Ctenuis2 Casper_coastal1 Ctenuis3 Casper_BC1 Casperrimus1 Cgulosus_WA Casperrimus2 Cperplexus1 Cpitensis1 Cmarginatus Cpitensis3 Cprinceps1 Cpitensis4 Cprinceps2 Cpitensis2 Casperrimus1 Cgulosus_RR2 Casperrimus2 Cgulosus_BAY3 Ctenuis2 Cgulosus_BAY4 Ctenuis1 Cgulosus_BAY1 Ctenuis3 Cgulosus_BAY2 Ckklamathensis2 Cperplexus1 Ckmacrops1 Cmarginatus Ckpolyporus1 Casper_coastal3 Ckmacrops2 Casper_coastal4 Ckklamathensis1 Casper_inland1 Ckpolyporus2 Casper_inland2 Cgulosus_RR2 Casper_BC1 Cgulosus_BAY3 Casper_BC2 Cgulosus_BAY4 Cgulosus_WA Cgulosus_BAY1 Casper_coastal2 Cgulosus_BAY2 Casper_coastal1 Cpitensis1 Casper_CL1 Cpitensis2 Casper_CL2 Cpitensis3 Cottus_bairdi Cpitensis4

0.09 0.0030

514

Supplemental Figure D (cont’d). Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

36

Cgulosus_RR1 Cottus_bairdi Cgulosus_RR2 Caleuticus2 Casper_coastal3 Caleuticus1 Casper_coastal4 Casper_coastal4 Casper_inland1 Casper_CL1 Casper_inland2 Cgulosus_RR1 Casper_BC1 Casper_CL2 Casper_BC2 Cpitensis3 Cgulosus_BAY3 Cgulosus1 Cgulosus_BAY4 Casper_inland1 Cgulosus_BAY1 Casper_inland2 Cgulosus_BAY2 Cgulosus2 Casper_CL1 Cpitensis4 Casper_CL2 Cgulosus_BAY4 Casper_coastal2 Casper_BC2 Ckklamathensis1 Casper_BC1 Ckklamathensis2 Cgulosus_BAY3 Ckmacrops1 Cgulosus_BAY1 Ckmacrops2 Cgulosus_BAY2 Ckpolyporus1 Cgulosus_RR2 Ckpolyporus2 Cpitensis2 Cprinceps2 Cpitensis1 Cprinceps1 Casper_coastal3 Ctenuis1 Casper_coastal2 Ctenuis3 Casperrimus1 Casperrimus2 Cprinceps2 Casperrimus1 Cprinceps1 Cgulosus1 Casperrimus2 Cgulosus2 Ctenuis3 Cpitensis3 Ctenuis1 Cpitensis4 Ckmacrops1 Cpitensis2 Ckklamathensis1 Cgulosus_WA Ckpolyporus1 Cperplexus1 Ckklamathensis2 Cmarginatus Ckmacrops2 Cperplexus2 Ckpolyporus2 Cpitensis1 Cmarginatus Caleuticus2 Cperplexus2 Caleuticus1 Cgulosus_WA Cottus_bairdi Cperplexus1

0.2 0.0030

516

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

37

Cgulosus2 Cottus_bairdi Cgulosus1 Caleuticus2 Cpitensis3 Caleuticus1 Cpitensis4 Cgulosus2 Cpitensis2 Cgulosus1 Cpitensis1 Cpitensis1 Cperplexus2 Cpitensis3 Cgulosus_WA Cpitensis4 Cperplexus1 Cpitensis2 Casper_inland2 Cmarginatus Cgulosus_RR1 Casper_inland1 Cgulosus_RR2 Casper_coastal3 Casper_BC1 Casper_coastal4 Casper_BC2 Cgulosus_BAY1 CasCL1 Cgulosus_BAY3 CasCL2 Cgulosus_BAY4 Casper_coastal3 Cgulosus_BAY2 Casper_coastal4 Casper_CL1 Casper_inland1 Casper_BC1 Cmarginatus Casper_CL2 Cgulosus_BAY3 Casper_inland2 Cgulosus_BAY4 Cgulosus_WA Cgulosus_BAY1 Cperplexus2 Cgulosus_BAY2 Cperplexus1 Cprinceps1 Casper_BC2 Cprinceps2 Cgulosus_RR1 Casper_coastal1 Cgulosus_RR2 Casper_coastal2 Casper_coastal2 Ckklamathensis1 Casper_coastal1 Ckmacrops1 Ckklamathensis1 Ckmacrops2 Ckmacrops1 Ckmacrops1 Ckmacrops2 Ckpolyporus1 Ckklamathensis2 Ctenuis1 Ckpolyporus1 Ctenuis2 Cprinceps1 Ctenuis3 Cprinceps2 Casperrimus2 Ctenuis3 Casperrimus1 Ctenuis1 Caleuticus2 Ctenuis2 Caleuticus1 Casperrimus2 Cottus_bairdi Casperrimus1

0.1 0.0040

517

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

38

Cmarginatus Cmarginatus Cperplexus2 Casper_CL1 Ctenuis2 Cgulosus_RR1 Ckklamathensis1 Casper_inland1 Ckklamathensis2 Casper_inland2 Ckmacrops1 Cgulosus_RR2 Ckmacrops2 Caleuticus1 Ckpolyporus1 Cgulosus_BAY1 Ckpolyporus2 Cgulosus_BAY3 Ctenuis1 Cgulosus_BAY2 Ctenuis3 Cgulosus1 Cprinceps2 Cgulosus2 Cprinceps1 Casper_BC1 Casperrimus2 Cpitensis3 Casperrimus1 Casper_BC2 Casper_coastal4 Cpitensis4 Casper_coastal3 Cgulosus_BAY4 Casper_coastal1 Cpitensis1 Cgulousus_WA Casper_coastal2 Cperplexus1 Casper_CL2 Cgulosus_RR1 Cpitensis2 Cgulosus_RR2 Cperplexus2 Casper_coastal2 Cgulosus_WA Casper_inland1 Casper_coastal1 Casper_inland2 Casper_coastal4 Casper_BC1 Casper_coastal3 Casper_BC2 Cperplexus1 Cgulosus1 Ctenuis1 Cgulosus2 Ctenuis3 Cpitensis3 Ctenuis2 Cpitensis4 Ckklamathensis1 Cgulosus_BAY3 Ckpolyporus1 Cgulosus_BAY4 Ckmacrops2 Cgulosus_BAY1 Ckpolyporus2 Cgulosus_BAY2 Ckklamathensis2 Casper_CL1 Ckmacrops1 Casper_CL2 Cprinceps2 Cpitensis2 Cprinceps1 Cpitensis1 Casperrimus2 Caleuticus1 Casperrimus1 Cottus_bairdi Cottus_bairdi

0.09 0.0030

518

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

39

Cperplexus2 Cottus_bairdi Caleuticus1 Casperrimus2 Casper_coastal1 Casperrimus1 Casper_coastal2 Ctenuis2 Cmarginatus Ctenuis3 Casper_BC1 Ctenuis1 Casper_BC2 Ckklamathensis1 Cgulosus_WA Ckpolyporus1 Cperplexus1 Ckmacrops1 Cprinceps1 Ckmacrops2 Cprinceps2 Ckpolyporus2 Ckklamathensis1 Cprinceps1 Ckmacrops1 Cprinceps2 Ckmacrops2 Caleuticus1 Ckpolyporus1 Casper_BC1 Ckpolyporus2 Casper_BC2 Ctenuis1 Cperplexus2 Ctenuis2 Casper_coastal1 Ctenuis3 Casper_coastal2 Casperrimus2 Cmarginatus Casperrimus1 Casper_coastal4 Cgulosus_RR1 Casper_coastal3 Cgulosus_RR2 Cpitensis1 Casper_CL1 Cpitensis2 Casper_CL2 Casper_inland1 Casper_coastal3 Cgulosus1 Casper_coastal4 Cgulosus_BAY4 Casper_inland1 Cgulosus_BAY3 Cgulosus1 Cgulosus2 Cgulosus2 Cpitensis3 Cpitensis3 Cpitensis4 Cpitensis4 Cgulosus_RR1 Cgulosus_BAY3 Cgulosus_RR2 Cgulosus_BAY4 Casper_CL1 Cpitensis1 Casper_CL2 Cpitensis2 Cgulosus_BAY1 Cgulosus_BAY1 Cgulosus_BAY2 Cgulosus_BAY2 Cgulosus_WA Cottus_bairdi Cperplexus1

0.07 0.0040

520

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

40

Cprinceps2 Cgulosus_RR2 Cprinceps1 Cgulosus_RR1 Casperrimus1 Cgulosus_WA Casperrimus2 Casper_CL2 Ctenuis1 Casper_CL1 Ctenuis3 Casper_BC2 Ckklamathensis1 Casper_BC1 Ckklamathensis2 Cmarginatus Ckmacrops1 Cgulosus_BAY2 Ckmacrops2 Cgulosus_BAY1 Ckpolyporus1 Cgulosus_BAY4 Ckpolyporus2 Cgulosus_BAY3 Cpitensis2 Casper_coastal4 Cpitensis1 Casper_inland2 Cpitensis3 Casper_inland1 Cpitensis4 Casper_coastal1 Cgulosus1 Casper_coastal3 Cgulosus2 Casper_coastal2 Casper_coastal1 Cpitensis1 Casper_inland1 Cpitensis2 Casper_inland2 Cpitensis4 Casper_coastal4 Cpitensis3 Casper_coastal2 Cgulosus2 Casper_coastal3 Cgulosus1 Cgulosus_BAY3 Cperplexus2 Cgulosus_BAY4 Cperplexus1 Cgulosus_BAY1 Caleuticus1 Cgulosus_BAY2 Caleuticus2 Casper_BC1 Cprinceps1 Casper_BC2 Cprinceps2 Casper_CL1 Ctenuis3 Casper_CL2 Ctenuis1 Cgulosus_WA Ckpolyporus2 Cmarginatus Ckpolyporus1 Cgulosus_RR1 Ckmacrops2 Cgulosus_RR2 Ckmacrops1 Cperplexus1 Ckklamathensis2 Cperplexus2 Ckklamathensis1 Caleuticus2 Casperrimus2 Caleuticus1 Casperrimus1 Cottus_bairdi Cottus_bairdi

0.02 0.0090

CB

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

41

Casper_coastal1 Cottus_bairdi Casper_inland1 Cperplexus1 Casper_inland2 Cgulosus_WA Casper_coastal4 Cperplexus2 Cmarginatus Casper_BC1 Casper_BC1 Casper_CL2 Casper_BC2 Casper_BC2 Casper_CL1 Casper_CL1 Casper_CL2 Cgulosus_RR1 Cprinceps2 Cgulosus_RR2 Cprinceps1 Cgulosus_BAY3 Ctenuis1 Cgulosus_BAY4 Ctenuis2 Cgulosus_BAY1 Ctenuis3 Cgulosus_BAY2 Casperrimus1 Cmarginatus Casperrimus2 Cgulosus1 Ckklamathensis1 Cgulosus2 Ckklamathensis2 Cpitensis3 Ckmacrops1 Cpitensis4 Ckpolyporus1 Cpitensis1 Ckmacrops2 Cpitensis2 Ckpolyporus2 Casper_inland1 Cgulosus_RR1 Casper_inland2 Cgulosus_RR2 Casper_coastal2 Cgulosus_BAY3 Casper_coastal3 Cgulosus_BAY4 Casper_coastal1 Cgulosus_BAY1 Casper_coastal4 Cgulosus_BAY2 Ckmacrops2 Cpitensis1 Ckklamathensis1 Cpitensis2 Ckklamathensis2 Cgulosus1 Ckmacrops1 Cgulosus2 Ckpolyporus1 Cpitensis3 Ckpolyporus2 Cpitensis4 Casperrimus1 Casper_coastal2 Casperrimus2 Casper_coastal3 Ctenuis3 Caleuticus2 Ctenuis1 Caleuticus1 Ctenuis2 Cperplexus1 Cprinceps2 Cgulosus_WA Cprinceps1 Cperplexus2 Caleuticus2 Cottus_bairdi Caleuticus1

0.04 0.02

CR

Supplemental Figure 2D (cont’d) Phylogenetic gene trees for each individual locus used to generate overall species trees. Trees on the left represent a Bayesian approach whereas trees on the right represent a maximum likelihood approach.

Chapter 3: Complex phylogeography and historical hybridization between sister taxa of freshwater sculpin (Cottus).

3.1 Abstract Native fish species diversification over time is often linked to regional differences within the hydrologic landscape. Exploring this temporal diversification often requires a multi-tiered genetic approach. In California, much of the hydrologic landscape is encapsulated in the San Joaquin, Sacramento, and Pit River basins. Two sister taxa of fish, riffle (Cottus gulosus) and Pit (Cottus pitensis) sculpin, inhabit these basins but their , distribution, and phylogeographic histories are unclear. Using a comprehensive geographic sampling and nuclear, mitochondrial, and microsatellite markers, we identified a genetic signature very different from morphological predictions. Coastal populations of C. gulosus were found to be a unique lineage, and “true” C. gulosus were limited to tributaries of the San Joaquin basin in the western Sierra Nevada. Low levels of ancestral hybridization were found in populations inhabiting the Sacramento River basin, while all contemporary individuals exhibit C. pitensis mitochondrial haplotypes. The historical range of C. pitensis appears to have extended from the headwaters of the Pit River to the mouth of the Sacramento, although contemporary populations within the Sacramento River appear to be genetically distinct. Geologic and geographic evidence along with genetic data are consistent with two distinct phylogeographic breaks, one associated with the Coast Range Mountains and the second somewhere around the current Sacramento-San Joaquin Delta. C. pitensis populations show limited genetic structure, likely from high gene flow whereas inland C. gulosus populations show extensive population structure, consistent with isolation of individual tributaries to the San Joaquin River. Results indicate that conservation strategies are needed for these two species especially isolated inland C. gulosus populations in the western Sierra Nevada.

3.2 Introduction How sister taxa established their current geographical distributions requires understanding the historical context wherein divergence occurred. Often, underlying genetic information in contemporary organisms can be linked to geographic, geologic, or ecological conditions which potentially facilitated that divergence (Avise 2000). Conditions can then be inferred from other sympatric species or populations, providing important information on whole regions, both historically and currently (Carlsbeek et al. 2003). To discover this information requires a comprehensive molecular approach, using molecular markers reflective of different evolutionary time frames (Crandall et al. 2000, Hewitt 2001, Schlötterer 2004, Ouborg et al. 2008). If different time frames are ignored, unknown factors such as secondary contact and hybridization may go unnoticed, obscuring the true divergence history (Lu et al. 2001, Brown et al. 2011). Once the historical context is identified, causative links between conditions and current species or population structure is possible. In California, a complex geologic history has combined with extensive historical and contemporary forces to shape species and their population structure (Shaffer et al. 2004, Burns and Barhoum 2006, Starrett and Hedin 2006). Nowhere is this more evident than in the flora (Hickman 1993), as conditions have led to extensive diversification of plants, consistent with California’s designation as a biodiversity hotspot (Brooks et al. 2002, Carlsbeek et al. 2003). Evidence of volcanism (Pease 1965), glaciation (Mulch et al. 2008), marine incursions (Dupré 1991), and mountain formation (Howard 1979) within the last five million years (Dupré et al. 1991) and more recent anthropogenic changes (Leu et al. 2008) have all provided a backdrop on which to study different biogeographic influences. Numerous studies have looked at how forces shaped the distribution of terrestrial organisms (Wake 1997, Lapointe and Rissler 2005, Feldman 42

43

and Spicer 2006, Spinks et al. 2010), yet fewer (King et al. 1996) have looked at species restricted to freshwater environments and how their contemporary phylogeographic distributions were developed. With an extensive array of rivers reflecting both coastal and inland freshwater drainages, California’s riverine system is relatively specialized, as evident from its highly diverse and endemic snail (Hershler 1995), crayfish (Light et al. 2002), and fish fauna (Moyle 2002, McGinnis 2006). Rivers have historically been shaped and reshaped by the ever changing landscape, continuously connecting and isolating systems (Macdonald and Gay 1968, Mount 1995). Extensive contemporary modifications such as dams and water diversions have only further modified the state’s riverine landscape (Moyle and Williams 2005, Kondolf and Batalla 2005). Unlike terrestrial organisms which migrate according to specific landscape variables (Manel et al. 2003, Storfer et al. 2010), freshwater organisms are often restricted by closed riverine or lacustrine networks, unable to migrate unless there is connectivity between systems (Poissant et al. 2005, Hughes et al. 2009). This isolation creates a unique genetic signature, which reflects a history of not just the organism but its geographic location. Genetic differences between populations can also provide information as to when or how connectivity was established or broken, even temporarily, potentially providing a timeline for when different species or populations formed. In certain cases, the recognized distribution of species does not fit a reasonable geographic or geologic scenario. For example the current ranges of riffle (C. gulosus) and Pit sculpin (C. pitensis), two sister taxa of California fish which exhibit a disjunct and parapatric distribution (Page and Burr 2011). C. gulosus inhabits a small coastal region around southern San Jose and also the foothill and high elevation headwaters of most major river drainages within the Sacramento and San Joaquin basins whereas C. pitensis is confined to the Pit River drainage (Figure 1) (Moyle 2002, Page and Burr 2011). Morphologically these two species differ only by C. gulosus usually having a joined dorsal fin, palatine teeth, a single chin pore, and a mouth slightly wider than the body post-pectoral fin (Moyle 2002, Page and Burr 2011). Meristically both species are also very similar, the lone difference being the lower bound of lateral line pores in C. gulosus versus C. pitensis. The distribution of C. gulosus is unusual given its relatively poor re-colonization abilities (Moyle et al. 1996). For example, regions surrounding the Sacramento and San Joaquin Rivers are influenced by contrasting climatic conditions based on elevation in bordering Sierra Nevada Mountains (Knowles and Cayan 2002). These differences led to variable glacial influences during the mid-Pleistocene (~1.5-1 mya) (Gillespie et al. 2004, Mulch et al. 2008). The San Joaquin basin was simultaneously influenced by Lake Turlock (Dupré et al. 1991), a shallow inland lake formed during the early to mid-Pleistocene (Frink and Kues 1954). Additionally, the confinement of C. pitensis to the Pit River is of note because no current geographic barrier exists between the lower Pit and Sacramento Rivers. However geologic evidence shows both the upper and lower Pit Rivers were repeatedly isolated by tectonic and volcanic activity in the Pliocene and Pleistocene, resulting in the formation of large lakes and alterations in river position (Pease 1965, Macdonald and Gay 1968, Dupras 1999). The most recent isolation and reestablishment of the connection between the Sacramento and the lower Pit River above its confluence with the Sacramento apparently occurred around the Plio-Pleistocene boundary (ca. 10.8 mya) and mid- Pleistocene, respectively, with the formation and subsequent breaching of ancestral Lake Britton (Dupras 1999). Based on geographic events, strong evolutionary forces are in play which has led to extensive speciation, population-level differences, or hybridization in these species.

44

Disentangling the discordance between proposed morphological species ranges and large scale geographic events requires a thorough genetic analysis. By partitioning molecular markers to specific questions related to speciation (nuclear or mitochondrial sequence) or population structure and ongoing hybridization (mitochondrial sequence and microsatellites), comparisons to historical or contemporary geologic or geographic factors can be accurately estimated (Hewitt 2001). Once these correlations have been ascertained, broader comparative studies can be initiated which test whether similar biogeographic factors are reflected in all sympatric species. Our study uses a comprehensive multi-tier genetic approach to identify putative genetic differences between C. gulosus and C. pitensis individuals across their species ranges in California. Using nuclear and mitochondrial sequence markers along with microsatellites, we look to test the following hypotheses: 1) Does the proposed divergence between C. gulosus and C. pitensis correspond with a phylogeographic break? 2) Is there evidence of hybridization between species in the Sacramento River? 3) What is the directionality of gene flow between and within species? 4) Is a phylogeographic break present between the Pit and Sacramento River? and 5) Do species show levels of population structure attributable to specific geographic/ecological conditions? If correlations can be drawn between these two species and specific geographic events, projections can be made for past or future events, such as climate changes, which will influence aquatic species across the state (Moyle et al. 2013).

3.3 Materials and Methods Sampling Using backpack electroshocking and seine nets, one to 50 specimens were collected at nine locations for C. pitensis and 24 locations for C. gulosus, including five coastal locations perceived to be a new lineage of Cottus gulosus (see Baumsteiger et al. 2012) (Table 1 and Figure 1). Ventral caudal fin clips and whole individuals were collected and stored in 100% non- denatured ethanol prior to DNA extraction, with whole individuals vouchered into the Humboldt State University Fish Collection. DNA was extracted from fin clips using the DNeasy tissue extraction kit (Qiagen Inc.), following manufacturer’s protocols. DNA was quantified on a NanoDrop 2000 spectrophotometer (Thermo Scientific) resulting in concentrations from 25 to 150 ng/µl. Sequencing marker development Three nuclear sequence markers (508, 517, and 520) and one mitochondrial marker (cytochrome b – cytb) were selected from Baumsteiger et al. (2012). Polymerase chain reactions (PCRs) were performed in 30 µl volumes with the following components: 1X PCR buffer (10 mM Tris HCl [pH 8.3] and 50 mM KCl), 1.5 mM MgCl2, 0.5 units of Taq DNA Polymerase (Applied Biosystems Inc.), 20 mM of each dNTP, 5 mg BSA, 0.67 mM of each primer, and 2μl of DNA. Amplifications were run at 94°C for 5 min., followed by 30-35 cycles of 94°C for 1 min., 48°C for 1 min., and 72°C for 1 min., and a final 72°C for 5 min. for the cytochrome b fragment. Conditions for nuclear markers differed by having a 30 sec denaturation, 52°C annealing, and a 1.5 min. extension. Three individuals were chosen from each sampling location for each nuclear locus while eight individuals were chosen for cytb. Amplified products were cleaned with ExoSap-IT (USB Inc.) following manufacturers’ protocols and sent to the University of California, Berkeley’s DNA Sequencing Facility for analysis on an ABI 3730XL Sequencing Analyzer. Fragments were sequenced bi-directionally. Sequences were imported into SEQUENCHER (Gene Codes Corp.) or MEGA5 (Tamura et al. 2010), visually inspected for quality, and both directions used to generate a consensus sequence for each individual. Gene

45

specific alignments were performed with MUSCLE (Edgar 2004) and all sequences deposited in GenBank (accession numbers: XXXXX-XXXXX). Individual loci (nuclear and mitochondrial) were tested for adherence to a molecular clock using the likelihood scores function (LSCORES) in PAUP* (Swofford 2002). Additionally, we examined each nuclear locus for recombination using GARD (Genetic Algorithm for Recombination Detection), found on the online server DATAMONKEY (www.datamonkey.com) (Pond et al. 2006). This method tests for incongruence within sequences at each locus and applies statistical support for recombinants via the Akaike Inference Criterion (AIC). Unique haplotypes for mitochondrial sequences were obtained with DNASP (Rozas et al. 2003, Librado and Rozas 2009) as were simple measures of sequence diversity. Nuclear sequence haplotypes were identified using PHASE (Stephens et al. 2001, Stephens and Scheet 2005) and later confirmed with DNASP. Nucleotide substitution testing was conducted with JMODELTEST v0.1.1 (Posada 2008) using the corrected (AICc) delta score (Posada and Buckley 2004). Haplotype networks were developed for each locus with TCS V1.21 (Clement et al. 2000). Numbers of segregating sites (S), number of haplotypes (H), haplotype diversity (Hd), and nucleotide diversity (π), were generated with DNASP (Table 2). Phylogenetics and Coalescence To identify putative species structure within our sampling locations, individual gene trees for all four markers (nuclear and mitochondrial) were generated using a maximum likelihood approach in PHYML (Guindon and Gascuel 2003). A single mottled sculpin (Cottus bairdi) or coastrange sculpin (C. aleuticus) served as an outgroup (Kinziger et al. 2005). Results derived from JMODELTEST, including model, across site rate variation, and proportion of invariable sites were used in each tree estimate. The best tree topology was searched over two heuristic algorithms, NNI (Nearest Neighbor Interchange) and SPR (Subtree Pruning and Regrafting). Node support was assessed using 1000 bootstrap replicates. A more comprehensive species tree approach, employing information from just the nuclear markers was performed in *BEAST (Drummond and Rambaut 2007). This program uses Bayesian Markov Chain Monte Carlo (MCMC) methods and partitions loci according to their specific nucleotide substitution model. Consistent with species tree analysis in Baumsteiger et al. (2012), a relaxed lognormal molecular clock was employed. Runs consisted of 100 million iterations sampled every 10,000th iteration, resulting in 7,500 trees after removing the first 25% as burn-in. TreeAnnotator, a subsidiary program in *BEAST, was used to obtain a consensus tree representing the most consistent topology overall all trees. Estimates of divergence times were generated based upon a standardized molecular clock substitution rate as lack of fossil evidence precluded molecular clock calibration. Nuclear sequences between species exhibited little polymorphic variation in which to use a coalescence approach. Therefore we estimated divergences based on the standard molecular clock for mtDNA (~1-2 million years per substitution) and restricted this analysis to only the mtDNA sequence data. A conservative estimate of mutation rates consistent with these times was calculated at 0.9% per million years (0.5% - 1.3%) (Bermingham et al. 1997, Mueller 2006, Simon et al. 2006). Estimation of time of divergence for C. gulosus and C. pitensis were obtained with BEAST V.1.6 (Drummond and Rambaut 2007) beginning with assumptions of a constant population size, strict molecular clock, and using uniform priors. The final Bayesian distribution was generated using 100 million iterations sampled every 10,000th iteration. Different priors, including the calculated upper and lower bounds for the molecular clock, were then modified until values of effective samples sizes (ESS) and other measures were within acceptable ranges,

46

as shown in TRACER (a sub-program within BEAST). A burn-in of 25% was employed to insure convergence of the MCMC chains Identification of potential population expansion over time was obtained through effective population size estimates in BEAST. Bayesian skyline plots were generated using similar information and approaches as those used to identify coalescence times. Twenty five million iterations were sampled with a 25% burn-in. Skyline plots were developed from datasets for each species or unique lineage (C. gulosus, C. pitensis, and C. gulosus_coastal). Gene flow between species To examine the possibility of historical versus contemporary gene flow between inland C. gulosus and C. pitensis, we used the isolation with migration approach implemented in the program IMA2 (Hey and Nielsen 2007). Using only nuclear DNA sequences, we estimated the species break to be consistent with that seen in the mitochondrial DNA (see below). Three independent runs were performed; each consisting of a burn-in period of 250,000 steps followed by 2.5 x 106 steps sampled every 100 steps (25,000 total sampled steps). Runs gave highly similar results and were combined to estimate migration parameters. A conservative estimate of 2.2 × 10−9 substitutions/site/year (Kumar & Subramanian 2002) was multiplied by the number of base pairs in each locus to give a mean estimate across loci of 9.79 x 10-7 (substitutions/site). A generation time of three years was assumed based on life-history data from other closely related cottids (Moyle 2002). Microsatellite markers Seven microsatellite loci were chosen for this study based on previous work in closely related prickly sculpin (Cottus asper) (Baumsteiger and Aguilar 2013). Markers were run for sampling locations with at least 15 individuals (Table 1). Polymerase chain reaction (PCR) conditions followed those of Baumsteiger and Aguilar (2013) and used the M13 protocol of Schuelke (2000). All loci were amplified in 10 µl reactions (2 µl of genomic DNA) with a multiplex kit (Qiagen Inc.) or standard PCR mix of 10X PCR buffer (10 mM Tris-HCl, 50mM KCl), 1.5 mM MgCl2, 10mg/ml Bovine Serum Albumin, 0.25 mM of each dNTP, and 2U of Amplitaq DNA Polymerase (ABI). Multiplex PCR conditions were 95°C for 15 min., 20 cycles of 95°C for 30 sec., primer specific annealing temperature for 30 sec., and 72°C for 45 sec. followed by 15 cycles at a fixed 48°C annealing temperature and two holds of 60° C for 30 min. and 12° for infinity. Non-multiplex PCRs used identical conditions except for an initial hold of 5 min. and final extension of 72°C for 7 min. Locus specific PCR products were combined and run on an ABI 3130XL Fragment Analyzer (Life Technologies Inc.). Fragment analysis was carried out using ABI GENEMAPPER software and by eye using the Liz500 size standard. A select number of individuals were rerun to insure allele calls and runs were consistent and had minimal error. Exported allelic tables of individual multi-locus genotypes from GENEMAPPER were reformatted for downstream analyses using CONVERT software (Glaubitz 2004). Initial descriptive statistics including numbers of alleles, allelic richness, observed and expected heterozygosity (HO and HE), inbreeding (FIS), conformance to Hardy-Weinberg Equilibrium, and linkage disequilibrium were all generated through the programs GDA (Lewis and Zaykin 2001) and ARLEQUIN V3.5 (Excoffier et al. 2005, Excoffier and Lischer 2010). Allelic richness was standardized according to sample size using HP-RARE (Kalinowski 2005). In cases where multiple test adjustments are necessary, will use the correction suggested in Narum (2006). GENETIX V4.05 (Belkhir et al. 2004) was used to assess population level differences between sampling locations using pairwise FST estimates (significance compared over 100 permutations) and ARLEQUIN for an analysis of molecular variance (AMOVA). An additional population level analysis using a Bayesian MCMC clustering approach was employed in STRUCTURE (Pritchard et

47

al. 2000). Samples were run with an initial burn-in of 100,000 followed by 100,000 runs using an admixture model, with allele frequencies assumed to be correlated, K (clusters) ranging from 1- 25, and 10 iterations per K. The number of genetically distinct clusters represented in the data were determined using the method of Evanno et al. (2005) or by plotting log likelihood values versus K to assess where the curve begins to plateau using STRUCTURE HARVESTER (Earl 2011). To average over the number of iterations of the chosen K, the program CLUMPP (Jakobsson and Rosenberg 2007) was run using the Greedy algorithm with 1000 replicates. Additional K values were also tested to insure the best K was obtained. Final visual presentation of clustering was prepared in DISTRUCT (Rosenberg 2007). Cavalli-Sforza Edwards chord distances were estimated with PHYLIP (v3.69 – Felsenstein 2005). We removed locus 318 due to the fact data was missing from some locations. A consensus unrooted Neighbor joining tree with 1,000 bootstrap replicates was also performed with the PHYLIP package. Trees were visualized with FIGTREE. Gene flow between locations Isolation-by-distance (IBD), a relationship between genetic and geographic distance, was calculated for each sampled location. Conformance to a strict model of IBD, with intercept zero, is indicative of equilibrium between migration and genetic drift (Hutchinson and Templeton 1999). Approximate geographic distance between each location was estimated in kilometers using Google Earth following riverine connectivity. In two locations, the Kings River and the Kaweah River, connectivity with the San Joaquin River is no longer available due to agricultural diversions, making connectivity with these locations largely based on river topology and historical accounts. Genetic distances were obtained from pairwise FST estimates in GENETIX v4.5. A Mantel test was performed on each pairwise comparison to statistically assess the correlation between genetic and geographic distance using the program IBD (Bohonak 2002).

3.4 Results A total of 873 individuals were collected, representing 655 putative C. gulosus and 217 C. pitensis over 24 and nine locations, respectively (Figure 3.1A and Table 3.1). For the nuclear and mitochondrial sequencing, only a subset of individuals was assayed (Table 3.2). One to four individuals were successfully sequenced for each nDNA marker. Mitochondrial sequencing averaged about eight individuals (range 2-13) and included all locations minus Bird Creek and the lower Tuolumne and American Rivers (Table 3.2). Sequence substitution model testing for each nDNA loci (508,517,520) was consistent with the Hasegawa, Kishino, Yano model (HKY – Hasegawa et al. 1985), whereas the cyt b sequences were best represented by the Tamura-Nei model (Trn + G – Tamura and Nei 1993). No recombination was evident for any of the three nuclear loci. Descriptive statistics for the nDNA sequences exhibited low nucleotide and haplotype diversity, with most locations containing a single haplotype (Table 2). One location, Guadalupe Creek, showed elevated levels of nucleotide substitutions in the 520 locus, but not other loci. The highest levels of sequence variation consistently occurred in the Sacramento River locations, with up to three different nuclear haplotypes at the Red Bluff diversion dam. Identical patterns were evident in mitochondrial haplotypes, again with highest levels of variation from Sacramento River locations, including the Feather River (Table 3.2). The Pit River collection sites exhibited low nucleotide and haplotype diversity (with the exception of Clark Creek). nDNA sequences Species tree analysis using only nuclear DNA resolved three primary clades (Figure 3.2). A coastal C. gulosus clade, including small isolated coastal rivers (Russian, Guadalupe, and

48

Uvas), was found to be basal to all other locations, although branch support was only moderate (posterior probability 0.845). Locations from the Pit River basin form a well-supported (0.906) monophyletic C. pitensis clade, though no additional clades are supported within the basin itself. Finally, an inland C. gulosus clade representing locations south of the American River (see Figure 3.1A) was discovered with moderate to low support (0.814). Locations in the Sacramento basin, plotted in a phylogenetic intermediate position between the C. pitensis and inland C. gulosus clades, do not assign to any specific clade, presumably a consequence of the slower evolving nDNA. These latter locations are situated geographically intermediate between locations assigned to either C. pitensis or inland C. gulosus clades. Using the mtDNA species break associated with the mouth of the Feather River (see below - Figure 3.1C and Table S3A), nuclear haplotypes were assigned to prospective species. Haplotype networks generated from unique nDNA sequences revealed similar patterns to the species tree (Figure S1), although some species specific haplotypes are found in the same location (Figure 1B and Figure S1). Locations from the Sacramento River basin, Feather, and American River all show haplotypes from both species and are usually the intermediate haplotypes between species in each haplotype network. For example, haplotypes 508H7-9, 517H6, 517H8, 517H11, 517H13-15, and 520H4, 520H12 are all found in the greater Sacramento River basin (Figure S3A and Table S3A). The three independent IMA2 runs converged on the same parameter estimates. Here we focus exclusively on the estimate of asymmetric migration rates, since our goal was to assess migration rates between the two species. Results from these analyses indicate low levels of historical migration occurred from inland C. gulosus to C. pitensis (2NmRiffle > 2NmPit = 0.3231; 95% HPD = 0.001 – 0.0884) while levels of migration in the alternate direction were not different from zero (2NmPit > 2NmRiffle = 0.0018; 95% HPD = 0 – 0.0595). Based on likelihood ratio tests, the population level migration rates from inland C. gulosus to C. pitensis were statistically significant (LLR = 4.32; p < 0.05) and those from C. pitensis to inland C. gulosus were not significant (LLR = 0.0 p = n.s.). mtDNA sequences Analysis of mtDNA resolved the same three major clades identified using nDNA including coastal C. gulosus, inland C. gulosus, and C. pitensis (Figure 3.1C). Individual species or lineages were distinct when mitochondrial haplotypes were used, with sufficient differences between sequences to warrant individual haplotype networks (> 15 changes per clade) (Figure 3.4). This can also be seen in a simple sequence diversity measure between clades in DNASP, with differences between coastal C. gulosus and C. pitensis around 4%, coastal C. gulosus and inland C. gulosus around 3%, and finally inland C. gulosus and C. pitensis around 2%. The geographic distribution of inland C. gulosus, and C. pitensis resolved using mtDNA was different than with nDNA. In the mtDNA analysis, locations north of the Feather River contained C. pitensis haplotypes while locations south contained C. gulosus haplotypes (Figure 3.1C). In the nDNA locations within the Sacramento River basin could not be clearly assigned to either C. pitensis or inland C. gulosus. Mitochondrial haplotype diversity was higher than in nDNA sequences, resulting in increased numbers of mtDNA haplotypes (Table 3.2). Coastal C. gulosus showed elevated levels of mtDNA diversity, with nine haplotypes distributed over four locations. Further diversity was apparent in the inland C. gulosus locations, each with its own distinct haplotype (Table S3B and Figure 3.3). This is dissimilar to the C.pitensis locations which contain a single highly frequent haplotype (CBH4) (Figure 3.3).

49

Bayesian skyline plots were developed to ascertain changes in effective population size through time. Using strictly mtDNA and separating locations by species/lineage, as above, plots of changes to log transformed effective population size over time were obtained (Figure S3B). Locations of coastal C. gulosus showed no change over time nor did locations associated with C. pitensis. Inland C. gulosus showed a small, constant increase over time to contemporary population sizes. Estimation of time of divergence between inland C. gulosus and C. pitensis proved to be difficult. Using a generalized mitochondrial clock and assuming a coalescent based on constant population size, BEAST estimated divergence at 3.29 m.y.a (Figure 3.4). The 95% confidence interval around this estimate was large, ranging from 1.46 m.y.a. to 6.32 m.y.a. However, this range falls within the estimated date of capture of the Pit River by the Sacramento River (≈ 2 m.y.a. or between the Pliocene/Pleistocene epochs). The remaining phylogenetic tree contains a number of well supported branches and is similar to species tree and haplotype predictions, with one exception (Figure 3.4). Coastal C. gulosus were suspected of possible mitochondrial introgression by prickly sculpin (C. asper), a sympatric species found throughout most of California (Baumsteiger et al. 2012). Our mtDNA tree is consistent with this hypothesis, showing strong branch support when including C. asper with coastal C. gulosus locations. Microsatellites The average number of individuals genotyped per location was 26 (range: 10-43 - Table 3.2). Allelic richness or mean number of alleles per locus was low, especially after correcting for sample size (rarefaction), with most locations around 1.25. Some exceptions were noted in the Feather and Merced River, where values were as high as 6 (Table 3.2). Most locations conform to Hardy-Weinberg Equilibrium (p > 0.05), with the exception of Bird Creek and the Kaweah River. Observed heterozygosity values tended to be low, with no location showing an observed heterozygosity value higher than 0.465 (Table 3.2). A review of locus specific alleles showed high levels of fixation to particular locations, especially in locations recognized as coastal or inland C. gulosus. FIS values were also extremely low, suggesting most locations are not exhibiting inbreeding constraints. Exceptions were the Kaweah River (0.673) and Bird Creek (0.829), two populations out of Hardy Weinberg equilibrium (Table 3.2). Bayesian clustering analyses with STRUCTURE suggested a conservative estimate of K = 8 using the ad hoc approach of Evanno et al. (2005), though a value of six had some support as well (data not shown). A secondary test for the appropriate K, plotting log likelihood values versus K, gives a value of K = 9 (Figure S3B). Plots of both K = 8 and K = 9 (Figure 3.5) show four distinct color coded groups representing three previously identified clades (coastal and inland C. gulosus, C. pitensis) and a new population restricted to the Sacramento River (Figure 3.1D and Figure 3.5). Variable amounts of population sub-structure were also found within each of these groups. Within the Sacramento River, locations are subdivided into two additional clusters, those located above Shasta Reservoir (10-12) and those below (13-16) (as numbered in Table 3.1). Inland C. gulosus locations exhibited substantial genetic sub-structuring, with four additional clusters consisting of: A –Feather and lower American, B – Stanislaus, upper Tuolumne, and Merced, C – SF and NMF American, Mokelumne, lower Tuolumne, and Kaweah, and D – upper and lower Kings. Geographically these clusters share a relationship, with A representing the northern most locations, B representing rivers which empty in the Sacramento/San Joaquin delta (minus the Kaweah), C representing tributaries to the San Joaquin River, and D being the now isolated Kings River. When K = 9 is used, additional clustering is found in C. pitensis, separating locations 1-3 from 4-9 (Figure 3.6). Geographically this coincides with the presence of Pit River Falls, a suspected geographic break in other species (A.P. Kinziger – unpubl. data).

50

An unrooted neighbor-joining tree using Cavalli-Sforza Edwards (CSE) chord distances found results similar to clustering in STRUCTURE (Figure 3.6). Three major groups are observable: a Pit River group, a Sacramento group and a San Joaquin group. With the exception of Sacramento_Dunsmuir, a geneal trend of increasing branch lengths is discoved moving from the northnermost locations to the southernmost locations. Long branches were found for most locations within the San Joaquin basin, showing their distinctiveness compared to Sacramento and Pit River locations with much shorter branching. Structuring was also found for locations above and below Pit River Falls and above and below Shasta Dam. Assuming four genetically distinct clusters (coastal and inland C. gulosus, C. pitensis, Sacramento), analysis of molecular variance (AMOVA) revealed 47% of the variance assigned between populations, 26% between individuals among populations, and 27% among individuals (p < 0.001) (Table S3D). If 15 clusters are used (the maximum structure seen from plotting K – Figure S3C), the variance shifts to 60% between populations, 10% between individuals among populations, and 30% among individuals (p < 0.001). The redistribution of variance between individuals among populations in the different clustering groups suggests a great deal of sub- structuring is evident below the species/lineage level, similar to Bayesian clustering in STRUCTURE. In pairwise comparisons of location/population differentiation (Fst) most locations significantly diverged in allele frequencies from one another in the inland C. gulosus cluster, some significant differences in the Sacramento cluster, and few significant differences between locations in the C. pitensis cluster (p < 0.001) (with the exception of Bear Creek) (Table S3C). Results confirm significant population structure within inland C. gulosus (isolated locations with little gene flow), moderate population structuring between Sacramento River locations (reduced gene flow), and limited population structure within C. pitensis (connected locations with high gene flow). Isolation by distance (IBD) was tested in C. pitensis, Sacramento, and inland C. gulosus (Table S3C). Coastal C. gulosus were collected from a limited number of locations and therefore excluded from IBD analysis. Comparing Fst versus distance in kilometers for each cluster with a Mantel test revealed no isolation by distance for C. pitensis (r = 0.37, p = 0.1), but significant IBD for the Sacramento (r = 0.466, p = 0.026) and inland C. gulosus (r = 0.4689, p = 0.00004) cluster. C. pitensis was found to exhibit significant IBD when samples from Bear Creek, a tributary to the Fall River, were removed (r = 0.76, p = 0.0001). A corrected Fst (Fst/1 – Fst) also gave similar results in all tests, including the removal of Bear Creek.

3.5 Discussion California’s complex geologic history apparently shaped multiple within and between species divergences for C. pitensis and C. gulosus, leading to novel phylogeographic breaks, cryptic lineages and population-level differences. First, was the continued support for a new lineage of sculpin, coastal C. gulosus, in the San Jose/Monterey Bay area (see Baumsteiger et al. 2012); isolated prior to the divergence between C. pitensis and Central Valley C. gulosus. Second, we discovered the original range of C. pitensis once extended into the Sacramento River basin, with individuals experiencing low levels of historical hybridization with inland C. gulosus; potentially with inland C. gulosus males mating with C. pitensis females. This ancestral hybrid zone appears confined to the Sacramento River basin and its tributaries (excluding the Pit River), with evidence for a geographic break around the Feather River. Interestingly, although hybridization does not appear to be ongoing, contemporary population structure identifies this region with distinctive clustering on par with coastal C. gulosus, C. pitensis, or inland C. gulosus.

51

The remaining genetic data largely support geographic and ecological differences between species which led to high levels of gene flow and little population structuring in C. pitensis, whereas inland C. gulosus showed little gene flow and high levels of population structuring. Taken as a whole, these two freshwater sculpin species reveal important components in the history of riverine systems within the region and could serve as organisms to monitor for future changes associated with climate. Novel lineage of coastal C. gulosus A primary phylogeographic break in our study area is the divergence of the coastal C. gulosus. Although never subject to a thorough morphological evaluation, coastal C. gulosus are reciprocally monophyletic and highly diverged in all of our analyses presented here and elsewhere (Baumsteiger et al. 2012); therefore we strongly believe this lineage represents a new species of sculpin. The divergence estimate in our coalescence tree (3.2 – 10 m.y.a.) coincides nicely with uplifting in the coastal mountain range during the late Miocene / early Pliocene (4 – 10 m.y.a.), separating habitats in the Great Central Valley from the coast (Howard 1979). Bayesian skyline plots show no sign of expansion, further confirming the isolation of these coastal locations. Interestingly, we found evidence for mitochondrial introgression between this lineage and the sympatric species C. asper. Cottus asper is generally a coastal species, often reliant on estuaries for natal rearing, though evidence for different inland radiations is available (Krejsa 1965, Baumsteiger et al. 2012). Whether this novel lineage evolved sympatrically with C. asper or is simply an example of secondary contact with C. asper is unknown. Divergence of inland C. gulosus and C. pitensis Unlike current morphological characterizations which confine C. pitensis to the Pit River basin, we believe the original divergence between species was associated with a phylogeographic break occurring somewhere around the confluence of Feather and Sacramento Rivers. This break is consistent with a known geologic break coinciding with the northernmost extent of the Sierra Nevada (Durrell 1987). According to mtDNA, this divergence (~1.4 – 6 m.y.a.) led to C. pitensis development in the Sacramento River basin (including the Pit River) and inland C. gulosus in the San Joaquin River basin. Geologically, both basins were experiencing decidedly different conditions, which would account for potential speciation (Minckley et al. 1986, Durrell 1987, Wagner et al. 1997). The Sacramento and Pit River basins were undergoing extensive volcanic and tectonic activity during much of the Plio-Pleistocene, including apparent isolations and reconnections associated with ancestral Lake Britton around the end of the Pliocene (Pease 1965, Dupras 1999). Higher levels of nucleotide diversity, no shared haplotypes with inland C. gulosus in the 508 and 520 nuclear loci, no evidence of gene flow from Pit to Sacramento Rivers, along with higher mtDNA variation in a number of Sacramento River locations all support an ancestral C. pitensis population within the Sacramento River. It is our hypothesis that these early C. pitensis within the Sacramento River invaded the lower Pit River and were temporarily isolated during the end of the Pliocene, creating some genetic divergence in the lower Pit River. Individuals in the lower Pit River, even after re-establishing connectivity with the Sacramento River, then invaded upper Pit and Goose Lake drainages as fault block lakes breached, thereby establishing the contemporary range of the species (Pease 1965). This is consistent with the invasion of other elements of the Sacramento fish fauna (Moyle 2002). As no published data support connectivity between Pit and Sacramento Rivers prior to the Pliocene, it seems unlikely C. pitensis were in the Pit River prior to its confluence with the Sacramento River. The San Joaquin River basin was decidedly different than Sacramento and Pit River systems historically, having been influenced by an inland sea for the last 16 million years

52

(Howard 1979, Dupré 1990). At some point in the Pliocene (1.8 – 5.3 m.y.a.), connections to the sea were cut off by uplifting in the coastal mountain range (Howard 1979). For the next three million years, the San Joaquin River basin slowly filled with freshwater, leading to the shallow Lake Turlock around the mid-Pleistocene (~1.5 m.y.a.) (Frink and Kues 1954, Dupré et al. 1991). Confined to the San Joaquin basin, the lake stretched from modern day Kern County to the town of Tracy and is believed to have been relatively long lived (720K-600K years), eventually draining south into Tulare Lake (Frink and Kues 1954, Dupré 1990). Concurrent with lake formation was substantial uplifting in the Sierra Nevada during the Pliocene, especially in the southern portion of the range (Figueroa and Knott 2010). Uplifting events brought new ecological conditions to eastern portions of the San Joaquin River basin, creating different conditions than those in the Great Central Valley (Beesley 2007). Events leading to the formation of Lake Turlock, along with changes in the Sacramento River basin, could easily have isolated ancestral C. gulosus/C. pitensis into their prospective regions, leading to speciation. The presence of Turlock Lake may have accounted for the low levels of hybridization seen in our nDNA. If ancestral C. gulosus were able to migrate north through the lake or were driven out of the Great Central Valley by its formation (as a strictly riverine species), invasion of the Sacramento River may have been possible, creating secondary contact between species. IMA2 results support this conclusion, showing evidence of gene flow in nDNA from inland C. gulosus into C. pitensis when the proposed phylogeographic break is used. Migration of inland C. gulosus northward could also explain the presence of inland C. gulosus mtDNA in the Feather and American Rivers, both tributaries to the Sacramento River. The Feather River basin (and American River basin to some extent) represents the northern-most regions of the true Sierra Nevada, with the Diamond Mountains (from the Great Basin) just northeast and the geologically younger volcanic plateau just northwest of the Feather basin (Durrell 1987). Thus both the Feather and American River basins are more geologically similar to the San Joaquin basin where inland C. gulosus presumably evolved than volcanic regions where C. pitensis evolved. If invading inland C. gulosus were better adapted for these conditions (e.g. strongly seasonal flows), natural selection may have driven populations toward inland C. gulosus genotypes, including mtDNA, creating modern day distributions. However, additional studies are needed to confirm whether geologic differences equate to ecological or geographic variances in habitat. Hybridization in the Sacramento River basin Identification of hybridization is often contentious (Allendorf et al. 2001). One competing hypothesis is always incomplete lineage sorting (ILS), where newly formed species have yet to evolve distinctiveness in certain genes (Carstens and Knowles 2007 and citations within). However we feel ILS is less likely in this case, because mixed species haplotypes were restricted to the Sacramento River, signifying each species sufficiently evolved from its last common ancestor in locations outside of the Sacramento River. If ILS were present, some individuals in locations outside the Sacramento River would have shown mixed ancestry. We recognize the possibility of a larger population size within the Sacramento River may sort slower than other regions or locations, yet little data from our summary statistics support this claim. A number of additional analyses substantiate low levels of historical hybridization including polyphyletic individuals in the species tree, shared species nuclear haplotypes, an intermediate geographic location between pure species, and IMA2 data showing migration from inland C. gulosus into C. pitensis in the Sacramento River. Hybridization appears to be unidirectional, as no inland C. gulosus mtDNA haplotypes were found in the Sacramento River basin. Because C. pitensis are suspected in the Sacramento River prior to secondary contact and the mitochondrial genome is inherited maternally (Dawid and Blackler 1972), we believe matings

53

were between inland C. gulosus males and C. pitensis females. However, we are unaware of any sexual or selective pressure which would dictate this uni-directional mating or fixation of C. pitensis haplotypes. Finally, a review of the morphological literature shows discrepancies exist in the distribution of the two species in regards to the Sacramento River, with Moyle (2002) placing inland C. gulosus in stretches above Shasta Reservoir, while Bailey and Bond (1963) and Page and Burr (2011) propose C. pitensis may also be found above the reservoir. Likely biologists are finding mixed morphological characters and meristic measurements indicative of both species. Unique contemporary genetic structuring of locations within the Sacramento River basin, as confirmed by microsatellite data, gives rise to an unusual scenario: Are individuals in this region representative of a hybrid taxon? Widely accepted among plant taxonomists but more heavily debated amongst animal taxa, speciation by way of hybridization is often found in nature (Dowling and Secor 1997, Seehausen 2004). The premise usually revolves around hybrids being better adapted for a novel environment, one not currently occupied by either parental species. Freshwater sculpin have two suspected cases of this occurring. The first was in Lake Baikal, where Kontula et al. (2003) and Hunt et al. (1997) potentially found two sculpin species hybridized prior to invading a new deep water pelagic habitat, later undergoing extensive adaptive radiations leading to speciation. The second was by Nolte et al. (2005) who found two old phylogenetic lineages of Cottus gobio were able to undergo hybridization and invade and colonize a novel warm downstream section of the Rhine River in Germany (also see Stemshorn et al. 2011). In both cases, hybrids produced were fitter for a particular habitat and able to successfully survive and establish population or species-level structure over a relatively short evolutionary time period (1 - 4 million years). However our genetic data does not support speciation at present in the Sacramento River, merely distinct population structure. Population-level differentiation Pit River basin Cottus pitensis confined to the Pit River exhibited high gene flow, little population structure, and strong evidence of a contemporary phylogeographic break at the mouth of the Pit River. Bayesian skyline plots show no increase in effective population size (no expansion) and IMA2 found no evidence for contemporary gene flow into the Sacramento River. A strong signal of IBD (when Bear Creek individuals are removed) supports a migration-drift equilibrium. Bear Creek, a distant tributary to the Fall River, is a low running trout stream which recently underwent a significant stream restoration effort (Hammersmark et al. 2008). However, quantifying its significance to IBD could not be determined with current information. Haplotype networks (mtDNA) and Bayesian cluster analyses show little population structuring by location, with a single common haplotype found throughout much of the range; suggestive of individual movement within the basin despite apparent IBD. Yet despite intra-basin movement, data support a recent phylogeographic break between Pit and Sacramento Rivers. Periodic Plio- Pleistocene isolation of the Pit River due to localized basalt flows and lake formation (Pease 1965) may explain a proposed historical break, but not a contemporary one. Because contemporary freshwater sculpin are relatively poor swimmers (Utzinger et al. 1998), slow continuous increases in stream flow caused by uplifting in the new headwater region of the Pit River may prevent expansion from the Sacramento River, but this remains untested. Distributions of endemic fish species of tule perch (Baltz and Loudenslager 1984), roach (Aguilar and Jones 2009) and sucker (Reid 2007, Smith et al. 2011) in the Pit River (with ancestry from the Sacramento basin) contradict this idea of isolation by flow strength, as all show superior swimming abilities to sculpin. On a more recent time frame, the break could be explained by the presence of Shasta Dam, creating Shasta Reservoir in 1944 (Fisher 1994). If C. pitensis are

54

strictly riverine, as appears to be the case, the dam and subsequent reservoir may serve as a barrier to contemporary gene flow, though this time frame is extremely recent (~70 years) . Incorporation of an additional level of clustering (K = 9) in STRUCTURE and the unrooted CSE tree provides a novel phylogeographic break in the Pit River. Systems upstream of Pit River Falls (Bear, Ash, South Fork, and Lassen Creek) are found to cluster together separately from all remaining locations downstream. Geographic analysis of the region indicates Pit River Falls lies just downstream of the mouth of Fall River, a formidable barrier to upstream sculpin migration. Whether the falls represent a true barrier to upstream fish passage is unknown, but some species such as tule perch (Hysterocarpus traski) are only found below the falls (Moyle 2002). Future comparative analysis across aquatic species that span Pit River Falls should provide insight into whether historical and ecological factors limit gene flow across this apparent dispersal barrier. Sacramento River basin Although physically connected to pure populations of both inland C.gulosus and C. pitensis, locations sampled within the Sacramento River basin appear to support a discrete population, showing no signs of contemporary inter-mating. STRUCTURE, the CSE tree, and AMOVA analyses support this, whereas nDNA and mtDNA do not, suggesting isolation is recent (Holocene). And though we see statistical evidence for IBD, r values are smaller than those seen in the Pit River, implying the Sacramento River locations exhibit greater variation. It is not surprising isolation would lead to a distinct population in the Sacramento River, given its geologic and ecological differences from either the Pit River or rivers flowing out of the southern Sierra Nevada (Howard 1979, Durrell 1987, Dupré 1991). The Sacramento River is the largest, in terms of width, length, and flow, of any river inhabited by C. pitensis or C. gulosus (Moyle 2002). Ecologically it is unlike the spring-fed systems of the Pit River (Daniels and Moyle 1987, Brown 1989) or the cold headwater river regions of the southern Sierra Nevada (Knowles and Cayan 2002). And the Sacramento River experiences distinctive climatic conditions, based on elevational differences in the Sierra Nevada (Mulch et al. 2008). These differences, first established in the Pleistocene (675K-500K years ago), lead directly to modern- day precipitation patterns. Collectively, these differences could drive the contemporary population structure seen in the Sacramento River. San Joaquin River basin The San Joaquin River basin is a complex hydrologic region, as the result of its dynamic geology. Since the Pleistocene, multiple factors have influenced individual inland C. gulosus, creating an extensive mtDNA haplotype network and fine-scale population-level structuring resolved by our CSE tree, STRUCTURE and AMOVA analyses. Ascertaining which factors directly led to contemporary population structure is unclear but likely these effects were combinatorial. In addition to the presence of a slow southerly-draining freshwater lake and rapid uplifting in the southern Sierra Nevada (Dupré 1990), much of the region was undergoing intermittent glaciation during the mid to late Pleistocene, based on the height of the mountains (Dupré et al. 1991, Gillespie et al. 2004). Advance and retreat of glaciers throughout this time period (up to Tioga/Hilgard ~50K-10K years ago – Gillespie et al. 2004) may have caused continual separation and reconnection of populations of ancestral inland C. gulosus. These factors could explain the relational clustering of different sampling locations, as seen in the CSE tree and STRUCTURE analyses. Newly available habitat left by retreating glacial episodes or invasion by generalist C. asper from coastal populations may have driven ancestral inland C. gulosus upstream, accounting for current population distributions. Contemporary populations of C. gulosus face additional threats from anthropogenic factors. All major rivers within the San Joaquin River basin are dammed somewhere in the

55

foothills of the Sierra Nevada (Moyle 2002). Waters in the basin are highly impacted by human use, experiencing significant diversions, dams and herbicides/pesticide applications from nearby farms (Fisher and Shaffer 2002). These factors have all but isolated inland C. gulosus populations to their home streams. With no potential for gene flow or migration, populations are quickly becoming endemic to a particular river, as Brown et al. (1992) found for California roach (Lavinia symmetricus) populations. This is evident in the fixation of many mtDNA haplotypes and microsatellite alleles to a particular location and a decreasing (though still statistically significant) r value from either C. pitensis or Sacramento River population in the IBD analyses. For example, the clustering of the Kaweah River populations with other more northern populations is likely a product of isolation and genetic drift, fixing similar alleles seen in other locations, rather than a true biological clustering of locations. Comparative Phylogeography Many of the phylogeographic breaks in our study coincide with breaks seen in other taxa. The Jepson manual (Hickman 1993), which focuses on ecological plant regions in California, shows floristic breaks consistent with the coastal range mountains and Great Central Valley and between the Sacramento and San Joaquin River basins. These separations were also present in other organisms, such as amphibians (Fisher and Shaffer 1996) and shrews (Maldonado et al. 2001), or in levels of biodiversity in plants (Carlsbeek et al. 2003). Freshwater fish species such as California roach (Lavinia symmetricus) are not exempt, showing regional diversity consistent with breaks presented in this study (Aguilar and Jones 2009). Future comparative studies are also needed to help authenticate proposed breaks. For example, predicted patterns of migration of ancestral C. pitensis into the Pit River are consistent with its confluence with the Sacramento River (Pease 1965). Yet some data point to an earlier invasion by C. pitensis into the Pit River. Establishing whether other historical invasions from the Sacramento River show genetic breaks consistent with sculpin could help validate this possibility. Species such as Sacramento sucker (Smith et al. 2011) or California roach (Aguilar and Jones 2009) may provide different evolutionary time frames for divergence, differences which could either corroborate or confound patterns observed in sculpin. Additional breaks could also be explored, such as Pit River Falls, where a comparative study over multiple species may shed light on physical or temporal restrictions resulting from this barrier. Finally, phylogeographic breaks can be linked to predicted zoogeographic provinces (Moyle and Ellison 1991, Moyle 2002), relating several species distributions to genetic, geologic, and ecological conditions. However, Lapointe and Rissler (2005) argue the congruence in comparative phylogeographic patterns between species is more a representation of the climatic variables associated with these phylogeographic breaks than the physical breaks themselves. Assuming Lapointe and Rissler (2005) are correct, populations of inland C. gulosus may be at risk, considering their isolation and the proposed climate change influences on Sierra Nevada snowpack (the primary source of water for these systems) (Lettenmaier and Gan 1990, Knowles and Cayan 2002, Miller et al. 2003, Maurer 2007). If present trends continue, habitats may become unsuitable for inland C. gulosus during parts of the year, threatening populations. Inability to migrate downstream due to dams and ecological requirements (Moyle 2002) can make most inland C. gulosus populations subject to extinction. Therefore conservation strategies should be implemented to monitor and protect this, and other, endemic California species.

56

3.6 Tables and Figures

Table 3.1 Sampling locations and numbers of individuals covering known ranges of each species. Pop Collected # Sites as N Location, County Latitude Longitude 1 Pit_Goose1 C. pitensis 21 Lassen Cr., Modoc Co. 41.8296 -120.2983 2 Pit_SF1 C. pitensis 29 South Fork Pit R., Lassen Co. 41.2325 -120.3451 3 Pit_Ash1 C. pitensis 28 Ash Cr., Modoc Co. 41.1592 -120.8285 4 Pit_Fall2 C. pitensis 30 Bear Cr., Fall R., Shasta Co. 41.1902 -121.7356 5 Pit_1 C. pitensis 26 Pit R. #1, Shasta Co. 40.9919 -121.5076 6 Pit_Hat2 C. pitensis 10 Hat Cr., Shasta Co. 40.9809 -121.5712 7 Pit_Rock C. pitensis 25 Rock Cr., Shasta Co. 41.0107 -121.7045 8 Pit_Clark2 C. pitensis 18 Clark Cr., Shasta Co. 40.9853 -121.7766 9 Pit_5 C. pitensis 30 Pit R. #5, Shasta Co. 41.0019 -121.9667 10 Sacramento_Dunsmuir2 C. gulosus 13 Big Springs Cr., Siskiyou Co. 41.3106 -122.3303 11 Sacramento_Cantara C. gulosus 24 Sacramento R. (Cantara), Siskiyou Co. 41.2661 -122.3078 12 Sacramento_Sims C. gulosus 24 Sacramento R. (Sims), Siskiyou Co. 41.0631 -122.3599 13 Sacramento_Clear3 C. gulosus 31 Clear Cr., Shasta Co. 40.4990 -122.4150 14 Sacramento_Battle3 C. gulosus 27 Battle Cr., Tehama Co. 40.3983 -122.1500 15 Sacramento_RBDD3 C. gulosus 43 Red Bluff Diversion Dam, Shasta Co. 40.1550 -122.2050 16 Sacramento_RM2053 C. gulosus 21 Sacramento R. (RM205.5), Colusa Co. 39.7910 -122.0350 17 Feather3 C. gulosus 28 Feather R., Butte Co. 39.3630 -121.6010 18 NMF_American C. gulosus 25 North/Middle F American R., Placer Co. 38.9168 -121.0370 19 SF_American C. gulosus 24 South Fork American R., El Dorado Co. 38.8078 -120.9001 20 Lower_American C. gulosus 16 American R. (Nimbus), Sacramento Co. 38.6343 -121.2243 21 Mokelumne C. gulosus 25 Mokelumne R., San Joaquin Co. 38.3159 -120.7104 22 Stanislaus C. gulosus 22 Stanislaus R., Stanislaus Co. 38.1430 -120.3760 23 Lower_Tuolumne C. gulosus 27 Tuolumne R., Stanislaus Co. 37.6642 -120.4548 24 Upper_ Tuolumne C. gulosus 22 Tuolumne R., Tuolumne Co. 37.8750 -119.9630 25 Merced C. gulosus 30 Merced R., Mariposa Co. 37.6532 -119.7825 26 Lower_Kings C. gulosus 89 Kings R., Kings Co. 36.7833 -119.4167 27 Upper_Kings C. gulosus 30 Kings R., Fresno Co. 36.8590 -119.0958 28 Kaweah C. gulosus 16 Kaweah R., Tulare Co. 36.4567 -118.8512 29 Russian2 C. gulosus 2 Russian R., Sonoma Co. 38.4844 -122.8244 30 Guadalupe4 C. gulosus 31 Guadalupe Cr., Santa Clara Co. 37.2250 -121.9050 31 Penitencia4 C. gulosus 30 Penitencia Cr., Santa Clara Co. 37.3980 -121.8000 32 Uvas4 C. gulosus 29 Uvas Cr., Santa Clara Co. 37.0120 -121.6270 33 Bird C. gulosus 27 Bird Cr., San Benito Co. 36.7803 -121.4011 1 samples obtained with the assistance of Stewart Reid of Western Fishes 2 samples obtained from Humboldt State University Fish Collection or personal collection of Andrew Kinziger 3 samples collected by U.S. Fish and Wildlife Service personnel 4 samples collected by Jerry Smith and students, San Jose State University

57

Table 3.2 Descriptive statistics, by location, for three nuclear (508,517,520), one mitochondrial (CB), and six microsatellites (µsats) loci. Number

of individuals (N), substitutions (S), nucleotide diversity (π), number of haplotypes (H), haplotype diversity (Hd), allele number corrected for rarefaction (AR), expected heterozygosity (HE), observed heterozygosity (HO), and inbreeding coefficient (FIS). * outside HWE (p < 0.01) 508 517 520 CB usats Sites N S π H Hd N S π H Hd N S π H Hd N S π H Hd N AR HE HO FIS Pit_Goose 3 1 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 8 0 0.000 1 0.000 21 1.41 0.2498 0.1361 0.4621* Pit_SF 3 1 0.001 2 0.600 3 0 0.000 1 0.000 3 0 0.000 1 0.000 8 0 0.000 1 0.000 29 1.28 0.1082 0.1113 -0.0292 Pit_Ash 3 1 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 5 0 0.000 1 0.000 28 1.28 0.1001 0.0884 0.1188 Pit_Fall 1 1 0.000 1 0.000 1 0 0.000 1 0.000 1 0 0.000 1 0.000 8 0 0.000 1 0.000 30 1.03 0.0111 0.0111 0.0000 Pit_1 ------6 1 0.001 2 0.533 26 1.49 0.1494 0.1488 0.0037 Pit_Hat ------2 1 0.001 2 1.000 10 1.33 0.0752 0.0556 0.2727 Pit_Rock 1 1 0.000 1 0.000 1 0 0.000 1 0.000 1 1 0.003 2 1.000 8 0 0.000 1 0.000 25 1.34 0.1288 0.1363 -0.0596 Pit_Clark 1 1 0.000 1 0.000 1 0 0.000 1 0.000 1 0 0.000 1 0.000 7 3 0.001 4 0.714 18 1.41 0.1378 0.1455 -0.0573 Pit_5 3 1 0.000 1 0.000 3 0 0.000 1 0.000 3 1 0.001 2 0.533 8 0 0.000 1 0.000 30 1.55 0.1824 0.1965 -0.0784 Sac_Dunsmuir ------2 1 0.001 2 1.000 13 2.33 0.4339 0.3236 0.2559* Sac_Cantara 3 1 0.001 2 0.533 3 1 0.003 2 0.600 3 0 0.000 1 0.000 7 0 0.000 1 0.000 24 1.27 0.1207 0.0910 0.2505 Sac_Sims 3 1 0.001 2 0.533 3 1 0.003 2 0.533 3 0 0.000 1 0.000 8 0 0.000 1 0.000 24 1.44 0.2018 0.2225 -0.1050 Sac_Clear ------13 3 0.001 2 0.282 31 1.77 0.2924 0.2793 0.0456 Sac_Battle ------6 0 0.000 1 0.000 27 2.25 0.4140 0.4043 0.0236 Sac_RBDD 3 3 0.002 3 0.600 3 0 0.000 1 0.000 3 2 0.003 2 0.533 8 3 0.001 4 0.750 43 2.25 0.4003 0.3028 0.2454* Sac_RM205 3 3 0.003 2 0.533 3 1 0.003 2 0.533 3 1 0.001 2 0.333 8 6 0.002 5 0.786 21 2.07 0.3888 0.2818 0.2850* Feather 3 0 0.000 1 0.000 3 1 0.003 2 0.600 3 0 0.000 1 0.000 9 6 0.002 6 0.833 28 6.88 0.4674 0.4746 -0.0157 NMF_American ------8 0 0.000 1 0.000 25 1.36 0.1350 0.1333 0.0131 SF_American 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 7 2 0.001 3 0.524 24 1.14 0.0473 0.0278 0.4177 Lower_American ------16 3.33 0.4560 0.4277 0.0638 Mokelumne ------8 0 0.000 1 0.000 25 1.16 0.0816 0.0800 0.0204 Stanislaus 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 8 0 0.000 1 0.000 22 1.29 0.1084 0.1288 -0.1940 Lower_Tuolumne ------27 2.67 0.3407 0.3020 0.1145 Upper_Tuolumne 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 7 1 0.001 2 0.476 22 1.16 0.1469 0.0898 0.3946 Merced 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 8 0 0.000 1 0.000 30 6.01 0.0992 0.0933 0.0602 Lower_Kings 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 2 0.003 3 0.800 10 1 0.000 2 0.200 89 1.25 0.1109 0.0945 0.1485 Upper_Kings ------7 1 0.000 2 0.286 30 1.27 0.1166 0.1222 -0.0493 Kaweah 3 0 0.000 1 0.000 3 0 0.000 1 0.000 3 0 0.000 1 0.000 8 3 0.001 4 0.643 16 1.36 0.1145 0.0489 0.6730* Russian 2 1 0.000 1 0.000 2 0 0.000 1 0.000 2 0 0.000 1 0.000 2 4 0.004 2 1.000 2 1.67 0.4167 0.0833 0.8571* Guadalupe 4 1 0.000 1 0.000 4 0 0.000 1 0.000 4 9 0.003 2 0.429 8 3 0.001 3 0.464 31 1.15 0.0600 0.0591 0.0149 Penitencia ------8 5 0.003 5 0.893 30 2.64 0.0795 0.0667 0.1643 Uvas 2 1 0.000 1 0.000 2 0 0.000 1 0.000 2 0 0.000 1 0.000 7 0 0.000 1 0.000 29 1.76 0.3282 0.2782 0.1360 Bird ------27 1.31 0.1077 0.0185 0.8298* Total/Mean 59 14 0.007 9 0.752 59 7 0.005 4 0.206 59 17 0.007 8 0.678 7 87 0.018 44 0.895 26 2.18 0.2013 0.1656 0.1691

58

Figure 3.1 Collection location (black dots) and distribution map for A) contemporary species ranges, B) nuclear haplotypes from each individual nuclear locus (508, 517, and 520, reading left to right) according to 3 individuals per location, C) mitochondrial haplotypes, and D) putative populations according to clustering obtained in STRUCTURE for six microsatellite loci. Black bars indicate possible breaks in species/populations and colors are indicative of species haplotypes: green (C. pitensis), red (C. gulosus), yellow (coastal C. gulosus), and purple (Sacramento basin C. pitensis).

59

Figure 3.2 Bayesian phylogenetic species tree developed from three nuclear loci. A single coastrange sculpin (C. aleuticus) served as an out- group, as in Baumsteiger et al. (2012). Three clades are evident: coastal C. gulosus, C. gulosus, and C. pitensis. Individuals from locations within the Sacramento River are intermediate between C. gulosus and C. pitensis clades. Branch support indicated by posterior probabilities (above).

60

Figure 3.3 mtDNA haplotype networks from unique cytochrome b sequences for each of three putative species (C. gulosus-riffle, C. pitensis-Pit, and C. gulosus-coast). Circle size correlates to number of individuals, with circle in legend approximately equal to one individual. Grey outlines represent locations containing a particular haplotype. Gray unlabeled box in coastal C. gulosus represents haplotype also found in Penitencia Creek. The gray box imbedded in the Kings River group is the Kaweah individuals. Black dots represent inferred substitutions between haplotypes. A complete list of haplotypes (CBH), by location, can be found in Table S2.

61

Figure 3.4 Mitochondrial coalescent tree estimated in BEAST, showing age of diversification based on a 0.9% mutation rate (per million years). Parentheses represent upper and lower bounds of estimate. Numbers above branches are posterior probabilities based on 37,500 trees. Colors match Figure 3.1 distinctions.

62

Figure 3.5 Clustering analysis based on microsatellite allele frequencies, as determined in STRUCTURE. Clusters (K) are indicated by color or shade. Individuals appear as slight vertical bars within each location (bordered by black lines).

63

Figure 3.6 Unrooted neighbor-joining tree showing Cavalli-Sforza Edwards chord distances obtained from 6 microsatellite loci (locus 318 removed). Values on nodes represent bootstrap percentages (out of 1000). Nodes without values showed less than 65% bootstrap support.

64

3.7 Supplementary Material

Supplementary Table 3A Haplotypes for three nDNA markers, showing numbers of individuals from each location with that haplotype. Dotted lines represent putative breaks between species/lineages. 508 517 H1 H3 H4 H5 H7 H8 H9 H10 H1 H2 H4 H6 H8 H10 H11 H13 H14 H15 H17 H18 H19 H20 Pit_Goose 3 3 3 Pit_SF 3 3 3 Pit_Ash 3 3 3 Pit_Fall 1 1 1 Pit_Rock 1 1 Pit_Clark 1 1 Pit_5 3 3 3 Sac_Cant 2 1 3 3 Sac_Sims 1 2 3 3 Sac_RBDD 1 1 1 3 Sac_RM205 2 1 1 1 1 1 Feather 3 1 1 2 1 1 1 American_SF 3 3 Stanislaus 3 3 1 Tuolumne 3 3 2 Merced 3 3 Kings_low 3 3 Kaweah 3 3 1 Russian 2 2 1 Guadalupe 4 2 1 Uvas 2 2

65

Supplementary Table 3A (continued) Haplotypes for three nDNA markers, showing numbers of individuals from each location with that haplotype. Dotted lines represent putative breaks between species/lineages. 520 H1 H3 H4 H6 H7 H8 H9 H11 H12 H15 H16 H17 H18 H19 H20 Pit_Goose 3 Pit_SF 3 Pit_Ash 3 Pit_Fall 1 Pit_Rock 1 1 Pit_Clark 1 Pit_5 3 2 Sac_Cant 2 1 Sac_Sims 3 Sac_RBDD 3 2 Sac_RM205 3 Feather 3 American_SF 3 Stanislaus 3 Tuolumne 3 Merced 3 Kings_low 2 2 Kaweah 3 Russian 2 Guadalupe 2 1 Uvas 2

66

Supplementary Table 3B Numbers of individuals, by location, containing specific mtDNA haplotypes. Dotted lines represent putative breaks between species/lineages. mtDNA – CB N H H H H H H H H H H H H H H H H H H H H H H H H H H H 1 2 3 4 5 7 8 9 10 11 15 16 17 18 21 22 23 24 31 32 33 34 35 36 37 39 40 Pit_Goose 8 8 Pit_SF 8 Pit_Ash 5 Pit_Fall 8 8 Pit_1 6 4 Pit_Hat 2 1 Pit_Rock 8 8 Pit_Clark 7 Pit_5 8 8 Sac_Duns 3 2 1 Sac_Cant 7 7 Sac_Sims 8 7 Sac_Clear 13 1 Sac_Battle 6 6 Sac_RBDD 8 4 1 2 1 Sac_RM205 8 4 1 1 1 1 Feather 9 1 3 1 1 1 1 1 American_NMF 8 American_SF 7 Mokelumne 8 Stanislaus 8 Tuolumne 7 5 2 Merced 8 8 Kings_low 10 9 1 Kings_up 7 5 1 1 Kaweah 8 4 1 1 1 1 Russian 2 1 1 Guadalupe 8 Uvas 7 Penitencia 8

67

Supplementary Table 3B (continued) Numbers of individuals, by location, containing specific mtDNA haplotypes. Dotted lines represent putative breaks between species/lineages. mtDNA - CB N H H H H H H H H H H H H H H H H H H H H H H H H H H 44 47 49 50 51 52 53 54 55 57 58 61 62 63 65 67 69 70 74 75 77 78 79 80 81 83 Pit_Goose 8 Pit_SF 8 7 1 Pit_Ash 5 4 1 Pit_Fall 8 Pit_1 6 2 Pit_Hat 2 1 Pit_Rock 8 Pit_Clark 7 1 1 Pit_5 8 Sac_Duns 3 Sac_Cant 7 Sac_Sims 8 1 Sac_Clear 13 11 1 Sac_Battle 6 Sac_RBDD 8 Sac_RM205 8 Feather 9 American_NMF 8 8 American_SF 7 3 1 1 1 1 Mokelumne 8 8 Stanislaus 8 8 Tuolumne 7 Merced 8 Kings_low 10 Kings_up 7 Kaweah 8 Russian 2 Guadalupe 8 3 3 1 1 Uvas 7 6 1 Penitencia 8 2 1 2 1 2

68

Supplementary Table 3C Genetic distance (Fst or Weir and Cockerham’s θ) (above diagonal) and geographic distance (in km) (below diagonal) found between locations; measurements used for pairwise comparison of isolation by distance. A = C. pitensis, B = Sacramento River, and C = C. gulosus locations. *not significant based on p < 0.05 over 100 permutations.

A Pit_Goose Pit_SF Pit_Ash Pit_Fall Pit_#1 Pit_Hat Pit_Rock Pit_Clark Pit_#5 Pit_Goose ------0.318 0.267 0.434 0.353 0.479 0.402 0.377 0.323 Pit_SF 104.00 ------0.017* 0.308 0.170 0.323 0.261 0.222 0.187 Pit_Ash 179.90 170.70 ------0.170 0.216 0.395 0.310 0.273 0.223 Pit_Fall 284.80 275.60 158.30 ------0.490 0.825 0.596 0.584 0.460 Pit_#1 221.20 157.00 108.00 65.40 ------0.122 0.032 0.000 0.100 Pit_Hat 228.48 164.28 115.28 72.68 7.28 ------0.122 0.100* 0.107* Pit_Rock 245.48 178.80 132.28 89.68 24.28 17.00 ------0.005* 0.022* Pit_Clark 253.33 186.13 140.13 97.53 32.13 24.85 7.85 ------0.005* Pit_#5 279.73 212.53 166.53 123.93 58.53 51.25 34.25 26.40 ------

B Sac_Duns Sac_Cant Sac_Sims Sac_Clear Sac_Battle Sac_RBDD Sac_RM205 Sac_Duns ------0.505 0.392 0.461 0.327 0.352 0.339 Sac_Cant 6.79 ------0.096 0.642 0.520 0.520 0.651 Sac_Sims 36.39 29.60 ------0.540 0.396 0.406 0.543 Sac_Clear 126.99 120.20 90.60 ------0.206 0.204 0.190 Sac_Battle 155.69 148.90 119.30 28.70 ------0.016 0.031* Sac_RBDD 205.09 198.30 145.00 79.50 49.40 ------0.004* Sac_RM205 259.09 252.30 199.00 133.50 103.40 54.00 ------

C Feather NMF_Amer SF_Amer Low_Amer Mokel Stan Low_Tuol Up_Tuol Merced Low_Kings Up_Kings Kaweah Feather ------0.304 0.364 0.036 0.403 0.535 0.223 0.541 0.500 0.621 0.583 0.432 NMF_Amer 174.00 ------0.624 0.297 0.587 0.820 0.363 0.801 0.774 0.799 0.788 0.648 SF_Amer 172.50 54.80 ------0.405 0.745 0.885 0.268 0.818 0.832 0.833 0.847 0.634 Low_Amer 135.00 39.90 40.00 ------0.325 0.609 0.160 0.581 0.553 0.624 0.558 0.397 Mokel 301.00 273.00 271.50 234.00 ------0.859 0.348 0.837 0.811 0.768 0.786 0.697 Stan 376.00 348.00 346.50 309.00 295.00 ------0.615 0.755 0.704 0.831 0.825 0.739 Low_Tuol 338.29 310.29 308.79 271.29 257.29 208.29 ------0.530 0.506 0.631 0.582 0.161 Up_Tuol 421.59 393.59 392.09 354.59 340.59 291.59 83.30 ------0.476 0.747 0.709 0.650 Merced 453.79 425.79 424.29 386.79 372.79 323.79 271.70 355.00 ------0.628 0.721 0.691 Low_Kings 555.79 527.79 526.29 488.79 474.79 425.79 373.70 457.00 420.00 ------0.072 0.777 Up_Kings 602.19 574.19 572.69 535.19 520.19 472.19 420.10 503.40 466.40 46.40 ------0.778 Kaweah 577.89 549.89 548.39 510.89 495.89 447.89 395.80 479.10 442.10 230.10 276.50 ------

69

Supplementary Table 3D Analysis of Molecular Variance (AMOVA) for two different potential groupings of locations based on STRUCTURE K outputs. All values highly significant (p < 0.001 for 1023 permutations).

For 4 Groups Source of d.f. Sum of Variance Percentage of Variation Squares Components Variation Among 3 919.824 0.72489 Va 47.11 Populations Among 887 1077.104 0.40037 Vb 26.02 Individuals within Populations Within 891 368.500 0.41358 Vc 26.88 Individuals Total 1781 2365.428 1.53885 Fixation Indices FST: 0.49189 FSC: 0.47106 FCT: 0.73124

For 15 Groups Source of d.f. Sum of Variance Percentage of Variation Squares Components Variation Among 14 1481.643 0.85079 Va 59.97 Populations Among 17 137.841 0.14163 Vb 9.98 Individuals within Populations Within 1750 745.944 0.42625 Vc 30.05 Individuals Total 1781 2365.428 1.41867 Fixation Indices FST: 0.69954 FSC: 0.24940 FCT: 0.55971

70

Supplementary Figure 3A Haplotype networks for each of three nuclear markers. Circle size correlates to number of individuals, with circle in legend equivalent to one individual. Locations matching haplotypes can be found in Supplementary Table 3A.

71

Supplementary Figure 3B Bayesian Skyline plot for each putative species or lineage using mtDNA.

72

Supplementary Figure 3C Plot of log likelihood values versus clusters (K) showing the correct K (eight) identified by the Evanno et al. (2010) method in STRUCTUREHARVESTER. The correct K is chosen based on where the curvature of the line plateaus. Shown also is the relative placement of K = 9, an additional cluster which may indicate a phylogeographic break at Pit River Falls.

73

Supplementary Figure 3D Plots of Isolation-by-Distance showing genetic distance (FST) versus geographic distance (in km) within identified clades.

Chapter 4: Crypsis and potential incipient parapatric speciation in a species of freshwater sculpin, Cottus asper

4.1 Abstract The prickly sculpin (Cottus asper) is a small freshwater species of fish found in coastal streams along the west coast of North America; it has a vast distribution requiring adaptation to multiple geographic environments. Though amphidromous, separate inland radiations into lacustrine and exclusively freshwater riverine environments has prompted the possibility of cryptic taxa. Two such radiations in California appear to exhibit different spatial models of speciation (allopatric and parapatric) at different temporal scales of development. Using molecular approaches, a novel species of C. asper was corroborated in Clear Lake, although its origin was found to be coastal and not the Sacramento River as previous suspected. A second cryptic lineage was discovered in rivers of the Great Central Valley, still connected to coastal populations through a narrow estuarine corridor. Not supported phylogenetically, unique mtDNA haplotypes, measures of isolation by distance, hybrids between forms, and microsatellite clustering portray this inland radiation to be experiencing incipient divergence. Additional population structuring within the inland form between Sacramento and San Joaquin River basins likely reflects the different ecological environments between basins as well as the adaptive capabilities of inland C. asper. Estimated to have diverged approximately 110,000 years ago, no perceived geographic barriers have existed between coastal and Central Valley forms. Combined with the presence of limited gene flow and hybridization restricted to the intermediate corridor and divergence appears to be an extreme form of parapatric speciation.

4.2 Introduction Current models of speciation tend to center around three primary spatial mechanisms, allopatry, parapatry, and sympatry (Butlin et al. 2012). And while the most prevalent model continues to be allopatry, a more intensive effort has been made to explore speciation with gene flow (Coyne and Orr 2004, Feder et al. 2012, Via 2012). Much of this attention has been devoted to assessing sympatric or ecological speciation in nature (Via 2001, Bolnick and Fitzpatrick 2007, Schluter 2009, Bird et al. 2012), though problems with this mechanism continue (Nosil 2008, Smadja and Butlin 2011). Improved knowledge of additional speciation mechanisms such as sexual selection, genetic drift, polyploidy, hybridization, and ecological reinforcement have all been shown to drive speciation regardless of geographic barrier (Bolnick and Fitzpatrick 2007, Butlin et al. 2012). Yet the spatial component remains an integral part of speciation in most organismal studies (Kisel and Barraclough 2010). Often lost in attempts to distinquish between allopatric and sympatric speciation is a clear understanding and appreciation of parapatric speciation. Described as a continuum between the other two extremes, parapatry is defined as connected populations which are diverging despite continuous gene flow (Bull 1991, Gavrilets et al. 2000, Butlin et al. 2008). Often this parapatry can occur via a very narrow hybrid zone (limited contact between diverging forms) or through practically intertwined populations (equal probability of encountering both forms) (see Box 3 in Butlin et al. 2012). Perhaps key to defining parapatry is the potential for continuous gene flow. In many cases, parapatry is suspected of being alloparapatry, where speciation is initially by allopatry followed quickly by secondary contact and apparent parapatry (Coyne and Orr 2004). Theoretical studies via simulation have shown parapatry to be a viable mode of speciation (Gavrilets 2000, Ödeen and Florin 2000, Kondrashov 2003); yet only some empirical examples have been found. Studies on ants (Seifert 2010), salamanders (Kozak and Wiens 2006), cave snails (Schilthuizen et al. 2005), butterflies (Hall 2005), fish (Giuffra et al. 1996, Kramer et al. 2003), birds (Aliabadian et al. 2009), and plants (Arnold et al. 1990) have all suspected divergence by way of parapatric speciation but more studies are needed to fully explore this model of speciation. Eliminating the possibility of contemporary gene flow or hybridization caused by secondary contact is one of the biggest challenges for demonstrating parapatric speciation (Butlin et al. 2008). Too 74

75

often organisms are already full species, subject to contemporary evolutionary pressures different from those experienced during the original diversification (Etges 2002). To best characterize true parapatric diversification, one needs to observe the process during its inception. Discovery of such a model system is not easy and there are no guarantees incipient species will become full species (Nosil et al. 2009). A few empirical examples of incipient parapatric speciation are available, including brown trout (Salmo trutta fario and S. marmoratus) (Giuffra et al. 1996), cichlids (Knight and Turner 2004), threespine sticklebacks (Gasterosteus aculeatus) (McKinnon and Rundle 2002), and marine and estuary silversides (Odontesthes argentinensis) (Beheregaray and Sunnucks 2001). Prickly sculpin (Cottus asper), a small, benthic freshwater fish species, has long been suspected of possessing cryptic forms (Krejsa 1965). With a wide-ranging distribution along the Pacific coast of North America (Southern California to Kenai Peninsula, Alaska), the species exists in coastal streams, estuaries, freshwater lakes, and most large coastal rivers (Page and Burr 2011). C. asper has a pelagic, larval stage, incorporating estuaries into larval rearing in coastal populations (Brown et al. 1995, Harvey et al. 2002). However there is strong evidence for two deep inland radiations into the Columbia and Sacramento/San Joaquin Rivers (Moyle 2002, Page and Burr 2011). As has been seen in other studies (Lee and Bell 1999, Schluter 2000, Reusch et al. 2001, Anger 2013), inland radiations often lead to novel speciation. Krejsa (1965) proposed based on morphology that two sympatric coastal ecotypes existed: an “inland” ecotype, which led to inland adaptive radiations, and a more ancestral “coastal” ecotype. Weakly prickled individuals (“coastal”) in coastal streams undertake downstream migrations to lay eggs and allow pelagic larvae to rear in estuaries or the sea (amphidromy). Strongly prickled individuals (“inland”) living both within the same coastal streams and more distant inland streams (no sea access), forgo this migration, choosing instead to lay eggs and rear pelagic larvae within strictly freshwater localities (Krejsa 1967). Bohn and Hoar (1965) affirmed Krejsa’s conclusions, using iodine metabolism to reveal physiological differences between coastal and inland ecotypes. Hopkirk (1973) followed up on Krejsa’s work by morphologically identifying what he believed was a new species of C. asper in Clear Lake, California. Long isolated from any coastal connections, Hopkirk believed C. asper had invaded Clear Lake by way of the Sacramento River, having become isolated and quickly diverging based on unique ecological conditions in the lake (Broadway and Moyle 1978). Additionally, Hopkirk discovered what he called “intermediates” between inland and coastal ecotypes in the San Francisco Bay and Sacramento/San Joaquin River Delta (Hopkirk 1973). As “intermediates” are likely hybrids, this area may represent a transition zone between types, where gene flow is limited but continuous. Within the Delta is a narrow geographic region known as the Carquinez Strait, a mixture of marine and freshwater components representing a constantly changing environment (Lucas et al. 2002). Estimated to have formed around 1.5 million years ago (Howard 1979), the Carquinez Strait closed around 600,000 years ago and reopened 450,000 years ago, having remained open since that time (Dupre 1990, Dupre et al. 1991). Other than contemporary anthropogenic changes, the only other potential influence to flows within the Carquinez Strait has been glaciation, which ended approximately 10,000 years ago (Howard 1979). This study looks to explore crypsis within C. asper and identify whether allopatric/parapatric models of speciation best explain inland radiations and potential incipient speciation of C. asper in Clear Lake and the Sacramento/San Joaquin River basins. Using mitochondrial sequence and microsatellite data, we examine 1) the origin and distinctiveness of Clear Lake C. asper, 2) whether an inland ecotype/form is present that constitutes an undescribed species, or is an incipient species, 3) the relative age of divergence for hypothesized cryptic forms and what factors correlate with their divergence, and 4) the role of gene flow in the proposed intermediate area and the role of hybridization in gene flow.

76

4.3 Materials and Methods We collected samples from multiple coastal and inland locations, making use of public and private museum, University, and agency collections. The remaining locations were sampled by way of backpack electroshocker (Smith Root Inc.) under a California Scientific Collecting Permit (10670). Thirty individuals were targeted per location to obtain a reasonable sample of the genetic pool at each particular location (per Hoban et al. 2013). Lower caudal fin clips were collected and stored in 100% non-denatured ethanol prior to DNA extraction, with limited whole individuals vouchered for future morphological analyses. DNA was extracted from fin clips using a DNeasy tissue extraction kit (Qiagen Inc.), following manufacturer’s protocols, and quantified on a NanoDrop 2000 spectrophotometer (Thermo Scientific), resulting in concentrations of 25 to 200 ng/µl. Two types of molecular markers were selected for this study; a 962 base pair mitochondrial region of cytochrome b (cytb) and a suite of 11 microsatellites (Baumsteiger et al. 2012, Baumsteiger and Aguilar 2012). Markers were chosen as different evolutionary time frames for comparative analyses. Mitochondrial Polymerase chain reactions (PCRs) for cytb were performed in 30 µl volumes with the following components: 1X PCR buffer (10 mM Tris HCl [pH 8.3] and 50 mM KCl), 1.5 mM MgCl2, 0.5 units of Taq DNA Polymerase (Applied Biosystems Inc.), 20 mM of each dNTP, 5 mg BSA, 0.67 mM of each primer, and 2μl of DNA. Amplifications were run at 94°C for 5 min., followed by 30-35 cycles of 94°C for 1 min., 48°C for 1 min., and 72°C for 1 min., and a final 72°C for 5 min. Eight individuals were chosen for cytb whenever possible. Amplified products were cleaned with ExoSap-IT (USB Inc.) following manufacturers’ protocols and sent to the University of California, Berkeley’s DNA Sequencing Facility for analysis on an ABI 3730XL Sequencing Analyzer. Fragments were sequenced bi- directionally and imported into SEQUENCHER (Gene Codes Corp.) or MEGA5 (Tamura et al. 2010). A consensus sequence was generated for each individual by overlapping bi-directional sequences, visually inspecting for quality. Alignments were performed with MUSCLE (Edgar 2004) and all sequences deposited in GenBank. Unique haplotypes for mitochondrial sequences were obtained with DNASP (Rozas et al. 2003, Librado and Rozas 2009) and the best-fit model of nucleotide substitution identified with JMODELTEST v0.1.1 (Posada 2008), using a corrected (AICc) delta score (Posada and Buckley 2004). A haplotype network was developed with TCS V1.21 (Clement et al. 2000). Summary statistics, including numbers of segregating sites (S), haplotypes (H), haplotype diversity (Hd), average number of nucleotide differences (K) and nucleotide diversity (π), were generated with DNASP. Phylogenetic analysis was conducted on unique haplotypes using a maximum likelihood (PHYML - Guindon and Gascuel 2003) and Bayesian (MRBAYES - Ronquist et al. 2011) statistical approach. A close ancestor, coastrange sculpin (Cottus aleuticus), was used as an outgroup (see Baumsteiger et al. 2012). Using information from JMODELTEST, PHYLML was run using the best tree searching operation from both nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR), accounting for invariable sites and across site rate variation if necessary. One thousand bootstrap replicates were run for branch support. MRBAYES was run for 10 million iterations, sampling every 1000. Convergence of two runs of six chains (heated chain 0.1) was assessed by insuring the standard deviation of split frequencies was below 0.01 and all potential scale reduction factors reasonably close to 1.0 for all parameters. A 25% burn-in was used to insure chains approached a stationary distribution, resulting in 7500 trees total. Using a coalescent approach, as implemented in the program BEAST V.1.6 (Drummond and Rambaut 2007), an estimate of time of divergence was calculated for nodes corresponding with specific regional or group separations. Two trees were generated. The first used a mutation rate of 0.9% per million years (0.5% - 1.3%), consistent with prior studies using cyb (Bermingham et al. 1997, Mueller 2006, Simon et al. 2006). The second used the predicted age of Clear Lake (480,000 yrs – Casteel and

77

Rymer 1981, Sims et al. 1988) as the maximum age for the Clear Lake divergence (with a standard deviation equivalent to 30,000 years). Though divergence was most likely caused by an isolation event in the Clear Lake region (predicted to be in the Pleistocene), it should have occurred after formation of Clear Lake proper, giving us a one-sided bounding. The mutation rate of 0.9% per million years (0.5% - 1.3%) was also used. Program variables were tested and narrowed to a constant population size, relaxed lognormal molecular clock, and uniform priors, except on the root node, which used a normal distribution. Fifty million iterations were sampled, only taking one in every ten thousand as an actual value. A final 25% burn-in was used to insure convergence of the MCMC chain on the true parameter space. Additional priors for the upper and lower bounds of the molecular clock were tested with TRACER (a sub-program within BEAST). Bayesian skyline plots were employed with BEAST to assess changes in effective population size. Applying a coalescent approach, a piecewise constant skyline model was used with a relaxed uncorrelated lognormal clock and mutation rate of 0.9% per million years. Seventy five million iterations were run, sampling every 1000, leaving 67,500 trees after a 10 percent burn-in. Operator performance was optimized to insure all Pr values were within acceptable limits. A final parameter test was performed with TRACER, where all parameters had acceptable ESS values. Plots were generated using a stepwise (constant) Bayesian skyline variant, with argument traces of popSize and groupSize used over 100 bins. Microsatellites Eleven microsatellite markers were run for sampling locations with at least14 individuals, using the M13 protocol of Schuelke (2000). All loci were amplified in 10 µl reactions (2 µl of genomic DNA) with a multiplex kit (Qiagen Inc.) or standard PCR mix (same as cytb). Multiplex PCR conditions were 95°C for 15 min., 20 cycles of 95°C for 30 sec., primer specific annealing temperature for 30 sec., and 72°C for 45 sec. followed by 15 cycles at a fixed 48°C annealing temperature and two holds of 60° C for 30 min. and 12° for infinity. Non-multiplex PCRs were identical except for an initial 5 min. hold and final extension of 72°C for 7 min. Locus specific PCR products were combined and run on an ABI 3130XL Fragment Analyzer (Life Technologies Inc.) and fragments analyzed using ABI GENEMAPPER software. A Liz500 size standard was used to confirm fragment sizes. Reruns were performed to insure allele calls and minimize error. Finalized allelic tables from GENEMAPPER were formatted for downstream programs with CONVERT (Glaubitz 2004). General descriptive statistics including numbers of alleles, allelic richness, observed and expected heterozygosity (HO and HE), inbreeding (FIS), conformance to Hardy-Weinberg Equilibrium, and linkage disequilibrium were generated through the programs GDA (Lewis and Zaykin 2001) and ARLEQUIN V3.0 (Excoffier et al. 2005). Allelic richness was standardized for sample size using HP-RARE (Kalinowski 2005). We used GENETIX v4.5 (Belkhir et al. 2004) to assess population level differences between groups or sampling locations via pairwise FST estimates and ARLEQUIN for an analysis of molecular variance (AMOVA). Genetic distances were obtained from pairwise FST estimates according to θ (Weir and Cockerham 1984). An additional population level analysis using a Bayesian Markov Chain Monte Carlo (MCMC) clustering approach was employed in STRUCTURE (Pritchard et al. 2000). Samples were run with an initial burn-in of 100,000 followed by 100,000 runs using an admixture model, with allele frequencies assumed to be correlated, K (clusters) ranging from 1-25, and seven iterations per K. The number of genetically distinct clusters represented in the data were determined using the method of Evanno et al. (2005) and by plotting log likelihood values versus K to assess where the curve begins to plateau using STRUCTURE HARVESTER (Earl 2011). To average over the number of different population assignments (based on 7 iterations) for a chosen K, the program CLUMPP (Jakobsson and Rosenberg 2007) was run using a Greedy algorithm and 1000 replicates. To assess and interpret the chosen K, additional K values were tested and compared. Final visual presentation of clustering was prepared in DISTRUCT (Rosenberg 2007).

78

Gene flow Measures of migration as well as divergence times between coastal and inland C. asper through the potential intermediate region of the Sacramento/San Joaquin River Delta were estimated with the isolation with migration approach implemented in the program IMA2 (Hey and Nielsen 2007). Locations were pooled into a coastal versus inland and a Sacramento River basin versus San Joaquin River basin comparison. Two independent runs were performed for each comparison; each consisting of a burn-in period of 100,000 steps followed by 1.5 x 106 steps sampled every 100 steps (15,000 total sampled steps). Runs gave highly similar results and were combined to estimate migration and divergence parameters. Secondary analyses were attempted on coastal versus Sacramento, coastal versus San Joaquin, and Sacramento versus San Joaquin samples. A mutation rate of 9.0 x 10−9 substitutions for cytochrome b was multiplied by 962base pairs to give an estimate of 8.7 x 10-6 (substitutions/site). A generation time of three years was assumed based on life-history data from other closely related cottids (Moyle 2002), resulting in a mutation rate per generation of 2.6 x10−9. Measures of migration were tested for differences from zero using a log-likelihood ratio test according to Nielsen and Wakeley (2001). Isolation-by-distance (IBD), a relationship between genetic and geographic distances, was evaluated for each sampled location or regional grouping. Approximate geographic distance between each location was estimated in kilometers using Google Earth following strictly riverine connectivity. In the Kings River connectivity with the San Joaquin River is no longer available due to agricultural diversions, making estimates largely based on historical river topology and accounts. A Mantel test was performed on each pairwise comparison to statistically assess the correlation between genetic and geographic distance using the program IBDWS (Bohonak 2002). Overall and regional results were plotted with Microsoft Excel to assess the linear regression between FST (from microsatellite data) and geographic distance. For the Clear Lake region, the two sampling locations were compared to all other locations sampled in this study (coastal and inland) as well as a single pairwise comparison between locations within the Clear lake region. The remaining regions (coastal and inland) were restricted to intra-regional comparisons. The presence of hybridization between inland and coastal regional groupings was assessed with the program NEWHYBRIDS (Anderson and Thompson 2002). Assignment to parental lineage (coastal, inland), F1, F2, or backcross (BCC or BCI) was achieved using a Bayesian MCMC algorithm to obtain a posterior distribution (≥ 0.500) for each hybrid type. A series of simulations were performed with HYBRIDLAB (Nielsen et al. 2006) to assess the power of the 11 microsatellites for successfully identifying the presence of hybrids or levels of hybridization. Using the baseline allele frequencies from parental lines of coastal (318 individuals) and inland (253 individuals) lineages, 50 individuals of F1, F2, BCP, or BCR were simulated and run through NEWHYBRIDS to assess accuracy.

4.4 Results Forty-three locations were sampled across the range of C. asper in California, including three locations in Oregon and Washington (Columbia River, Oregon coast, and Winchuck estuary) (Table 4.1 and Figure 4.1). One location was outside any known range for C. asper (Mojave River). Twenty-six locations were coastal in origin, three from Clear Lake/Bear Creek, twelve from inland locations (five from Sacramento River/ tributaries, one tributary to the delta –Mokelumne, five from San Joaquin River/tributaries), and two were intermediate between coastal and inland locations (Suisun Bay and Napa River). Overall numbers were reduced for mitochondrial analysis, with 34 locations used (twenty coastal, three Clear Lake, two intermediate, and nine inland) totaling 238 individuals (Table 4.2). To maximize allelic sampling from locations, 28 locations (fourteen coastal, two Clear Lake, two intermediate, and ten inland) were chosen for microsatellite analyses based on 14 minimum individuals per location (Table 2).

79

Exceptions were Winchuck (11), Wilson Creek (7), Russian River (13), Bear Creek (5), and the American River (12). Overall numbers for microsatellite analysis totaled 654 individuals. Mitochondrial Final cytb sequence length for all samples was 962 base pairs and Tamura-Nei (Trn + I – Tamura and Nei 1993) the best evolutionary model. The next closest score, with an AICC ∆ score of 1.84 was the same model with a gamma component (Trn+I+G). Patterns of sequence diversity were always highest in coastal locations, and of those, the more northerly locations were noticeably higher (Table 4.2). After these regions, locations within Clear Lake had the second highest haplotype diversity (HD) and numbers of segregating sites (S). Locations earmarked as intermediate or inland had intermittent diversity measures, with no consistent pattern across locations (Table 4.2). Looking at the consensus phylogenetic tree of cytb sequences, very little structure was evident (Figure 4.2). Two primary clades could be identified; a Clear Lake clade (posterior probability 0.96, bootstrap 83%) and a larger overall C.asper clade (0.83 and 100%). Two poorly supported sub-clades, Coastal A (0.62 and 59%) and Coastal B/Inland (0.70 and 43%) were discovered within the overall C. asper clade, with most coastal locations found to have individuals in both sub-clades (Figure 4.2). Improved group structure was apparent in the haplotype network for unique mtDNA haplotypes (Figure 4.3). As with the phylogenetic analysis, Clear Lake and its tributaries are noticeably different from the remaining C. asper locations, with nine changes before the next nearest haplotype. A pattern of radiation can also be discerned in this group, with the founding Clear Lake haplotypes giving rise to all tributary haplotypes (Figure 4.3). Evidence for a distinct Coastal A, Coastal B, and Inland grouping is also observable, with each predominated by a single haplotype (Table S4A). Plotting unique haplotypes onto a geographic map of sampling locations reveals a number of patterns (Figure 4.1). The first is the presence of haplotypes of both coastal groupings (A and B) in most northern coastal sites. Although not shown, coastal Oregon and Washington locations also share this mixed pattern (see Table S4A). A strong phylogeographic break appears to occur within the distinct Inland group between Sacramento and San Joaquin River basins, with no shared haplotypes between basins. The single exception is the Mokelumne River, which contains mixed haplotypes. Mixed haplotypes between coastal and inland groupings were also observed in the Napa River and Suisun Bay, areas intermediate between proposed groupings (Coastal and Inland) (Figure 4.1). Lastly, two unique haplotypes appear solely in southern California locations, an apparent extension of the primary Coastal B haplotype (Figure 4.3), and two individual samples, one in the Santa Clara River and one in the Mojave River, have unusual inland haplotypes (San Joaquin River basin and Sacramento River basin sources, respectively). Using a Bayesian coalescent approach in BEAST, estimates of divergence times were obtained (Figure 4.4). As with the phylogenetic tree, three clades were strongly supported (Clear Lake, Coastal A, Coastal B/Inland). Clades/group names are simply a reflection of the primary location for most samples contained within that clade/group. Using only the 0.9% mutation rate, Clear Lake is estimated to have diverged from other C. asper approximately 2.35 million years ago (m.y.a.) (1.15 – 4.19, 95% C.I.). This would put the divergence of the inland radiation and Coastal B some time around 560,000 years ago (0.27 – 0.91). Using the estimated geologic age of Clear Lake (~480,000 years ago), the divergence time between Coastal B and the Inland groups is estimated at around 111,089 years ago (40,899 – 191,544, 95% C.I.). Also rather than 1.19 m.y.a. (0.57 – 2.08), Coastal A is estimated to have diverged 234,154 years ago; it would not become a monophyletic clade until approximately 70,000 years ago (as opposed to 330,000 years ago). An unusual and unexplainable clade (Coastal C) appeared in this analysis, consisting of three individuals from the coastal Oregon, Klamath and Russian River locations. Clades indicative of Sacramento and San Joaquin basin groups are not monophyletic, especially the Sacramento basin group with numerous individuals also found in the Coastal B clade/group.

80

Due to a lack of distinctiveness between locations for cytb sequences, we reanalyzed general levels of diversity according to unique haplotype groups observed in Figure 4.3 (Clear Lake, Kelsey/Bear, Coastal A, Coastal B, and Inland groupings). Further statistical support for these groups can be attributed to haplotype diversity scores of 1.0 for each grouping (Table 4.3). As with individual locations, Coastal groups had the highest numbers of segregating sites and unique haplotypes, followed by the Inland group, the Kelsey/Bear group, and the Clear Lake group (Table 4.3). Interestingly, Kelsey/Bear had an unusually high number of unique haplotypes and segregating sites, despite only sampling from two locations totaling 20 individuals (Table 4.2). A simple measure of mtDNA divergence between groups found all comparisons to have very low divergence (< 2%), similar to phylogenetic tree analyses. Kelsey/Bear Creeks appears to have the highest divergence from either Coastal or Inland groups (1.7%- 1.9%), though virtually none from Clear Lake (0.3%). Divergence between Coastal groups is also relatively low (0.8%), and even lower between Coastal B and Inland groups (0.5%). Lumping Coastal groups together, a Bayesian skyline plot found evidence of a strong recent increase in effective population size whereas the Inland group showed virtually no change in size, with only the slightest increase over time (Figure 4.5). Measures of migration obtained from IMA2 found very few migrant genes between inland and coastal populations (2N1m1 > 2N0m0 = 0.948, 95% HPD 0.054 – 5.995). A likelihood ratio test found the migration value to be statistically significant (LLR = 9.159, p < 0.01). The divergence time estimate for the coastal/inland radiation was approximately 1.595 (0.665 – 9.985), which when adjusted for substitutions per generation puts divergence around 61,407 years ago (25,602 - 384,422). Analyses run between Sacramento and coastal populations (1.413, 0.0614 – 7.883; LLR = 7.697, p < 0.01) and San Joaquin and coastal populations (0.883, 0.054 – 5.531; LLR = 8.705, p < 0.01) were similar to overall values. No evidence of migration between Sacramento and San Joaquin River basins was found (0.8096, 0 – 5.667), with a LLR of 2.487 (not significant). Microsatellites A select set of locations based on the numbers of individuals sampled were used for all microsatellite analyses, with locations renumbered to reflect changes (Table 4.1). Summary statistics were generated per sampling site, with sites broken into Coastal, Clear Lake, inland, and intermediate locations (Table 4.2). The overall average allelic richness (after correcting for rarefaction) was 3.02, with all coastal locations ranging from (3.0 – 3.9). Inland locations were slightly lower, ranging from 2.0 – 3.9, (mean 2.7). Intermediate locations (Suisun Bay and Napa) had allelic richness similar to coastal locations while Clear Lake and surrounding Bear Creek were noticeably lower than the average (2.7 and 1.6 respectively). The highest proportion of private alleles was found in the Napa River (0.23) compared to the overall average of 0.06. Wilson (0.18), Smith (0.13), Clear Lake (0.15), San Joaquin (0.15) and the Kings River (0.14) were also noticeably higher than the average. Average heterozygosity was only slightly higher in coastal locations than inland locations (.400 vs. .332), with all locations being in Hardy Weinberg equilibrium except Freshwater Creek (Table 4.2). Pairwise FST values grouped by coastal (locations 1-14), Clear Lake (15-16), intermediate (19- 20), and inland locations (17, 18, 21-28) always indicated higher values between groups than within (Table 4.4A). The highest average FST value was between Clear Lake and the Inland grouping (0.501) and the lowest between the Intermediate and Inland grouping (0.116). Within group FST averages were highest within the Inland group (0.143) and lowest within the Intermediate group (0.013). Individual location FST comparisons were similar to overall averages, with the highest observed value between Suisun Bay and the Calaveras River (0.699) and the lowest between coastal locations Big Lagoon and Redwood Creek (0.010) (Table S4B). Summary statistics on each of the four groups found allelic richness was highest in coastal locations and lowest in Clear Lake locations (Table 4.4B). However, Clear Lake had the highest proportion of private alleles (1.25).

81

An analysis of molecular variance (AMOVA) on a proposed five group division (Inland divided into Sacramento and San Joaquin groups – see STRUCTURE analysis below) found among group variation to be 20%, whereas among population within group (6%) and among individuals within populations (5%) were low. The majority of variance was attributed to within individuals (69%). Fixation indices among groups were 0.0717 (FIS), 0.0762 (FSC), 0.200 (FCT) and 0.314 (FIT), with significance tests (1023 permutations) showing all values to be highly significant (p < 0.0001). Clustering analyses with STRUCTURE found three primary clusters (K), attributable to a Coastal group, an Inland group, and a Clear Lake group (Figure 4.6). While the Evanno et al. (2005) method strongly supported three clusters (∆K 168.6), additional but less supported clustering shows the Santa Clara River population to be unique (Figure 4.6). Suisun Bay, Napa River, and to a lesser degree the Mokelumne River show clear intermediate allele frequencies between coastal and inland populations. Additional sub-structuring within original clusters was evident in both Coastal (K = 3) and Inland (K = 6) groups, but not in Clear Lake (K = 1). The Coastal group contained the unique Santa Clara population and a mixture of two sub-clusters spread across clusters. The proportion of individuals assigned to a particular cluster changed between Cottoneva Creek and the Van Duzen River. Inland locations had significantly more sub-structure, with a Sacramento River basin population, an individual Calaveras, Stanislaus, Merced, and Kings River population and an admixed San Joaquin River (Figure 4.6). Intermediate locations (Suisun Bay and Napa River) were found to contain the highest proportion of identified hybrid individuals based on both STRUCTURE and NEWHYBRIDS analyses (Table 4.5). Greater than 50 percent of samples taken in these locations were identified as hybrid. The Mokelumne River also had a relatively high number of identified hybrids (28%) when compared to other coastal or inland locations (ave. 3%). Interestingly, all hybrids identified in this study were F2. Examination power to identify hybrids to type (F1, F2, etc.) was moderate to low, especially BCC (only 38% accuracy). However, markers were successful at differentiating hybrid from parental type (91%), with most errors occurring in the identification of parental inland from inland back-crosses (Table 4.5). The presence of two coastal clades (A and B) in the mtDNA analysis prompted an investigation into whether similar patterns were apparent in the microsatellite data (Table 4.5). Using individuals from eight locations found in both mtDNA and microsatellite analyses (Table 4.2), we compared Coastal A and B haplotypes to putative clustering found within STRUCTURE analyses (according to Figure 4.6, two sub- clusters are present). Assignment to cluster was based on probabilities greater than 0.50 from the CLUMP output (based on multiple iterations), though most individuals were > 0.70. A Fisher’s exact test was performed between a specific coastal clade and a specific microsatellite cluster (Table S5), to assess whether patterns were different than random and found a strong statistical difference (p = 0.002). Overall correlation was around 70%, though values ran as high as 88% (Navarro River) and as low as 57% (Big Lagoon and Russian River) in any one particular location (Table S5). We screened for potential isolation by distance (IBD) between all locations using pairwise FST values and geographic distance (Table S4B). Using a Mantel test, overall results were positive for patterns of IBD (r = 0.476, p < 0.001) (Figure 4.7). Grouping locations into either Coastal or Inland found strong IBD in coastal locations (r = 0.960, p < 0.001) and no IBD in inland locations (r = 0.277, p = 0.148). A final analysis of separated Sacramento and San Joaquin River basins within the Inland group found no IBD in the Sacramento group (r = 0.266, p = 0.236) and very low IBD in the San Joaquin group (r = 0.628, p =0.049). Because the Clear Lake group consists of only two locations, intragroup IBD and subsequent Mantel tests could not be calculated.

4.5 Discussion Overall, this study found C. asper to be a highly dynamic and complex species group within California. Clear evidence points to Clear Lake populations being a distinct species, diverging from

82

ancestral coastal locations via allopatry prior to the inland radiation of C. asper. Two phylogenetic clades of coastal C. asper were also discovered, supported by microsatellite data, often with representatives of both clades/clusters in the same coastal location. The presence of a distinct inland form of C. asper was confirmed, with unique haplotypes, population structure, and gene flow patterns despite weak phylogenetic structure; all which strongly support incipient divergence. Estimates of divergence between coastal and inland forms were relatively recent (~110,000 years ago), too recent for known geologic event to cause populations to be allopatric. Thus with no geologic evidence of physical barriers between forms, a geographically restricted hybrid zone, and limited gene flow through a narrow connective corridor, it is reasonable to infer inland C. asper may represent a form of parapatric divergence. Crypsis While we are confident the Clear Lake C. asper is a new species (distinct in every analysis), the idea of crypsis may be somewhat misleading. Clear Lake, a large, warm shallow inland lake in California harbors a number of endemic fish species including splittail (Pogonicthys ciscoides), hitch (Lavinia exilicauda chi) and tuleperch (Hysterocarpus traski pomo (Moyle 2002, Moyle et al. 2011). In 1973, Hopkirk found morphological and morphometric differences among coastal, inland, and Clear Lake C. asper based on increased fin rays and skin prickles (minimal, moderate, and extensive prickling, respectively) (Hopkirk 1973). A distinctive lake-adapted ecology in the Clear Lake population has also been recorded (Broadway and Moyle 1978). A recent genetic study by Baumsteiger et al. (2012) proposed Clear Lake C. asper to be a putative species based on a nine nuclear and two mitochondrial marker species tree along with species delimitation using 83 single nucleotide polymorphisms (SNPs). Therefore we might argue this study is simply a genetic reaffirmation of Clear Lake C. asper as a new lineage, one in need of a formal documented description for full recognition as a species. A similar argument might be made for the two phylogenetic mtDNA clades and microsatellite clusters of coastal individuals, many of which were found apparently coexisting in the same sampling locations. Based on the haplotype network, the Coastal B clade likely gave rise to the Inland group proposed in this study. Patterns also suggest newly founded inland haplotypes within the Sacramento River radiated into the San Joaquin River, establishing unique haplotypic structure in each sampled location. Coastal A haplotypes appear genetically closest to Clear Lake haplotypes and are possibly ancestral for this group. This is reasonable given Clear Lake once drained into the Russian River, a coastal river system (Hopkirk 1973; Moyle 2002). It is possible these clades/clusters represent the two ecotypes described by Krejsa (1965), with a lightly prickled amphidromous “coastal” ecotype and a heavily prickly freshwater “inland” ecotype. However we did not collect individuals according to ecotype and therefore cannot definitively correlate ecotype with clade/cluster. But given the number of other studies which identified phenotypic differences in coastal and inland C. asper based on physiology, morphology, migration or freshwater isolation (Bohn and Hoar 1965, McLarney 1968, Hopkirk 1973, White and Harvey 1999), it would be inappropriate to define these clades/clusters as cryptic unless clear genetic distinctions are linked to proposed phenotypic differences. The recognition of a distinct Inland group in the Sacramento/San Joaquin River basin is a true cryptic lineage in our estimation. No previous morphological study has been attempted to ascertain differences between coastal and Central Valley inland forms, with Krejsa (1965) and Hopkirk (1973) incorporating this group into the “inland” ecotype. But genetic results show clear differences between forms, differences which reflect a species in transition. Estimated to have diverged around (500,000 – 66,000 years ago; 110,000 years ago using the origin of Clear Lake bounding), little to no phylogenetic signal is available from the mtDNA. However, unique, lineage-specific mtDNA haplotypes were discovered. Population structure (coastal – limited, inland – prevalent but mainly in the San Joaquin River basin) and measures of gene flow (coastal – strong IBD, inland – absent) were also found to be different between forms. Summary statistics for mtDNA and microsatellites consistently demonstrate

83

higher levels of haplotype, nucleotide, and allelic variation and heterozygosity in coastal locations and Bayesian skyline plots found dissimilarities between patterns of effective population size (coastal – strong recent increase, inland – limited but continuous growth). Though extremely limited in sample size, Baumsteiger et al. (2012) also found evidence for differences between lineages using 83 SNPs. But perhaps the final evidence for two distinct lineages or forms of C. asper (other than Clear Lake) is the presence of hybrids in the Sacramento/San Joaquin Delta. Largely isolated to the “intermediate” region between coastal and inland forms (see Krejsa 1965, Hopkirk 1973), this area represents a constantly changing estuarine environment, with fluctuations in salinity, dissolved oxygen, temperature, and marine and freshwater influences (Lucas et al. 2002) that are very different from either parental habitat. Because hybrids are predicted to most often occur between defined ecological habitats (Anderson 1948, Allendorf et al. 2001), the almost 8-fold average difference in hybrid discovery between coastal/inland locations and intermediate locations reinforces the notion of ecological distinctiveness between habitats. Interestingly, of the hybrids identified in the intermediate region, all were F2, suggesting no outbreeding depression and potential pre-zygotic barriers to cross-lineage mating (including hybrids) are present (Dowling and Secor 1997, Mallet 2005). We recognize our microsatellites were not particularly adept at separating hybrids into classes (F1, F2, etc.) although they are > 91% effective at identifying hybrids from parental types. But even with some error, the overwhelming proportion of F2 hybrids may represent the beginning of a new population of C. asper, adapted to this intermediate, heavily altered environment (Arnold 1992, but see Servedio et al. 2013). Speciation We propose two classic geographic models of speciation are evident in C. asper, with one still at an incipient stage. Although studies continue to introduce speciation factors outside of these classical models (White 1968, Lynch 1989, Gavrilets 2003, Bolnick and Fitzpatrick 2007, Butlin et al. 2012, Barton 2013), their applicability has not been questioned. The Clear Lake lineage identified here is a typical example of allopatric speciation. Originally inhabited through connections with the Russian River (see below), invading species were cut off some time during the Pleiostocene (Hopkirk 1973) and evolved in isolation, leading to the contemporary species. Parapatric speciation is the most probable, but not the only, explanation for the formation of the inland lineage or form of C. asper in the Sacramento/San Joaquin River basin. From a geologic perspective, divergence occurred (for both divergence estimates) after proposed 1.8 and 0.65 m.y.a. geologic events (Dupre et al. 1991), allowing systems to maintain a continuous connection; though only through a very narrow corridor, Carquinez Strait (Warner et al. 2002). According to Losos and Glor (2003), this fact alone would qualify this lineage as parapatric, as they define parapatric speciation solely on whether populations are geographically adjacent to one another. This ecologically intermediate area is also the primary hybrid zone between forms, with restricted regional gene flow. As proposed by Endler (1977), this would limit contact between forms, allowing lineages to evolve essentially in isolation. In addition to a narrow zone of contact, we found an overall signal of IBD, a factor which has been shown to drive parapatric speciation in simulation models through spatial self-organization of the gene pool (Hoelzer et al. 2008). Lastly, our study found coastal and inland haplotypes in the intermediate region, as well as mixed allele frequencies in microsatellites, suggesting individuals from both lineages are still moving through this region in addition to hybrids. The presence of this bimodal hybrid zone (as defined by Jiggins and Mallet 2000) is taken as empirical evidence of a stable contact zone, one that does not require allopatric speciation for divergence. The lone antithesis of our parapatric hypothesis is overall measures of gene flow were extremely low, which could be an artifact of using a single gene (Morjan and Rieseberg 2004, Hedrick 2006), the narrow corridor in which species are connected, or simply a reflection of improving pre- or post-zygotic barriers between groups (as seen in the number of F2 hybrids) as the

84

result of anthropenic manipulation of waterways. In the end, we feel our system coincides well with an extreme form of parapatric speciation, as defined by Butlin et al. (2012). As is often the case with proposals of parapatric speciation, there is the potential for alloparapatric speciation (Coyne and Orr 2004, Butlin et al. 2008). While there is less evidence for a geologic barrier between coastal and inland forms since their divergence (one divergence estimate has a 95% confidence interval which overlaps known geologic breaks), there may be an ecological one. During most of the Pleistocene, the Sierra Nevada was experiencing fluctuating glaciation (Howard 1979, Phillips et al. 1996). This glaciation resulted in changes in sea level, temperature, and riverine flows (Howard 1979). Any one of these may have modified flows through the Carquinez Strait, isolating forms for some period of time. Additionally, the presence of an “intermediate” ecological area may serve as a barrier as well, reducing or preventing gene flow between forms. In any case, allopatric speciation would have led to the initial divergence, followed by secondary contact between now isolated forms, leading to the hybridization seen in the Carquinez Strait area. The lack of a phylogenetic signal seen in our study would suggest this hypothesis is less likely, as countless species have shown species-level divergence since the Pleistocene (Avise et al. 1998, Knowles 2001, Moyle et al. 2009). The presence of two ecotypes with morphological, life history, and physiological differences (Bohn and Hoar 1965, Krejsa 1965, Hopkirk 1973) in coastal C. asper locations as well as two mtDNA clades (and corresponding microsatellite clusters) within the same location introduces the idea of sympatric divergence. Unfortunately we lacked the information to directly correlate these ecotypes with our genetic results. Sympatric ecotypes have been seen in other freshwater fish species including rainbow trout (Oncorhynchus mykiss – Zimmerman and Reeves 2000), whitefish (Coregonus clupeaformis – Lu and Bernatchez 1999), and rainbow smelt (Osmerus mordax – Saint-Laurent et al. 2003), often related to life history type. However, even if genetic results were congruent with ecotype, the only 70% correlation between mtDNA and microsatellite data likely represents incomplete reproductive barriers between types (Kondrashov and Kondrashov 1999, Bolnick and Fitzpatrick 2007). Therefore these C. asper may be prime candidates for future research related to sympatric speciation or parallel and convergent evolution (Wood et al. 2005), especially with advances in genomics research (Michel et al 2010). Clear Lake One novel feature of the proposed new species of Clear Lake C. asper is its origin. Due to the resemblance of most native Clear Lake ichthyofauna with species of the Sacramento River basin, it was presumed species invaded Clear Lake via its only current outlet, Cache Creek - a Sacramento River tributary (Hopkirk 1973, Moyle 2002). Yet, geologic data support a prior outlet or connection with the Russian River through the Cold Creek/ region to the north, a connection severed by volcanic activity sometime in the Pleistocene (Hodges 1966, Hopkirk 1973). Modern day Clear Lake is estimated to be about 480,000 years old (Casteel and Rymer 1981, Hearn et al. 1988), also in the Pleistocene. It is our contention the severing of the Russian River connection led to the divergence of C. asper in Clear Lake and that this most likely occurred following lake formation. Therefore, using this value as the oldest possible age of divergence of the Clear Lake lineage (within the Pleistocene), coalescent and phylogenetic analyses clearly illustrate divergence occurred prior to the existence of an inland lineage. Our haplotype network also reveals far fewer genetic changes are required between Clear Lake and coastal locations than Clear Lake and inland locations. Finally, though not the final clustering (K) value in STRUCTURE, initial clustering always assigned Clear Lake and coastal, not inland, locations together. Taken as a whole, we feel the Clear Lake lineage originated from individuals in a coastal location, most likely the Russian River as described by Casteel and Rymer (1981) and Hearn et al. (1988), and were isolated upon the permanent dissolution of that connection. It is worth noting that the Clear Lake C. asper has spread to parts of its outlet, Cache Creek, as indicated by its presence in Bear Creek (a tributary to Cache Creek).

85

Population Structure Outside of groups proposed in this study (Clear Lake, Inland, Coastal ecotypes), limited population structure was identified. AMOVA analyses found 20% of the variance attributable to these groupings but less than 5% to any sub-structuring within those groups. Of the structure observed, some could be explained by known breaks in other species. For example, in the coastal group, two clusters were observed throughout, though the proportion of each seems to change geographically somewhere between Cottoneva Creek and the Van Duzen River. This region, known as Cape Mendocino, represents a phylogeographic break in a number of other species (Dawson et al. 2001, Edmands 2001, Marko and Hart 2011). With an amphidromous life history, coastal C. asper exhibits some marine dispersal during its pelagic larval stage, a factor which could explain this potential break (Moyle 2002). Perhaps the most extensive population structure was found within the Inland group. Here evidence of a phylogeographic break between locations associated with the Sacramento River and San Joaquin River basins was discovered, with no structure in the former and individual locational structure in the latter. Basin specific haplotypes were also discovered as were differing measures of IBD (only present in the San Joaquin). The proposed break is consistent with breaks seen in other species of plants (Carlsbeek et al. 2003), amphibians (Fisher and Shaffer 1996), reptiles (Spinks and Shaffer 2005), shrews (Maldonado et al. 2001) and fish (Aguilar and Jones 2009, Chapter 2). Each basin is characterized by different historical and contemporary ecological and geographic factors which could explain the structure seen here (Hickman 1993). For example, the San Joaquin River basin was strongly impacted by glaciation in the bordering Sierra Nevada as well as the presence of a freshwater lake throughout much of the Pleistocene (Howard 1979, Dupré 1990, Phillips et al. 1996). Unlike the Coastal group, which shows signs of significant IBD attributable to a migration-drift equilibrium, the overall Inland group does not show IBD. Much of the Sacramento/San Joaquin River basin is subject to extensive contemporary anthropogenic influence (Leu et al. 2008), a factor which may have erased any signal of IBD and may also explain the population structure observed. The limited IBD discovered in the San Joaquin River basin may simply be a remnant of the original inhabiting coastal individuals or specific water resource allocations unique to this system (Lettenmaier and Gan 1990, Matthews 2007). In any event, individual population structure is seen for every sampled tributary in the San Joaquin River whereas those of the Sacramento River show no structure. This pattern has also been found in California roach (Lavinia symmetricus) (Brown et al. 1992). Conservation and management personnel may wish to adopt this information into future strategies, weighing tributaries to the San Joaquin River more intensively than Sacramento River locations. Potential Anthropogenic Range Expansion With the extensive anthropogenic modifications to California’s waterways, some native species have managed to expand their range into otherwise inaccessible regions (Moyle and Williams 1990, Moyle 2002). Two examples were evident in our C. asper sampling: the presence of C. asper and its Sacramento River basin haplotype in the Mojave River and the presence of a San Joaquin River basin haplotype in the coastal Santa Clara River. Both locations can be easily explained by their connection to the California State Water Project (CA Department of Water Resources). Pumping water from the Sacramento/San Joaquin River Delta, aqueducts run south to southern California to provide water for the region (Dewey and Madsen 1976, Israel and Lund 1995); serving as the likely vector for the sculpin. The water is stored in reservoirs, in which C asper can form abundant populations (Moyle 2002) allowing them to colonize downstream areas. Whether C. asper individuals were transported as adults or pelagic larvae is unclear, but our data reinforce the notion that anthropogenic factors are moving fish species outside their native ranges in California (Moyle and Marchetti 2006).

86

4.6 Tables and Figures

Table 4.1 List of sampling locations, by name, and numbers sampled. Microsatellite number (usat #) refers to numbering used for locations in all microsatellite analyses. # usat # Name N Location Latitude Longitude Coastal 1 Washington 2 Columbia River, Clatsop/Pacific Counties 46.2655 -123.5508 2 Oregon Coast 8 Woahink Lake, Lane County 43.9190 -124.0990 3 1 Winchuck 10 Winchuck Estuary, Curry County 42.0450 -124.2710 4 2 Smith 20 Smith River, Del Norte County 41.8530 -124.1220 5 3 Wilson 7 Wilson Creek, Humboldt County 41.6040 -124.1000 6 Klamath 28 Klamath River, Humboldt County 41.4980 -124.0020 7 4 Redwood 40 Redwood Creek, Humboldt County 41.2980 -124.0410 8 5 Big Lagoon 27 Big Lagoon, Humboldt County 41.1640 -124.1310 9 6 Mad River 20 Mad River, Humboldt County 40.8740 -123.9930 10 7 Freshwater 8 Freshwater Creek, Humboldt County 40.7560 -124.0500 11 8 Van Duzen 37 Van Duzen River, Humboldt County 40.5980 -124.1610 12 9 Cottoneva 30 Cottoneva Creek, Mendocino County 39.7527 -123.8130 13 12 EF Russian 30 East Fork Russian River, Sonoma County 39.2350 -123.1520 14 Russian 9 Russian River, Mendocino County 38.4757 -123.0480 15 10 Navarro 29 Navarro River, Mendocino County 39.2069 -123.5362 16 11 Garcia 21 Garcia River, Mendocino County 38.8557 -123.5598 17 Salmon 4 Salmon Creek, Sonoma County 38.3512 -123.0627 18 13 Estero Americano 18 Estero Americano, Sonoma/Marin Counties 38.2960 -123.0030 19 Abbott's 9 Abbott’s Lagoon, Marin County 38.1171 -122.9536 20 Rodeo 13 Rodeo Lagoon, Marin County 37.8110 -122.5170 21 San Gregoris 1 San Gregoris Creek, San Francisco County 37.3259 -122.3872 22 San Luis Obispo 2 San Luis Obispo Creek, San Luis Obispo County 35.1945 -120.6973 23 Wood Canyon 5 Mouth of Wood Canyon, Santa Barbara County 34.4575 -120.4148 24 Phelps 6 Phelps Creek, Santa Barbara County 34.4227 -119.8797 25 Ventura 5 Ventura River, Ventura County 34.2853 -119.3088 26 14 Santa Clara 34 Santa Clara River, Ventura County 34.2350 -119.2640 Clear Lake Region 27 15 Clear Lake 26 Clear Lake, Lake County 38.9950 -122.7050 28 Kelsey 12 Kelsey Creek, Lake County 38.9260 -122.8450 29 16 Bear 21 Bear Creek, Lake County 39.0120 -122.5610 Intermediate 30 19 Napa 32 Napa River, Napa County 38.2680 -122.2840 31 20 Suisun Bay 26 Suisun Bay, Solano County 38.0700 -122.0700 Inland 32 17 Sacramento 40 Sacramento River, Red Bluff Diversion Dam, Tehama Co. 40.1530 -122.2030 33 18 Sacramento 40 Sacramento River mile 205.5,Colusa County 39.7910 -122.0350 34 Sacramento 3 Sacramento River, Knight’s Landing, Yolo County 38.8029 -121.7145 35 21 Feather 39 Feather River, Butte County 39.3630 -121.6010 36 22 American 3 American River, Sacramento County 38.6343 -121.2243 37 23 Mokelumne 30 Mokelumne River, San Joaquin County 38.2249 -121.0332 38 24 Calaveras 29 Calaveras River, Calaveras County 38.0494 -121.0142 39 25 Stanislaus 30 Stanislaus River, Stanislaus County 37.8067 -120.7199 40 26 Merced 30 Merced River, Merced County 37.4707 -120.5006 41 27 San Joaquin 30 San Joaquin River, Fresno County 36.9333 -119.7500 42 28 Kings 36 Kings River, Kern County 36.7833 -119.4167 43 Mojave 1 Deep Creek, Mojave River, San Bernardino County 34.3362 -117.2806

87

Table 4.2 Overall table of individuals used for either mitochondrial (cytb) or microsatellites analyses, showing general measures of diversity for each marker type. Number of individuals (N), substitutions

(S), nucleotide diversity (π), number of haplotypes (H), haplotype diversity (Hd), allele number corrected for rarefaction (AR), expected heterozygosity (HE), observed heterozygosity (HO), and inbreeding coefficient (FIS). * outside HWE (p < 0.01) Mitochondrial Sequence Microsatellites # Locations N S H Hd K π N AR pAR HE HO FIS Coastal 1 Washington 2 4 2 1.000 4.000 0.0042 ------2 Oregon Coast 8 9 5 0.786 2.429 0.0025 ------3 Winchuck 8 6 5 0.857 3.250 0.0034 11 3.4 0.00 0.447 0.420 0.064 4 Smith 10 8 6 0.889 2.820 0.0030 32 3.5 0.13 0.442 0.360 0.188 5 Wilson ------7 3.1 0.18 0.373 0.300 0.217 6 Klamath 8 10 7 0.964 4.321 0.0032 ------7 Redwood ------27 3.2 0.00 0.417 0.349 0.153 8 Big Lagoon 8 7 5 0.857 3.036 0.0032 30 3.3 0.02 0.429 0.400 0.069 9 Mad ------32 3.3 0.02 0.436 0.374 0.144 10 Freshwater 8 7 4 0.750 2.321 0.0024 14 3.1 0.09 0.425 0.288 0.325* 11 Van Duzen 8 8 7 0.964 3.536 0.0037 31 3.4 0.07 0.432 0.407 0.060 12 Cottoneva ------24 3.1 0.00 0.456 0.458 -0.006 13 EF Russian ------14 Russian 16 8 4 0.642 3.025 0.0032 13 3.9 0.04 0.462 0.439 0.037 15 Navarro 7 7 4 0.714 2.762 0.0029 30 3.3 0.03 0.467 0.449 0.112 16 Garcia ------21 3.5 0.08 0.479 0.426 0.053 17 Salmon 4 0 1 0.000 0.000 0.0000 ------18 Estero Americano 8 4 2 0.429 1.714 0.0018 16 3.2 0.01 0.469 0.539 -0.156 19 Abbott's 8 3 3 0.464 0.750 0.0018 ------20 Rodeo 4 1 2 0.500 0.500 0.0005 ------21 San Gregoris 1 0 1 0.000 0.000 0.0000 ------22 San Luis Obispo 3 4 3 1.000 2.667 0.0028 ------23 Wood Canyon 5 0 1 0.000 0.000 0.0000 ------24 Phelps 6 1 2 0.333 0.333 0.0004 ------25 Ventura 5 1 2 0.600 0.600 0.0006 ------26 Santa Clara 10 3 3 0.644 0.933 0.0010 23 3.0 0.05 0.353 0.315 0.109 Clear Lake Region 27 Clear Lake 10 5 4 0.644 1.467 0.0015 22 2.7 0.15 0.371 0.390 -0.055 28 Kelsey 12 5 3 0.621 1.156 0.0016 ------29 Bear 8 7 7 0.964 2.286 0.0024 5 1.6 0.00 0.194 0.164 0.172 Intermediate 30 Napa 3 2 3 1.000 1.333 0.0014 27 3.5 0.23 0.564 0.456 0.230 31 Suisun Bay 8 1 2 0.250 0.250 0.0003 22 3.1 0.08 0.473 0.380 0.197 Inland 32 Sac_RBDD 8 4 4 0.643 1.000 0.0010 16 2.9 0.01 0.413 0.332 0.204 33 Sac_RM205.5 11 4 4 0.071 1.091 0.0011 27 2.9 0.00 0.416 0.409 0.017 34 Sac_KL 3 0 1 0.00 0.000 0.0000 ------35 Feather 10 3 5 0.756 1.178 0.0012 27 2.5 0.09 0.370 0.292 0.180 36 American ------12 2.4 0.00 0.424 0.375 0.120 37 Mokelumne 8 2 3 0.607 0.786 0.0008 28 3.4 0.06 0.453 0.398 0.123 38 Calaveras ------30 2.0 0.00 0.242 0.230 0.049 39 Stanislaus ------22 2.3 0.01 0.317 0.286 0.101 40 Merced 3 5 3 1.000 3.333 0.0035 29 2.7 0.09 0.363 0.324 0.110 41 San Joaquin 8 1 2 0.571 0.571 0.0006 25 2.4 0.15 0.405 0.394 0.035 42 Kings 10 1 2 0.356 0.356 0.0004 54 3.9 0.14 0.345 0.282 0.185 43 Mojave 1 0 1 0.000 0.000 0.0000 ------

Total 238 48 53 0.929 6.114 0.0064 654 3.02 0.06 0.408 0.366 0.107

88

Table 4.3A Population diversity measures for unique haplotypes of cytb sequence data. Intermediate locations (Napa and Suisun Bay) included with Inland group.

Group H S Hd K π Coastal_A 10 7 1.000 2.33 0.00243 Coastal_B 19 20 1.000 3.57 0.00371 Inland 12 10 1.000 2.55 0.00265 Clear_Lake 3 4 1.000 2.67 0.00277 Kelsey_Bear 8 10 1.000 3.25 0.00338 Total 52 50 1.000 9.08 0.00945

Table 4.3B Measure of DNA divergence between groups using the average number nucleotide substitutions per site for 962 base pairs of mitochondrial cytochrome b in DNASP. Intermediate locations (Napa and Suisun Bay) are included with inland grouping.

Group Coastal_A Coastal_B Inland Clear_Lake Kelsey_Bear Coastal_A - 0.00769 0.0092 0.01403 0.01689 Coastal_B - 0.00458 0.01528 0.01769 Inland - 0.01648 0.01949 Clear_Lake - 0.00394 Kelsey_Bear -

89

Table 4.4A Average pairwise FST estimates between Coastal, Clear Lake, Intermediate, and Inland groups.

Group Coastal Clear Lake Intermediate Inland Coastal 0.066 0.459 0.186 0.219 Clear Lake 0.054 0.479 0.501 Intermediate 0.013 0.116 Inland 0.143

Table 4.4B Summary statistics from 11 microsatellites are shown, separated by region (rarefaction adjusted values according to lowest sample size, Clear Lake - 27). N = number of individuals, A = number of alleles, AR = adjustment for rarefaction, pAR = proportion of private alleles, HE = expected heterozygosity, HO = observed heterozygosity, FIS = inbreeding coefficient. * indicates values out of HWE (p < 0.01).

Group N A AR pAR HE HO FIS Coastal 308 8.9 5.31 0.82 0.470 0.400 0.149* Clear Lake 27 3.8 3.82 1.25 0.336 0.335 0.002 Intermediate 49 5.5 4.85 0.67 0.528 0.414 0.221* Inland 265 7.5 4.75 0.61 0.430 0.331 0.230* Ave 162 6.4 4.68 0.84 0.441 0.370 0.163

90

Table 4.5 Results from two separate analyses concerning hybridization. F1 = First generation cross, F2 =

F1 x F1, BCC = backcross of F1 x coastal parent, and BCI = backcross of F1 x inland parent. Acc = Accuracy (power of markers to recognize a particular cross via simulation).

NewHybrids Location N Coastal Inland F1 F2 BCC BCI Acc Winchuck 11 10 - - 1 - - - Smith 32 31 - - 1 - - - Wilson 7 7 ------Redwood 28 26 - - 2 - - - Big_Lagoon 30 28 - - 2 - - - Mad 33 31 - - 2 - - - Freshwater 16 14 - - 2 - - - Van_Duzen 31 29 - - 2 - - - Cottoneva 25 24 - - 1 - - - Navarro 30 30 ------Garcia 21 21 ------Russian 13 13 ------Estero_Amer 17 17 ------Napa 30 6 8 - 16 - - - Suisun_Bay 25 8 3 - 14 - - - Sac_RBDD 18 - 16 - 2 - - - Sac_RM205 27 - 22 - 5 - - - Feather 30 - 28 - 2 - - - American 13 - 13 - - - - - Mokelumne 29 - 21 - 8 - - - Calaveras 30 - 30 - - - - - Stanislaus 22 - 22 - - - - - Merced 30 - 30 - - - - - San_Joaquin 29 - 28 - 1 - - - Kings 54 1 53 - - - - - Total 631 296 275 - 61 - - - Hybridlab F1 50 - 2 33 6 1 8 66% F2 50 - 1 6 31 4 9 62% BCC 50 5 - 9 17 19 0 38% BCI 50 - 10 1 9 0 30 60%

91

Figure 4.1 Map on left shows all sampling locations (Washington and Oregon locations not shown). Map on right indicates distribution and proportion of unique mtDNA haplotypes over a select sub-sampling of locations.

92

Figure 4.2 Consensus phylogenetic tree of Bayesian and maximum likelihood statistical approaches showing regional groupings of locations. Branch support is given as a posterior probability (over 7500 trees) and bootstrap values (from 1000 bootstraps), respectively.

93

Figure 4.3 Haplotype network showing regional (shaded) diversification amongst haplotypes. Numbers refer to specific haplotypes (see Table SA) whereas circle size represents the number of individuals with that particular haplotype.

94

Figure 4.4 Coalescent trees with only a 0.9% mutation rate (per million years) (A) or with the same mutation rate and a dated node (Clear Lake divergence Z = 480,000 years ago) (B). Divergence times and ranges in millions of years (m.y.a.) or thousands of years (t.y.a.) for nodes related to regional groupings (given as capital letters). Branch supports are posterior probabilities from 3750 trees. The group labeled as Coastal C is three individuals, one from the Russian and Klamath Rivers, and one from the Oregon Coast.

95

Figure 4.5 Plot of Bayesian skyline showing changes in effective population size over time. Both axes are log transformed.

96

Figure 4.6 Initial STRUCTURE analysis showing three primary clusters: coastal, inland, and Clear Lake. Each primary cluster was run separately to reveal substantial sub-structuring, though Clear Lake showed no additional clustering. K refers to number of clusters according to Evanno et al. (2005). Unique clustering was witnessed between Sacramento and San Joaquin River regions, but only the San Joaquin had additional sub- structuring.

97

Figure 4.7 Isolation by distance (IBD) measures across all samples and at each regional scale. Overall, Coastal, Inland, Sacramento, and San Joaquin groups are within group IBD whereas Clear Lake is a comparison between the two Clear Lake locations and all other locations in this study (except point with gray outline, which is the pairwise comparison).

98

4.7 Supplemental Material

Supplementary Table 4A The distribution of unique haplotypes across Coastal clades A and B (denoted by vertical lines), with specific haplotype numbers along the top. *One individual had an inland haplotype (21). A B SC N 1 7 11 12 13 30 38 39 40 55 2 8 9 10 14 15 16 17 18 26 27 29 31 32 36 37 52 53 54 51 Coastal Washington 2 1 1 Oregon 8 1 4 1 1 1 Winchuck 8 1 2 2 1 1 1 Smith 10 2 3 2 1 1 1 Klamath 8 1 1 1 2 1 1 1 Big Lagoon 8 1 1 2 3 1 Freshwater 8 1 4 2 1 Van Duzen 8 1 1 2 1 1 1 1 Russian 8 6 1 1 Navarro 7 4 1 1 1 Salmon 4 4 EsteroAmericano 8 6 2 Abbott's 8 6 1 1 Rodeo 4 3 1 San Gregoris 1 1 San Luis Obispo 2 1 1 Wood Canyon 5 5 Phelps 6 5 1 Ventura 5 3 2 Santa Clara* 10 5 4 Total 120 32 2 3 1 1 1 1 1 1 1 36 1 9 2 3 2 1 2 1 5 1 1 1 1 2 1 1 1 1 11

99

Supplemental Table 4B A distribution of unique haplotypes across Clear Lake, Intermediate, and Inland groups (denoted by vertical lines), with specific haplotype numbers along the top. *One individual had a coastal haplotype (17). N 3 4 5 6 28 46 47 48 19 20 21 25 49 50 22 23 24 33 34 35 41 42 43 44 45 Clear Lake Clear Lake 8 6 1 1 Kelsey 12 5 6 1 Bear 8 2 1 1 1 1 1 1 Total 28 6 1 1 7 7 1 1 1 1 1 1 Intermediate Napa* 3 1 1 Suisun Bay 8 7 1 Total 11 8 2 Inland Sac-RBDD 8 1 5 1 1 Sac-RM205 11 4 5 1 1 Sac-KnLand 3 3 Feather 10 5 1 2 1 1 Mokelumne 8 5 1 2 Stanislaus 8 7 1 Merced 3 1 1 1 San Joaquin 8 4 4 Kings 10 8 2 Mojave 1 1 Total 70 13 24 1 2 2 2 1 1 12 1 16 1 1 1

100

Supplementary Table 4C Isolation by distance measures with FST in upper portion and geographic distance (in km) in lower. Bold indicates values non-significant values. FST/D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 - 0.015 0.056 0.012 0.007 0.004 0.007 -0.001 0.067 0.043 0.052 0.044 0.092 0.268 2 28 - 0.011 0.005 0.005 0.012 -0.008 0.022 0.064 0.067 0.074 0.066 0.092 0.286 3 52 67 - -0.003 0.009 0.036 -0.003 0.036 0.058 0.072 0.084 0.042 0.097 0.284 4 94 109 42 - 0.010 0.014 0.003 0.013 0.074 0.067 0.082 0.059 0.109 0.301 5 103 118 51 21 - 0.005 0.008 0.010 0.046 0.048 0.059 0.038 0.084 0.261 6 153 168 101 71 50 - 0.011 0.003 0.048 0.045 0.053 0.038 0.076 0.257 7 170 191 124 94 73 66 - 0.017 0.069 0.058 0.066 0.058 0.076 0.280 8 195 211 144 114 93 86 66 - 0.058 0.046 0.061 0.047 0.078 0.249 9 290 305 239 209 188 181 160 150 - 0.004 -0.001 -0.002 0.017 0.201 10 395 410 343 313 292 285 265 254 111 - -0.003 -0.002 0.016 0.189 11 420 435 369 339 318 311 290 280 136 89 - -0.003 0.003 0.168 12 485 500 433 403 382 376 355 345 201 153 116 - 0.013 0.196 13 512 528 461 431 410 403 383 342 229 181 143 51 - 0.182 14 1182 1197 1130 1100 1079 1072 1052 1041 898 850 812 720 687 - 15 629 644 578 548 527 520 499 489 346 298 260 144 195 864 16 735 750 684 654 633 626 605 595 452 404 366 250 301 970 17 652 667 600 571 550 543 523 512 369 321 283 191 149 672 18 660 675 608 579 558 551 531 520 377 329 291 199 157 680 19 929 944 877 848 827 820 800 789 646 598 560 468 426 949 20 888 903 836 807 786 779 759 748 605 557 519 427 385 908 21 807 822 755 726 705 698 678 667 524 476 438 346 304 827 22 794 809 742 713 692 685 665 654 511 463 425 333 291 814 23 768 783 716 687 666 659 639 628 485 437 399 307 265 788 24 768 783 716 687 666 659 639 628 485 437 399 307 265 788 25 826 841 774 745 724 717 697 686 543 495 457 365 323 846 26 852 867 800 771 750 743 723 712 569 521 483 391 349 872 27 983 998 931 902 881 874 854 843 700 652 614 522 480 1003 28 1033 1048 981 952 931 924 904 893 750 702 664 572 530 1071

101

Supplementary Table 4C (continued) Isolation by distance measures with FST in upper portion and geographic distance (in km) in lower. Bold indicates values non-significant values. FST/ D 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 0.391 0.484 0.083 0.120 0.171 0.180 0.212 0.221 0.119 0.336 0.317 0.251 0.231 0.323 2 0.407 0.480 0.091 0.114 0.164 0.172 0.193 0.217 0.128 0.294 0.295 0.229 0.216 0.311 3 0.486 0.608 0.075 0.060 0.159 0.175 0.197 0.202 0.105 0.335 0.335 0.190 0.191 0.280 4 0.425 0.505 0.106 0.129 0.170 0.174 0.199 0.225 0.123 0.298 0.302 0.234 0.231 0.314 5 0.411 0.490 0.081 0.098 0.157 0.157 0.183 0.207 0.110 0.287 0.291 0.212 0.223 0.297 6 0.404 0.482 0.077 0.099 0.170 0.171 0.198 0.226 0.117 0.286 0.291 0.235 0.221 0.313 7 0.435 0.530 0.048 0.068 0.144 0.167 0.166 0.215 0.108 0.286 0.281 0.207 0.154 0.309 8 0.404 0.481 0.093 0.103 0.176 0.178 0.208 0.218 0.115 0.293 0.293 0.226 0.220 0.305 9 0.400 0.482 0.083 0.069 0.192 0.179 0.227 0.220 0.141 0.336 0.325 0.216 0.243 0.293 10 0.383 0.458 0.084 0.067 0.181 0.172 0.226 0.215 0.140 0.326 0.312 0.222 0.240 0.285 11 0.380 0.455 0.089 0.077 0.204 0.193 0.248 0.240 0.155 0.356 0.339 0.239 0.250 0.309 12 0.412 0.465 0.058 0.068 0.188 0.185 0.225 0.208 0.139 0.367 0.353 0.246 0.269 0.308 13 0.395 0.485 0.087 0.077 0.215 0.203 0.246 0.246 0.164 0.376 0.351 0.231 0.234 0.304 14 0.431 0.540 0.160 0.167 0.326 0.285 0.357 0.335 0.255 0.416 0.440 0.320 0.322 0.364 15 - 0.054 0.353 0.387 0.394 0.374 0.437 0.426 0.387 0.598 0.553 0.482 0.478 0.475 16 63 - 0.439 0.466 0.465 0.439 0.546 0.510 0.459 0.699 0.635 0.561 0.583 0.541 17 335 398 - -0.020 0.047 0.077 0.018 0.124 0.028 0.145 0.177 0.116 0.104 0.204 18 343 406 8 - 0.071 0.079 0.128 0.157 0.056 0.217 0.205 0.115 0.083 0.144 19 612 675 277 269 - 0.013 0.044 0.110 0.035 0.206 0.203 0.133 0.120 0.166 20 571 634 236 228 46 - 0.069 0.106 0.044 0.219 0.220 0.108 0.115 0.147 21 490 553 155 147 211 173 - 0.168 0.038 0.147 0.173 0.132 0.067 0.204 22 477 540 142 134 239 213 87 - 0.105 0.316 0.268 0.204 0.208 0.193 23 451 514 116 108 318 282 160 147 - 0.127 0.127 0.094 0.055 0.152 24 451 514 116 108 338 276 176 150 132 - 0.151 0.193 0.140 0.301 25 509 572 174 166 356 306 205 183 163 129 - 0.177 0.122 0.228 26 535 598 200 192 383 355 235 227 179 152 142 - 0.112 0.135 27 666 729 331 323 479 455 335 306 267 237 235 162 - 0.144 28 716 779 381 373 556 497 411 387 357 352 303 247 164 -

102

Supplementary Table 5 Pairwise comparisons between mitochondrial haplotype (A or B) and microsatellite cluster (A or B) within individuals of a particular coastal location. North and South refer to locations above or below Cape Mendocino (indicated by a dotted line).

Location usat/mtDNA Clade A Clade B Overall Accuracy Winchuck Cluster A 1 1 63% Cluster B 2 4 Smith Cluster A 2 2 82% Cluster B 0 7 Big Lagoon Cluster A 0 1 57% Cluster B 2 4 Freshwater Cluster A 0 1 75% Cluster B 1 6 Van Duzen Cluster A 0 1 63% Cluster B 2 5 Navarro Cluster A 5 0 88% Cluster B 1 2 Russian Cluster A 4 0 57% Cluster B 3 0 Estero Amer. Cluster A 5 2 71% Cluster B 0 0 North Cluster A 3 6 69% Cluster B 7 26 South Cluster A 14 2 73% Cluster B 4 2 Total Cluster A 17 8 70% Cluster B 11 28

Chapter 5: Synthesis and Future Directions

A genetic exploration of freshwater sculpin in California has long been needed to address limitations in morphological identifications between species (Moyle 2002). Though not of commercial or sportfishing value, sculpin are important windows into California’s hydrologic past, linking vicariance events and connectivity across the state. Too often anthropogenic influences have modified the genetic link between historical and contemporary distributions in species (Zellmer and Knowles 2009). But the relative anonymity and plasticity of sculpin made them ideal species for questions of this kind. This recurrent theme is prevalent throughout this dissertation, revealing geographic breaks and crypsis which were previously only speculative. Additionally important was the widespread distribution of each target species. Such sweeping distributions, though a sampling nightmare, were ideal for confronting geographic and ecological differences over a highly variable landscape. The contribution of results from this dissertation could be divided into three themes: 1) Direct information on Cottus species along the west coast of North America, 2) Indirect information on endemic species co-occurring alongside sculpin, through forces influencing each’s distribution, and 3) Exploration of theoretical analyses and statistical approaches necessary for the information in 1) and 2). Such a three-pronged approach allowed for the addressing of fundamental questions related to speciation, divergence, inland radiation, hybridization, gene flow, and factors which correlate with these concepts from both a theoretical and a practical standpoint. From this comprehensive approach, a number of prior studies were addressed, providing answers to questions previously unattainable. At the same time, countless new questions were discovered, making future research on sculpin a worthy endeavor. As a conservation biologist, it was important to generate data which could be immediately used by biologists and managers alike. Too often important theoretical work is discovered, but provides little in the way of “on the ground” understanding of a species. Here I was able to locate new species or populations, describe how they came about, and suggest some manner of each species true distribution throughout the state. I was also pleasantly surprised by the number and relative ease in which most samples could be attained. With the influx of so many invasive species in California (Marchetti et al. 2004), it was reassuring to discover endemic species doing so well. Perhaps the unique niche filled by sculpin has not been in direct competition with an invasive species, allowing them to persist. However, my concern is for the novel species of “coastal” C. gulosus in the San Jose area, as many of these streams are very small, not managed for fisheries, and sit in highly urban areas. With hope, the introduction of their potential species status will facilitate greater protection of these isolated systems. Additionally, though relatively healthy at this point (at least from a sampling standpoint), individual populations within the southern Sierra Nevada are liable to be at future risk, with their isolation above dams and threat of reduced snowpack (Lettenmaier and Gan 1990). Increased monitoring and management should allow for long term survival of these isolated populations. Phylogeographic breaks identified in Cottus may have similar patterns with other endemic and sympatric fish species. Already studies are similar to breaks seen in roach (Aguilar and Jones 2009), hitch, and tuleperch (Moyle 2002, Moyle et al. 2011). Additional work is needed to explore differences in Sacramento sucker (Catostomus occidentalis), speckled dace (Rhinichthys osculus), threespine stickleback (Gasterosteus aculeatus), and other endemic species (Moyle and Nichols 1973, Baltz et al. 1982, Baumgartner and Bell 1984, Moyle 2002). Correlating these breaks with identified zoogeographic provinces (Moyle 2002) will continue to strengthen the understanding of species and their distribution throughout California. From a theoretical standpoint, this dissertation explored a number of contemporary analytical approaches which were facilitated by the use of multiple marker types. Next 103

104

generation sequencing provided multiple nuclear loci as well as novel microsatellites in Cottus (Baumsteiger and Aguilar 2012). Rather than settle for single gene phylogenetic approaches, multi-locus species trees were explored from both Bayesian and maximum likelihood statistical methodologies. Species were delimited using novel software and differences between marker types used to estimate evolutionary time frames. Methods involving the coalescent and isolation with migration were explored over multiple nuclear loci and through a single mitochondrial locus, requiring understanding of marker specific mutation and migration principles for bounding priors. Global information systems (GIS) were used to identify localities, map distributions, and identify distances between locations. Finally, simulated datasets were generated to test the veracity of empirical datasets, as well as predict patterns in analytical approaches. The breadth of techniques and methodologies employed here greatly enhanced the scope and direction of the dissertation as well as the defensibility of its conclusions. Completing a phylogenetic re-appraisal of the Cottopsis clade (Chapter 2) was an essential component of this dissertation. Although the work by Kinziger et al. (2005) was instrumental in identifying the monophyletic clade Cottopsis, sampling and species representation were limited or absent in many cases, making true evolutionary lineages less certain. By expanding sampling of target species (Cottus asper, C. gulosus, C. pitensis), phylogenetic resolution was greatly improved, identifying a number of putative species or lineages within these three target taxa. Additionally, the incorporation of multiple nuclear and mitochondrial markers enhanced accuracy through the use of species trees, where patterns of multiple genes were used to infer evolutionary relationships. This approach represents a major improvement over single gene approaches, where the conveyed phylogenetic relationship is that of the gene and not the organism (Edwards 2009). Current software associated with species trees often use different statistical approaches, necessitating multiple programs and runs to infer if any differences in topology exist. What little discordance was witnessed could largely be attributed to potential introgression between species, a factor shown to occasionally modify tree topology (Edwards et al. 2007, Knowles 2009). While relationships within the species tree are statistically sound, future expanded sampling of C. marginatus, C. perplexus, C. aleuticus, or any Klamath Basin species throughout its range may show additional structure similar to that discovered in the three target species. This would complete the overall phylogeny of the Cottopsis clade and provide the best information for taxonomists to entirely describe freshwater sculpin along the west coast of North America. Perhaps one of the more novel recognitions from the Cottopsis species tree was individual C. gulosus sampled along the Oregon and Washington coast are not even sister taxa to the holotype C. gulosus species. Unfortunately, availability of this potentially novel species was limited to a few samples, restricting further exploration. Attempts to collect additional samples through D.Markel (Oregon State University) were unsuccessful, prompting concerns over its current distribution or potential extinction. If new samples could be obtained for this lineage, a rigorous taxonomic investigation between holotype C. gulosus and this lineage may reveal differences not apparent previously. If no differences are evident, this species may represent an interesting example of convergent evolution, where two species evolve morphological similarities through different genetic lineages (e.g. sharks and tuna - Donley et al. 2004). Exploration of the relationship between riffle (C. gulosus) and Pit (C. pitensis) sculpin proved to be more complicated than originally anticipated (Chapter 3). Recognition of a novel species in the San Jose area was unanticipated, though logical once the rise of the Coastal Mountain Range is considered. More confounding was the true phylogeographic break between riffle and Pit sculpin species and the potential hybridization in the Sacramento River. It is

105

reasonable to surmise species are restricted to either the Sacramento River basin and its unique geologic and ecological history or drainages emanating from the Sierra Nevada (Durrell 1987). This has been seen in a number of species, most notably in plants (Hickman 1993). What is less clear is the origin and directionality of species in the Sacramento River. Though most of the data point to hybridization in this region, ILS cannot be ruled out, especially with the limited migration uncovered. Complicating this fact is the region now appears to be a unique population, not sharing alleles with either C. pitensis from the Pit River or C. gulosus from the Feather, American, or San Joaquin Rivers. To really understand sculpin in the Sacramento River, a more comprehensive genomic approach is needed, with improved sampling in the southern most portions of the system (around the Sacramento/San Joaquin Delta) and around the mouth of the Pit River. By identifying large regions of the genome, not just a few nuclear loci, more thorough analyses can be initiated to confirm whether individuals are actually representatives of historical hybridization. Additionally, laboratory studies could be performed, artificially mating C. gulosus, C. pitensis, and any potential hybrids to look for signs of post-zygotic barriers. Or natural conditions could be mirrored in a test stream, looking for pre-zygotic barriers to mating since mtDNA introgression appears in only one direction. Combinations of these future ideas would greatly improve knowledge of Sacramento River sculpin, plus spark interest from evolutionary biologists wishing to explore a potential case of incipient hybrid speciation (Mallet 2007). The extensive amount of population structure seen in the San Joaquin River basin could largely be attributed to hyper-variable conditions during the Pleistocene (Howard 1979). The presence of a marine inlet, followed by a large freshwater lake and constant glaciation during the same time period is ample explanation for the unique population structure in this region (Dupré 1990). Much of this structure is now reinforced by dams, permanently isolating populations to a particular system. This region is also highly susceptible to climate change (Brekke et al. 2004). Therefore a future comparative study of other endemic aquatic species in the region would be informative as well, ascertaining whether conditions led to similar structure in other taxa. For example, results from Chapter 4 on C. asper showed analogous structure to C. gulosus in this region, signifying comparative studies may find consistent outcomes. For future directions, perhaps the best candidate is prickly sculpin (C. asper). Confirmation of Clear Lake C. asper as a new species seems certain, as does the presence of a coastal and inland Central Valley form (Chapter 4). But the presence of two ecotypes within most coastal streams is still very much a mystery. Although two different mtDNA clades and microsatellite clusters were found in these same locations, direct correlative links between ecotype and genetic data were not available. So while it seems plausible a correlation exists, direct sampling must be done to confirm the two life history types, similar to the initial morphological work by Krejsa (1965). But the opportunity to validate two genetically different ecotypes living sympatrically in not just a single system but multiple systems is appealing. Questions about how ecotypes initially diverged and factors which led to that divergence are important to evolutionary theory (Via 2009). Ascertaining whether ecotypes are more closely related between or within streams strikes at ideas on convergent and parallel evolution (Elmer and Meyer 2011). Are distributions of ecotypes within a system separate or could they lead to potential hybrids in areas where species are found to overlap? Or are ecotypes truly sympatric, relying of life history characters to main genetic distinctiveness? Clearly follow-up studies are needed in this system. Of equal potential is the presence of hybrids in the so called “intermediate” area between coastal and inland forms of C. asper. Sampling in this study was limited in this region, relying

106

essentially on two locations for inferences. Improved sampling in and around the whole intermediate area has the capacity to substantiate claims of improved fitness for hybrids. An in depth study of habitat types for coastal, inland, and intermediate forms are also essential to claims of intermediacy. As with sculpin in the Sacramento River, genetic studies of this region would greatly benefit from a genomics approach. With thorough sampling and genome coverage, improved analytical power can be used to confirm the presence of an isolated hybrid population, limited to this habitat. The Sacramento/San Joaquin Delta is a highly managed and anthropogenetically impacted region (Lund 2010). Attributing conditions to human management strategies may confirm whether hybrids are responding to a natural system or simply changes brought on by human needs. Lastly, improved measures of gene flow can be executed, providing a better window into movement into and out of this unique system.

107

References Abdelkrim J, Robertson BC, Stanton JAL, Gemmell NJ (2009) Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing. BioTechniques, 46, 185-192. Abell R, Thieme ML, Revenga C, Bryer M, Kottelat M, et al. (2008) Freshwater ecoregions of the World: a new map of biogeographic units for freshwater biodiversity conservation. BioScience, 58, 403-414. Adams SB, Schmetterling DA (2007) Freshwater sculpins: phylogenetics to ecology. Transactions of the American Fisheries Society 136, 1736–1741. Aguilar A, Jones WJ (2009) Nuclear and mitochondrial diversification in two native California minnows: insights into taxonomic identity and regional phylogeography. Molecular Phylogenetics and Evolution, 51(2), 373-381. Aliabadian M, Kaboli M, Nijman V, Vences M (2009) Molecular identification of birds: performance of distance-based DNA barcoding in three genes to delimit parapatric species. PLoS One, 4(1), e4119. Allendorf FW, Leary RF, Hitt NP, Knudsen KL, Lundquist LL, Spruell P (2004) Intercrosses and the U.S. Endangered Species Act: Should hybridization populations be included as Westslope cutthroat trout? Conservation Biology, 18, 1203-1213. Allendorf FW, Leary RF, Spruell P, Wenburg JK (2001) The problems with hybrids: setting conservation guidelines. Trends in Ecology and Evolution, 16(11), 613-622. Anderson E (1948) Hybridization of the habitat. Evolution, 2, 1-9. Ane C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution 24, 412-426. Anger K (2013) Neotropical Macrobrachium (Caridea: Palaemonidae): on the biology, origin, and radiation of freshwater-invading shrimp. Journal of Crustacean Biology, 33(2), 151- 183. Arnold ML (1992) Natural hybridization as an evolutionary process. Annual review of Ecology and Systematics, 23, 237-261. Arnold ML, Bennett BD, Zimmer EA (1990) Natural hybridization between Iris fulva and Iris hexagona: pattern of ribosomal DNA variation. Evolution, 44, 1512-1521. Arnold ML, Hodges SA (1995) Are natural hybrids fit or unfit relative to their parents? Trends in Ecology and Evolution 10(2), 67-71. Avise JC (2009) Phylogeography: retrospect and prospect. Journal of Biogeography, 36, 3-15. Avise JC (2000) Phylogeography: the history and formation of species. Harvard University Press. Avise JC, Arnold J, Ball RM, Bermingham, E, Lamb T, Neigel JE, Reeb CA, Saunders NC (1987) Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematic. Annual Review of Ecological Systems, 18, 489-522. Avise JC, Walker D, Johns GC (1998) Speciation durations and Pleistocene effects on vertebrate phylogeography. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265, 1707-1712. Avise JC, Vrijenhoek RJ (1987) Mode of inheritance and variation of mitochondrial DNA in hybridogenetic fishes of the genus Poeciliopsis. Molecular Biology and Evolution, 4, 514-525. Bailey RM, Bond CE (1963) Four new species of freshwater sculpins, genus Cottus, from western North America. Occasional Papers of the Museum of Zoology University of Michigan, 634, 1-27. Baldwin EM (1981) Geology of Oregon. Kendall Hunt, Dubuque, Iowa.

108

Baltz DM, Loudenslager EJ (1984) Electrophoretic variation among subspecies of tule perch (Hysterocarpus traski). Copeia, 1984(1), 223-227. Baltz DM, Moyle PB, Knight NJ (1982) Competitive interactions between benthic stream fishes, riffle sculpin, Cottus gulosus, and speckled dace, Rhinichthys osculus. Canadian Journal of Fisheries and Aquatic Sciences, 39(11), 1502-1511. Barlow GW, (1961) Intra- and interspecific differences in rate of oxygen consumption in gobid fishes of the genus Gillichthys. Biological Bulletin, 121, 209-221. Barton NH (2013). Does hybridization influence speciation? Journal of Evolutionary Biology, 26(2), 267-269. Baumgartner JV, Bell MA (1984) Lateral plate morph variation in California populations of the threespine stickleback, Gasterosteus aculeatus. Evolution, 38, 665-674. Baumsteiger J, Aguilar A (2013) Nine original microsatellite loci in prickly sculpin (Cottus asper) and their applicability to other closely related Cottus species. Conservation Genetics Resources, 1-4. Baumsteiger J, Kinziger AP, Aguilar A (2012) Life history and biogeographic diversification of an endemic Western North American freshwater fish clade using a comparative species tree approach. Molecular Phylogenetics and Evolution, 65, 940-952. Beheregaray LB, Sunnucks P (2001) Fine‐scale genetic structure, estuarine colonization and incipient speciation in the marine silverside fish Odontesthes argentinensis. Molecular Ecology, 10, 2849-2866. Beerli P (2009) How to use migrate or why are markov chain monte carlo programs difficult to use? In G. Bertorelle, M. W. Bruford, H. C. Hau_e, A. Rizzoli, and C. Vernesi, editors, Population Genetics for Animal Conservation, volume 17 of Conservation Biology, pages 42-79. Cambridge University Press, Cambridge, UK. Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proceedings of the National Academy of Science USA, 98, 4563–4568. Beesley D (2007) Crow's Range: An Environmental History of the Sierra Nevada. University of Nevada Press. Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (1996-2004) GENETIX 4.05, logiciel sous Windows TM pour la génétique des populations. Laboratoire Génome, Populations, Interactions, CNRS UMR 5000, Université de Montpellier II, Montpellier (France). Bentivoglio AA (1998) Investigations into the endemic sculpins, (Cottus princeps, Cottus tenuis), in Oregon's Upper Klamath Lake watershed with information on sculpins of interest (C. evermanni, C. ssp., C. "pretendor"). Report to US Geological Survey. Bermingham E, McCafferty SS, Martin AP (1997) Fish biogeography and molecular clocks: perspectives from the Panamanian Isthmus. Pp. 113–128 in T. D.Kocher and C. A.Stepien, eds. Molecular systematics of fishes. Academic Press, San Diego, CA. Bickford D, Lohman DJ, Sodhi NS, Ng PK, Meier R, Winker K, ... Das I (2007) Cryptic species as a window on diversity and conservation. Trends in Ecology and Evolution, 22(3), 148- 155. Bird CE, Fernandez-Silva I, Skillings DJ, Toonen RJ (2012) Sympatric speciation in the post “modern synthesis” era of evolutionary biology. Evolutionary Biology, 39(2), 158-180. Blair C, Murphy RW (2011) Recent trends in molecular phylogenetic analysis: Where to next? Journal of Heredity, 102(1), 130-138. Bohonak AJ (2002) IBD (Isolation by Distance): a program for analyses of isolation by distance. Journal of Heredity, 93, 153-154.

109

Bolnick DI, Fitzpatrick BM (2007) Sympatric speciation: models and empirical evidence. Annual Review of Ecology and Evolution, and Systematics, 38, 459-487. Bond CE (1963) Distribution and ecology of freshwater sculpins, genus Cottus, in Oregon. Ph.D. Thesis, Univ. Mich., Ann Arbor. Brekke, LD, Miller NL, Bashford KE, Quinn NW, Dracup JA (2004) Climate change impacts uncertainty for water resources in the San Joaquin River Basin, California. JAWRA Journal of the American Water Resources Association, 40(1), 149-164. Brito PH, Edwards SV (2009) Multilocus phylogeography and phylogenetics using sequence- based markers. Genetica, 135, 439-455. Broadway JE, Moyle PB (1978) Aspects of the ecology of the prickly sculpin, Cottus asper Richardson, a persistent native species in Clear Lake, Lake County, California. Environmental Biology of Fishes, 3(4), 337-343. Brooks TM, Mittermeier RA, Mittermeier CG, Da Fonseca GA, Rylands AB, Konstant WR, ... Hilton‐Taylor C (2002) Habitat loss and extinction in the hotspots of biodiversity. Conservation Biology, 16(4), 909-923. Brown RM, Jordan WC, Faulkes CG, Jones CG, Bugoni L, Tatayah V, ... Nichols RA (2011) Phylogenetic relationships in Pterodroma petrels are obscured by recent secondary contact and hybridization. PloS One, 6(5), e20350. Brown LR, Moyle PB, Bennett WA, Quelvog BD (1992) Implications of morphological variation among populations of California roach Lavinia symmetricus (Cyprinidae) for conservation policy. Biological Conservation, 62, 1-10. Brown LR (1989) Temperature preferences and oxygen consumption of three species of sculpin (Cottus) from the Pit River drainage, California. Environmental Biology of Fishes, 26(3), 223-236. Bull CM (1991) Ecology of parapatric distributions. Annual Review of Ecology and Systematics, 22, 19-36. Burns KJ, Barhoum DN (2006) Population-level history of the wrentit (Chamaea fasciata): Implications for comparative phylogeography in the California Floristic Province. Molecular Phylogenetics and Evolution, 38(1), 117-129. Butlin R, Debelle A, Kerth C, et al. (26 co-authors) (2012). What do we need to know about speciation? Trends in Ecology and Evolution, 27, 27–39. California Department of Water Resources. www.water.ca.gov/swp/ Calsbeek R, Thompson JN, Richardson JE (2003) Patterns of molecular evolution and diversification in a biodiversity hotspot: the California Floristic Province. Molecular Ecology, 12(4), 1021-1029. Campton DE, Kaeding LR (2005) Westslope cutthroat trout, hybridization, and the U.S. Endangered Species Act. Conservation Biology, 19, 1323-1325. Casteel RW, Adam DP, Sims JD (1977) Late-Pleistocene and Holocene remains of Hysterocarpus traski (tule perch) from Clear Lake, California and inferred Holocene temperature fluctuations. Quaternary Research, 7, 133-143 Carstens BC, Knowles LL (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Systematic Biology, 56(3), 400-411. Clement M, Posada D, Crandall K (2000) TCS: a computer program to estimate gene genealogies. Molecular Ecology, 9(10), 1657-1660. Cooper JG (1868) Fishes. in: JG Cooper and TF Cronise, The Natural Wealth of California. Chap. 7. HH Bancroft and Co., San Francisco 487-498.

110

Coyne JA, Orr HA (2004) Speciation. Sunderland, MA: Sinauer Associates, 1-545. Cracraft J (1989) Speciation and its ontology: The empirical consequences of alternative species concepts for understanding patterns and processes of differentiation. In Speciation and its consequences, D. Otte and J.A. Endler (eds.). Sunderland, Mass.: Sinauer, 28-59. Crandall KA, Bininda-Emonds OR, Mace GM, Wayne RK (2000) Considering evolutionary processes in conservation biology. Trends in Ecology and Evolution, 15(7), 290-295. Daniels RA (1987) Comparative life histories and microhabitat use in three sympatric sculpins (Cottidae: Cottus) in northeastern California. Environmental Biology of Fishes, 19, 93- 110. Daniels RA, Moyle PB (1984) Geographic variation and a taxonomic reappraisal of the Marbled Sculpin, Cottus klamathensis. Copeia. 1984, 949-959. Daniels RA, Moyle PB (1978) Biology, distribution and status of Cottus asperrimus in the Pit River drainage, northeastern California. Copeia, 1978, 673-679. Dawid IB, Blackler AW (1972) Maternal and cytoplasmic inheritance of mitochondrial DNA in Xenopus. Developmental biology, 29(2), 152-161. Dawson MN, Staton JL, Jacobs DK (2001) Phylogeography of the tidewater goby, Eucyclogobius newberryi (Teleostei, Gobiidae), in coastal California. Evolution, 55(6), 1167-1179. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology and Evolution, 24(6), 332-340. deQueiroz K (2007) Species concepts and species delimitation. Systematic Biology, 56, 879-886. Dewey HC, Madsen WR (1976) Flow control and transients in the California aqueduct. Journal of the Irrigation and Drainage Division, 102(3), 335-348. Donley JM, Sepulveda CA, Konstantinidis P, Gemballa S, Shadwick RE (2004) Convergent evolution in mechanical design of lamnid sharks and tunas. Nature, 429, 61-65. Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJ (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Molecular Biology and Evolution, 20, 248-254. Dowling TE, Secor CL (1997) The role of hybridization and introgression in the diversification of . Annual review of Ecology and Systematics, 28, 593-619. Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7, 214. Dudgeon D, Arthington AH, Gessner MO, et al. (2006) Freshwater biodiversity: importance, threats, status and conservation challenges. Biological Reviews, 81, 163–182. Dupras D (1999) Plio-Pleistocene fossil trees found in ancestral Lake Britton diatomite deposits. California Geology, 52, 15–19. Dupré WR (1990) Quaternary geology of the Monterey Bay region, California. Dupré WR, Morrison RB, Clifton HE, Lajoie KR, Ponti DJ, Powell III CL, ... Yeats RS (1991) Quaternary geology of the Pacific margin. Quaternary Nonglacial Geology; Conterminous US, 2, 141-189. Durrell, C (1987) Geologic History of the Feather River County, California. University of California Press. Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 1-3. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5), 1792-1797.

111

Edmands S (2001) Phylogeography of the intertidal copepod Tigriopus californicus reveals substantially reduced population differentiation at northern latitudes. Molecular Ecology, 10(7), 1743-1750. Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution, 63(1),1-19. Edwards SV, Liu L, Pearl DK (2007) High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences, 104(14), 5936-5941. Elmer KR, Meyer A (2011) Adaptation in the age of ecological genomics: insights from parallelism and convergence. Trends in Ecology and Evolution, 26(6), 298-306. Endler JA (1977) Geographic variation, speciation and clines. Princeton (New Jersey): Princeton University Press. Englebrecht CC, Largiader CR, Hanfling B, Tautz D (1999) Isolation and characterization of polymorphic microsatellite loci in the European bullhead Cottus gobio L. (Osteichthyes) and their applicability to related taxa. Molecular Ecology, 8, 1957-1969. Etges WJ (2002) Divergence in mate choice systems: does evolution play by rules? Genetica, 116(2-3), 151-166. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14(8), 2611-2620. Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10, 564-567. Excoffier L, Laval G, Schneider S (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online, 1, 47-50. Feder JL, Egan SP, Nosil P (2012) The genomics of speciation-with-gene-flow. Trends in Genetics. 28, 342-350. Feldman CR, Spicer GS (2006) Comparative phylogeography of woodland reptiles in California: repeated patterns of cladogenesis and population expansion. Molecular Ecology, 15(8), 2201-2222. Felsenstein J (2005) PHYLIP (Phylogeny Inference Package) version 3.65. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. Figueroa AM, Knott JR (2010) Tectonic geomorphology of the southern Sierra Nevada Mountains (California): Evidence for uplift and basin formation. Geomorphology, 123(1), 34-45. Fisher RN, Shaffer HB (2002) The decline of amphibians in California’s Great Central Valley. Conservation Biology, 10(5), 1387-1397. Fisher FW (1994) Past and present status of Central Valley Chinook salmon. Conservation Biology, 8, 870-873. Foote CJ, Brown GS (1998) Ecological relationship between freshwater sculpins (genus Cottus) and beach-spawning sockeye salmon (Oncorhynchus nerka) in Illiamna Lake, Alaska. Canadian Journal of Fish and Aquatic Sciences, 55(6), 1524-1533 Frink JW, Kues HA (1954) Corcoran clay-A Pleistocene Lacustrine deposit in San Joaquin Valley, California. Bulletin of the American Association of Petroleum Geologists, 38(11), 2357-2371. Galtier N, Nabholz B, Glemin S, Hurst GDD (2009) Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Molecular Ecology, 18, 4541–4550. doi: 10.1111/j.1365-294X.2009.04380.x

112

Gavrilets S (2003) Perspective: models of speciation: what have we learned in 40 years? Evolution, 57(10), 2197-2215. Gavrilets S, Li H, Vose MD (2000) Patterns of parapatric speciation. Evolution, 54(4), 1126- 1134. Gavrilets S (2000) Waiting time to parapatric speciation. Proceedings of the Royal Society of London. Series B: Biological Sciences, 267, 2483-2492. Gillespie AR, Porter SC, Atwater BF (2004) The Quaternary Period in the United States (No. 1). Elsevier Science Limited. Girard C (1856) Researches upon the cyprinoid fishes inhabiting the fresh waters of the United States of America, west of the Mississippi Valley, from specimens in the museum of the Smithsonian Institution. Ibid, 8, 165–213. Giuffra E, Guyomard R, Forneris G (1996) Phylogenetic relationships and introgression patterns between incipient parapatric species of Italian brown trout (Salmo trutta L. complex). Molecular Ecology, 5(2), 207-220. Glaubitz JC (2004) CONVERT: A user-friendly program to reformat diploid genotypic data for commonly used population genetic software packages. Molecular Ecology Notes, 4, 309- 310. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696-704. Hall JP (2005) Montane speciation patterns in Ithomiola butterflies (Lepidoptera: Riodinidae): are they consistently moving up in the world? Proceedings of the Royal Society B: Biological Sciences, 272, 2457-2466. Hammersmark CT, Rains MC, Mount JF (2008) Quantifying the hydrological effects of stream restoration in a montane meadow, northern California, USA. River Research and Applications, 24(6), 735-753. Hasegawa M, Kishino H, Yano T (1985) Dating of human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22, 160–174. Hearn BC, McLaughlin RJ, Donnelly-Nolan JM (1988) Tectonic framework of the Clear Lake basin, California. Geological Society of America Special Papers, 214, 9-20. Hedrick PW (2006) Genetic polymorphism in heterogeneous environments: the age of genomics. Annual Review of Ecology and Evolutionary Systematics, 37, 67–93 Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution. 27(3), 570-580. Hershler R (1995) New freshwater snails of the genus Pyrgulopsis (Rissooidea: Hydrobiidae) from California. Veliger, 38(4), 343-373. Hewitt, GM (2001). Speciation, hybrid zones and phylogeography—or seeing genes in space and time. Molecular Ecology, 10(3), 537-549. Hey J, Machado CA (2003) The study of structured populations-new hope for a difficult and divided science. Genetics, 4, 535–543. Hey J, Nielsen R (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences of the United States of America, 104, 2785–2790. Hickerson MJBC, Cavender-Bares J, Crandall KA, Graham CH, Johnson JB, Rissler L, Victoriano PF, Yoder AD (2010) Phylogeography’s past, present, and future: 10 years after Avise, 2000. Molecular Phylogenetics and Evolution, 54, 291-301. Hickman JC (ed) (1993) The Jepson manual: higher plants of California. University of California Press.

113

Hoban S, Gaggiotti O, Bertorelle G (2013) Sample Planning Optimization Tool for conservation and population Genetics (SPOTG): a software for choosing the appropriate number of markers and samples. Methods in Ecology and Evolution, 4(3), 299-303. Hodges CA (1966) Geomorphic history of Clear Lake, California. Ph. D. Thesis, Stanford University. Hoelzer GA, Drewes R, Meier J, Doursat R (2008) Isolation-by-distance and outbreeding depression are sufficient to drive parapatric speciation in the absence of environmental influences. PLoS Computational Biology, 4(7), e1000126. Hopkirk JD (1973) Endemism in fishes of the Clear Lake region. University of California Publishings on Zoology. 96, 1-160. Howard AD (1979) Geologic history of middle California (Vol. 43). University of California Press. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Resources, 9, 868-877. Hubbs CL, Lagler KF (1941) Guide to the fishes of the Great Lakes and tributary waters. Cranbrook Institute of Science, Scientific Bulletin. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogeny. Bioinformatics. 17, 754-755. Hughes JM, Schmidt DJ, Finn DS (2009) Genes in streams: using DNA to understand the movement of freshwater fauna and their riverine habitat. BioScience, 59(7), 573-583. Hunt DM, Fitzgibbon J, Slobodyanyuk SJ, Bowmaker JK, Dulai KS (1997) Molecular evolution of the cottoid fish endemic to Lake Baikal deduced from nuclear DNA evidence. Molecular Phylogenetics and Evolution, 8(3), 415-422. Hutchinson DW, Templeton AR (1999) Correlation of pairwise genetic and geographic distance measures: inferring the relative influences of gene flow and drift on the distribution of genetic variability. Evolution 53, 1898–1914. Huson DH, Bryant D (2006) Application of Phylogenetic Networks in Evolutionary Studies. Molecular Biology and Evolution, 23(2), 254-267. Israel M, Lund JR (1995) Recent California water transfers: Implications for water management. Natural Resources Journal, 35, 1-32. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23(14), 1801-1806. Janzen FJ, Krenz JG, Haselkorn TS, Brodie Jr.ED, Brodie III, ED (2002) Molecular phylogeography of common garter snakes (Thamnophis sirtalis) in Western North America: implications for regional historical forces. Molecular Ecolology, 11, 1739– 1751. Jordan DS (1878) A catalogue of the fishes of the fresh waters of North America. Bulletin of the United States Geological and Geographical Survey of the Territories / Department of the Interior, 9(2), 1-442. Kalinowski ST (2005) HP-Rare: a computer program for performing rarefaction on measures of allelic diversity. Molecular Ecology Notes, 5, 187-189.

114

Kettratad J (2008) The Oregon Coastal Subprovince, a new biogeographic subprovince for primary freshwater fishes in Oregon. Ph. D. Dissertation. Oregon State University, Corvalis, OR. King JL, Simovich MA, Brusca RC (1996) Species richness, endemism and ecology of crustacean assemblages in northern California vernal pools. Hydrobiologia, 328(2), 85- 116. Kingman JFC (1982) On the Genealogy of Large Populations. Journal of Applied Probability, 19A, 27–43. Kisel Y, Barraclough TG (2010) Speciation has a spatial scale that depends on levels of gene flow. The American Naturalist, 175, 316-334. Kinziger AP, Wood RM, Neely DA (2005) Molecular systematics of the genus Cottus (: Cottidae). Copeia, 2, 303-311. Kinziger AP, Wood RM (2003) Molecular Systematics of the Polytypic Species Cottus hypselurus (Teleostei: Cottidae). Copeia, 2003(3), 624-627. Knight ME, Turner GF (2004) Laboratory mating trials indicate incipient speciation by sexual selection among populations of the cichlid fish Pseudotropheus zebra from Lake Malawi. Proceedings of the Royal Society of London, Series B: Biological Sciences, 271, 675-680. Kocher TD (2004) Adaptive evolution and explosive speciation: the cichlid fish model. Nature Reviews Genetics, 5, 288-298. Knowles LL (2009) Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. Systematic Biology, 58(5), 463-467. Knowles LL (2009) Statistical Phylogeography. Annual Review of Ecology, Evolution, and Systematics, 40, 593-612 Knowles LL, Maddison WP (2002) Statistical phylogeography. Molecular Ecology, 11, 2623– 2635. Knowles N, Cayan DR (2002) Potential effects of global warming on the Sacramento/San Joaquin watershed and the San Francisco estuary. Geophysical Research Letters, 29(18), 1891. Kondrashov AS (2003) Accumulation of Dobzhansky-Muller incompatabalities within a spatially structured population. Evolution, 57(1), 151-153. Kondrashov AS, Kondrashov FA (1999) Interactions among quantitative traits in the course of sympatric speciation. Nature, 400, 351-354. Kontula T, Kirilchik SV, Väinölä R (2003) Endemic diversification of the monophyletic cottoid fish species flock in Lake Baikal explored with mtDNA sequencing. Molecular phylogenetics and evolution, 27(1), 143-155. Kozak KH, Wiens J (2006) Does niche conservatism promote speciation? A case study in North American salamanders. Evolution, 60(12), 2604-2621. Kramer B, Van der Bank H, Flint N, Sauer-Gürth H, Wink M (2003) Evidence for parapatric speciation in the Mormyrid fish, Pollimyrus castelnaui (Boulenger, 1911), from the Okavango–Upper Zambezi River Systems: P. marianne sp. nov., defined by electric organ discharges, morphology and genetics. Environmental Biology of Fishes, 67(1), 47- 70. Krejsa RJ (1967a) The systematics of prickly sculpin, Cottus asper Richardson, a polytypic species; Part I. Synonomy, nomenclatural history, and distribution. Pacific Science, 21, 241-251.

115

Krejsa RJ (1967b) The systematics of prickly sculpin, Cottus asper Richardson, a polytypic species; Part II. Studies on the Life History, with especial reference to migration. Pacific Science, 21, 414-422. Krejsa RJ (1965) The systematic of the prickly sculpin Cottus asper: An investigation of genetic and non-genetic variation within the polytypic species. Unpublished Ph.D. thesis. University British Columbia. Kumar S, Subramanian S (2002) Mutation rates in mammalian genomes. Proceedings of the National Academy of Science, United States of America, 99, 803-808. Kubatko LS, Gibbs HL, Bloomquist EW (2011) Inferring species-level phylogenies and taxonomic distinctiveness using multilocus data in Sistrurus rattlesnakes. Systematic Biology. 60(4), 393-409. Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene grees under coalescence. Bioinformatics, 25(7), 971-973. Kubatko LS, Degnan JH (2007) Inconsistency of phylogenetic estimates from concatenated data under coalescence. Systematic Biology. 56, 17-24. Kuhner MK (2008) Coalescent genealogy samplers: windows into population history. Trends in Ecology and Evolution, 24, 86-93. Lapointe FJ, Rissler LJ (2005) Congruence, consensus, and the comparative phylogeography of codistributed species in California. The American Naturalist, 166(2), 290-299. Larget B, Kotha SK, Dewey CN, Ané C (2010) BUCKy: Gene tree / species tree reconciliation with the Bayesian concordance analysis. Bioinformatics, 26 (22), 2910-2911. Laval G, Excoffier L (2004) SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics, 20(15), 2485-2487. Lee CE, Bell MA (1999) Causes and consequences of recent freshwater invasions by saltwater animals. Trends in Ecology and Evolution, 14(7), 284-288. Lee WJ, Conroy J, Howell WH, Kocher T (1995) Structure and evolution of teleost mitochondrial control regions. Journal of Molecular Evolution, 41, 54–66. Lettenmaier DP, Gan TY (1990) Hydrologic sensitivities of the Sacramento-San Joaquin River basin, California, to global warming. Water Resources Research, 26(1), 69-86. Leu M, Hanser SE, Knick ST (2008) The human footprint in the west: a large-scale analysis of anthropogenic impacts. Ecological Applications, 18(5), 1119-1139. Lewis PO, Zaykin D (2001) Genetic Data Analysis: Computer program for the analysis of allelic data. Version 1.0 (d16c). Free program distributed by the authors over the internet from http://lewis.eeb.uconn.edu/lewishome/software.html Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25(11), 1451-1452. Light T, Erman DC, Myrick C, Clarke J (2002) Decline of the Shasta crayfish (Pacifastacus fortis Faxon) of northeastern California. Conservation Biology, 9(6), 1567-1577. Liu L (2008) BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics. 24, 2542-2543. Liu L, Pearl DK, Brumfield RT, Edwards SV (2008) Estimating species trees using multiple- allele DNA sequence data. Evolution. 62(8), 2080-2091. Liu L, Pearl DK (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Systematic Biology, 56, 504-514.

116

Losos JB, Glor RE (2003) Phylogenetic comparative methods and the geography of speciation. Trends in Ecology and Evolution, 18(5), 220-227. Lu G, Basley DJ, Bernatchez L (2001) Contrasting patterns of mitochondrial DNA and microsatellite introgressive hybridization between lineages of lake whitefish (Coregonus clupeaformis); relevance for speciation. Molecular Ecology, 10(4), 965-985. Lucas LV, Cloern JE, Thompson JK, Monsen NE (2002) Functional variability of habitats within the Sacramento-San Joaquin Delta: Restoration implications. Ecological Applications, 12(5), 1528-1547. Lund JR (2010) Comparing futures for the Sacramento-San Joaquin delta (Vol. 3). University of California Press. Lynch JD (1989) The gauge of speciation: on the frequencies of modes of speciation. In Speciation and Its Consequences (Otte, D. and Endler, J.A. eds) Sinauer Associates. Macdonald GA, Gay TE (1968) Geology of the Cascade Range [repr.]. California Division of Mines and Geology, Mineral Information Service, 21, 108-112. Maddison W, Knowles LL (2006) Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21-30. Marjoram P, Tavare S (2006) Modern computational approaches for analyzing molecular genetic variation data. Nature Reviews – Genetics, 7, 759-770. Maldonado JE, Vila C, Wayne RK (2001) Tripartite genetic subdivisions in the ornate shrew (Sorex ornatus). Molecular Ecology, 10(1), 127-147. Mallet J (2007) Hybrid speciation. Nature, 446, 279-283. Manel S, Schwartz M K, Luikart G, Taberlet P (2003) Landscape genetics: combining landscape ecology and population genetics. Trends in Ecology and Evolution, 18(4), 189-197. Marchetti MP, Moyle PB, Levine R (2004) Invasive species profiling? Exploring the characteristics of non‐native fishes across invasion stages in California. Freshwater Biology, 49(5), 646-661. Marko PB, Hart MW (2011) The complex analytical landscape of gene flow inference. Trends in Ecology and Evolution, 26(9), 448-456. Mathias Kondolf G, Batalla RJ (2005) Hydrological effects of dams and water diversions on rivers of Mediterranean-climate regions: examples from California. Developments in Earth Surface Processes, 7, 197-211. Matthews N (2007) Rewatering the San Joaquin River: A Summary of the Friant Dam Litigation. Ecology LQ, 34, 1109. Maurer EP (2007) Uncertainty in hydrologic impacts of climate change in the Sierra Nevada, California, under two emissions scenarios. Climatic Change, 82(3), 309-325. Mayr E (1963) Animal species and evolution. Cambridge, Mass.: Harvard University Press. McGinnis SM (2006) Field guide to freshwater fishes of California (Vol. 77). University of California Press. McGuire J, Linkem CW, Koo MS, Hutchison DW, Lappin AK, Orange DI, Lemos-Espinal J, Riddle BR, Jaeger JR (2007) Mitochondrial introgression and incomplete lineage sorting through space and time: Phylogenetics of crotaphytid lizards. Evolution. 61, 2879-2897. McKinnon JS, Rundle HD (2002) Speciation in nature: the threespine stickleback model systems. Trends in Ecology & Evolution, 17(10), 480-488. McLarney WO (1968) Spawning habits and morphological variation in the coastrange sculpin, Cottus aleuticus, and the prickly sculpin, Cottus asper. Transactions of the American Fisheries Society, 97(1), 46-48.

117

Mertz JE (2002) Comparison of diets of prickly sculpin and juvenile fall-run Chinook salmon in the lower Mokelumne River, California. The Southwestern Naturalist, 47(2) 195-204. Michel AP, Sim S, Powell TH, Taylor MS, Nosil P, Feder JL (2010) Widespread genomic divergence during sympatric speciation. Proceedings of the National Academy of Sciences, 107(21), 9724-9729. Miller NL, Bashford KE, Strem E (2007) Potential impacts of climate change on California hydrology. Journal of the American Water Resources Association, 39(4), 771-784. Millikan AE (1968) The life history and ecology of Cottus asper Richardson and Cottus gulosus (Girard) in Conner Creek, Washington. M.S. Thesis, Univ. Wash., Seattle, Wash. Minckley WL, Hendrickson DA, Bond CE (1986) Geography of western North American freshwater fishes: description and relationships to intracontinental tectonism. The zoogeography of North American freshwater fishes, 519-613. Moore WS (1995) Inferring phylogenies from mtDNA variation: Mitochondrial-gene trees versus nuclear-gene trees. Evolution, 49(4), 718-726. Morjan CL, Rieseberg LH (2004) How species evolve collectively: implications of gene flow and selection for the spread of advantageous alleles. Molecular Ecology, 13(6), 1341-1356. Mount JF (1995) California rivers and streams: the conflict between fluvial process and land use. University of California Press. Moyle PB, Katz JV, Quiñones RM (2011) Rapid decline of California’s native inland fishes: a status assessment. Biological Conservation, 144(10), 2414-2423. Moyle PB, Williams JE (2005) Biodiversity loss in the temperate zone: decline of the native fish fauna of California. Conservation Biology, 4(3), 275-284. Moyle PB (2002) Inland fishes of California. University of California Press. Moyle PB (1977) In defense of sculpins. Fisheries 2, 20-23. Moyle PB, Marchetti MP (2006) Predicting invasion success: freshwater fishes in California as a model. Bioscience, 56, 515-524. Moyle PB, Vondracek B, Grossman GD (1983) Response of fish populations in the North Fork of the Feather River, California to treatment with fish toxicants. North American Journal of Fisheries Management, 3, 48-60. Moyle PB, Yoshiyama RM, Williams JE, Wikramanayake E (1996) Fish species of special concern in California. Sacramento: California Department of Fish and Game. Moyle PB, Nichols RD (1973) Ecology of some native and introduced fishes of the Sierra Nevada foothills in central California. Copeia, 1973, 478-490. Moyle RG, Filardi CE, Smith CE, Diamond J (2009) Explosive Pleistocene diversification and hemispheric expansion of a “great speciator”. Proceedings of the National Academy of Sciences, 106(6), 1863-1868. Mueller RL (2006) Evolutionary rates, divergence dates, and the performance of mitochondrial genes in Bayesian phylogenetic analysis. Systematic Biology, 55, 289-300. Mulch A, Sarna-Wojcicki AM, Perkins ME, Chamberlain CP (2008) A Miocene to Pleistocene climate and elevation record of the Sierra Nevada (California). Proceedings of the National Academy of Sciences, 105(19), 6819-6824. Myers N, Mittermeier RA, Mittermeier CG, Da Fonseca GAB, Kent J (2000) Biodiversity hotspots for conservation priorities. Nature. 403 Narum SR (2006) Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783-787.

118

Near TJ, Bossu CM, Bradburd GS, Carlson RL, Harrington RC, Hollingsworth PR, Keck BP, Etnier DA (2011) Phylogeny and temporal diversification of darters (Percidae: Etheostomatinae). Systematic Biology, 60(5), 565-595. Near TJ, Cheng C-HC (2008) Phylogenetics of notothenioid fishes (Teleostei: Acanthomorpha): Inferences from mitochondrial and nuclear gene sequences. Molecular Phylogenetics and Evolution, 47(2), 832-840. Nielsen JL, Crow KD, Fountain MC (1999) Microsatellite diversity and conservation of a relic trout population: McCloud River redband trout. Molecular Ecology, 8(1), 129-142. Nolte AW, Freyhof J, Tautz D (2006) When invaders meet locally adapted types: rapid moulding of hybrid zones between sculpin (Cottus) in the Rhine system. Molecular Ecology 15, 1983–1993. Nolte AW, Freyhof J, Stemshorn KC, Tautz D (2005) An Invasive lineage of sculpins, Cottus sp. (Pisces, Teleostei) in the Rhine with new habitat adaptations has originated from hybridization between old phylogeographic groups. Proceedings of the Royal Society B, 272, 2379-2387. Nolte AW, Stemshorn KC, Tautz D (2005) Direct cloning of microsatellite loci from Cottus gobio through a simplified enrichment procedure. Molecular Ecology Research 5, 628- 636. Nosil P, Harmon LJ, Seehausen O (2009) Ecological explanations for (incomplete) speciation. Trends in Ecology and Evolution, 24(3), 145-156. Nosil P (2008) Speciation with gene flow could be common. Molecular Ecology, 17(9), 2103- 2106. Nye TMW (2008) Trees of Trees: An approach to comparing multiple alternative phylogenies. Systematic Biology, 57(5), 785-794. Ödeen A, Florin AB (2000) Effective population size may limit the power of laboratory experiments to demonstrate sympatric and parapatric speciation. Proceedings of the Royal Society of London. Series B: Biological Sciences, 267, 601-606. O'Grady PM, Kidwell MG (2002) Phylogeny of the Subgenus Sophophora (Diptera: Drosophilidae) Based on Combined Analysis of Nuclear and Mitochondrial Sequences. Molecular Phylogenetics and Evolution. 22, 442-453. Ouborg NJ, Piquot Y, Van Groenendael JM (2008) Population genetics, molecular markers and the study of dispersal in plants. Journal of Ecology, 87(4), 551-568. Page LM, Burr BM (2011) A field guide to freshwater fishes: North America north of Mexico. Houghton Mifflin Harcourt. Patten BG (1971) Spawning and fecundity of seven species of northwest American Cottus. American Midland Naturalist, 85,493-506. Pease RM (1965) Modoc County: A geographic time continuum on the California volcanic tableland. Berkeley: California University Press. Pereira SL, Baker AJ, Wajntal A (2002) Combined Nuclear and Mitochondrial DNA Sequences Resolve Generic Relationships within the Cracidae (Galliformes, Aves). Systematic Biology, 51 (6), 946-958. Phillips FM, Zreda MG, Benson LV, Plummer MA, Elmore D, Sharma P (1996) Chronology for fluctuations in late Pleistocene Sierra Nevada glaciers and lakes. Science, 274, 749-751. Poissant J, Knight TW, Ferguson MM (2005) Nonequilibrium conditions following landscape rearrangement: the relative contribution of past and current hydrological landscapes on the genetic structure of a stream‐dwelling fish. Molecular Ecology, 14(5), 1321-1331.

119

Pond SLK, Posada D, Gravenor MB, Woelk CH, Frost SD (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Molecular Biology and Evolution, 23(10), 1891-1901. Posada D (2008) jModelTest: phylogenetic model averaging. Molecular Biology and Evolution, 25(7), 1253-1256. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793-808. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics, 155(2), 945-959. Rannala B, Yang Z (2003) Bayes estimates of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 164, 1645-1656 Reid SB (2007) Conservation Review - Modoc sucker, Catostomus microps. Final Report submitted to US Fish and Wildlife Service* Klamath Falls Field Office. Reusch TBH, Wegner KM, Kalbe M (2001) Rapid genetic divergence in postglacial populations of threespine stickleback (Gasterosteus aculeatus): the role of habitat type, drainage and geographical proximity. Molecular Ecology, 10(10), 2435-2445. Richardson J (1836) The Fish. In: Fauna Boreali-Americana; or the zoology of the northern parts of British America: containing descriptions of the objects of natural history collected on the late northern land expeditions, under the command of Sir John Franklin, R.N. Fauna Boreali-Americana; or the zoology of the northern parts of British America: ... Part 3: i- xv + 1-327, Pls. 74-97. Roberts C M, McClean CJ, Veron JEN, Hawkins JP, Allen GR, McAllister DE, Mittermeier CG, Schueler FW, Spalding M, Wells F, et al. (2002) Marine Biodiversity Hotspots and Conservation Priorities for Tropical Reefs. Science, 295, 1280–1284. Robins R, Miller RR (1957) Classification, variation, and distribution of the sculpins, genus Cottus, inhabiting Pacific slope waters in California and southern Oregon with a key to the species. California Fish and Game, 43(3), 213-233. Ronquist F, Huelsenbeck JP (2003) MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19, 1572-1574.Rosenberg NA (2003) DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes, 4(1), 137-138. Rubinoff D, Cameron S, Will K (2006) A Genomic Perspective on the Shortcomings of Mitochondrial DNA for "Barcoding" Identification. Journal of Heredity, 97(6), 581-594. Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics, 19(18), 2496-2497. Rundle HD, Nagel L, Boughman JW, Schluter D (2000) Natural Selection and Parallel Speciation in Sympatric Sticklebacks. Science, 287, 306-308. Rhymer JM, Simberloff D (1996) Extinction by hybridization and introgression. Annual Review of Ecology and Systematics, 27, 83-109. Saint‐Laurent R, Legault M, Bernatchez L (2003) Divergent selection maintains adaptive differentiation despite high gene flow between sympatric rainbow smelt ecotypes (Osmerus mordax Mitchill). Molecular Ecology, 12(2), 315-330. Schilthuizen M, Cabanban AS, Haase M (2005) Possible speciation with gene flow in tropical cave snails. Journal of Zoological Systematics and Evolutionary Research, 43(2), 133- 138. Schlötterer C (2004) The evolution of molecular markers—just a matter of fashion? Nature Reviews Genetics, 5(1), 63-69.

120

Schluter D (2000) The ecology of adaptive radiation. OUP Oxford. Schluter D (2009) Evidence for ecological speciation and its alternative. Science, 323, 737-741. Schmidt TR, Gold JR (1993) Complete sequence of the mitochondrial cytochrome b gene in the cherryfin shiner, Lythrurus roseipinnis (Teleostei: Cyprinidae). Copeia. 1993, 880–893. Schuelke M (2000) An economic method for the fluorescent labeling of PCR fragments. National Biotechnology, 18, 233-234. Seehausen O (2004) Hybridization and adaptive radiation. Trends in Ecology and Evolution, 19(4), 198-207. Seifert B (2010) Intranidal mating, gyne polymorphism, polygyny, and supercoloniality as factors for sympatric and parapatric speciation in ants. Ecological Entomology, 35, 33-40. Servedio MR, Hermisson J, Doorn GS (2013) Hybridization may rarely promote speciation. Journal of Evolutionary Biology, 26(2), 282-285. Shaffer HB, Pauly GB, Oliver JC, Trenham PC (2004) The molecular phylogenetics of endangerment: cryptic variation and historical phylogeography of the California tiger salamander, Ambystoma californiense. Molecular Ecology, 13(10), 3033-3049. Shaffer HB, Thompson RC (2007) Delimiting species in recent radiations. Systematic Biology, 2007, 56: 896-906. Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT (2006) Incorporating molecular evolution into phylogenetic analysis, and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Annual Review of Ecology, Evolution, and Systematics, 2006, 545-579. Sites JWJr, Marshall JC (2004) Operational criteria for delimiting species. Annual Review of Ecology and Evolutionary Systematics, 35, 199-227. Sites JWJr, Marshall JC (2003) Delimiting species: a Renaissance issue in systematic biology. Trends in Ecology and Evolution, 18(9), 462-470. Smadja CM, Butlin RK (2011) A framework for comparing processes of speciation in the presence of gene flow. Molecular Ecology, 20, 5123–40. Smit AFA, Hubley R, Green P (2004) RepeatMasker Open-3.0. (http://www.repeatmasker.org). Smith CT, Reid SB, Godfrey L, Ardren WR (2011) Gene Flow Among Modoc Sucker and Sacramento Sucker Populations in the Upper Pit River, California and Oregon. Journal of Fish and Wildlife Management, 2(1), 72-84. Sota T, Vogler AP (2001) Incongruence of mitochondrial and nuclear gene trees in the Carabid beetles Ohomopterus. Systematic Biology, 50, 39-59. Spinks PQ, Thomson RC, Shaffer BH (2010) Nuclear gene phylogeography reveals the historical legacy of an ancient inland sea on lineages of the western pond turtle, Emys marmorata in California. Molecular Ecology, 19(3), 542-556. Starrett J, Hedin M (2006) Multilocus genealogies reveal multiple cryptic species and biogeographical complexity in the California turret spider Antrodiaetus riversi (Mygalomorphae, Antrodiaetidae). Molecular Ecology, 16(3), 583-604. Stemshorn KC, Reed FA, Nolte AW, Tautz D (2011) Rapid formation of distinct hybrid lineages after secondary contact of two fish species (Cottus sp.). Molecular Ecology, 20, 1475- 1491. Stephens M, Scheet P (2005) Accounting for decay of linkage disequilibrium in haplotype inference and missing data imputation. American Journal of Human Genetics, 76, 449- 462. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978-989.

121

Storfer A, Murphy MA, Spear SF, Holderegger R, Waits LP (2010) Landscape genetics: where are we now? Molecular Ecology, 19, 3496-3514. Strauss RE, Bond CE (1990) Taxonomic Methods: Morphology. In: Methods of Fish Biology. Schreck, C.B. and P.B. Moyle, editors. American Fisheries Society, Bethesda, Maryland. Swift CC, Haglund TR, Ruiz M, Fisher RN (1993) The status and distribution of the freshwater fishes of southern California. Bulletin of Southern California Academy of Science, 92, 101-167. Swofford DL (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). v4. Sinauer Associates, Sunderland, Mass. Tabor RA, Footen BA, Fresh KL, Celedonia MT, Mejia F, Low DL, Park L (2007) Smallmouth bass and predation on juvenile Chinook salmon and other Salmonids in the Lake Washington basin. North American Journal of Fisheries Management, 27, 1174-1188. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10, 512–526. Than C, Nakhleh L (2009) Species tree inference by minimizing deep coalescences. PLoS Computational Biology, 5(9), 1-12. e1000501. Than C, Ruths D, Nakhleh L (2008) PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics. 9, 322-338. Utzinger J, Roth, C, Peter A (1998) Effects of environmental parameters on the distribution of bullhead Cottus gobio with particular consideration of the effects of obstructions. Journal of Applied Ecology 35, 882-892. Van Valen L (1976) Ecological species, multispecies, and oaks. Taxon. 25, 233-239 Via S (2012) Divergence hitchhiking and the spread of genomic isolation during ecological speciation-with-gene-flow. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 451-460. Via S (2001) Sympatric speciation in animals: the ugly duckling grows up. Trends in Ecology and Evolution, 16(7), 381-390. Wagner HM, Hanson CB, Gustafson EP, Gobalet KW, Whistler DP (1997) Biogeography of Pliocene and Pleistocene vertebrate faunas of northeastern California and their temporal significance to the development of the Modoc Plateau and the Klamath Mountains Orogen. San Bernardino County Museum Association Quarterly, 44, 13-21. Wake DB (1997) Incipient species formation in salamanders of the Ensatina complex. Proceedings of the National Academy of Sciences, 94, 7761-7767. Wakeley J (2008) Coalescent Theory: An Introduction. Roberts and Company. Greenwood Village, CO. Warner J, Schoellhamer D, Burau J, Schladow G (2002) Effects of tidal current phase at the junction of two straits. Continental Shelf Research, 22(11), 1629-1642. Waters JM, Rowe DL, Burridge CP, Wallis GP (2010) Gene trees versus species trees: Reassessing life-history evolution in a freshwater fish radiation. Systematic Biology, 59(5), 504-517. Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) Abctoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics, 11(1), 116. Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 40(1), 1358-1370.

122

Wood TE, Burke JM, Rieseberg LH (2005) Parallel genotypic adaptation: when evolution repeats itself. Genetica, 123, 157-170. White JL, Harvey BC (1999) Habitat separation of prickly sculpin, Cottus asper, and coastrange sculpin, Cottus aleuticus, in the mainstem Smith River, northwestern California. Copeia, 1999, 371-375. White MJD (1968) Models of Speciation New concepts suggest that the classical sympatric and allopatric models are not the only alternatives. Science, 159, 1065-1070. Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika. 92(4), 937-950. Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci U.S.A. 107, 9264-9269. Yokoyama R, Goto A (2005) Evolutionary history of freshwater sculpins, genus Cottus (Teleostei; Cottidae) and related taxa, as inferred from mitochondrial DNA phylogeny. Molecular Phylogenetics and Evolution 36, 654-668. Zellmer AJ, Knowles LL (2009) Disentangling the effects of historic vs. contemporary landscape structure on population genetic divergence. Molecular Ecology, 18(17), 3593-3602.