UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

Title Genomic Analysis of Disjunct Marine Fish Populations of the Northeastern Pacific and Sea of Cortez

Permalink https://escholarship.org/uc/item/33r6f026

Author Garcia, Eric

Publication Date 2018

License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SANTA CRUZ

GENOMIC ANALYSIS OF DISJUNCT MARINE FISH POPULATIONS OF THE NORTHEASTERN PACIFIC AND SEA OF CORTEZ

A dissertation submitted in partial satisfaction of the requirements for the degree of

DOCTOR OF PHILOSOPHY

in

ECOLOGY AND EVOLUTIONARY BIOLOGY

by

Eric Garcia

December 2018

The Dissertation of Eric Garcia is approved:

______Professor Giacomo Bernardi, Chair

______Professor Pete Raimondi

______Brian Simison, Ph.D.

______Professor Octavio Aburto

______Lori Kletzer Vice Provost and Dean of Graduate Studies

Copyright © by

Eric Garcia

December, 2018

Table of Contents

General Introduction

Studying evolution in the genomic era………………………..……………………..01

The Baja California disjunct fishes…………………………………………………..06

Chapter 1

Tempo and mode of divergence in Baja California disjunct fishes based on a genome- wide calibrated substitution rate……………………………..………………………13

Chapter 2

Patterns of genomic divergence and signals of selection in sympatric and allopatric northeastern Pacific and Sea of Cortez populations of the sargo (Anisotremus davidsonii) and longjaw mudsucker (Gillichthys mirabilis)……………..………..…42

Chapter 3

Genomic divergence and signals of convergent selection between northeastern Pacific and Sea of Cortez disjunct populations of four marine fishes…………………...…..81

General Conclusion...... 126

iii

List of Tables

General Introduction

Table 1 - List and distribution of the 19 Baja California disjunct fishes……………07

Tempo and mode of divergence in Baja California disjunct fishes based on a genome-wide calibrated substitution rate.

Table 1 - Number of samples collected per species and region...... 19

Table 2 - Divergence time estimates based on internally-calibrated Anisotremus RAD loci substitution rate...... 24

Patterns of genomic divergence and signals of selection in sympatric and allopatric northeastern Pacific and Sea of Cortez populations of the sargo

(Anisotremus davidsonii) and longjaw mudsucker (Gillichthys mirabilis)

Table 1 - General characteristic and previously available genetic information of studied species...... 49

Table 2 - Location and number of samples per species…...... 51

Table 3 - Locus, polymorphism, and genetic diversity statistics of the Pacific and Sea of Cortez (Gulf) populations per species…...... 58

Table 4 - Distance matrix reporting FST values and significance between sympatric populations of Anisotremus davidsonii and Gillichthys mirabilis...... 59

iv

Genomic divergence and signals of convergent selection between northeastern

Pacific and Sea of Cortez disjunct populations of four marine fishes

Table 1 - Species characteristics and previously available genetic information….…86

Table 2 - Pacific and Sea of Cortez collecting sites and samples per species...... 90

Table 3 - Locus, polymorphism, and genetic statistics of the Pacific and Sea of Cortez populations per species ……………….……………………………………………..95

Table 4 - List and function of presumed under selection and that are diverging between Pacific and Gulf populations of more than one species…………………...101

v

List of Figures

General Introduction

Figure 1 - Visual representation of expected differentiation along the genome…..…04

Figure 2 - Typical Baja California disjunct distribution …………………….………06

Figure 3 - The studied species……………………………….………………………09

Tempo and mode of divergence in Baja California disjunct fishes based on a genome-wide calibrated substitution rate.

Figure 1 - Time-calibrated phylogeny and distribution of studied Anisotremus species………………………………………………………………………………..25

Patterns of genomic divergence and signals of selection in sympatric and allopatric northeastern Pacific and Sea of Cortez populations of the sargo

(Anisotremus davidsonii) and longjaw mudsucker (Gillichthys mirabilis)

Figure 1 - Pacific and Sea of Cortez sampling localities and well-established phylogeographic breaks, Point Conception and Punta Eugenia...... 50

Figure 2 - DAPC cluster plots of Gillichthys mirabilis and Anisotremus davidsonii populations…………………………………….……………………………………..61

Figure 3 - STRUCTURE plots with Bayesian assignment of individual into distinct genetic clusters or populations……………..….……………………………………..62

vi Figure 4 - Outlier loci statistics in pairwise comparisons of populations of

Anisotremus davidsonii and Gillichthys mirabilis..……………………………...…..65

Genomic divergence and signals of convergent selection between northeastern

Pacific and Sea of Cortez disjunct populations of four marine fishes

Figure 1 - Pacific and Sea of Cortez sampling localities…………..……...... …...…..89

Figure 2 - DAPC cluster plots of Pacific and Sea of Cortez per species based on all loci……………………………………………..……………………………………..96

Figure 3 - STRUCTURE plots based on presumed neutral loci and outlier loci or loci suspected to be under selection…………..……...... …...………………………….....97

Figure 4 - FST distribution and corresponding relative density of outlier loci per species…...………………...…………………..……………………………………..99

Figure 5 - Stacked barplot showing the percentages of outliers giving a GenBank match to a , a sequence or location without gene information, or not match found…...………………...…………………..……………………………...100

Figure 6 - KEGG assignments of outlier loci………………...... …...……………...100

vii Abstract

Genomic analysis of disjunct marine fish populations of the northeastern Pacific and Sea of Cortez

Eric Garcia

The formation of the Baja California peninsula separated the distributions of

19 marine fishes into disjunct Pacific and Sea of Cortez populations. Similarly, their

Pacific distributions cross phylogeographic points that diminish the genetic connectivity of their populations. This resulted in multiple species experiencing a gradient of gene flow and an extraordinary framework to study mechanisms of divergence and signals of selection under different scenarios of isolation. Genetic isolation in these species has previously been studied using only a handful of markers.

In this dissertation, Restriction Site-Associated DNA (RADseq) is used to genotype thousands of genome-wide makers to study the evolutionary history, characterized genomic patterns of divergence among populations, and search for signals of drift and selection, in four of these species: the sargo, Anisotremus davidsonii (ADA); the longjaw mudsucker, Gillichthys mirabilis (GMI), the California sheephead,

Semicossyphus pulcher (SPU); and the zebraperch, Kyphosus azureus (KAZ). While, no evidence of isolation was found for SPU, estimated dates the disjunction of ADA,

GMI, and KAZ, were 60, 284, and 23 thousand years, respectively. A dispersal episode where species migrated northwards and then became isolated by the warming of the seawater temperate in the south of the Baja California peninsula at the end of

viii the last glaciation period, and a vicariant isolation resulting from the closure of a mid- peninsular seaway, are the plausible historical events that produced the distributions of these species. Lower than expected levels of genomic gene flow

(significant p-values) were seen across Point Conception in GMI (FST=0.15), Punta

Eugenia in ADA (FST=0.02), and across the peninsula in KAZ, GMI and ADA

(FST=0.03, 0.11, and 0.23, respectively). Furthermore, when comparing disjunct populations, 19 to 46 % of outlier loci matched coding genes in all species and analyses identified 15 genomic regions, potentially involved in processing environmental information, , immune response, and possibly reproduction, diverging in more than one of these species. Results are interpreted as supporting (1) the idea that ADA and GMI disjunct populations are in the initial phases of allopatric speciation and (2) the presence of convergent selection in these species.

ix Dedication

To my family, friends, and mentors.

x Acknowledgements

I would like to express my deep gratitude to my partner and family for supporting me throughout my Ph.D. adventure. Thank you for making my life so much better. I am forever in doubt to my committee members, Drs. Giacomo

Bernardi, Pete Raimondi, Brian Simison, and Octavio Aburto, for the guidance and support at all times during this dissertation. Thank you to all EEB professors and fellow Ph.D. students for your lessons and discussions, which improved my understanding of evolution and academic life. Finally, I would like to thank all my friends for making Santa Cruz an incredible place to call home for all these years.

This work was supported financially by the Consejo Nacional de Ciencia y

Tecnología (CONACYT) in Mexico, UC Mexus, The Myers Trust, STARTS scholarship, Friends of the Long Marine Lab, and by the Graduate Student

Association and the department of Ecology and Evolutionary Biology at UCSC. I would also like to thank Dr. Larry G. Allen for providing fish illustrations.

xi

GENERAL INTRODUCTION

Studying Evolution in the Genomic Era

Genetic differences are the building blocks of biodiversity. How this variation distinguishes populations and eventually creates new species has been a central theme of investigation among evolutionary biologists. The way scientists have perceived the speciation process and the methodologies they used to answer questions about evolution have been evolving through time. Speciation was first proposed by Darwin to be the result of natural selection but with the rise of the modern synthesis, isolation, where populations can divergence without any selective , was instead considered to be the main driver of differentiation (Coyne & Orr 2004;

Darwin 1859; Gaither et al. 2015). Much of the research in the field had thus focused in populations with some geographic separation until recently, when many authors have documented an active role of natural selection in several sympatric systems

(Schluter 2009; Crow et al 2010; Nosil 2012; Bowen et al 2013). (Nosil 2012; Crow et al. 2010)As a result, views of speciation are currently more holistic and include spatial and ecological components that often operate in conjunction during speciation events (Diekmann 1999; Bernardi 2013).

Regardless of the driving differentiation, a population must first have genetic variation in order to diverge from others. Variation is acquired by a population through random mutations, recombination or the gene shuffling in sexual organisms, or the introduction of new alleles by individuals migrating from a

1

different population. Each population has then a unique gene pool that is the outcome of complex historical and contemporary evolutionary trajectories. Existing variation within a population can be maintained by micro-evolutionary processes such as balancing selection or diminished by drift and other types of selection.

Structure between populations, whether or not they are isolated, can be produced randomly through genetic drift (acting mostly upon neutral loci) or selectively by natural selection (by definition acting only upon non-neutral loci).

When populations interbreed, structure will be shaped by the interplay of gene flow, drift and selection. On one hand, populations might become more similar to each other by gene flow resulting in an overall homogenization of variation throughout the genome. On the other hand, natural selection is considered to be the major responsible for any existing differentiation between populations with high gene flow

(Nosil et al. 2009; Gaither et al. 2015), except for small effective populations where drift can also substantially change the frequencies of random alleles.

Therefore, it is important to include neutral and non-neutral loci to obtain a more complete picture of the evolutionary histories of populations and species alike.

Due to technological limitations, geneticists have analyzed the speciation process for a long time with only a handful of loci (Hohenlohe et al. 2010), which in some studies were a mix of neutral and selected loci but in most were only of one type. Yet, a new suite of genetic tools often referred as Next Generation Sequencing (NGS) have revolutionized the study of genomics and increased the ability of researchers to investigate components of evolution in both, individual and holistic approaches.

2

Genetic variation can now be studied at the genomic level with technologies such as

Restriction site-Associated DNA (RADseq) that can produce amounts of loci many orders of magnitude greater than First Generation Sequencing methodologies (Baird et al. 2008; Hohenlohe et al. 2010). Traditional population parameters can be measured at the genomic level and analyses yield higher confidence levels and statistical power.

Furthermore, discrepancies in the allele frequencies between populations allow calculating statistics such as FST that estimate differentiation. These statistics can be measured throughout the entire genomic data and models can be run to predict the range of differentiation that is expected to occur randomly through genetic drift.

Loci embedded within this range represent neutral loci that evolve freely (Figure 1;

Hohenlohe et al. 2010; Nosil et al. 2009). In contrast, loci with lower differentiation

(lower FST values) than neutral loci are candidate regions to be under balancing selection. This maintains a high diversity of alleles which can be useful in genes involved, for instance, in pathogen defenses, immune and inflammation responses

(Hohenlohe et al. 2010). Similarly, loci with higher differentiation than neutral loci are candidate regions to be under divergent selection which are the loci experiencing different selective pressures and driving local adaptation in populations (Hohenlohe et al. 2010; Nosil et al. 2009). However, outlier loci are initially only considered

“candidates” since some of these loci might illustrate higher levels of differentiation than neutral loci only because they are present in adjacent, non-adaptive areas, linked to the actual selected loci (Hohenlohe et al. 2010; Nosil et al. 2009; Bernardi 2013).

3

Outlier loci can then either be the actual genes under selection or simply linked loci that are “hitchhiking” the effects of selection and creating islands of speciation, or areas in the genome with substantially higher differentiation (Figure 1; Nosil et al.

2009; Bernardi 2013).

Island of Speciation! Selected locus! Sea level = upper!

! limit of expected! Tightly- linked! neutral divergence!

! neutral loci! ) Fst

( Sea floor = purely! neutrally evolving! regions!

Genetic divergence Genetic Loosely- linked! neutral loci! Trenches = loci! under balancing! selection!

Figure 1. Visual representation of expected differentiation along the genome Figure 4! (orange line). Loci with significantly higher differentiation create islands of speciation uplifted by loci under selection. Areas with significantly lower differentiation represent loci under balancing selection. Loci between these two extremes are neutral loci (modified from Nosil et al. 2009).

RADseq uses enzymes that cut DNA at particular restriction sites producing thousands of sequences which are distributed throughout the genome (Baird et al.

2008). Once outlier loci are filtered, they can be compared to the online database

Basic Local Alignment Search Tool (BLAST) to reveal if any of them match known gene sequences (Altschul et al. 1997). Subsequently, with knowledge of gene

4

function, we can then infer the possible ecological or environmental mechanisms that might be responsible for the differential selective between populations

(Gaither et al. 2015). Recent studies discovered that on average, 5 to 10% of the total number of loci obtained by RADseq showed signs of divergent selection (i.e. higher differentiation than neutral loci) and some of these loci where successfully correlated to an ecological process in the studied system (Nosil et al. 2009).

Any organism and even any population can have unique evolutionary trajectories shaped by many factors such as behavior, historical events affecting population size, and the combined forces of micro-evolutionary processes acting in the past and present time. Thus, much of our ability to investigate how the different drivers of evolution (gene flow, drift, and selection) produce divergence in allopatric and sympatric populations, depends as well on finding good systems where to apply these modern genetic tools. Marine fishes offer the opportunity to study these components in both individual and holistic approaches (Bernardi 2013). For instance, while some fishes exhibit large distributions with little or no genetic structure (Bowen et al. 2006; Schultz et al. 2007), the formation of the Isthmus of Panama approximately 2.8 to 3.5 million years ago (Coates and Obando, 1996; Collins, 1996;

Craig et al., 2004; Knowlton, 1993; Lessios, 2008; Marko, 2002; Thomson et al.,

2000), acted as an absolute barrier to gene flow for several others. In-between these two extremes, the Baja California disjunct fishes form an extraordinary system of marine fishes with shared evolutionary histories and a mix of allopatric and sympatric populations where to study genomic divergence at different levels of isolation.

5

The Baja California Disjunct Fishes

The Baja California disjunct fishes are a group of 19 temperate species that shared similar distributions from central California to southern Baja California, in the northern Eastern Pacific, and northern and central zones of the Sea of Cortez (here also referred as the “Gulf”, Figure 2, Table 1, Bernardi et al. 2003).

40

USA San Francisco 35

Point San Diego Conception 30

Punta Eugenia

25 MEXICO

N Pacific Ocean

0 200 400 600 km 20

-130 -125 -120 -115 -110 -105

Figure 2. Typical Baja California disjucnt distribution.

6

Pacific distributions are more specific than Gulf. Notes in the Gulf distribution are given when available Table 1. Pleuronectidae Pleuronectidae Agonidae Polyprionidae Embiotocidae Scorpaenidae Scorpaenidae Labridae Labridae Kyphosidae Blenniidae Gobiidae Gobiidae Serranidae Chaenopsidae Blenniidae Haemulidae Girellidae Atherinidae Family

List and distribution of the 19 Baja California disjunct fish species

Pleuronichthys verticalis Hypsopsetta guttulata Xenetremus ritteri Stereolepis gigas Zalembius rosaceus Scorpaena guttata Sebastes macdonaldi Semicossyphus pulcher Halichoeres semicinctus Kyphosus azureus Hypsoblennius gentilis Lythrytpnus dalli Gillichthys mirabilis Paralabrax maculatofasciatus Chaenopsis alepidota Hypsoblennius jenkinsi Anisotremus davidsonii /simplicidens Girella nigricans Leuresthes tenuis/sardina Species

hornyhead turbot diamond turbot poacher flagfin giant seabass pink surfperch scorpionfish Mexican rockfish california sheephead wrasse rock zebraperch bay blenny blue banded boby mudsucker longjaw bass spotted sand oragethroat pikeblenny mussel blenny sargo opaleye grunion name Common

(Froese & Pauly; Thomson et al. Thomson Pauly; & (Froese

Point Reyes (San Francisco) to southern Cape Mendocino, northern California to southern Baja Malibu to central Baja. Upper Gulf Humboldt Bay to southern Baja. Upper Gulf Baja southern Point Delgada, northern California to Bahia San Cristobal, Guadalupe too) Santa Cruz to Punta Abreojos (southern Baja, isla California to southern Baja Guardia and Guaymas) Monterey to Baja (uncommon south of Isla Angel de la Bahia de la Paz Point Conception to Bahia Magdalena (Isla Guadalupe too). Monterey to Baja Monterey to Bahia Magdalena Southern California to central Baja Maria, Culiacan, Sinaloa to the north of the gulf. Tomales bay to Bahia Magdanela. Mulege Monterey to Baja Southern Southern California to Baja Conception (Miller and lea 1972) Monterey to Magdalena Bay (rare north of Point San Francisco to southern Baja (common in upper gulf) San Francisco to Magdalena Bay (southern Pacific Baja) Range

California to Baja. Upper gulf.

-

Mazatlan and north in the gulf .

, 2000; Bernardi et al., 2003) al., et Bernardi , 2000;

Baja

and and Bahia Santa

.

7

Populations along the Pacific coast occur without geographic barriers but gene flow might be limited by and oceanographic discontinuities at Point

Conception and Punta Eugenia (Allen et al. 2006). In contrast, there are no obvious barriers within the Gulf but these populations are isolated from the Pacific by the actual Baja California peninsula and the warmer seawater temperate at southernmost regions (Bernardi et al. 2003; Medina & Walsh 2000; Thomson et al. 2000; Present

1987; Tranah & Allen 1999). The end result is 19 natural cases of allopatric and sympatric populations with different levels of gene flow.

The taxonomy, ecology and life histories of these species do not illustrate patterns that can help explain the similar distributions. These species belong to 14 different families and have diverse ecologies from inhibiting estuaries to deep reefs, or having small to large body sizes, different diets, or short to long pelagic larval durations (i.e. 30 days in some and 120 in others). Yet, two genetic patterns based on few mitochondrial and nuclear markers were revealed among these species

(Bernardi et al. 2003). In the first, 8 of 12 studied species showed deep genetic differences between the Pacific and Gulf populations as well as shallower differences across Point Conception and Punta Eugenia. The other 4 species did not show noticeable divergence between any of their populations. Here, we employed NGS techniques to the two species showing the highest structure: the sargo, Anisotremus davidsonii (grunts, Haemulidae) and the longjaw mudsucker, Gillichthys mirabilis

(gobies, Gobiidae), as well as two of the species showing no divergence: the

California sheephead, Semicossyphus pulcher (wrasses, Labridae) and the zebraperch,

8

Kyphosus azureus (sea chubs, Kyphosidae), to examine how micro-evolutionary

processes affect populations in a gradient of isolation (Figure 3).

a)# b)#

a)# a)#b)#c)# d)#b)#

c)# d)#c)# d)# a)# b)#

c)# d)#

Figure 3. The studied species: a) longjaw mudsucker, Gillichthys mirabilis; b) sargo, Anisotremus davidsonii; c) zebraperch, Hermosilla azurea; d) California sheephead, Semicossyphus pulcher.

Permission for illustrations provided by Dr. Larry .G Allen

9

In this dissertation, two pairs of congeneric geminate grunt species are sequenced using RADseq to internally calibrate the genomic mutation rate within the

Anisotremus genus. This substitution rate is then utilized to estimate time and predict most likely hypothesis of divergence that produced the disjunct population in the focal species. Subsequently, thousands of loci are used to characterize genome-wide gene flow and to compare patterns of genomic structure and signals of selection between sympatric and allopatric Baja California populations of the sargo and longjaw mudsucker. Finally, this study compares patterns of genomic differentiation between the allopatric populations and identifies coding genes diverging convergently in more than one of these temperate marine fishes with diverse ecological backgrounds. While each chapter is written and should be treated as an independent, stand-alone study, all three combined conform a useful framework to continue exploring the mechanisms in which isolation and selection shape the genome and drive adaptation in organisms.

10

References

Allen, L.G., Pondella, D.J. & Horn, M. h., 2006. The Ecology of Marine Fishes:

California and Adjacent Waters, UC Press.

Altschul, S.F. et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of

database search programs. Nucleic acids research, 25(17), pp.3389–402.

Baird, N.A. et al., 2008. Rapid SNP discovery and genetic mapping using sequenced

RAD markers. PLoS ONE, 3(10), pp.1–7.

Bernardi, G., 2013. Speciation in fishes. Molecular Ecology, 22(22), pp.5487–5502.

Bernardi, G., Findley, L. & Rocha-Olivares, A., 2003. Vicariance and dispersal across

Baja California in disjunct marine fish populations. Evolution; international

journal of organic evolution, 57(7), pp.1599–1609.

Coyne, J.A. & Orr, H.A., 2004. Speciation, Sunderland, Massachusetts: Sinauer

Associates.

Crow, K.D., Munehara, H. & Bernardi, G., 2010. Sympatric speciation in a genus of

marine fishes. Molecular Ecology, 19(10), pp.2089–2105.

Darwin, C., 1859. On the Origin of Species by Means of Natural Selection, Or, the

Preservation of Favoured Races in the Struggle for Life, London.

Gaither, M.R. et al., 2015. Genomic signatures of geographic isolation and natural

selection in fishes. Molecular Ecology, 24(7), pp.1543–1557.

Hohenlohe, P.A. et al., 2010. Population genomics of parallel adaptation in threespine

11

stickleback using sequenced RAD tags. PLoS Genetics, 6(2).

Medina, M. & Walsh, P.J., 2000. Comparison of Four Mendelian Loci of the

California Sea Hare ( Aplysia californica ) from Populations of the Coast of

California and the Sea of Cortez. Marine Biotechnology, 2, pp.449–455.

Nosil, P., 2012. Ecological speciation, New York: Oxford University Press Inc.

Nosil, P., Funk, D.J. & Ortiz-Barrientos, D., 2009. Divergent selection and

heterogeneous genomic divergence. Molecular Ecology, 18(3), pp.375–402.

Present, T.M.C., 1987. Genetic Differentiation of Disjunct Gulf of California and

Pacific Outer Coast Populations of Hypsoblennius jenkinsi Genetic

Differentiation of Disjunct Gulf of California and Pacific Outer Coast

Populations of Hypsoblennius jenkinsi. Copeia, 1987(4), pp.1010–1024.

Thomson, D.A., Findley, L.T. & Kersitch, A.N., 2000. Reef fishes of the Sea of

Cortez: the rocky shore fishes of the Gulf of California., Austin, TX: The

University of Texas Press.

Tranah, G.J. & Allen, L.G., 1999. Morphologic and genetic variation among six

populations of the spotted sand bass, Paralabrax maculatofasciatus, from

southern California to the upper Sea of Cortez, Los Angeles, CA.

12

CHAPTER 1

Tempo and Mode of Divergence in Baja California Disjunct Fishes Based on a

Genome-Wide Calibrated Substitution Rate.

Abstract

The distribution of the Baja California disjunct species, where populations in the Sea of Cortez occurred isolated from the Pacific, was a product of multiple vicariant and dispersal events. This study first produces a time calibrated genome- wide substitution rate from thousands of RADseq markers by genotyping two trans-

Isthmian Anisotremus geminate fish clades. Then, this rate is utilized to estimate divergence time of disjunct populations of the congeneric sargo, Anisotremus davidsonii, and three other Baja California disjunct fishes: the longjaw mudsucker,

Gillichthys mirabilis, the California sheephead, Semicossyphus pulcher, and the zebraperch, Kyphosus azureus. Finally, produced dates of divergence are compared to previous estimates and used to predict the most likely hypothesis of isolation in each species. Observed genomic substitution rate was faster, and suggested narrower and younger ranges of divergence time than previous estimates based on individual markers. No evidence of isolation was found for the sheephead. Estimated dates of divergence were 60, 284, and 23 thousand years, for the sargo, mudsucker, and zebraperch, respectively. A dispersal episode where species migrated northwards and then became isolated by the warming of the seawater temperate in the south of the

Baja California peninsula at the end of the last glaciation period, and a vicariant

13

isolation resulting from the closure of a mid-peninsular seaway, are the plausible historical events that produced the current disjunction in the distributions of these species.

Keywords: allopatric populations, dispersal, isolation, speciation, vicariance.

14

Introduction

Disjunct populations are formed after a physical or other type of barrier separates the populations of a species (Palumbi 1992), a process that might lead to allopatric speciation events and can be used to study early drivers of evolution (Mayr

1942; Endler 1977; Coyne & Orr 2004; Bernardi 2013). Compared to terrestrial or fresh water systems, barriers to gene flow in marine systems are less obvious but do form in the way of geographical separation as well as differences in oceanographic features such as current regimes, temperature, and salinity (Helfman et al. 2009;

Bernardi 2013; Rocha et al. 2002). The Baja California Peninsula in Northeastern

Mexico, offers a remarkable opportunity to examine natural cases of population disjunctions including several terrestrial organisms (Upton & Murphy 1997; B. R.

Riddle et al. 2000; Brett R. Riddle et al. 2000; B R Riddle et al. 2000) as well as marine invertebrates (Medina & Walsh 2000), mammals (Maldonado et al. 1995) and fishes (Miller & Lea 1972; Thomson et al. 2000; Bernardi & Lape 2005; Bernardi et al. 2003; Present 1987; Huang & Bernardi 2001; Stepien et al. 2001). Among these, fishes present diverse mechanisms of isolation discussed next.

There are 19 temperate marine fishes that inhabit the Pacific coast of

California and the Baja California and also have populations in the northern and central portions of the Sea of Cortez (also called Gulf of California and here also referred simply as the “Gulf”), where populations are thought to be isolated in part by the higher seawater temperature at the south of the Baja California peninsula (where these species are absent, Bernardi et al., 2003; Bernardi and Lape, 2005; Huang and

15

Bernardi, 2001; Miller and Lea, 1972; Present, 1987; Stepien et al., 2001; Thomson et al., 2000, Supplementary Table 1, Supplementary Figure 1).

Three hypotheses lead the efforts to explain the vicariant and dispersal events that produced these disjunct distributions. Approximately 5 millions years ago (Mya), the San Andres fault began separating the Baja California peninsula from the mainland until creating the Sea of Cortez an island within (Grismer 2000; Bernardi

2014; B. R. Riddle et al. 2000). During this process, a seaway in the north of the peninsula and another one in the south (near the current location of La Paz), connected the Pacific Ocean and the Gulf but were uplifted and closed approximately

3 Mya (Bernardi and Lape, 2005; Riddle et al., 2000). This event is here referred as the “ancient seaway divergence hypothesis.” An additional seaway was later created and uplifted approximately 1 to 1.6 Mya in the middle of the peninsula (Riddle et al.

2000). This closure represents a second possible divergent scenario referred by this study as the “Mid-peninsular seaway divergence hypothesis.” Finally, a third possibly explanation, here refereed as the “Post-glacial divergence hypothesis,” predicts that after the closure of these seaways, it remained possible for temperate fishes to migrate in cold water around the peninsula until the last glaciation period ended approximately 12,000 years ago (ya) and seawater temperature was increased in the southern portion of the peninsula (Bernardi et al., 2003; Brusca, 1973; Riddle et al.,

2000).

Genetic differences in a handful of markers between Pacific and Gulf populations of several disjunct species have been linked to these hypotheses before

16

(Bernardi & Lape 2005; Huang & Bernardi 2001; Bernardi et al. 2003; Stepien et al.

2001; Thomson et al. 2000; Miller & Lea 1972; Bernardi 2014; Terry et al. 2000;

Brusca 1973; Present 1987; Poortvliet et al. 2013). The most comprehensive study analyzed the genetic patterns of 12 disjunct species and found deep divergence across the Pacific and Gulf populations of 8 species but high gene flow levels for the other 4

(Bernardi et al. 2003). Yet, these patterns might change if populations are examined with an increased number of loci. Here, thousands of markers are used to assess divergence time between disjunct populations of two species with some of the highest observed divergence, the sargo, Anisotremus davidsonii (ADA) and the longjaw mudsucker, Gillichthys mirabilis (GMI), and two species that previously showed little or no divergence, the California sheephead, Semicossyphus pulcher (SPU) and the zebraperch, Kyphosus azureus (KAZ).

The Pacific and Gulf populations of the sargo and longjaw mudsucker were proposed to have diverged from 0.16 to 0.64 Mya (based on divergence and coalescence analysis using mtDNA cyt-b and nDNA S7) by the post-glacial warming of the seawater temperature and from 0.76 to 2.3 Mya (based on mtDNA cyt-b) by the closure of the mid-peninsular seaway, respectively (Huang & Bernardi 2001;

Bernardi et al. 2003; Bernardi & Lape 2005). However, given the large range of dates for separation, other hypotheses of isolation were not ruled out but considered less parsimonious. In the case of the California sheephead and the zebraperch, previous studies found little genetic evidence of isolation and estimated high gene flow across the peninsula (Bernardi et al. 2003; Poortvliet et al. 2013). Yet again, as the utilized

17

markers in these studies might lack sufficient resolution to detect a very recent cessation to gene flow and as mutation rates vary among different species and loci, different patterns of divergence are likely to emerged by examining many more loci

(Avise 2000; Bernardi et al. 2003; Coyne & Orr 2004).

This study estimates internally-calibrated times for the divergence in the focal species by using two pairs of Anisotremus geminates to obtained an average rate of substitution from thousands of genome-wide RADseq loci. Geminate taxa refer to organisms that had some populations separated by the uplifting of the Isthmus of

Panama and speciated into sister species in the Pacific and Atlantic Oceans (Jordan

1908). As a closure of the Isthmus, at approximately 3 Mya, has been heavily supported (O’Dea et al. 2016; Coates & Obando 1996; Collins 1996; Craig et al.

2004; Lessios 2008; Marko 2002; Thomson et al. 2000; Knowlton et al. 2003), trans-

Isthmian geminate lineages have been used regularly to calibrate the time of population divergence, speciation events, and phylogeographic patterns in fishes

(Bernardi & Lape 2005; Tavera et al. 2012; Tariel et al. 2016; Campbell et al. 2018;

Thacker 2017; Bowen et al. 2013; Helfman et al. 2009). While examining the utility of RADseq to estimate dates of separation, we test if analyzing disjunct populations with large number of loci might uncover unseen divergence in sheephead and zebraperch, and estimate different or smaller ranges for the time of disjunction in sargo and mudsucker. Finally, we investigate the phylogeographic history of these species by matching produced estimates of divergence time to the mentioned hypotheses of isolation: post-glacial, mid-peninsular, or ancient seaways divergence.

18

Materials and Methods

Sampling

Samples were collected from Northeastern Pacific (NEP) and Gulf populations of the four focal Baja California disjunct species and from the Tropical

Eastern Pacific (TEP) and the Western Atlantic (WA) for two pairs of geminate species (Table 1, see supplementary Table 2 and 3 for specific locations). Minnow traps were used to collect mudsuckers. Zebraperch and all Anisotremus samples were collected by spearing or by hand netting juveniles in pools. California sheephead individuals were speared or obtained from local .

Table 1. Number of samples collected per species and region. Baja California disjunct species Northern Eastern Sea of Cortez (Gulf) Pacific (NEP) Anisotremus davidsonii (ADA) 15 22 Gillichthys mirabilis (GMI) 34 21 Semicossyphus pulcher (SPU) 16 10 Kyphosus azureus (KAZ) 10 22

Geminate species Tropical Eastern Western Atlantic Pacific (TEP) (WA) Anisotremus interruptus (AIN) 3 Anisotremus taeniatus (ATA) 3 Anisotremus surinamensis (ASU) 3 Anisotremus virginicus (AVI) 3

DNA Extraction and Library Preparation

Genomic DNA was isolated from fin, muscle, or gill, following the Qiagen

DNeasy 96 Tissue Kit for purification of DNA recommendations (QIAGEN,

19

Valencia, California, USA). A single RAD library was constructed with the original protocol and using SBf1 for digestion, sonication for fragmentation, and magnetic beads for size selection (Miller et al. 2007; Baird et al. 2008). Individual barcodes were ligated to samples which were then sequenced at UC Berkeley (Vincent J.

Coates Genomics Sequencing Laboratory) on a single illumina HiSeq 2000 lane.

Filtering and Discovering Markers

Sample de-multiplexing, and loci and Single Nucleotide Polymorphisms

(SNPs) filtering and discovering, were performed using modified Perl scripts (Miller et al. 2012) and STACKS version 2 (Catchen et al. 2011; Catchen et al. 2013). Only reads with quality scores of 90% or higher (Phred score=33) and containing exact matches to the SBf1 restriction site were maintained for the analysis. Final sequence length was 80-bp after quality filters and the removal of restriction site and barcodes.

In a single de novo map in STACKS 2 for all five species of Anisotremus, stacks (suspected orthologous loci) were created if they showed a minimum depth of three (-m) and a maximum of three mismatches per loci in each sample (-M). Given that this analysis contained five different (albeit closely related) species, we allowed a maximum of 7 mismatches between catalog loci (-n). Separate analyses were performed for the other three species with an increased depth of 8x (-m), the same number of mismatches between loci in the individuals (-M) but allowing only two differences when building catalogs (-n). Subsequent Anisotremus data consisted of only loci that were present in all geminate individuals and at the same time, in least

20

80 percent of the individuals from the Pacific and Gulf sargo populations

(populations –p 6 –r 0.8). Similarly, loci in the other species must have been present in at least 80 percent of the Pacific and Gulf samples to be considered for downstream analyses (populations –p 2 –r 0.8).

Estimating Time of Disjunction

The first analysis consisted of calculating the calibrated rate of substitution in

Anisotremus RADseq loci. We accomplished this by using the closure of the Isthmus of Panama as the divergence time for the geminates A. taeniatus / A. virginicus and for A. interruptus / A. surinamensis. Including all five Anisotremus species in the same analysis permitted calculating this rate simultaneously with estimating the time of the divergence between sargo disjunct populations. For this, sequence alignments were constructed to build a Bayesian phylogeny with BEAST v. 2 (Bouckaert et al.

2014) while using a General Time Reversible (GTR) substitution model in combination with a lognormal relaxed clock-model and a birth-death model for priors. To allow for direct comparison to previous divergence estimates (Bernardi &

Lape 2005), a prior divergence range of 3.1 to 3.5 My was specified using a normal distribution in both pairs of geminates. 10 billion generations were run with tree sampling every million.

Alignment were produced by converting a phylip output file from STACKS maintaining all the individuals into nexus format in Mesquite (Maddison & Maddison

2018). This nexus file was then converted into a BEAST .xml input file using

21

mentioned parameters in BEAUti 2. Output from the Markov chain Monte Carlo and

ESS values were monitored using Tracer v.1.7 and a maximum clade credibility time tree with median node heights was selected using TreeAnnotator v.2.5 (Bouckaert et al. 2014). This tree was then visualized and modified in FigTree .1.4.2 (Bouckaert et al. 2014).

Subsequently, the Anisotremus RAD loci substitution rate was used to estimate the divergence between the disjunct populations of the other species. In order to do this, we first calculated the average number of differences between Pacific and Gulf population of all disjunct species. The genepop output file from STACKS was converted into Arlequin format using PGDSpider (Lischer & Excoffier 2012) and the corrected average pairwise difference between populations (which also accounts for differences within populations) was calculated in Arlequin v.3.5.1.2 (Excoffier &

Lischer 2010). The Anisotremus RAD loci rate, as well as previously published Cyt-b mutation rates for the same geminate pairs (Bernardi & Lape 2005), were then applied to back calculate time estimates of divergence on the other species lacking external sources of calibration.

Results and discussion

Markers and Single Nucleotide Polymorphisms

The number of orthologous loci passing all quality and population filters range from 4,232 to 9,340 per species (Figure 2). Single nucleotide polymorphisms

(SNPs) from these loci were used to estimate the time in which disjunct populations

22

became isolated. The alignment used to estimate calibrated Anisotremus substitution rates and the divergence in sargo populations had a length of 14,309 polymorphic characters. Two A. virginicus individuals were dropped from the analysis as they showed very low coverage in early de novo analyses.

Genomic Rate of Evolution

Misleading relationships can be observed between species if taxon sampling is low when analyzing a group of organisms (Lessios 2008; Hamilton et al. 2017). In our case, a phylogeny of the entire Anisotremus genus confirms that the species within our geminate clades are indeed sister taxa (Bernardi et al. 2008). Furthermore, given the cooling conditions already happening at the northern and southern latitudes during and after the closure of the Isthmus, it is unlikely that these species maintained gene flow by migrating around the continents (Nof & Van Gorder 2003; Hodell &

Warnke 1991; Lessios 2008). In this regard, our estimated rate of evolution is obtained with a degree of confidence. However, specifying times of divergence to geminate clades is not a trivial process that can affect estimated rates. We encountered a considerable range of dates for the closure of the Isthmus

(approximately 2.8 to 3.5 Mya) even when we considered only the traditionally accepted dates (O’Dea et al. 2016; Coates & Obando 1996; Collins 1996; Craig et al.

2004; Lessios 2008; Marko 2002; Thomson et al. 2000; Knowlton et al. 2003). As the principal goal of this study was to compare the produced RADseq loci substitution rate to previous estimates based on individuals markers (Bernardi & Lape 2005), our

23

BEAST analysis was performed indicating both geminate clades as priors with a possible divergence from 3.1 to 3.5 My. The Anisotremus average genomic substitution rate from was 0.0088 mutations per million years. In turn, the Bayesian phylogeny predicted a divergence of 60 thousand years (ky) for the sargo disjunct populations (ESS values were high and stable. 95% Highest Posterior Density, HPD,

0–220 ky) (Table 2, Figure 1).

Table 2. Divergence time estimates based on internally-calibrated Anisotremus RAD loci substitution rate (3.1-3.5 My geminate priors) and observed average distance between disjunct populations (% Diff.). A substitution rate of 8.8 µ (10−3) was estimated in this study and the median (Est. Div.) and range of divergence (Est. Range of Div.) were calculated for sargo. Bernardi and Lape (2005) estimated the range of divergence in Cyt-b for the sargo by applying a rate of 1.7-1.5 µ (10−8) to A. virginicus / A. taeniatus and a rate of 0.9-1.0 µ (10−8) to A. surinamensis / A. interruptus. These rates are applied to the observed number of differences in RAD loci to back calculate divergence in the other species. The lower end of divergence for sargo was zero (low end of HPD) thus preventing this calculation in the other species.

RAD loci rate Cyt- b rate

Loci % Diff. Est. Div. Est. Range Est. Range of Div. of Div.

Sargo 5915 19.54 60 ky up to 220 ky 330 - 641 ky

Longjaw 8468 92.56 284 ky up to 1 My 1.56 - 3.04 My mudsucker

Zebraperch 4232 7.58 23 ky up to 90 ky 130 - 250 ky

California 9340 n/a n/a n/a n/a sheephead

24 932 G. Bernardi et al. / Molecular Phylogenetics and Evolution 48 (2008) 929–935

100/100/100 COM 1 COM 2 Plectorhinchus chaeotodonoides (outgroup)

MMP 2 MMP 5 MMP 7 ZFI 4 A. dovii ZFI 1 TEP 932 G. Bernardi et al. /ZFI Molecular 5 Phylogenetics and Evolution 48 (2008) 929–935 100/100/100 EIM 1 ZFI 2 99/100/100 ZFI 3 100/100/100 PSM 4 Inshore, sandy, COM 1 PSM 1 (outgroup) COM 2 PlectorhinchusMMP 3 chaeotodonoides or MMP 4 PAR 1 A. pacifici TEP muddy bottoms 96/100/100 PAR 2 MMP 2 MMP 5 MMP 7 ZFI 4 A. dovii ZFI 1 TEP ZFI 5 100/100/100 EIM 1 ZFI 2 CAY 1 ZFI 3 G. luteus WA 99/100/100 PSM 4 Inshore, sandy, PSM 1 MMP 3 or MMP 4 BRA 1 A. pacifici TEP 96/100/100 100/100/100PAR 1 BRA 2 muddy bottoms PAR 2 BRA 5 BRA 3 BRA 4 A. moricandi WA Inshore reefs CAY 1 G. luteus WA BEL 1 sciurus 84/98/89 BRA 1 100/100/100100/100/100BRA 2BEL 1 BRA BEL5 2 932 G. Bernardi et al. / MolecularBRA BEL3 3 Phylogenetics and Evolution 48 (2008) 929–935 --/79 BRA 4 parra A. moricandi WA Inshore reefs PAR 2 Haemulon spp. G. Bernardi et al. / Molecular100/100/100 PhylogeneticsPAR 15 and Evolution 48 (2008) 929–935 932 76/98/94 PAR 18 PAR 17 scudderi 100/100/100 BEL 1 sciurus 84/98/89 COM 1 100/100/100 ZFI1BEL 1 Plectorhinchus85/100/100 BEL 2 chaeotodonoides (outgroup) COM 2 BELZFI 43 100/100/100 --/79 ZFI2 n.parra sp. COM 1 ZFI 3 PAR 2 Haemulon spp. COM 2 Plectorhinchus100/100/100 chaeotodonoidesPAR 15 (outgroup) 76/98/94 PAR 18 MMP 2 PAR 17 scudderi MMP 5 MMP 1 ZFI1 100/100/100 PTC 6 85/100/100MMP 7 ZFI 4 PNMMMP 1 2 ZFI 4 ZFI2 n. sp. A. dovii TEP PNMMMP 2 5 ZFI 1 ZFI 3 A. caesius TEP 100/100/100PNMMMP 3 7 ZFI 5 ZFI 4 EIM 1 A. dovii ZFI 1 ZFI 2 TEP MMPZFI 1 3 99/100/100 100/100/100100/100/100 ZFI PTC5 6 Inshore, sandy, EIM 1PNMMMPPSM 1 42 ZFI 2PNMMMP PSM2 16 A. caesius TEP ZFIPNM 3 MMPCPU 3 13 99/100/100 PSM 4 MMP 3 Inshore,or sandy, MMP 4 PSM 1 PAR 4 MMP 3 PANMMP 184 2 A. pacifici 100/100/99 PAR 1 PANMMP 146 6 A. taeniatus TEPTEP or muddy bottoms 96/100/100 PAR 2 100/100/100 MMPCPU 4 1 MMP 4 PAR 3MMP 3 PAR 1 MMP 5PAR 4 A. pacifici TEP 96/100/100 PAN 184 muddy bottoms 100/100/99PAR 2 CPU 2PAN 146 100/100/100 PAR 1MMP 4 A. taeniatus TEP PAR 2PAR 3 MMP 5 CAY 1 CPU 2 G. luteus WA 932 PAR 1 G. Bernardi et al. / Molecular Phylogenetics and Evolution 48 (2008) 929–935 PAN 7 PAR 2 CAY 1 BRA 3 G. luteus WA BRA 4 99/100/100 BRA 2PAN 7 Pacific BRA 1BRA 3 100/100/100 BRA 1 . 100/100/100 BRA 25BRA 4 COM 1 Plectorhinchus chaeotodonoides (outgroup) 99/100/100 PANBRA 23 BRA5 2 COM 2 BRA 1 BRA 1 A. virginicus WA 100/100/100 0.06PANBRA 21 3BRA 5 84/100/100BRA 2 BELBRA 1PAN 4 23 A. virginicus WA Inshore reefs a) BRA 5 PAN 21 A. moricandi WA BEL 3 93/93/78 84/100/100BRA 3 BEL 2BEL 1 MMP 2 BRA 4 BEL 3 A. davidsoniiMMP 5 Inshore reefs 93/93/78 BEL 1 BEL 2 A. moricandi MMP 7WA NEP BEL 1 ZFI 4 A. dovii TEP 100/100/100 ZFI 1 100/100/100 BEL 1 PER 1 100/100/100 ZFI 5 PER 1 EIM 1 . PER 2sciurus ZFI 2 84/98/89 BEL 1 PER 2 ZFI 3 100/100/100BEL 1 99/100/100 A.A. scapularis scapularisPSM 4 TEPTEP Inshore, sandy, ML / NJ /ML MP / NJ / MP BEL 2 sciurus PSM 1 84/98/89 MMP 1MMPBEL 1 1 BEL 3 MMP 3 or 9.88--/79 100/100/100MMP MMP2 2 Seaparra of Cortez RCP 1BELRCP 12 MMP 4 5 changes BLA 1BLABEL 1 3 PAR 2 A. pacifici TEP 5 changes --/79 100/100/100MMP 4 PARparra 15 96/100/100 HaemulonPAR 1 spp. muddy bottoms 76/98/94 MMP 4PTC 5 PAR 2 96/100/95PTC 5 ZFI 1 PAR 2 PAR 18 scudderi Coral or 96/100/95 100/100/100ZFI 1 PTC 3 PAR 15 PAR 17 Haemulon spp. Coral or 76/98/94 PTC 3 PNM 1PAR 18 PNM PNM1 2 scudderi PTC 1 PAR 17ZFI1 A. interruptus TEPCAY 1 rocky reefs PNM85/100/100 PTC2 4 ZFI 4 G. luteus WA . A. interruptus rocky reefs PTC 1 BLA 2ZFI1 ZFI2 n. sp. A. interruptus TEP PTC 4 MMP 3 TEP 85/100/100 PAR 6ZFI 4 ZFI 3 3.12BLA 2 ZFI2 n. sp. MMPZF 3 2 BRA 1 PAR 6 ZFI 3 100/100/100 BRA 2 98/100/100 BRA 5 ZF 2 BRA 3 932 G. Bernardi et al. / Molecular PhylogeneticsMMP 1 and Evolution 48 (2008) 929–935 BRA 4 A. moricandi Inshore reefs 98/100/100100/100/100 PTC 6 WA MMP 1 PNM 1 FDN 1 13.39100/100/100 932PTC 6 PNM 2 FLO 3 G. Bernardi et al.A. / Molecularcaesius PhylogeneticsTEP and Evolution 48 (2008) 929–935 PNM 1 VEN 1 PNM 3 PAN 1 A. caesiussurinamensisBEL 1 100/100/100 PNM 2 FDN 1FLO 2 TEPWA COM 1 74/100/89 . A. surinamensissciurus PNM 3 FLO FLO3 1 (outgroup)84/98/89 100/100/100 BEL 1 COM 2 Plectorhinchus chaeotodonoidesVEN 1 BEL 2 WA PAN 1MMP 2 100/100/100 A. surinamensisBEL 3 parra FLO 2 MMP 6 COM 1--/79 Plectorhinchus chaeotodonoidesWA (outgroup) 74/100/89 MMP 2 CPU 1 COM 2 PAR 2 Haemulon spp. MMPFLO 61 100/100/100 PAR 15 MMP 2 MMP 3 76/98/94 PAR 18 MMP 5 CPU 1 PAR 4 PAR 17 scudderi MMP 7 MMP 3 PAN 184 ZFI 4 PAR 4 ZFI1 100/100/99 PAN 146 A. dovii TEP MMP85/100/100 2 ZFI 1 PAN 184MMP 4 A. taeniatusMMP 5 ZFI 4 TEP 100/100/99 ZFI 5 100/100/100PAN 146 BLA1 ZFI2 n. sp. 100/100/100 EIM 1 PAR 3 A. taeniatus A. davidsoniiMMP 7 TEP ZFI 3 100/100/100 MMP 4 BKI 1 ZFI 4 TEP ZFI 2 MMP 5. A. dovii TEP ZFI 3 PAR 3 PCH 2 A. taeniatusZFI 1 99/100/100 CPU 2 BKI 2 (Gulf of California)Inshore, sandy, PSM 4 MMP 5 PAR 1 100/100/100 ZFI 5 TEP PSM 1 100/100/100CPU 2 PAR 2 MMPEIM 1 MMP 3 3.12 PEU1 100/100/100 PTCZFI 62 PAR 1 ZFI 3or PAR 2 BLA1CAT299/100/100 A. davidsoniiPNM 1 A. caesius TEPInshore, sandy, MMP 4 BKI 1 PNMPSM 2 4 TEP PAN 7 SDI 2 A. pacifici TEP PNMPSM 3 1 96/100/100 PAR 1 PCH 2 ISR1 A. davidsoniiMMP 3muddy bottoms PAR 2 BRA 3 ASU1 (Gulf of California) or PAN 7 BRA 4 BKI 2 TEP BRA 3 SDI 1 MMP 4 (Pacific coast)MMP 2 99/100/100100/100/100BRA 4 BRA 2 CAT1 MMP 6 A. pacifici TEP BRA 1 PSRPEU1 5 PAR 1 CPU 1 muddy bottoms 99/100/100 BRA 2 96/100/100 PAR 2 MMP 3 BRA 5 . A. virginicus BRACAY 1 1 CAT2 PAR 4 WA PAN 23 BRA 5 SDI 2 G. luteus WAA. virginicusPAN 184 WA PAN 23 PAN 21 100/100/99 PAN 146 A. taeniatus TEP Fig. 2. Molecular phylogeny of Anisotremus based on mitochondrialBEL 1 andISR1 nuclear molecular markersA. (cytochrome virginicus100/100/100A. davidsoniib andMMP the4 WA 1st intron of the ribosomal protein S7) using the 84/100/100PAN 21 ASU1 PAR 3 84/100/100 BEL 1 BEL 3 MMP 5 CAY 1 TEP Maximum Likelihood93/93/78 method (Maximum Parsimony andBEL 3 Neighbor-JoiningBEL 2 SDI 1 reconstructions resulted in the same topology).CPU 2 Labels are described in TableG. 1. Bootstrapluteus supportWA 2.0 BRA 1 BEL 1 (PacificPAR coast) 1 93/93/78 100/100/100 BEL 2 CAT1 PAR 2 BRA 2 BEL 1 Plectorhinchus is shown when above 50%, for the threeBRA methods 5 100/100/100 used, Maximum Likelihood,PSR 5 Neighbor-Joining, and Maximum Parsimony, in that order. Harlequin sweetlip, BRA 3 PER 1 chaetodonoides, individuals were used100/100/100 asBRA outgroups. 4 Drawings of fish speciesA. are moricandi from FAO (or our own for A.PAN moricandiInshore 7 ,reefsA. scapularis , and G. luteus) and are drawn to PER 1 WA BRA 1 BRA 3 PER 2 100/100/100 BRA 2 BRA 4 932approximate30 25 relative scale20 of adult15 size. 10G. Bernardi5 et al. /PER Molecular 20 Phylogenetics and99/100/100 EvolutionA. 48scapularisBRA (2008) 5 BRA 2 929–935 TEP Fig. 2. Molecular30 phylogeny 25 of20Anisotremus 15based 10 on mitochondrial 5 and 0 nuclear MYA molecular markersA. (cytochromescapularisBRAb 1and the 1st intron of the ribosomal protein S7) using the BRA 3 TEP ML / NJ / MP MMP 1 BRA 5 Inshore reefs ML / NJ / MP BEL 1 MMP 2 BRA 4PAN 23 A. moricandiA. virginicusWA WA Maximum Likelihood method (Maximum Parsimony andMMP Neighbor-Joining1 sciurus reconstructions resulted in the samePAN topology). 21 Labels are described in Table 1. Bootstrap support 84/98/89 MMP 2 RCP 1 84/100/100 BEL 1 5 changes 100/100/100 BEL 1 RCP 1 BLA 1 BEL 3 is shown when above 50%, for the three100/100/100 methodsBEL used, 2 MaximumMMP 4 Likelihood, Neighbor-Joining,93/93/78 and MaximumBEL Parsimony, 2 in that order. Harlequin sweetlip, Plectorhinchus 5 changes BEL 3 BLA 1 parra BEL 1 While Anisotremus--/79 was found not toMMPCOM be 4 1PTC monophyletic, 5 within and A. surinamensis (outgroup)withBEL 1 A. interruptus. (4) Anisotremus caesius 96/100/95PTCCOM 5 2ZFI 1 Plectorhinchus chaeotodonoides100/100/100 sciurus chaetodonoides, individuals were used as outgroups.PAR Drawings 2 of fish speciesHaemulon are from84/98/89 spp FAO. (or our own for A. moricandiPER, 1A. scapularis, andCoralG. or luteus ) and are drawn to 100/100/10096/100/95 PARZFI 151 PTC 3 100/100/100 BEL 1 Coral or our dataset, very well supported76/98/94 patternsPARPTC 18 3 scudderiPNM were 1 found: (1) The was sister to the A.BEL taeniatus 2 PER–A. 2 virginicus clade (strongA. scapularis support), TEP approximate relative scale of adult size. PARPNM 17 1 PNM 2 BEL 3 parra PNM 2 PTC 1 ML / NJ / MP --/79 MMP 1 rocky reefs two sympatric sandy/bottom species A.ZFI1 dovii and A. pacificiMMP 2 grouped and A. scapularisA. interruptusMMPwas 2 PAR sister 2 toTEP the A. surinamensis–A. interruptus PTC 1 PTC 4 A. interruptusRCP 1 Haemulonrocky reefs spp. b) 85/100/100 ZFI 4 MMP 5 100/100/100 PAR 15 TEP PTC 4 BLA 2 ZFI2 n. sp. MMP 57 changes 76/98/94 BLA 1 PAR 18 together in a very well supported clade.ZFIBLA 3 (2) 2 MMP High 3 bootstrap (both clade (weak support).MMP 4 (5)PAR 17 Asscudderi expected, the two Baja California dis- 40 MMP 3PAR 6 ZFI 4 PTC 5 A. dovii TEP PAR 6 ZFI 1 96/100/95 ZFI 1 Coral or markers and with all methods) supported theZF 2 groupingZFI 5 of seven junct populationsPTC of 3 A.ZFI1 davidsonii grouped together, confirming the ZF 2 100/100/100 EIM 1 85/100/100PNM 1 ZFI 4 Atlantic Ocean MMP 1 PNM 2 n. sp. While Anisotremus was100/100/100 found not98/100/100 to be monophyletic,ZFI 2 within and A. surinamensisPTC 1 ZFI2with A. interruptus. (4) AnisotremusA. interruptus caesiusTEP rocky reefs Anisotremus species. Within98/100/100 thisPTC 6 group. (3) The presumedZFI geminate3 results of BernardiPTC 4 andZFI 3 Lape (2005). There was also some weak 99/100/100PNM 1 BLA 2 Inshore, sandy, PNM 2 PSM 4 A. caesius TEP MMP 3 our dataset,species very grouped well together: supportedAnisotremusPNM 3 patterns virginicus were found:with A.PSM (1) 1 taeniatus The wassupport sister for to two thePAR more A.6 taeniatus patterns.–GenyatremusA. virginicus luteusclade, which (strong occurs support), MMP 3 ZF 2 FDN 1 MMP 1 or FDN 1 FLO 3 100/100/100 two sympatric sandy/bottom species A. dovii andFLOMMP 3 VENA.4 1 pacifici grouped and A.98/100/100 scapularisPTC 6 was sister to the A. surinamensis–A. interruptus MMP 2 VEN 1 PNM 1 MMP 6 PAN 1 PNM 2 A. pacifici TEP A. caesius TEP CPU 1 PANPAR 1 1 A. surinamensis WA muddy bottoms 96/100/100 PAR 2 FLO 2 A. surinamensisPNM 3 WA together in a very well supported clade.MMP 3 (2)74/100/89FLO High 2 FLO 1 bootstrap (both clade (weak support).FDN 1 (5) As expected, the two Baja California dis- 74/100/89PAR 4 FLO 1 FLO 3 PAN 184 VEN 1 100/100/99 PAN 146 A. taeniatus TEP MMPPAN 2 1 A. surinamensis WA

markers and30 with all methods) supportedMMP 4 the grouping of seven junct populationsFLO of 2 A. davidsonii grouped together, confirming the 100/100/100 74/100/89 MMP 6 PAR 3 CPUFLO 1 1 MMP 5 CAY 1 MMP 3 Anisotremus species. Within this group.CPU (3)2 The presumed geminate results of BernardiPARG. 4 andluteus Lape (2005)WA . There was also some weak PAR 1 PAN 184 PAR 2 100/100/99 PAN 146 A. taeniatus TEP species grouped together: Anisotremus virginicus with A. taeniatus support100/100/100 for twoMMP more 4 patterns. Genyatremus luteus, which occurs PAN 7 PAR 3 A. surinamensis BRA 3 BRA 1 MMP 5 BRA 4 100/100/100 BRA 2 CPU 2 A. davidsonii BLA1 BLA1 A. davidsoniiPAR 1 99/100/100 BRA 2 BKIBRA 1 5 A. davidsoniiPAR 2 BLA1 TEP A. davidsonii BRA 1 BKI 1 BRA 3 BKI 1TEP TEP BRA 5 PCH 2 Inshore reefs PAN 23 PCH 2 BRA 4 A. virginicus WA(Gulf A.of California)moricandiPCH 2 WA (Gulf of California) PAN 21 BKI 2 BKI 2 (Gulf of California)PAN 7 BKI 2 84/100/100 BEL 1 100/100/100BRA 3 BEL 3100/100/100100/100/100 BRA 4 PEU1 93/93/78 BEL 2 PEU1 PEU1 99/100/100 BRA 2 CAT2 20 BEL 1 CAT2 CAT2 BEL 1 BRA 1 SDI 2 BRA 5 ISR1 A. davidsonii 100/100/100 SDI 2 SDI 2 sciurus PAN 23 84/98/89 PER 1 ASU1 A. virginicus WA TEP 100/100/100ISR1 ISR1 BEL 1 A. davidsoniiA. PANdavidsonii 21 SDI 1 (Pacific coast) PER 2 ASU1 ASU1 BEL 2 A. scapularis 84/100/100TEP BEL 1 CAT1TEPTEP BEL 3 BEL 3 PSR 5 A. virginicus ML / NJ / MP --/79 SDI 1 SDI 1 93/93/78 parra BEL 2 MMP 1 CAT1 CAT1 (Pacific(PacificBEL coast) 1 coast) MMP 2 A. interruptus PAR 2 RCP 1 PSR 5100/100/100PSR 5 PAR 15 100/100/100Haemulon spp. 5 changes BLA 1 Fig. 2.76/98/94Molecular phylogeny of Anisotremus based on mitochondrial andPER 1 nuclear molecular markers (cytochrome b and the 1st intron of the ribosomal protein S7) using the MMP 4 PAR 18 scudderi PTC 5 Maximum Likelihood methodPAR 17 (Maximum Parsimony and Neighbor-JoiningPER 2 reconstructions resulted in the same topology). Labels are described in Table 1. Bootstrap support 96/100/95 ZFI 1 Coral or A. scapularis TEP PTC 3 ZFI1 Fig. 2.Fig.Molecular 2. Molecular phylogeny phylogeny of Anisotremus of AnisotremusbasedPNMbased on1 mitochondrial on mitochondrialis shown and when nuclear and85/100/100 aboveML nuclear / NJ molecular 50%,/ MP molecular for the markers three markers methods (cytochrome (cytochrome used,MMP Maximumb 1 andb theand Likelihood, 1st the intron 1st intron ofNeighbor-Joining, the of ribosomal the ribosomal proteinand proteinMaximum S7) usingS7) Parsimony, using the the in that order. Harlequin sweetlip, Plectorhinchus PNM 2 ZFI 4 MMP 2 PTC 1 chaetodonoides, individualsZFI2 A. were interruptusn. used sp. as outgroups.TEP DrawingsRCP 1 ofrocky fish speciesreefs are from FAO (or our own for A. moricandi, A. scapularis, and G. luteus) and are drawn to MaximumMaximum Likelihood Likelihood method method (Maximum (Maximum ParsimonyPTC Parsimony 4 and Neighbor-Joining and Neighbor-Joining reconstructions5 reconstructionschangesZFI 3 resulted resulted in the in same the sameBLA topology). 1 topology). Labels Labels are described are described in Table in Table 1. Bootstrap 1. Bootstrap support support 10 BLA 2 MMP 4 MMP 3 approximate relative scale of adult size. PTC 5 is shownis shown when when above above 50%, for 50%, the for three the three methods methodsPAR 6 used, used, Maximum Maximum Likelihood, Likelihood, Neighbor-Joining, Neighbor-Joining, and Maximum and96/100/95 MaximumZFI Parsimony, 1 Parsimony, in that in that order. order. Harlequin Harlequin sweetlip sweetlip, Plectorhinchus, Plectorhinchus Coral or ZF 2 PTC 3 MMP 1 PNM 1 chaetodonoideschaetodonoides, individuals, individuals were98/100/100 were used usedas outgroups. as outgroups. Drawings Drawings100/100/100 of fish ofPTC species fish 6 species are from are from FAO FAO (or our (or own our own for A.PNM for moricandi2 A. moricandi, A. scapularis, A. scapularis, and, andG. luteusG. luteus) and) and are aredrawn drawn to to PNM 1 PTC 1 A. interruptus TEP rocky reefs PTC 4 A. caesius TEP approximateapproximate relative relative scale ofscale adult of adult size. size. WhilePNMAnisotremus 2 was found not to beBLA monophyletic, 2 within and A. surinamensis with A. interruptus. (4) Anisotremus caesius FDN 1 PNM 3 MMP 3 FLO 3 PAR 6 VEN 1 our dataset, very well supportedA. taeniatus patternsZF 2 were found: (1) The was sister to the A. taeniatus–A. virginicus clade (strong support), PAN 1 A. surinamensis WA FLO 2 MMP 2 98/100/100 74/100/89 FLO 1 two sympatricMMP sandy/bottom 6 species A. dovii and A. pacifici grouped and A. scapularis was sister to the A. surinamensis–A. interruptus CPU 1 together in a veryMMP 3 well supported clade. (2) High bootstrap (both clade (weak support). (5) As expected, the two Baja California dis- WhileWhileAnisotremus0 Anisotremuswaswas found found not tonot be to monophyletic, be monophyletic, withinPAR within4 andandA. surinamensisA. surinamensisFDN 1withwithA. interruptusA. interruptus. (4). (4)AnisotremusAnisotremus caesius caesius markers and withPAN 184 all methods) supported theFLO 3 grouping of seven junct populations of A. davidsonii grouped together, confirming the 100/100/99 PAN 146 VEN 1A. taeniatus TEP our dataset, very well supported patterns were found:100/100/100 (1)MMP The 4 was sister to the A.PAN taeniatus1 –A. virginicus cladeA. surinamensis (strong support), our dataset, very well supported patternsAnisotremus were found:species.PAR (1) 3 The Within thiswas group. sister (3) to The theFLO 2 presumedA. taeniatus geminate–A. virginicusresultsclade of (strongBernardi support), andWA Lape (2005). There was also some weak MMP 5 74/100/89 FLO 1 two sympatric sandy/bottom species A. dovii and A.species pacifici groupedgroupedCPU 2 together:andAnisotremusA. scapularis virginicuswas sisterwith A. to taeniatus the A. surinamensissupport– forA. two interruptus more patterns. Genyatremus luteus, which occurs two sympatric sandy/bottom species A. doviiBLA1 and A. pacificiPARgrouped 1 A. davidsoniiand A. scapularis was sister to the A. surinamensis–A. interruptus N Pacific Ocean BKI 1 PAR 2 TEP together in a very well supported clade. (2)PCH High 2 bootstrap (both clade (weak support). (5) As expected, the two Baja California dis- together in a very well supported clade. (2)BKI 2 High bootstrap (both(Gulf of cladeCalifornia) (weak support). (5) As expected, the two Baja California dis- PAN 7 100/100/100 BRA 3 markers and with all methods) supported thePEU1 grouping ofBRA seven 4 junct populations of A. davidsonii grouped together, confirming the markers and with all0 methods)500 1000 supported1500 kmCAT2 the groupingBRA 2 of seven junct populations of A. davidsonii grouped together, confirming the 99/100/100SDI 2 BRA 1 BLA1 Anisotremus species. Within this group. (3) TheISR1 presumed geminateBRA 5 A.results davidsonii of Bernardi and Lape (2005). There was A. also davidsonii some weak TEP Anisotremus species. Within this group. (3)ASU1 The presumed geminate results of BernardiTEP andBKI 1 Lape (2005). There was also some weak PAN 23 A. virginicusPCH 2 WA -10 SDI 1 PAN 21 (Pacific coast) BKI 2 (Gulf of California) species grouped together: Anisotremus virginicusCAT1 with84/100/100A. taeniatusBEL 1 support for two more patterns. Genyatremus luteus, which occurs species grouped together: Anisotremus virginicusPSR 5 with BELA. 3 taeniatus support for100/100/100 two more patterns. Genyatremus luteus, which occurs 93/93/78 BEL 2 PEU1 BEL 1 CAT2 Fig. 2. Molecular phylogeny of Anisotremus based on mitochondrial and nuclear100/100/100 molecular markers (cytochrome b and the 1st intron of the ribosomalSDI 2 protein S7) using the -140 -120 PER-100 1 ISR1-80 A. davidsonii-60 ASU1 TEP Maximum Likelihood method (Maximum Parsimony and Neighbor-Joining reconstructions resulted in thePER same 2 topology). Labels are describedA. scapularis inSDITable 1 1. BootstrapTEP support (Pacific coast) CAT1 is shown when above 50%, for the three methodsML / NJ / used, MP Maximum Likelihood, Neighbor-Joining,MMP 1 and Maximum Parsimony, in that order. HarlequinPSR 5 sweetlip, Plectorhinchus MMP 2 chaetodonoides, individuals were used as outgroups. Drawings of fish species are fromRCP FAO 1 (or our own for A. moricandi, A. scapularis, and G. luteus) and are drawn to Figure 1. a) Time5 changes -calibrated phylogenyBLA 1 of studied Anisotremus species using both approximate relative scale of adult size. Fig. 2. MolecularMMP 4 phylogeny of Anisotremus based on mitochondrial and nuclear molecular markers (cytochrome b and the 1st intron of the ribosomal protein S7) using the PTC 5 96/100/95 ZFI 1 Coral or Maximum LikelihoodPTC 3 method (Maximum Parsimony and Neighbor-Joining reconstructions resulted in the same topology). Labels are described in Table 1. Bootstrap support geminate clades as priors with prePNM -1 set divergence time from 3.1 to 3.5 My. Numbers is shown whenPNM above 2 50%, for the three methods used, Maximum Likelihood, Neighbor-Joining, and Maximum Parsimony, in that order. Harlequin sweetlip, Plectorhinchus PTC 1 A. interruptus TEP rocky reefs chaetodonoidesPTC, 4 individuals were used as outgroups. Drawings of fish species are from FAO (or our own for A. moricandi, A. scapularis, and G. luteus) and are drawn to While Anisotremus was found not to be monophyletic, within BLAand 2 A. surinamensis with A. interruptus. (4) Anisotremus caesius specify the estimated date approximateof node relativeMMP 3divergence scale of adult size. (in million years). Blue bars represent our dataset, very well supported patterns were found: (1) The PARwas 6 sister to the A. taeniatus–A. virginicus clade (strong support), ZF 2 two sympatric sandy/bottomthe 95% species HighestA. dovii andPosteriorA. pacifici98/100/100grouped Densities.and A. scapularisNEP, Northeasternwas sister to the A. surinamensisPacific;–A. TEP, interruptus tropical together in a very well supported clade. (2) High bootstrap (both clade (weak support). (5) As expected, the two Baja California dis- While AnisotremusFDN 1 was found not to be monophyletic, within and A. surinamensis with A. interruptus. (4) Anisotremus caesius markers and withEastern all methods) Pacific; supported WA, the grouping Western of seven Atlantic.FLOjunct 3 populations b) Species of A. davidsonii distributions.grouped together, confirming the VEN 1 A. taeniatus A. virginicus our dataset,PAN very 1 well supported patterns wereA. surinamensis found: (1) The was sister to the – clade (strong support), Anisotremus species. Within this group. (3) The presumed geminate FLOresults 2 of Bernardi and Lape (2005). There was also someWA weak 74/100/89 species grouped together: Anisotremus virginicus with A. taeniatustwo sympatricFLOsupport 1 sandy/bottom for two speciesmore patterns.A. dovii andGenyatremusA. pacifici grouped luteus, whichand occursA. scapularis was sister to the A. surinamensis–A. interruptus together in a very well supported clade. (2) High bootstrap (both clade (weak support). (5) As expected, the two Baja California dis- markers and with all methods) supported the grouping of seven junct populations of A. davidsonii grouped together, confirming the Anisotremus species. Within this group. (3) The presumed geminate results of Bernardi and Lape (2005). There was also some weak species grouped together:BLA1 Anisotremus virginicus with A. davidsoniiA. taeniatus support for two more patterns. Genyatremus luteus, which occurs BKI 1 TEP PCH 2 BKI 2 (Gulf of California) 100/100/100 PEU1 CAT2 25 SDI 2 ISR1 A. davidsonii ASU1 TEP SDI 1 CAT1 (Pacific coast) PSR 5

Fig. 2. Molecular phylogeny of Anisotremus based on mitochondrial and nuclear molecular markers (cytochrome b and the 1st intron of the ribosomal protein S7) using the Maximum Likelihood method (Maximum Parsimony and Neighbor-Joining reconstructions resulted in the same topology). Labels are described in Table 1. Bootstrap support is shown when above 50%, for the three methods used, Maximum Likelihood, Neighbor-Joining, and Maximum Parsimony, in that order. Harlequin sweetlip, Plectorhinchus chaetodonoides, individuals were used as outgroups. Drawings of fish species are from FAO (or our own for A. moricandi, A. scapularis, and G. luteus) and are drawn to approximate relative scale of adult size.

While Anisotremus was found not to be monophyletic, within and A. surinamensis with A. interruptus. (4) Anisotremus caesius our dataset, very well supported patterns were found: (1) The was sister to the A. taeniatus–A. virginicus clade (strong support), two sympatric sandy/bottom species A. dovii and A. pacifici grouped and A. scapularis was sister to the A. surinamensis–A. interruptus together in a very well supported clade. (2) High bootstrap (both clade (weak support). (5) As expected, the two Baja California dis- markers and with all methods) supported the grouping of seven junct populations of A. davidsonii grouped together, confirming the Anisotremus species. Within this group. (3) The presumed geminate results of Bernardi and Lape (2005). There was also some weak species grouped together: Anisotremus virginicus with A. taeniatus support for two more patterns. Genyatremus luteus, which occurs

Time and Mode of Divergence in Disjunct Populations

Considering the dynamic evolutionary histories of these species, our genomic substitution rate produced dates of divergence somewhat comparable with previous estimates (Table 2). Overall, our time estimates are younger as a result of the surprisingly faster average rate of evolution observed in produced RAD loci than in mitochondrial Cyt-b (Bernardi & Lape 2005). Our estimates also have smaller ranges of variation.

As in other studies (Poortvliet et al. 2013; Bernardi et al. 2003), evidence of isolation was not found between disjunct populations of sheephead. The variation seen between populations within the Gulf and Pacific was higher than between the two regions. This produced a negative corrected distance between sheephead disjunct populations which prevented further calculations and rose questions about its classification as a disjunct species. In contrast, this study revealed previously unseen divergence between disjunct populations of zebraperch. As expected, zebraperch divergence was young and shallow using any of the three substitution rates (Table 2).

This divergence time was considerable far from the one million mark leaving the post-glacial divergence hypothesis has the only plausible explanation for the observed divergence.

Results for the longjaw mudsucker are also very interesting as well. To recall, divergence was previously estimated to range from 0.76 to 2.3 My (Huang &

Bernardi 2001). Authors attributed this divergence to a possible vicariant event via the closure of the mid-peninsular seaway but given the large variation in these

26

estimates, a post-glacial separation was not ruled out but considered less parsimonious. When applying the mutation rates from Bernardi and Lape (2005) to the average distance in RAD loci between disjunct populations, produced estimates ranged from approximately 1.5 to 3 My. While, this range opens the door to a possible ancient divergence, the RAD loci estimates are more in tune with either an early dispersal and isolation event or vicariance produced by the mid-peninsular seaway hypothesis. The probability of the latter appears higher as that it occurs in the middle of combined time estimates.

In the case of the sargo, previous estimates of divergence ranged from 330 to

641 ky (or from 160 to 641 ky if coalescence analyses are included). This was considered to be the result of dispersal around the peninsula or the closure of an additional seaway that has not been discovered yet (Bernardi & Lape 2005). Our analysis produced considerably younger estimates (up to 220 ky, Table 2) cementing the idea of sargo populations migrating north and later becoming isolated by the warming of the sea water temperature in the southern tip of the peninsula. Although it is outside the focus of this study, it is worth noting that the produced phylogeny suggests that the separation of sargo and the geminate clade of A. surinamensis and A. interruptus occurred before the closure of the Isthmus.

Potential Sources of Discordance

In order to allow direct comparisons with previous estimates, this study assumed a divergence time of 3.1 to 3.8 My for both Anisotremus geminate clades.

27

However, previous studies have highlighted the possibility of at least one of these trans-Isthmian pairs to have diverged at a different time than at the closure of the

Isthmus, as sequence divergence in the clade conformed by A. taeniatus and A. virginicus, was found to be consistently higher than that between A. surinamensis and

A. interruptus in multiple markers (Bernardi & Lape 2005; Lessios 2008; Bernardi et al. 2008; Bermingham et al. 1997). This can be the result of either different rates of evolution or different divergence time in the two clades, or a combination of both.

Evidence of differential substitution rates (such as very different branch lengths) in the two clades is not present in previous phylogenies (Bernardi & Lape 2005;

Bernardi et al. 2008). Alternately, the different dates of divergence might be a result of differences in ecology such as the usage of mangrove and reef habitat. For example, Anisotremus taeniatus and A. virginicus could have separated before the closure of the Isthmus if they were more dependent on reef habitat that might have disappeared in the Isthmus area as this became shallower (Thomson et al. 2000;

Lessios 2008; Froese & Pauly n.d.; Knowlton et al. 2003). Finally, a vicariance date of 2 my has also been considered for A. surinamensis and A. interruptus, based on a proposed Pleistocene breach of the Isthmus (Bernardi & Lape 2005). When using these two pairs of geminates and traditional dates for the closure of the Isthmus of

Panama, indicating older dates of divergence for A. taeniatus / A. virginicus and younger for A. surinamensis / A. interruptus, might help finding the upper and lower limits of possible divergence times in disjunct populations.

28

An additionally potential issue when estimating divergence dates using

RADseq is that produced loci, and associated rates of substitution, are likely not the same in different species. However, while different rates of change among loci are unequivocal, a consensus rate might be reachable purely giving the large numbers of loci produced by this technology. If other analyses of geminate lineages produce similar RAD loci substitution rates, estimates like ours might be widely applicable to date divergence, and other evolutionary events, in systems without external sources of temporal calibration. Time calibration analyses include many variables, which require iterative processing and adjusting, our analysis was a comparison of estimates given fixed variables. Futures analyses will include diverse dates for geminates species, different models of evolution, and estimating dates of divergence using only suspected neutral RAD loci.

Conclusion

Unlike the sheephead, which along with its congenerics show anti-tropical distributions, ancestral lineages of the sargo, zebraperch, and mudsucker, were likely present in tropical waters in the Pacific and Atlantic oceans. Genetic evidence from this is other studies, as well as the current distributions of related species in the same genera and families, are concordant with a phylogeographic scenario in which these species migrated northwards from lower latitudes (Bernardi et al. 2003; Bernardi &

Lape 2005; Bernardi et al. 2008; Tavera et al. 2012; Thomson et al. 2000; Froese &

Pauly n.d.; Helfman et al. 2009; Huang & Bernardi 2001). These species might have

29

been peripheral populations at the edge as their physiological and adaptive capabilities that accumulated genetic differences from divergent selective pressures until eventually speciated as a result of the invasion to new environments (Rocha et al. 2005; Coyne & Orr 2004). Disjunct pupations in these species were likely isolated by either the closure of a mid-peninsular seaway approximately 1 Mya, or by the retreat of the ocean into higher latitudes by the end of the last glaciation, in combination with an earlier migration of these species into the Pacific coast of Baja

California. While this scenario is probably true for other disjunct species as well, it is definitively not universal within Baja California disjuncts. For instance, the Mexican rockfish, Sebastes macdonaldi, and the pink surfperch, Zalembius rosaceus, evolved from two spectacular Northeastern Pacific species radiations. The Baja California

Disjunct species continues to offer a remarkable framework where to test hypothesis of species biogeography and study the early mechanisms of speciation.

References

30

Avise, J.C., 2000. Phylogeography: The History and Formation of Species. Harvard

University Press, Cambridge, MA.

Baird, N.A., Etter, P.D., Atwood, T.S., Currey, M.C., Shiver, A.L., Lewis, Z.A.,

Selker, E.U., Cresko, W.A., Johnson, E.A., 2008. Rapid SNP discovery and

genetic mapping using sequenced RAD markers. PLoS One 3, 1–7.

https://doi.org/10.1371/journal.pone.0003376

Bermingham, E., McCafferty, S.S., Martin, A.P., 1997. Fish biogeography and

molecular clocks: perspectives from the Panamian Isthmus, in: T.D., K., Stepien,

C.A. (Eds.), Molecular Systematics of Fishes. Academic Press Inc., San Diego,

CA, pp. 113–128.

Bernardi, G., 2014. Baja California disjunctions and phylogeographic patterns in

sympatric California blennies. Front. Ecol. Evol. 2, 1–9.

https://doi.org/10.3389/fevo.2014.00053

Bernardi, G., 2013. Speciation in fishes. Mol. Ecol. 22, 5487–5502.

https://doi.org/10.1111/mec.12494

Bernardi, G., Alva-Campbell, Y.R., Gasparini, J.L., Floeter, S.R., 2008. Molecular

ecology, speciation, and evolution of the reef fish genus Anisotremus. Mol.

Phylogenet. Evol. 48, 929–35. https://doi.org/10.1016/j.ympev.2008.05.011

Bernardi, G., Findley, L., Rocha-Olivares, A., 2003. Vicariance and dispersal across

Baja California in disjunct marine fish populations. Evolution 57, 1599–1609.

https://doi.org/10.1554/02-669

31

Bernardi, G., Lape, J., 2005. Tempo and mode of speciation in the Baja California

disjunct fish species Anisotremus davidsonii. Mol. Ecol. 14, 4085–4096.

https://doi.org/10.1111/j.1365-294X.2005.02729.x

Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.H., Xie, D., Suchard,

M.A., Rambaut, A., Drummond, A.J., 2014. BEAST 2: A Software Platform for

Bayesian Evolutionary Analysis. PLoS Comput. Biol. 10, 1–6.

https://doi.org/10.1371/journal.pcbi.1003537

Bowen, B.W., Rocha, L.A., Toonen, R.J., Karl, S.A., 2013. The origins of tropical

marine biodiversity. Trends Ecol. Evol. 28, 317–376.

Brusca, R.C., 1973. A handbook to the common intertidal invertebrates of the Gulf of

California. University of Arizona Press, Tucson, AZ.

Campbell, M.A., Robertson, D.R., Vargas, M.I., Allen, G.R., McMillan, W.O., 2018.

Multilocus molecular systematics of the circumtropical reef-fish genus

Abudefduf (Pomacentridae): history, geography and ecology of speciation. PeerJ

6, e5357. https://doi.org/10.7717/peerj.5357

Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., Cresko, W.A., 2013. Stacks:

An analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140.

https://doi.org/10.1111/mec.12354

Catchen, J.M., Amores, A., Hohenlohe, P., Cresko, W., Postlethwait, J.H., 2011.

Stacks : Building and Genotyping Loci De Novo From Short-Read Sequences.

G3: Genes|Genomes|Genetics 1, 171–182.

32

https://doi.org/10.1534/g3.111.000240

Coates, A., Obando, J., 1996. The geologic evolution of the Central American

isthmus, in: Jackson, J., Budd, A., Coates, A. (Eds.), Evolution and Environment

in Tropical America. University of Chicago Press, Chicago, pp. 21–56.

Collins, T., 1996. Molecular comparisons of transisthmian species pairs: rates and

patterns of evolution., in: Jackson, J.B.C., Budd, A.F., Coates, A. (Eds.),

Evolution and Environment in Tropical America. University of Chicago Press,

Chicago, pp. 303–334.

Coyne, J.A., Orr, H.A., 2004. Speciation. Sinauer Associates, Sunderland,

Massachusetts.

Craig, M.T., Hastings, P.A., Pondella, D.J., 2004. Speciation in the Central American

Seaway: The importance of taxon sampling in the identification of trans-

isthmian geminate pairs. J. Biogeogr. 31, 1085–1091.

https://doi.org/10.1111/j.1365-2699.2004.01035.x

Endler, J.A., 1977. Geographic Variation, Speciation and Clines. Princeton University

Press.

Excoffier, L., Lischer, H.E.L., 2010. Arlequin suite ver 3.5: A new series of programs

to perform population genetics analyses under Linux and Windows. Mol. Ecol.

Resour. 10, 564–567. https://doi.org/10.1111/j.1755-0998.2010.02847.x

Froese, R., Pauly, D., n.d. Fishbase [WWW Document]. URL

33

http://www.fishbase.org (accessed 5.5.16).

Grismer, L.L., 2000. Evolutionary biogeography on Mexico’s Baja California

peninsula: A synthesis of molecules and historical geology. Proc. Natl. Acad.

Sci. 97, 14017–14018. https://doi.org/10.1073/pnas.260509697

Hamilton, H., Saarman, N., Short, G., Sellas, A.B., Moore, B., Hoang, T., Grace,

C.L., Gomon, M., Crow, K., Brian Simison, W., 2017. Molecular phylogeny and

patterns of diversification in syngnathid fishes. Mol. Phylogenet. Evol. 107,

388–403. https://doi.org/10.1016/j.ympev.2016.10.003

Helfman, G.S., Collette, B.B., Facey, D.E., Bowen, B.W., 2009. The diversity of

fishes, Second. ed. Wiley-Blackwell.

Hodell, D.A., Warnke, D.A., 1991. Climatic evolution of the Southern Ocean during

the Pliocene epoch from 4.8 to 2.6 million years ago. Quat. Sci. Rev. 10, 205–

214. https://doi.org/10.1016/0277-3791(91)90019-Q

Huang, D., Bernardi, G., 2001. Disjunct Sea of Cortez-Pacific Ocean Gillichthys

mirabilis populations and the evolutionary origin of their Sea of Cortez endemic

relative , Gillichthys seta. Mar. Biol. 138, 421–428.

Jordan, D.S., 1908. The law of geminate species. Am. Nat. 42, 73–80.

Knowlton, N., , L. a, Solorzano, L.A., Mills, D.K., Bermingham, E., 1993.

Divergence in , Mitochondria! DNA, and Reproductive Compatibility

Across the Isthmus of Panama. Science (80-. ). 260, 1629–1632.

34

Lessios, H.A., 2008. The Great American Schism: Divergence of Marine Organisms

After the Rise of the Central American Isthmus. Annu. Rev. Ecol. Evol. Syst. 39,

63–91. https://doi.org/10.1146/annurev.ecolsys.38.091206.095815

Lischer, H.E.L., Excoffier, L., 2012. PGDSpider: An automated data conversion tool

for connecting population genetics and genomics programs. Bioinformatics 28,

298–299. https://doi.org/10.1093/bioinformatics/btr642

Maddison, W.P., Maddison, D.R., 2018. Mesquite: a modular system for evolutionary

analysis.

Maldonado, J.E., Davila, F.O., Stewart, B.S., Geffen, E., 1995. Intraspecific genetic

differentiation in California sea lions (Zalophus californianus) from Southern

California and the Gulf of California. Mar. mammal Sci. 11, 46–58.

Marko, P.B., 2002. Fossil calibration of molecular clocks and the divergence times of

geminate species pairs separated by the Isthmus of Panama. Mol. Biol. Evol. 19,

2005–2021. https://doi.org/10.1093/oxfordjournals.molbev.a004024

Mayr, E., 1942. Systematics and the Origin of Species from the View- point of a

Zoologist. Columbbia University Press, New York, New York.

Medina, M., Walsh, P.J., 2000. Comparison of Four Mendelian Loci of the California

Sea Hare ( Aplysia californica ) from Populations of the Coast of California and

the Sea of Cortez. Mar. Biotechnol. 2, 449–455.

https://doi.org/10.1007/s101260000020

35

Miller, D.J., Lea, R.N., 1972. Guide to the coastal marine fishes of California.

Berkeley, CA.

Miller, M., Dunham, J., Amores, a, Cresko, W., Johnson, E., 2007. genotyping using

restriction site associated DNA (RAD) markers. Genome Res. 17, 240–248.

https://doi.org/10.1101/gr.5681207

Miller, M.R., Brunelli, J.P., Wheeler, P.A., Liu, S., Rexroad, C.E., Palti, Y., Doe,

C.Q., Thorgaard, G.H., 2012. A conserved haplotype controls parallel adaptation

in geographically distant salmonid populations. Mol. Ecol. 21, 237–249.

https://doi.org/10.1111/j.1365-294X.2011.05305.x

Nof, D., Van Gorder, S., 2003. Did an open Panama Isthmus correspond to an

invasion of Pacific water into the Atlantic? J. Phys. Ocean 33, 1324–1336.

https://doi.org/10.1175/1520-0485(2003)033<1324:DAOPIC>2.0.CO;2

O’Dea, A., Lessios, H.A., Coates, A.G., Eytan, R.I., Restrepo-Moreno, S.A., Cione,

A.L., Collins, L.S., De Queiroz, A., Farris, D.W., Norris, R.D., Stallard, R.F.,

Woodburne, M.O., Aguilera, O., Aubry, M.P., Berggren, W.A., Budd, A.F.,

Cozzuol, M.A., Coppard, S.E., Duque-Caro, H., Finnegan, S., Gasparini, G.M.,

Grossman, E.L., Johnson, K.G., Keigwin, L.D., Knowlton, N., Leigh, E.G.,

Leonard-Pingel, J.S., Marko, P.B., Pyenson, N.D., Rachello-Dolmen, P.G.,

Soibelzon, E., Soibelzon, L., Todd, J.A., Vermeij, G.J., Jackson, J.B.C., 2016.

Formation of the Isthmus of Panama. Sci. Adv. 2, 1–12.

https://doi.org/10.1126/sciadv.1600883

36

Palumbi, S.R., 1992. Marine speciation on a small planet. Trends Ecol. Evol. 7, 114–

118.

Poortvliet, M., Longo, G.C., Selkoe, K., Barber, P.H., White, C., Caselle, J.E., Perez-

Matus, A., Gaines, S.D., Bernardi, G., 2013. Phylogeography of the California

sheephead, Semicossyphus pulcher: The role of deep reefs as stepping stones

and pathways to antitropicality. Ecol. Evol. 3, 4558–4571.

https://doi.org/10.1002/ece3.840

Present, T.M.C., 1987. Genetic Differentiation of Disjunct Gulf of California and

Pacific Outer Coast Populations of Hypsoblennius jenkinsi Genetic

Differentiation of Disjunct Gulf of California and Pacific Outer Coast

Populations of Hypsoblennius jenkinsi. Copeia 1987, 1010–1024.

Riddle, B.R., Hafner, D.J., Alexander, L.F., 2000. Comparative phylogeography of

Baileys’ pocket mouse (Chaetodipus baileyi) and the Peromyscus eremicus

species group: historical vicariance of the Baja California Peninsular Desert.

Mol. Phylogenet. Evol. 17, 161–72. https://doi.org/10.1006/mpev.2000.0842

Riddle, B.R., Hafner, D.J., Alexander, L.F., 2000. Phylogeography and systematics of

the Peromyscus eremicus species group and the historical biogeography of North

American warm regional deserts. Mol. Phylogenet. Evol. 17, 145–160.

https://doi.org/10.1006/mpev.2000.0841

Riddle, B.R., Hafner, D.J., Alexander, L.F., Jaeger, J.R., 2000. Cryptic vicariance in

the historical assembly of a Baja California Peninsular Desert biota. Proc. Natl.

37

Acad. Sci. 97, 14438–14443. https://doi.org/10.1073/pnas.250413397

Rocha, L.A., Bass, A.L., Robertson, D.R., Bowen, B.W., 2002. Adult habitat

preferences, larval dispersal, and the comparative phylogeography of three

Atlantic surgeonfishes (Teleostei: Acanthuridae). Mol. Ecol. 11, 243–252.

https://doi.org/10.1046/j.0962-1083.2001.01431.x

Rocha, L.A., Robertson, D.R., Roman, J., Bowen, B.W., 2005. Ecological speciation

in tropical reef fishes. Proc. R. Soc. B 272, 573–579.

https://doi.org/10.1098/2004.3005

Stepien, C.A., Rosenblatt, R.H., Bargmeyer, B.A., 2001. Phylogeography of the

Spotted Sand Bass , Paralabrax maculatofasciatus : Divergence of Gulf of

California and Pacific Coast Populations. Evolution (N. Y). 55, 1852–1862.

Tariel, J., Longo, G.C., Bernardi, G., 2016. Tempo and mode of speciation in

Holacanthus angelfishes based on RADseq markers. Mol. Phylogenet. Evol. 98,

84–88. https://doi.org/10.1016/j.ympev.2016.01.010

Tavera, J.J., Acero, A.P., Balart, E.F., Bernardi, G., 2012. Molecular phylogeny of

grunts (Teleostei, Haemulidae), with an emphasis on the ecology, evolution, and

speciation history of new world species. BMC Evol. Biol. 12, 57.

https://doi.org/10.1186/1471-2148-12-57

Terry, A., Bucciarelli, G., Bernardi, G., 2000. Restricted Gene Flow and Incipient

Speciation in Disjunct Pacific Ocean and Sea of Cortez Populations of a Reef

Fish Species , Girella nigricans. Evolution (N. Y). 54, 652–659.

38

Thacker, C.E., 2017. Patterns of divergence in fish species separated by the Isthmus

of Panama. BMC Evol. Biol. 17, 1–14. https://doi.org/10.1186/s12862-017-

0957-4

Thomson, D.A., Findley, L.T., Kersitch, A.N., 2000. Reef fishes of the Sea of Cortez:

the rocky shore fishes of the Gulf of California. The University of Texas Press,

Austin, TX.

Upton, D.E., Murphy, R.W., 1997. Phylogeny of the Side-Blotched Lizards

(Phrynosomatidae:Uta) Based on mtDNA Sequences: Support for a

Midpeninsular Seaway in Baja California. Mol. Phylogenet. Evol. 8, 104–113.

https://doi.org/10.1006/mpev.1996.0392

Supplementary Information:

Supplementary Table 1. List of the 19 Baja California disjunct fish species

39

(Bernardi et al. 2003).

Family Species Common name Atherinidae Leuresthes tenuis/ L. grunion sardina Girellidae Girella nigricans / G. opaleye simplicidens Haemulidae Anisotremus davidsonii sargo

Blenniidae Hypsoblennius jenkinsi mussel blenny

Blenniidae Hypsoblennius gentilis bay blenny

Chaenopsidae Chaenopsis alepidota oragethroat pikeblenny Serranidae Paralabrax spotted sand bass maculatofasciatus Gobiidae Gillichthys mirabilis longjaw mudsucker

Gobiidae Lythrytpnus dalli blue banded boby

Kyphosidae Kyphosus azureus zebraperch Labridae Halichoeres semicinctus rock wrasse

Labridae Semicossyphus pulcher California sheephead

Scorpaenidae Sebastes macdonaldi Mexican rockfish Embiotocidae Zalembius rosaceus pink surfperch

Scorpaenidae Scorpaena guttata scorpionfish Polyprionidae Stereolepis gigas giant seabass Agonidae Xenetremus ritteri flagfin poacher Pleuronectidae Hypsopsetta guttulata diamond turbot Pleuronectidae Pleuronichthys verticalis hornyhead turbot

40

40

USA San Francisco 35

San Diego 30

25 MEXICO

N Pacific Ocean Cabo San Lucas

0 200 400 600 km 20

-130 -125 -120 -115 -110 -105 Supplementary Figure 1. Typical distribution of Baja California disjucnt fishes.

Supplementary Table 2. Sampling sites for the Anisotremus geminate species. A. interruptus, AIN; A. taeniatus, ATA; A. surinamensis, ASU; A. virginicus, AVI.

Site AIN ATA ASU AVI

Tropical Eastern Pacific

Playa Troncoles, Costa Rica 3 Puntarenas, Costa Rica 3

Western Atlantic

Florida, USA 3 Guarapari Islands, Brazil 3

41

Supplementary Table 3. Pacific and Sea of Cortez (Gulf) sampling sites (ordered by latitude) and number of samples per species. SPU, Semicossyphus pulcher; KAZ, Kyphosus azureus; GMI, Gillichthys mirabilis; ADA, Anisotremus davidsonii.

Region Site SPU KAZ GMI ADA

Pacific 16 10 34 15 Monterey Bay (MB) 7 Goleta Slough (GOS) 10 Carpinteria Slough(CAS) 7 Catalina Island (CI) 5 2 San Diego (SD) 10 7

Guadalupe Island (GUA) 4 Guerrero Negro (GNE) 4 Punta Eugenia (PEU) 1 Bahia Tortugas (TOR) 4 Punta San Roque (PSR) 5 Estero El Coyote (ECO) 6 Puerto San Carlos (PSC) 3

Gulf 10 12 21 22 Estero La Choya (ELC) 10 Punta Choya (PC) 3 5 Estero Morua (EMO) 5 Estero La Pinta (ELP) 6

Bahia Kino (BK) 12 Bahia de los Angeles 5 12 5 (BLA) 2 Los Frailes (LFR)

42

CHAPTER 2

Patterns of Genomic Divergence and Signals of Selection in Sympatric and

Allopatric Northeastern Pacific and Sea of Cortez populations of the Sargo

(Anisotremus davidsonii) and Longjaw Mudsucker (Gillichthys mirabilis).

Abstract

The Pacific and Sea of Cortez allopatric populations of the sargo, Anisotremus davidsonii, (Haemulidae), and the longjaw mudsucker, Gillichthys mirabilis,

(Gobiidae), are separated by the Baja California peninsula, and sympatric populations along the Northeastern Pacific cross established phylogeographic breaks that impact gene flow. Thus, these taxa offer an excellent multi-species framework where to study the mechanisms of divergence and signatures of selection under a gradient of isolation. Here, thousands of loci are genotyped from 48 sargos and 73 mudsuckers using RADseq, to characterize population connectivity, and investigate if sympatric and allopatric populations show unique genomic signatures of divergence, drift, or selection. Statistically significant divergence was seen across Point Conception in

mudsucker (FST=0.15), Punta Eugenia in sargo (FST=0.02), and across the peninsula in

both species (FST=0.11, and 0.23, in sargo and mudsucker, respectively). Structure across the peninsula in presumed neutral loci was highest in the mudsucker but both species showed evidence of strong selection. Larger number of presumed loci under selection were seen between allopatric populations (586-721 vs 13-224 between

sympatric) and these were more differentiated (FST=0.36-1 vs 0.19-0.94 between

43

sympatric). Furthermore, the majority of outliers between sympatric populations accumulated at the lower end of the range of differentiation in contrast to a more even distribution between allopatric populations. This might be an indication of a prevalence of soft selective sweeps when sympatric populations are adapting to a mosaic of adaptive peaks and a more even contribution of soft and hard sweeps during adaptation to very distinct peaks in allopatry.

Keywords: adaptive evolution, Baja California, incipient speciation, differential selection, ecological divergence, selective sweeps, sympatric populations.

44

Introduction

Under the biological species concept, speciation is the process by which two or more populations accumulate sufficient genetic differences to reach reproductive isolation (Mayr 1942). In allopatry, divergence can be initiated by the appearance of a barrier, physically separating populations (vicariance), or by occasional dispersal events where individuals are able to cross an existing barrier which normally impedes open migration between sites (dispersal). In the absence of gene flow populations may diverge randomly through genetic drift acting on neutral loci or selectively by local adaptation changing the allele frequencies of the coding portions of the genome.

Alternatively, populations might also differentiate in sympatry (without a physical barrier), or close geographic proximity (Rocha et al. 2005), if they experience distinct selective pressures from different environments and local adaptation overwhelms the effects on gene flow (Rundle & Nosil 2005; Bernardi

2013; Coyne & Orr 2004). In contrast, high levels of gene flow may prevent the accumulation of advantageous alleles in populations when gene flow is higher than selective pressure (Coyne & Orr 2004; Sexton et al. 2014). Yet, the effects of gene flow are more complex than a simple unequivocal homogenization of genetic variation. In fact low levels of gene flow might be beneficial to local adaptation by providing new alleles from other populations that might impart higher local fitness

(Sexton et al. 2014). Therefore, divergence is the outcome of the interplay between drift, selection, and the level of gene flow between populations. Comprehensive studies examining each of these processes can shed light on the evolutionary histories

45

of species and the similarities and differences between the way different modes of speciation operate in populations and allow them to persist and adapt to new environments.

Sympatric and Allopatric Populations of the Baja California Disjunct Fishes

Marine fishes with allopatric populations are scarce due to the paucity of absolute barriers to gene flow in the ocean (such as the isthmus of Panama) and the great dispersal potential of the pelagic larval stage of most fishes (Helfman et al.

2009; Bernardi 2013; Rocha et al. 2002). The Baja California disjunct species are a rare example of temperate marine fishes (19 species) in which populations have been isolated by specific vicariant and dispersal events (Bernardi et al. 2003; Brusca 1973;

B. R. Riddle et al. 2000) (see Supplementary Table for the full list of species). These species count with populations that have no geographic barriers to dispersal throughout their Pacific distribution from central California to approximately Bahia

Magdalena, in southern Baja California. Likewise, populations occur without obvious barriers in the northern and central zones of the Sea of Cortez (also known as Gulf of

California and here simply referred as the Gulf). However, Gulf populations are separated from the Pacific by the Baja California peninsula and the warmer seawater at the southern point of the peninsula, where these species are either rare or absent (see Supplementary Figure S1 for a representation of the typical Baja

California disjunct distribution).

46

Based on the respective presence or absence of geographic barriers, this study considers Gulf populations to be allopatric to the Pacific and populations within the

Pacific and Gulf distributions to be sympatric. While disjunct populations are considered an early step toward allopatric speciation (Endler 1977), the absence of geographic barriers in the Northeastern Pacific does not intrinsically translate into panmixia as gene flow in marine fishes in this area is affected by environmental factors (Allen et al. 2006; Bernardi & Lape 2005; Huang & Bernardi 2001; Bernardi et al. 2003) . Point Conception (near Santa Barbara, California) is an established biogeographic boundary where fish communities north and south of this point change drastically due to temperature and oceanographic discontinuities (Briggs 1974;

Dawson et al. 2006). Similarly, Punta Eugenia (near the middle of the Baja California peninsula) has been shown to be a phylogeographic barrier that lowers genetic connectivity of some fish populations (Bernardi 2000; Bernardi and Talley 2000;

Terry el al. 2000; Huang and Bernardi 2001; Stepien et al. 2001; Schinske et al.

2010). Bernardi et al. (2013) found that disjunct species with low gene flow across the Baja California peninsula (8 out of 12 studied species), also presented decreased gene flow across Punta Eugenia and Point Conception. The end result is multiple species with populations experiencing a gradient of gene flow and an extraordinary system to study the mechanisms of divergence and signals of selection under different scenarios of isolation.

Despite significant differences in their ecologies, the sargo, Anisotremus davidsonii (grunts, Haemulidae), and the mudsucker, Gillichthys mirabilis (gobies,

47

Gobiidae), are among the species that showed most divergence (based on mitochondrial cytochrome b, mtCYTB) across Point Conception and Punta Eugenia as well as high differentiation between disjunct populations (Table 1). In fact, their disjunct populations have been proposed to be in the process of incipient speciation as they formed well-supported clades that are sister to each other and reciprocally monophyletic (Bernardi et al. 2003; Bernardi & Lape 2005). Given that divergence is not homogeneous along the genome, however, these patterns might differ when assessing divergence with an increased number of markers.

Here, we scanned thousands of loci throughout the genome using Restriction site-Associated DNA sequencing (RADseq) to characterize the gene flow between sympatric and allopatric Northeastern and Sea of Cortez populations of sargo and mudsucker. Subsequently, by analyzing differentiation in neutral and outlier loci in combination and separately, we investigated the role of drift and selection in creating population structure. We finally searched for commonalities in the patterns of genomic divergence and signals of selection between sympatric and allopatric populations of the focal species. We specifically ask (1) if the genomic patterns of divergence across the peninsula and mentioned Pacific phylogeographic breaks are concordant with the previously observed mtCTYB divergence? (2) Do genomic patterns of differentiation change when analyzing neutral and outlier loci separately? and finally (3), can we identify patterns of outlier loci divergence that are specific to sympatric and allopatric populations?

48

1

Table 1. General characteristics and previously available geneti Haemulidae Gobiidae Family

(sargo) davidsonii Anisotremus mudsucker) (longj mirabilis Gillichthys Species aw

Pacific/Gulf D central upper & Baja to central Monterey central /upper & Magdanela Bahia bay to Tomales istribution

/ gulf gulf

Up to 60m bottoms. y sand Occasionall Rocky reef. sloughs. and estuaries bottoms, shallow Intertidal d Habitat and epth

s oft

40 to 50 8 settle at larvae but unknown, (days) PLD - 12mm

n/a high to Moderate Conception Point across Gene flow c information (based on mt CYT

Low l Extremely Eugenia Punta across flow Gene ow

l Extremely l Extremely peninsula across flow Gene ow ow

0.64 mya. 0.16 to mya. 0.76 to 2.3 mode time & divergence Gulf/Pacific Proposed B

)

of studied specie

Fi (2005) Bernardi & Lape Bernardi et al. (2003) FishBase Bernardi e (2001) Huang & Bernardi References shBase

t al. (2003) s.

49

Methods 1

Collections and DNA extractions 2

Sargo specimens were collected by polespear on SCUBA or free diving. 3

Mudsucker individuals were collected using minnow traps at multiple locations 4 throughout their Pacific and Gulf distributions (Figure 1). See Table 2 for numbers of 5 samples per population and region. 6

7 40

USA San Francisco ELK GOS CAS

35 Point Conception ELC PC CI EMO ELP SD BLA 30 GNE BK Punta Eugenia

PEU

25 PSR MEXICO ECO N Pacific Ocean

0 200 400 600 km 20 -130 -125 -120 -115 -110 -105 8 Figure 1. Pacific and Sea of Cortez sampling localities and 9 well-established phylogeographic breaks, Point Conception 10 and Punta Eugenia. See Table 2 for the number of samples 11 per species. ELK, Elkhorn Slough; GOS, Goleta Slough; 12 CAS, Carpinteria Sough; CI, Catalina Island; SD, San 13 Diego; GNE, Guerrero Negro; PEU, Punta Eugenia; PSR, 14 Punta San Roque; ECO, Estero El Coyote; ELC, Estero La 15 Choya; PC, Punta Choya; EMO, Estero Morua; ELP, Estero 16 La Pinta; BK, Bahia Kino; BLA, Bahia de Los Angeles. 17 18 19

50

Table 2. Location and number of samples per species. 20 21 Region Location Gillichthys Anisotremus mirabilis davidsonii 22 Pacific (Pac) 45 18 (CAL) 30 10 23 California Elkhorn Slough (ELK) 11 Goleta Slough (GOL) 10 24 Carpinteria Slough(CAS) 9 Catalina Island (CI) 3 25 San Diego (SD) 7 Baja 26 (BCS) 15 8 California Gerrero Negro (GNE) 8 27 Punta Eugenia (PEU) 2 Punta San Roque (PSR) 6 28 Estero El Coyote (ECO) 7 29 Sea of Cortez (Gulf) 28 30 Estero La Choya (ELC) 10 30 Punta Choya (PC) 10 Estero Morua (EMO) 9 31 Estero La Pinta (ELP) 9 Bahia Kino (BK) 12 32 Bahia de los Angeles 8 (BLA) 33

34

35

Only one of our sampling sites for the mudsucker (Elkhorn Slough) occurs 36 north of Point Conception. While the sargo might be found at these latitudes during 37 warm water events such as El Nino, it is generally rare and does not have established 38 populations north of Point Conception. Between Point Conception and Punta 39

Eugenia, samples were collected in three sites for the mudsucker and two for the 40 sargo (Figure 1, Table 2). However, since only three sargos were obtained from 41

Catalina Island (CI), samples from CI and San Diego (SD) were pooled into a single 42

51

Californian population (CAL) for the subsequent analyses. South of Punta Eugenia, 43 mudsuckers were collected from one site and sargos from two. Yet, these two sargo 44 sites (Punta Eugenia, PEU and Punta San Roque, PSR) were also pooled together to 45 form a single Baja California sargo population (BCS). We sampled three distinct 46 locations for each species within the northern and central zones of the Sea of Cortez. 47

Fin clips, gill, or muscle tissues, were extracted from specimens and stored in 95% 48 ethanol at room temperature in the field and at -80 in the lab. Qiagen DNeasy 96 49

Tissue Kits for purification of DNA from animal tissues (QIAGEN, Valenica, 50

California, USA) were utilized following the manufacturer’s protocol to extract 51 genomic DNA from subsampled tissue. 52

53

Marker Genotyping, Discovery, and Validation. 54

We followed the original protocol utilizing the Sbf1 restriction enzyme to 55 digest DNA (Miller et al. 2007; Baird et al. 2008) and produced two RAD libraries. 56

Each genomic DNA sample contained a of 400ng which was then 57 physically sheared using a Covaris S2 sonicator using an intensity of 5 for 30 58 seconds, 10% duty cycle, and cycles/burst of 200. Final amplification PCR was 59 performed with 50µl reaction volumes and 18 amplification cycles. Subsequent 60 purification and size selection steps were accomplished using Ampure XP beads 61

(Agencourt). Unique barcodes were ligated to all samples which were later sequenced 62 using two illumina HiSeq 2000 lanes at UC Berkeley (Vincent J. Coates Genomics 63

Sequencing Laboratory). 64

52

Discovery and genotyping of single nucleotide polymorphism (SNP) was 65 performed using the STACKS software version 1.29 (Catchen et al. 2011; Catchen et 66 al. 2013) and modified Perl scripts (Miller et al. 2012). We excluded any reads 67 without the 6-bp barcode or an exact match to the SBf1 restriction site sequence, or 68 with a probability of sequencing error higher than 10% (Phred score=33). Final 69 quality-filtered reads consisted of 80-bp after removing barcodes and restriction sites. 70

Population scripts in STACKS used these reads as the input for the population 71 genomic analysis. The analysis of allopatric populations of both species started by 72 running population scripts with all the Pacific individuals pooled as a single 73 population and all the Gulf individuals as another. For the analysis of sympatric 74 populations of G. mirabilis, one population script was run considering each site as a 75 separate population. For sympatric populations of A. davidsonii, Pacific individuals 76 were pooled into two populations, California (CAL) and Baja California (BCS), and 77

Gulf sites were kept as separate populations. 78

After substantial exploration of the different datasets and to enable using the 79 same parameters in all population analyses, only reads with a minimum depth 80 coverage of 6x (m=6) that were present in at least 60% (r=0.60) of the corresponding 81 population were included in the analyses. We further minimized the probability of 82 linkage in our data by using the write_single_SNP option to select only the first SNP 83 in reads that had more than one SNP. Reads passing the quality and population filters 84 are presented as our total loci in Table 3. 85

86

53

Analysis of Population Genomic Divergence 87

After the various population analyses in STACKS, the genepop output file 88 from the population program of STACKS was converted into Arlequin format using 89

PGDSpider (Lischer & Excoffier 2012). Arlequin version 3.5.1.2 (Excoffier & 90

Lischer 2010) was then used to compute genomic indexes of diversity for each 91 population and determine genetic distances between populations. Fixation index (FST) 92 was calculated using the total number of loci and computing a distance matrix under 93

10000 permutations, a significance level of 0.05, and allowing a maximum of 0.1 94 missing level per site. Usable loci, polymorphic loci, percent polymorphism 95

(polymorphic loci divided by usable loci), and gene diversity or theta Θ, are also 96 reported from the output of Arlequin (Table 3). Z-tests of proportions (with the 97 significance level set to 0.01) were conducted in the Ausvet EpiTools calculator 98

(http://epitools.ausvet.com.au/content.php?page=z-test-2) to determine if the average 99 amount of polymorphism observed in sympatric Pacific and Gulf population, as well 100 as between allopatric populations, was statistically different. 101

We used different approaches to explore and visualize genomic structure 102 using all, neutral, and outlier loci. First, we performed a Discriminant Analysis of 103

Principal Components (DAPC) (Jombart et al. 2010) on our dataset containing all 104 loci. DAPC combines the benefits of discriminant and principal component analyses 105 and is particularly useful to study differences between clusters (or populations) as it 106 utilizes a multivariate approach to explore the entire variation in the data and 107 minimizes that within clusters. This analysis was performed using the ADEGENET 108

54

package (Jombart 2008) in R (R Core Team 2013) with the Structure file produced by 109 the populations program in STACKS as input. The algorithm find.clusters identified 110 the plausible number of clusters by comparing Bayesian Information Criterion (BIC) 111 values and the cross-validation tool xvalDapc determined the number of principal 112 components that were retained. 113

We subsequently examined genetic structure in neutral and outlier loci, 114 separately, to investigate if genetic drift and selection have shaped population 115 divergence in unique patterns. Putative neural loci were obtained by excluding the 116 outlier loci from the total number of loci. The cataloging of outlier loci, or putative 117 loci under selection, is explained below. We ran population scripts employing the 118 blacklist and whitelist options, for neutral and outlier loci respectively. Structure 119 output files were analyzed using a Bayesian approach in STRUCTURE version 2.3.4 120

(Pritchard et al. 2000) to determine if the assignment of samples into genetic clusters 121 differed based on neutral and outlier loci. For the neutral loci data set, a range of K 122 from one to seven for A. davidsonii and one to ten for G. mirabilis (corresponding to 123 the number of sites for each species plus two more possible hypothetical populations) 124 were performed with 10 000 as the burn-in parameter and 100 000 replicates under 125 the admixture model. STRUCTURE runs with the outlier loci followed the same 126 parameters but with a range of K from one to five for both species. Only the K with 127 the highest likelihood according to the Evanno method (Evanno et al. 2005) 128 implemented in Structure Harvester (Earl & VonHoldt 2012) are illustrated. 129

130

55

Cataloging and Analyzing FST Outliers 131

Although working with FST outliers might incorporate a series of shortfalls 132

(Bierne et al. 2011, 2013; Lotterhos & Whitlock 2015), they are commonly used to 133 show evidence of the diverging effects of selection between populations (Bernardi et 134 al. 2016; Gaither et al. 2015; Longo & Bernardi 2015; Stockwell et al. 2016). Outlier 135 loci were identified directly from the phistats output file of STACKS by selecting loci 136 above a threshold value equal to three standard deviations above the mean AMOVA 137

FST (or loci with the top 0.03% divergence values). 138

We then investigated the patterns of loci abundance and divergence between 139 populations. To begin the outlier analysis, a population script was run using a 140 whitelist containing the outlier identifications and the resulting structure files were 141 used to create STRUCTURE plots following the same parameters as with the neutral 142 loci (see above). Subsequently, violin plots (density and boxplot hybrid graphics) 143 were produced using the package ggplot2 and geom_violin( ) (Wickham 2016) in R 144 to observe the FST range of outliers and their abundance along this range. We also 145 performed Z-tests to determine whether the relative proportion of outliers (i.e. the 146 number of outlier loci according to the corresponding total number of loci) was 147 statistically different between intraspecific and interspecific populations. In both 148 species, we tested if the proportion corresponding to the average number of outliers 149 found in the sympatric populations was different from that of the allopatric 150 populations and whether differences in Pacific and Gulf outlier proportions were 151 significant. 152

56

Outlier sequences were then uploaded into the GenBank database specifying 153 an Expect threshold (E-value) of 10-6, which returns only matches with a one in a 154 million chance to be paired with a record by chance alone. We finally documented 155 matches to protein coding genes as well as matches to sequences without annotation 156 and present the percentages of each category in horizontal stacked bar plots built in R. 157

Similarly, Z-tests were conducted to determine if the percent of outliers matching to 158 coding genes was statically different between populations. 159

160

Results 161

Loci and Polymorphism Statistics 162

Genomic DNA was sequenced from a total of 121 samples (73 G .mirabilis 163 and 48 A. davidsonii) in two illumina lanes producing approximately 200 millions 164 reads. The de novo program from STACKS created a total of 3,994,606 unique stacks 165 and identified a total of 708,055 SNPs (averaging to 33,013 stacks and 5,852 SNPs 166 per individual). The number of loci passing all filters in the population scripts with 167

Pacific and Gulf pooled populations of G. mirabilis and A. davidsonii were 15,058 168 and 4, 379, respectively (Table 3). Total loci resulting from the scripts with multiple 169 populations were 4, 316 for G. mirabilis and 15,338 for A. davidsonii. For both 170 species, percent polymorphism was statistically higher in the pooled Gulf population 171 than in the pooled Pacific population (72% to 25% in G. mirabilis and 72% to 49% A. 172 davidsonii; z- value=10.1, p-value=0, and z-value=13.5, p-value=0). Percent 173

174 175

57

Table 3. Locus, polymorphism, and genetic diversity statistics of the Pacific and Sea 176 of Cortez (Gulf) populations per species (after quality and population filters). Percent 177 polymorphism was calculated by dividing the number of polymorphic loci by the 178 usable loci. ELK, Elkhorn Slough; GOL, Goleta Slough; CAS, Carpinteria Slough; 179 GNE, Gerrero Negro; ECO, Estero El Coyote; Pac, Pacific; Gulf, Sea of Cortez; ELC, 180 Estero La Choya; EMO, Estero Morua; ELP, Estero La Pinta; CAL, California; BCS, 181 Baja California; PC, Punta Choya; BKI, Bahia Kino; BLA, Bahia de los Angeles. 182 Species Site Samples Total Usable Polym. % Tetha Θ loci loci loci Polym.

G.mirabilis ELK 11 4316 3002 270 9 0.01845 GOL 10 4316 3900 304 8 0.01991 CAS 9 4316 159 13 8 0.02902 GNE 8 4316 24 7 29 0.08680 ECO 7 4316 124 19 15 0.05113 Pacific 45 15058 352 89 25 0.02770

ELC 10 4316 3344 1315 39 0.07139 EMO 9 4316 129 38 29 0.05618 ELP 9 4316 44 17 39 0.06654 Gulf 28 15058 168 121 72 0.05166

A. davidsonii CAL 10 15338 8182 2609 32 0.06383 BCS 8 15338 5780 2046 35 0.07524 Pacific 18 4379 2107 1031 49 0.06214

PC 10 15338 3108 1111 36 0.05951 BK 12 15338 8464 3693 44 0.06592 BLA 8 15338 3191 1141 36 0.07564 Gulf 30 4379 1391 1004 72 0.05963 183

184 polymorphism ranged from 8% to 39% and from 32% to 49%, in discrete mudsucker 185 and sargo populations, respectively. The average polymorphism in sympatric Gulf 186

58

populations was also statistically higher than in Pacific populations for either specie 187

(z-value=13.4 and p-value=0 for mudsucker populations; z-value=5.8 and p- 188 value=0.0001 for sargo populations). See Table 3 for full loci and polymorphism 189 statistics. 190

191 Table 4. Distance matrix reporting FST values (below diagonal) and 192 significance (above diagonal) between sympatric populations of Anisotremus 193 davidsonii and Gillichthys mirabilis. Results from pooled allopatric Pacific 194 and Gulf populations are given below matrices. ELK, Elkhorn Slough; GOS, 195 Goleta Slough; CAS, Carpinteria Sough; CI, Catalina Island; SD, San Diego; 196 GNE, Guerrero Negro; PEU, Punta Eugenia; PSR, Punta San Roque; ECO, 197 Estero El Coyote; ELC, Estero La Choya; PC, Punta Choya; EMO, Estero 198 Morua; ELP, Estero La Pinta; BK, Bahia Kino; BLA, Bahia de Los Angeles. 199 200 Gillichthys mirabilis ELK GOL CAS GNE ECO ELC EMO ELP ELK + + + + + + + GOL 0.15 - + - + + + CAS 0.12 0 + - + + + GNE 0.18 0.05 0.04 - + + + ECO 0.15 0 0 0 + + + ELC 0.30 0.25 0.20 0.13 0.16 - - EMO 0.31 0.26 0.21 0.14 0.19 0 - ELP 0.28 0.23 0.19 0.10 0.14 0 0

Pacific vs Gulf Fst= 0.23 +

Anisotremus davidsonii CAL BCS PC BK BLA CAL + + + + BCS 0. 02 + + + PC 0.12 0.11 - - BK 0.12 0.11 0 - BLA 0.12 0.11 0.01 0

Pacific vs Gulf Fst = 0.11 + 201 59

Genomic Divergence and Structure between Sympatric and Allopatric Populations 202

The first step in our population comparison was to characterize the genomic 203

FST in Arlequin using the total number of loci. The allopatric sargo and mudsucker 204 populations (Pacific and Gulf pooled populations) respectively presented a genomic 205

FST of 0.11 and 0.23, and both were significant (using a significance level of 0.05). 206

Pacific populations of the sargo (CAS and BCS) diverged by a FST=0.02278 and this 207 was significant as well. In contrast, differentiation between any pair of the Gulf 208 populations only reached FST values lower than 0.01. Latitudinal pairwise FST 209 between Pacific mudsucker populations ranged from 0 to 0.14575 and all Gulf 210 comparisons resulted in FST=0 (See Table 4 for the complete distance matrix). 211

Subsequently, in order to determine if drift and/or selection have produced 212 unique patterns of population structure, we dissected the observed genomic 213 divergence by performing a DAPC analysis using all loci together and STRUCTURE 214 plots with neutral and outlier loci individually. Differences in the genetic 215 composition of allopatric populations for both species resulted in distinct clusters in 216 the DAPC graphs (Figure 2). Interestingly, Pacific sargo population and Gulf 217 mudsucker populations are also visibly well-separated. STRUCTURE plots showed 218 different patterns of genomic structure when we analyzed either neutral or outlier loci 219

(Figure 3). When using neutral loci in both species, Pacific individuals largely 220 appeared to belong to a common cluster with Gulf individuals, but not without 221 showing noticeable differences. In contrast, when outlier loci are utilized, plots 222 showed an extremely high probability that every Pacific individual belong to a 223

60

separate population to that of all Gulf individuals. Structure Harvester selected a K of 224

4 for the analysis of neutral loci in A. davidsonii and a K of 2 for every other 225 treatment. 226

227

228

229 Figure 2. DAPC cluster plots of Gillichthys mirabilis (a) and Anisotremus davidsonii 230 (b) populations. For both species, Pacific populations cluster to the left of vertical line 231 (ELK, GOL, CAS, GNE and ECO for G. mirabilis; CAL and BCS for A. davisonii) 232 and Sea of Cortez populations cluster to the right (ELC, EMO and ELP for G. 233 mirabilis; PC, BK and BLA for A. davisonii). Plots were created in R using the 234 adegenet package with 3 and 4 discriminant functions and retaining 14 and 12 235 principal components for G. mirabilis and A. davisonii, respectively. Number of 236 principal components was selected by the validation tool xvalDapc. 237 238 239 240 241 242

61

243

244 Gillichthys mirabilis Anisotremus davidsonii 1.00 1.00 Neutral0.80 0.80 loci 0.60 0.60 0.40 0.40

0.20 0.20

0.00 0.00 1 3 5 7 1 3 5 2 4 6 8 2 4 1.00 1.00 Outlier 0.80 0.80 loci 0.60 0.60 0.40 0.40

0.20 0.20

0.00 0.00 1 3 5 7 1 3 5 ELK GOL2 CAS GNE4 ECO ELC6 EMO ELP8 CAL BCS2 PC BK4 BLA 245 Figure 3. STRUCTURE plots with Bayesian assignment of individual into 246 distinct genetic clusters or populations (green, red, blue, or yellow) based on 247 presumed neutral loci (top panels) and outlier loci or loci suspected to be 248 under selection (bottom panels). Structure Harvester selected a k of 4 for 249 neutral loci in A. davidsonii and a K of 2 for every other analysis. Structure 250 Harvester results are presented in Supplementary Figure 2. For G. mirabilis, 251 neutral loci= 4169 and outlier loci= 147. For A. davidsonii, neutral loci= 252 15058 and outlier loci= 275. ELK, Elkhorn Slough; GOS, Goleta Slough; 253 CAS, Carpinteria Sough; CI, Catalina Island; SD, San Diego; GNE, 254 Guerrero Negro; PEU, Punta Eugenia; PSR, Punta San Roque; ECO, Estero 255 El Coyote; ELC, Estero La Choya; PC, Punta Choya; EMO, Estero Morua; 256 ELP, Estero La Pinta; BK, Bahia Kino; BLA, Bahia de Los Angeles. 257 258

259

260

Outlier Loci and Evidence of Selection 261

Outliers were identified for latitudinal pairwise Pacific populations and for all 262 pairwise comparisons within the Gulf by selecting loci with a differentiation higher 263 than three standard deviation from the mean AMOVA FST in the phistats file 264

62

produced by the STACKS population scripts. The number of outliers between 265 allopatric population of the sargo and mudsucker were 721 loci with a FST range from 266

0.36 to 1 and 586 loci with a FST range from 0.72 to 1, respectively (Figure 4). 267

Among these, allopatric populations of sargo and mudsucker had 40 and 49 fixed 268 loci, respectively. When comparing CAL and BCS sargo populations, 202 outlier loci 269 were identified with FST values ranging from 0.24 to 0.77. In the Gulf, population 270 pairwise comparisons resulted in 192 to 224 outlier loci and FST values ranging from 271

0.19 to 0.81. Analyses of mudsucker sympatric populations yielded a range of outlier 272 loci from 13 to 22 with an FST range of 0.19 to 0.94 in the Pacific and 57 to 76 outlier 273 loci, with an FST range of 0.18 to 0.55 in the Gulf (Figure 4). 274

Given that the population program did not produce equal total number of loci 275 in the sympatric and allopatric analyses, we compared the proportions of the total loci 276 that were classified as outliers. We found that the average proportion of outliers 277 observed in sympatric sargo populations was not different than that in the sympatric 278 mudsucker populations (z-value=2.3, p-value=0.02). However, the proportions of 279 outliers between allopatric populations of the two species were significantly different 280

(z-value=29.2, p-value=0). In fact, with the exception of finding no statistically 281 differences between the average outlier proportion in the Pacific and Gulf sargo 282 populations, all other comparison (Pacific sargo v.s. Pacific mudsucker populations; 283 sympatric v.s. allopatric populations within each species; and Pacific v.s. Gulf 284 mudsucker populations) were statistically different. 285

286

63

287

We further classified outliers based on whether they matched known coding 288 regions, sequences without annotation, or produced no match at all in GenBank. Out 289 of the 721 and 586 outlier loci identified between allopatric sargo and mudsucker 290 populations, 25% and 17% were paired to coding genes by the Blastn tool, 291 respectively (Figure 4). This percent difference was statistically significant (z- 292 value=3.6, p-value=0.0004). On average, 38% and 27% of the outliers between 293 sympatric Pacific sargo and mudsucker populations produced matches to coding 294 genes and 35% and 16% did the same in Gulf sargo and mudsucker populations, 295 respectively. 296

While the percent difference between Pacific sargo and mudsucker 297 populations was not statistically different (z-value=0.9, p-value=0.3424), the gene 298 matches among Gulf populations was significantly different (z-value=2.8, p- 299 value=0.0046). Sargo presented a statistically higher average percentage of outliers in 300 sympatric v.s. allopatric populations (z-value=4.2, p-value=0.0001). Percentages and 301 number of loci falling into each category for every population analysis are given in 302

Figure 4. Supplementary File 1 provides the list of all the loci (symparic and 303 allopatric) matching a coding gene with their respective FST values and GenBank gene 304 descriptions. 305

306

307

308

64

Category A(none) B(sequence) C(genes) Anisotremus davidsonii Category A(none) B(sequence) CategoryC(genes)codingA(none) B(sequence)non-codingC(genes)Category A(none)no matchB(sequence) C(genes)

M(CAL-BCS) Category A(none) B(sequence) C(genes) CAL-BCS M(CAL-BCS) 77 202M(CAL-BCS) 45 77 8077M(CAL-BCS)45 45 77 80 80 45 80

L(Pac-Gulf) Pac-Gulf L(Pac-Gulf) 183 146L(Pac-Gulf) 721183 392183146L(Pac-Gulf) 146 183392 146392 392 M(CAL-BCS) 77 45 80 K(PC-BKI) PC-BKI K(PC-BKI) 19579 K(PC-BKI) 45 79 7179K(PC-BKI) 45 45 79 71 71 45 71 L(Pac-Gulf) 183 146 392 J(PC-BLA) PC-BLA J(PC-BLA) 69 J(PC-BLA)34 69 89 69 J(PC-BLA)34 34 6989 89 34 89 192 K(PC-BKI) 79 45 71 I(BK-BLA) BK-BLA I(BK-BLA) 61 I(BK-BLA)48 61 11661 48I(BK-BLA) 48 61116 11648 116 224 J(PC-BLA) 69 34 89 H(ELK-GOL) H(ELK-GOL) 4 2 H(ELK-GOL) 4 13 42 H(ELK-GOL)2 134 213 13 Gillichthys mirabilis I(BK-BLA) 61 48 116 G(GOL-CAS) G(GOL-CAS) 4 G(GOL-CAS)2 4 7 4 G(GOL-CAS)2 2 47 7 2 7 ELK-GOL H(ELK-GOL)19 4 2 13 F(CAS-GNE) F(CAS-GNE) 6 2F(CAS-GNE) 6 14 62 F(CAS-GNE)2 146 142 14 GOL-CAS 13 G(GOL-CAS) 4 2 7 E(GNE-ECO) E(GNE-ECO) 6 E(GNE-ECO)6 6 106 E(GNE-ECO)6 6 6 10 106 10 CAS-GNE 22 F(CAS-GNE) 6 2 14 D(Pac-Gulf) D(Pac-Gulf) 101 60 D(Pac-Gulf) 101425 60101 D(Pac-Gulf)60 101425 60425 425 GNE-ECO 22 E(GNE-ECO) 6 6 10 C(ELC-EMO) C(ELC-EMO) 9 11 C(ELC-EMO) 9 37 911 C(ELC-EMO)11 9 37 11 37 37 Pac-Gulf D(Pac-Gulf)586 101 60 425 B(EMO-ELP) B(EMO-ELP) 11 6 B(EMO-ELP) 11 41 611 B(EMO-ELP)6 1141 641 41 ELC-EMO 57 C(ELC-EMO) 9 11 37 A(ELC-ELP) A(ELC-ELP) 11 13 A(ELC-ELP) 11 52 1311 A(ELC-ELP)13 1152 13 52 52 EMO-ELP 58 B(EMO-ELP) 11 6 41 0 25 500 0 2575 25 501000 50 75 25 75100 50 100 75 100 ELC-ELP 0.25 0.5076 0.75 PercentA(ELC-ELP)1.00 11 13 Percent Percent 52 Percent Fst 0.25 0.50 0.75 1.00 0 25 50 75 100 Percent Fst 309 Figure 4. Outlier loci statistics in pairwise comparisons of populations of 310 Anisotremus davidsonii and Gillichthys mirabilis. Violin plots (left panels) 311 depict the relative density of outliers over the overall outlier FST distribution. 312 Numbers at the end of shapes give the total number of outlier loci. Horizontal 313 stacked bar plots (right panels) illustrate the percent of loci matching to a 314 protein-coding gene (blue stacks), or a sequence with no annotation (green 315 stacks), or returning no matches using the BLAST tool in GenBank (salmon 316 colored stacks). Numbers inside stacked bars give the actual number of 317 outliers in each category. ELK, Elkhorn Slough; GOL, Goleta Slough; CAS, 318 Carpinteria Slough; GNE, Gerrero Negro; ECO, Estero El Coyote; Pac, 319 Pacific; Gulf, Sea of Cortez; ELC, Estero La Choya; EMO, Estero Morua; 320 ELP, Estero La Pinta; CAL, California; BCS, Baja California; PC, Punta 321 Choya; BKI, Bahia Kino; BLA, Bahia de los Angeles. 322

65

Discussion 323

The Baja California disjunct species are an exceptional group of species with 324 a shared evolutionary history where genomic divergence, local adaptation, and modes 325 of selections can be examined. This study analyzed two disjunct species that 326 previously showed population structure based on mtCYTB and searched for common 327 patterns of genomic divergence and adaptation in sympatric and allopatric 328 populations. Overall, the output from illumina sequencing yielded less reads in 329 mudsucker than sargo individuals, most likely due to poorer condition and older 330 tissue samples. This was also reflected in the lower number of loci observed in 331 mudsucker populations. The lower number of loci in allopatric than in sympatric 332 analysis of sargo populations is likely a product of pooling populations that already 333 contained intra-population variation and thus impeding a large number of loci from 334 passing established filters. Regardless of these limitations, a sufficient number of loci 335 passed all filters to allow us to reach some comparative and meaningful results (Table 336

3). 337

338

Divergence and Structure between Allopatric Populations 339

According to every evaluation in this study (fixation indexes, DAPCs, and 340

STRUCTURE analyses) which consistently revealed strong structure between the 341

Pacific and Gulf populations, gene flow across the peninsula appears to be extremely 342 low. Genomic distance based on FST values was large and significant, and DAPCs 343 graphs clearly separated the two regions in a single plot axis. The different genetic 344

66

patterns observed in STRUCTURE plots when analyzing putative neutral and loci 345 under selection provide potential evidence that this divergence has been produced in 346 combination by drift and selection. When analyzing putative neutral loci, Pacific and 347

Gulf sargo individuals showed distinctive clusters (blue and yellow) and mudsuckers 348 from the Gulf possessed a visibly higher probability of belonging to an alternate 349 genetic pool (red). When analyzing the outlier loci, the probability of any Pacific 350 individual belonging to the same genetic pool than any Gulf individual, for both 351 species, is extremely low or null. The low levels of gene flow could have allowed 352 drift to make such noticeable impacts on disjunct populations producing the observed 353 unique genetic patters. Similarly, isolation might have also aided selection in creating 354 the drastically different genetic compositions seemed in the outlier loci. The observed 355 higher levels of polymorphism in Gulf than in Pacific populations, might be a 356 reflection of the environmental diversity in the Gulf (Thomson et al. 2000), or for 357 instance, the proposed older age and bigger effective sizes of the sargo population in 358 this region (Bernardi & Lape 2005). 359

360

Divergence and Structure between Sympatric Populations 361

Point Conception and Punta Eugenia appear to be an important discontinuity 362 for gene flow between Pacific populations of both species as suggested by the 363 significant FST values between ELK and rest of the mudsucker populations as well as 364 between the CAL and BCS sargo populations. Yet, GNE and ECO mudsucker 365 populations did not show differentiation across Punta Eugenia preventing the 366

67

genomic divergence from entirely agreeing with previously observed mitochondrial 367

FST patterns. Besides the comparisons of populations north and south of these 368 phylogeographic points, results also reported a low but significant FST value between 369 the mudsucker populations from the Carpinteria Slough (CAS) and Gerrero Negro 370

(GNE). The large geographic distance as well as strong regimes (at the San 371

Quintin region) between them, might be responsible for such divergence. 372

Interestingly, Arlequin detected very little or no divergence between Gulf 373 populations for both species but the DAPC showed some displacement in the sargo 374 populations and substantial differentiation between the three studied mudsucker 375 locations. The fact that FST values could not always predict how distantly populations 376 would be placed in DAPC plots, which in turn revealed structure not visible by FST 377 values alone, highlights how complex structure between these populations can be and 378 the importance of analyzing it with different methods. On one hand, although the 379 three Esteros where we collected the gobies are close to each other, isolation might 380 exist as these are very large estuaries and mudsuckers prefer habitats occuring in 381 outcrops or small channels remarkably high in the intertidal zone. Moreover, this area 382 of the Sea of Cortez experiences very large tidal fluxes that often disconnect 383 mudsucker habitat from the main flow in the estuary which might then prevent the 384 exportation of mudsucker larvae. On the other hand, sargo populations are relatively 385 far from each other but the series of islands between the peninsula and mainland 386 might be acting as stepping stones allowing larvae to cross the Gulf. 387

388

68

Outlier loci and Signals of Selection 389

Our results strongly suggest the presence of differential selection in the 390

Pacific and Gulf populations of each species. The two regions contain very different 391 outlier gene pools as analyses identified large number of substantially diverged loci, 392 including many with fixed differences (40 for sargo and 49 for mudsucker allopatric 393 populations). This divergence is strikingly illustrated in the STRUCTURE plots 394 where almost every individual is homogeneous for the gene pool of its region. The 395 relative proportion of outliers (considering the initial total loci in each analysis) was 396 higher in allopatric than in sympatric comparisons. Moreover, our exploratory 397 examination revealed that outliers from allopatric populations enclosed substantially 398 higher ranges of differentiation than in sympatry and these were distributed more 399 evenly throughout such range. In general, the great majority of outliers diverging 400 between sympatric populations lied close to the lower end of the Fst range instead. 401

This “outlier evenness” in allopatry might be a byproduct distribution of a group of 402 outliers with higher differentiation but it might also be an exclusive signature of 403 allopatric speciation. For instance, while selection can be favoring different alleles in 404 the loci that accumulate in the lower end of the FST range in sympatric populations, 405 existing gene flow could be holding back the fixation process of these potentially 406 adaptive alleles. Local adaptation in sympatry might therefore be reached 407 predominantly by soft selective sweeps. In contrast, at some level of isolation 408 presumably only attainable in allopatry, some of these alleles can avoid the effects of 409 gene flow and diverge more easily obtaining the mentioned “outlier evenness.” This 410

69

in turn, might indicate a more balanced operation of soft and hard selective sweeps in 411 allopatric adaptation. While this study did not test this hypothesis given that for 412 example, the outliers found in sympatry and allopatry are not the same loci, this 413 phenomenon merits further investigation in other systems. 414

Overall, the relative proportion of outliers between latitudinal pairwise 415 sympatric populations was concordant with the observed levels of genomic distance. 416

For instance, this proportion was statistically higher in Pacific populations of the 417 sargo than of the mudsucker but the average proportion between intra- and 418 interspecific Gulf populations did not differ. As mentioned above, the range of FST 419 values of sympatric outliers was not as high as in allopatry but it was still 420 considerably high and outliers were more densely distributed toward the lower end of 421 this range. The two Pacific mudsucker population pairs with significant FST values 422

(ELK-GOL and CAS-GNE) accordingly illustrated the two highest ranges of 423 differentiation of all sympatric comparisons of mudsucker populations. 424

Subsequent evidence of selection includes having large proportions of outliers 425

(up to 25% in allopatry and 38% in sympatry) matching a known coding gene in 426

GenBank. While sympatric comparisons produced overall less numbers of outliers 427 than in allopatry, there was no statistical difference in the average proportion of loci 428 matching genes. Mentioned phylogenetic breaks, upwelling events, habitat 429 discontinuities, as well as the natural latitudinal temperature gradient, might be part of 430 the selective pressures producing the observed divergence in Pacific populations. 431

Similarly, environmental differences between central and northern zones of the Sea of 432

70

Cortez and the tidal regime in combination with habitat choice might do the same for 433

Gulf sargo and mudsucker populations, respectively. This study was successful in 434 identifying patterns of outlier differentiation between the studied sympatric and 435 allopatric populations. A more in-depth revision of the identified outliers (as well as 436 those loci matching a sequence without annotation but that could be adaptive) might 437 shine light on specific details about how sympatric an allopatric operate in these and 438 possibly other populations as well. 439

440

Conclusion 441

The Baja California Peninsula acts as an effective barrier to gene flow 442 between Pacific and Sea of Cortez populations of both species, which show 443 signatures of drift and adaptation to their specific environments. Point Conception 444 and Punta Eugenia moderate the migration of individuals in the Pacific but might 445 create different patterns of divergence depending on the utilized markers. Divergence 446 patterns in outlier loci might be directly correlated with the level of isolation between 447 corresponding populations. Comparisons of allopatric populations produced 448 substantially larger number of outlier loci than sympatric analysis and these were also 449 more evenly distributed along higher ranges of differentiation. Yet, for every 450 population analysis (sympatric and allopatric) at least 16% of the outlier loci matched 451 a known coding gene. These results pinpoint signatures of adaptation in every 452 population and potentially indicate the prevalence of soft sweeps between sympatric 453 populations and a more balanced contribution of soft and hard sweeps in allopatry. 454

71

This study illustrates how RADseq data can be used to explore signatures of 455 sympatric and allopatric speciation as well as the utility of examining genomic 456 divergence using diverse methodologies and while separating neutral and non-neutral 457 loci in analyses. Future population genomic studies are urged to employ multiple 458 techniques and scrutinize the type of loci to use the most appropriate approach to 459 answer formulated questions. 460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

72

References 477

Allen, L. G., Pondella, D. J. & Horn, M. h. (2006) The Ecology of Marine Fishes: 478

California and Adjacent Waters. UC Press. 479

Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., 480

Selker, E. U., Cresko, W. A. & Johnson, E. A. (2008) Rapid SNP Discovery 481

and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE 3, 1–7. 482

Bernardi, G. & Lape, J. (2005) Tempo and Mode of Speciation in the Baja California 483

Disjunct Fish Species Anisotremus Davidsonii. Molecular Ecology 14, 4085– 484

4096. 485

Bernardi, G., Findley, L. & Rocha-Olivares, A. (2003) Vicariance and Dispersal 486

across Baja California in Disjunct Marine Fish Populations. Evolution; 487

international journal of organic evolution 57, 1599–1609. 488

Bernardi, G., Azzurro, E., Golani, D. & Miller, M. R. (2016) Genomic Signatures of 489

Rapid Adaptive Evolution in the Bluespotted Cornetfish, a Mediterranean 490

Lessepsian Invader. Molecular ecology 25, 3384–3396. 491

Bierne, N., Welch, J., Loire, E., Bonhomme, F. & David, P. (2011) The Coupling 492

Hypothesis: Why Genome Scans May Fail to Map Local Adaptation Genes. 493

Molecular Ecology 20, 2044–2072. 494

Bierne, N., Roze, D. & Welch, J. J. (2013) Pervasive Selection or Is It.? Why Are 495

FSToutliers Sometimes so Frequent? Molecular Ecology 22, 2061–2064. 496

73

Briggs, J. C. (1974) Marine Zoogeography. New York: McGraw-Hill. 497

Brusca, R. C. (1973) A Handbook to the Common Intertidal Invertebrates of the Gulf 498

of California. Tucson, AZ: University of Arizona Press. 499

Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. (2013) 500

Stacks: An Analysis Tool Set for Population Genomics. Molecular Ecology 501

22, 3124–3140. 502

Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W. & Postlethwait, J. H. (2011) 503

Stacks : Building and Genotyping Loci De Novo From Short-Read Sequences. 504

G3&#58; Genes|Genomes|Genetics 1, 171–182. 505

Dawson, M. N., Waples, R. S. & Bernardi, G. (2006) Phylogeography. In The 506

Ecology of Marine Fishes: California and Adjacent Waters p. UC Press Larry 507

G. Allen, Daniel J. Pondella, II, and Michael H. Horn Eds. 508

Earl, D. & VonHoldt, B. (2012) STRUCTURE HARVESTER: A Website and 509

Program for Visualizing STRUCTURE Output and Implementing the Evanno 510

Method. Conservation Genetics Resources 4, 359–361. 511

Endler, J. A. (1977) Geographic Variation, Speciation and Clines. Princeton 512

University Press. 513

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the Number of Clusters of 514

Individuals Using the Software STRUCTURE: A Simulation Study. 515

Molecular Ecology 14, 2611–2620. 516

74

Excoffier, L. & Lischer, H. E. L. (2010) Arlequin Suite Ver 3.5: A New Series of 517

Programs to Perform Population Genetics Analyses under Linux and 518

Windows. Molecular Ecology Resources 10, 564–567. 519

Gaither, M. R., Bernal, M. A., Coleman, R. R., Bowen, B. W., Jones, S. A., Simison, 520

W. B. & Rocha, L. A. (2015) Genomic Signatures of Geographic Isolation 521

and Natural Selection in Coral Reef Fishes. Molecular Ecology 24, 1543– 522

1557. 523

Jombart, T. (2008) Adegenet: A R Package for the Multivariate Analysis of Genetic 524

Markers. Bioinformatics 24, 1403–1405. 525

Jombart, T., Devillard, S., Balloux, F., Falush, D., Stephens, M., Pritchard, J., 526

Pritchard, J., Stephens, M., Donnelly, P., Corander, J., et al. (2010) 527

Discriminant Analysis of Principal Components: A New Method for the 528

Analysis of Genetically Structured Populations. BMC Genetics 11, 94. 529

Lischer, H. E. L. & Excoffier, L. (2012) PGDSpider: An Automated Data Conversion 530

Tool for Connecting Population Genetics and Genomics Programs. 531

Bioinformatics 28, 298–299. 532

Longo, G. & Bernardi, G. (2015) The Evolutionary History of the Embiotocid 533

Surfperch Radiation Based on Genome-Wide RAD Sequence Data. Molecular 534

Phylogenetics and Evolution 88, 55–63. 535

Lotterhos, K. E. & Whitlock, M. C. (2015) The Relative Power of Genome Scans to 536

Detect Local Adaptation Depends on Sampling Design and Statistical Method. 537

75

Molecular Ecology 24, 1031–1046. 538

Mayr, E. (1942) Systematics and the Origin of Species from the View- Point of a 539

Zoologist. New York, New York: Columbbia University Press. 540

Miller, M., Dunham, J., Amores, a, Cresko, W. & Johnson, E. (2007) Genotyping 541

Using Restriction Site Associated DNA (RAD) Markers. Genome Research 542

17, 240–248. 543

Miller, M. R., Brunelli, J. P., Wheeler, P. A., Liu, S., Rexroad, C. E., Palti, Y., Doe, 544

C. Q. & Thorgaard, G. H. (2012) A Conserved Haplotype Controls Parallel 545

Adaptation in Geographically Distant Salmonid Populations. Molecular 546

Ecology 21, 237–249. 547

Pritchard, J. K., Stephens, M. & Donnelly, P. (2000) Inference of Population 548

Structure Using Multilocus Genotype Data. Genetics 155, 945–959. 549

R Core Team. (2013) R: A Language and Environment for Statistical Computing. 550

Vienna, Austria: R Foundation for Statistical Computing 2013, doi:ISBN 3- 551

900051-07-0. 552

Riddle, B. R., Hafner, D. J., Alexander, L. F. & Jaeger, J. R. (2000) Cryptic 553

Vicariance in the Historical Assembly of a Baja California Peninsular Desert 554

Biota. Proceedings of the National Academy of Sciences 97, 14438–14443. 555

Rocha, L. A., Robertson, D. R., Roman, J. & Bowen, B. W. (2005) Ecological 556

Speciation in Tropical Reef Fishes. Proceedings of the Royal Society B 272, 557

76

573–579. 558

Sexton, J. P., Hangartner, S. B. & Hoffmann, A. A. (2014) Genetic Isolation by 559

Environment or Distance: Which Pattern of Gene Flow Is Most Common? 560

Evolution 68, 1–15. 561

Stockwell, B. L., Larson, W. A., Waples, R. K., Abesamis, R. A., Seeb, L. W. & 562

Carpenter, K. E. (2016) The Application of Genomics to Inform Conservation 563

of a Functionally Important Reef Fish (Scarus Niger) in the Philippines. 564

Conservation Genetics 17, 239–249. 565

Thomson, D. A., Findley, L. T. & Kersitch, A. N. (2000) Reef Fishes of the Sea of 566

Cortez: The Rocky Shore Fishes of the Gulf of California. Austin, TX: The 567

University of Texas Press. 568

Wickham, H. (2016) Ggplot2: Elegant Graphics for Data Analysis. New York. 569

570

571 572 573 574 575 576 577 578 579 580 581 582 583 584 585

77

Supplementary Information 586 587 Supplementary Table 1. List of the 19 Baja California disjunct fish species 588 (Bernardi et al. 2003). 589 590 Family Species Common name Atherinidae Leuresthes tenuis/ L. grunion sardina Girellidae Girella nigricans / G. opaleye simplicidens Haemulidae Anisotremus davidsonii sargo

Blenniidae Hypsoblennius jenkinsi mussel blenny

Chaenopsidae Chaenopsis alepidota oragethroat pikeblenny Serranidae Paralabrax spotted sand bass maculatofasciatus Gobiidae Gillichthys mirabilis longjaw mudsucker

Gobiidae Lythrytpnus dalli blue banded boby

Blenniidae Hypsoblennius gentilis bay blenny Kyphosidae Kyphosus azureus zebraperch Labridae Halichoeres semicinctus rock wrasse

Labridae Semicossyphus pulcher California sheephead

Scorpaenidae Sebastes macdonaldi Mexican rockfish Embiotocidae Zalembius rosaceus pink surfperch

Scorpaenidae Scorpaena guttata scorpionfish Polyprionidae Stereolepis gigas giant seabass Agonidae Xenetremus ritteri flagfin poacher Pleuronectidae Hypsopsetta guttulata diamond turbot Pleuronectidae Pleuronichthys verticalis hornyhead turbot 591 592 593 594 595 596 597 598 599 78

40

USA San Francisco 35

Point San Diego Conception 30

Punta Eugenia

25 MEXICO

N Pacific Ocean

0 200 400 600 km 20

-130 -125 -120 -115 -110 -105 600 Supplementary Figure 1. Typical distribution of the Baja California disjunct 601 species and location of the well-established biogeographic breaks in the Pacific. 602 603 604 605 606 Supplementary Figure 2. Structure Harvester Output: 607 608 Anisotremus davidsonii 609 15058 neutral loci (15338 total loci -280 outlier loci). K=4 was selected. 610 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -205806.92000 14.135440 — — — 2 10 -200435.53000 224.562300 5371.390000 8836.390000 39.349392 3 10 -203900.53000 564.896762 -3465.000000 3439.920000 6.089467 4 10 -203925.61000 622.339092 -25.080000 194718.330000 312.881406

79

K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 5 10 -398669.02000 328814.801505 -194743.410000 362571.210000 1.102661 6 10 -230841.22000 85264.319795 167827.800000 288243.680000 3.380590 7 10 -351257.10000 235171.117020 -120415.880000 — — 611 280 outlier loci. K=2 was selected. 612 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -10718.04000 0.365756 — — — 2 10 -3313.470000 0.266875 7404.570000 7136.180000 26739.792691 3 10 -3045.080000 0.561348 268.390000 805.260000 1434.512273 4 10 -3581.950000 1579.984620 -536.870000 952.640000 0.602943 5 10 -3166.180000 398.719154 415.770000 — — 613 Gillichthys mirabilis 614 4169 neutral loci (4316 total loci -147 outlier loci). K=2 was selected. 615 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -59229.560000 6.831496 — — — 2 10 -53640.810000 105.544392 5588.750000 189325.41000 1793.79882 3 10 -237377.47000 239035.29124 -183736.66000 161469.280000 0.675504 4 10 -259644.85000 341260.719055 -22267.38000 100243.130000 0.293744 5 10 -382155.36000 319468.970318 -122510.51000 101526.080000 0.317796 6 10 -403139.79000 334488.287904 -20984.430000 113259.570000 0.338605 7 10 -537383.79000 258593.19520 -134244.00000 180627.570000 0.698501 8 10 -491000.22000 212888.002932 46383.570000 316609.720000 1.487213 9 10 -761226.370000 295510.172897 -270226.15000 359754.900000 1.217403 10 10 -671697.620000 348340.494014 89528.750000 — — 616 147 outlier loci. K=2 was selected. 617 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -10718.040000 0.365756 — — — 2 10 -3313.470000 0.266875 7404.570000 7136.180000 26739.792691 3 10 -3045.080000 0.561348 268.390000 805.260000 1434.512273 4 10 -3581.950000 1579.984620 -536.870000 952.640000 0.602943 5 10 -3166.180000 398.719154 415.770000 — — 618 80

CHAPTER 3 619

620 Genomic Divergence and Signals of Convergent Selection between Northeastern 621

Pacific and Sea of Cortez Disjunct Populations of Four Marine Fishes 622

623 Abstract 624

Disjunct populations offer the opportunity to study adaptation as well as the 625 early mechanisms of allopatric speciation and the genomic effects of isolation. Unlike 626 terrestrial organisms, or even freshwater fishes, disjunct distributions are uncommon 627 in marine fishes. Yet, the formation of the Baja California peninsula has produced 19 628 natural cases of marine fishes with disjunct populations in the Pacific and Sea of 629

Cortez. Genetic isolation in some of these species has repeatedly been studied based 630 on few markers. Here, we explored the Baja California disjunction further, using 631 genomic markers. This study, based on RADseq, genotyped thousands of loci from 632 genome-wide scans to document patterns of genomic divergence in disjunct 633 population of the sargo, Anisotremus davidsonii (ADA); the longjaw mudsucker, 634

Gillichthys mirabilis (GMI); the California sheephead, Semicossyphus pulcher (SPU); 635 and the zebraperch, Kyphosus azureus (KAZ). Furthermore, we examine the unique 636 patterns of genomic structure drafted by drift and selection, and search for coding 637 regions showing higher than expected differentiation between disjunct populations of 638 multiple species. Depending on the species, approximately 5 to 10 thousand loci were 639 genotyped. Disjunct populations of the sheephead showed little genetic evidence of 640 isolation (FST=0.004). While populations at the south end of the peninsula have not 641

81

been detected near the surface, migrant sheephead might be maintaining gene flow 642 between Pacific and Sea of Cortez populations using deep sea reefs as stepping- 643 stones. Population structure between zebraperch populations was detected for the first 644 time ever. This was shallow (FST=0.03) but statistically significant (p-value=0.0004), 645 and consistent with an early time of isolation. In contrast, divergence and structure 646 for sargo (FST=0.1) and mudsucker (FST=0.31), were considerably higher, thus 647 supporting the idea that these species are in the initial stages of allopatric speciation. 648

We observed different patterns of divergence in presumed neutral and non-neutral 649 loci. All species showed evidence of drift (except for the sheephead) and selection 650

(including sheephead) in STRUCTURE plots, strongly suggesting the presence of 651 unique selective pressures in the two regions. Large numbers of outlier loci 652

(presumed loci under selection) were found (from 120 to 330) and a high proportion 653 of these were also matched to protein coding genes (from 19% to 46%). We identified 654

15 genomic regions of convergence in more than one of these species (40 outlier loci 655 from all four species were matched to these regions). These regions are suspected to 656 be involved in processing environmental information, metabolism, development, 657 immune response, and possible reproduction. Yet, validation of these loci across 658 larger population samples and analyses of the specific mechanisms in which these 659 loci might impart adaptive advantages in these populations are still needed. 660

661

Keywords: adaptive evolution, allopatric populations, Baja California, incipient 662 speciation, population divergence, temperature isolation. 663

82

Introduction 664

Species with disjunct distributions have the potential to remain the same 665 species or diverge depending on how effectively the barrier separating their 666 populations impedes gene flow (Palumbi, 1992). Disjunct populations provide an 667 excellent opportunity to study the evolutionary processes of one of the main forces of 668 speciation, allopatry (Endler 1977; Coyne & Orr 2004). 669

Given the long pelagic larval durations (PLD) in most marine fishes 670

(averaging 30 days but may be as long as 2 years) and the scarcity of obvious 671 physical barriers preventing gene flow in the oceans, marine fishes were traditionally 672 considered to exhibit open populations with larvae that disperse long distances and 673 undermine the genetic structure of the species (Helfman et al. 2009) However, the 674 lack of support for a correlation between PLD and genetic structure in a wide array of 675 marine fishes (Bowen et al. 2006; Shulman & Bermingham 1995; Bay et al. 2006), 676 the development of the idea of self-recruitment (Jones et al. 1999; Swearer et al. 677

2002), and the documentation of the ability of larvae to navigate and make active 678 choices about where to settle (Leis et al. 2003; Leis & Lockett 2005; Leis & Carson- 679

Ewart 2002; Leis & Carson-Ewart 2000; Planes et al. 2002; Selkoe et al. 2006; 680

Bernardi et al. 2012; Pujolar et al. 2006), have highlighted the need to investigate the 681 role of adults and the overall ecology of the species in producing genetic variation 682 between fish populations (Rocha et al. 2002; Poortvliet et al. 2013) 683

Additionally, although hard physical barriers in the oceans are few (e.g. the 684 closure of the isthmus of Panama), other types of barriers such as seawater 685

83

temperature, salinity and freshwater discharge can also isolate fish populations and 686 provide systems where to study the speciation process in diverse scenarios of 687 isolation (Rocha et al. 2002; Bernardi & Lape 2005). 688

689

The Baja California Disjunct Fishes 690

The Baja California disjunct fishes are a group of 19 temperate species 691 inhabiting the Pacific coast from central California to central Baja California Sur (the 692 southern state in the peninsula) and northern and central zones of the Sea of Cortez 693

(also called Gulf of California and here simply referred as the Gulf), where 694 populations are isolated by the warmer seawater temperature at the south of the 695 peninsula (and where these species are absent; supplementary Figure 1 and 696 supplementary Table 1). These taxa offer an extraordinary system of independent 697 natural cases of allopatry with hypothesized isolation dates ranging from at least 2 698 million to 12,000 years (Miller & Lea 1972; Medina & Walsh 2000; Thomson et al. 699

2000; Bernardi et al. 2003). The majority of previous genetic analyses studying 700 allopatry in fishes of the Baja California peninsula have focused on one or two 701 disjunct species (Present 1987; Tranah & Allen 1999; Terry et al. 2000; Huang & 702

Bernardi 2001; Stepien et al. 2001). Yet, Bernardi et al. (2003) examined few 703 mitochondrial and nuclear markers in 12 of these taxa and found 8 species with fixed 704 differences (and low gene flow) and 4 with no notable divergence (and high gene 705 flow) between Pacific and Gulf populations. However, the results of this and the other 706 previous studies were limited in that (1) they utilized only a handful of markers and 707

84

(2) these do not possess enough resolution to detect a very recent cessation of gene 708 flow. Given that differentiation is not uniform throughout the genome, utilizing 709 modern genetic tools (Next Generation Sequencing, NGS) that allow for massive 710 parallel sequencing of neutral and selected loci, might reveal undetected divergence 711 and distinct patterns of differentiation across the peninsula. Similarly, by examining 712 separately each type of loci (neutral vs selected), the effects of drift and selection on 713 allopatric populations and their patterns among species can be investigated. 714

This study performs genome-wide scans to search for signals of selection and 715 determine the levels of genomic differentiation and gene flow between Gulf and 716

Pacific populations of a subset of the Baja California disjunct species: the sargo, 717

Anisotremus davidsonii (ADA); the longjaw mudsucker, Gillichthys mirabilis (GMI); 718 the California sheephead, Semicossyphus pulcher (SPU); and the zebraperch, 719

Kyphosus azureus (KAZ).These four species were selected in order to provide a range 720 of population divergence time and ecological characteristics where to compare 721 genomic variation (Table 1). 722

On one hand, with low gene flow and high differentiation based on mtDNA, 723 sargo and longjaw mudsucker Pacific and Gulf populations are proposed to have 724 diverged 0.160 to 0.64 mya and 0.76 to 2.3 mya, respectively (Huang & Bernardi 725

2001; Bernardi et al. 2003; Bernardi & Lape 2005). Moreover, Bernardi et al. (2003) 726 reported signals of incipient speciation in these species as their Pacific and Gulf 727 populations illustrated reciprocal monophylies, i.e. they formed separate sister clades 728

85

Table 1. Species characteristic and previousl Haemulidae Gobiidae Kyphosidae Labridae Family

(sargo) davidsonii Anisotremus mudsucker) (longjaw mirabilis Gillichthys (zebraperch) azureus Kyphosus sheephead) (california pulcher Semicossyphus Species

central Gulf Northern, Magdalen to Conception Point central Gulf Northern, Magdalen to Bahia Tomales central Gulf Northern Baja central Monterey to Gulf Northern in Baja/ Cape Region Monterey to distribution / Gulf d Pacific istribution Bahia

/

bay , a a / /

y available genetic information.

to 60m bottoms. Up sand Occasionally Rocky reef. Shallow only. sloughs. estuaries and bottoms, Intertidal soft 27m. reefs. Up to Shallow rocky 55m. beads. Up to and kelp Rocky reefs depth Habitat and

40 to 50 8 settle till larvae but unknown, unknown 30 (days) PLD - 12mm

mt CYB mt CYB mt CR microsat mt CR mt CYB Markers

low Extremely low Extremely High High peninsula across flow Gene

0.64 mya. 0.16 to mya. 0.76 to 2.3 n/a n/a time divergence Gulf/Pacific Proposed

(2000) Thomson et al. Lape (2005) (2003) Bernardi et al. et al (2003) Bernardi et al. (2001) Huang & Bernardi et al. (2000) (2003) Bernardi et al. (1972) Miller & Lea et al. (2013) (2003) Bernardi et al. Refe . (2000) rences , , , , ,

Bernardi & Thomson Thomson Poortvliet

, ,

729 86

(supplementary Figure 2). On the other hand, California sheephead and zebraperch 730 showed high levels of gene flow across the peninsula (Bernardi et al. 2003). 731

Ecologically, zebraperch and sargo are most similar as they tend to prefer 732 shallow rocky reefs, although sargo is also found on sandy bottoms and deeper reefs. 733

Juveniles of both species are also common in tide pools. In contrast, California 734 sheephead can be found anywhere from shallow to deep reefs and longjaw mudsucker 735 is restricted to upper intertidal habitat (Thomson et al. 2000; Huang & Bernardi 2001; 736

Bernardi et al. 2003; Poortvliet et al. 2013). 737

Using Restriction site-Associated DNA sequencing (RADseq), we generated 738 thousands of genome-wide markers to explore the patterns of genomic divergence 739 among these four species and searched for coding genes that have diverged in 740 disjunct populations. Our main goal was to determine if the genomic divergence 741 mirrors the previously observed mtDNA patterns. In addition, we assessed divergence 742 differences using neutral and outlier loci (or loci suspected to be under selection). 743

Finally, search for common genomic regions that might have diverged simultaneously 744 in disjunct populations of multiple species, as evidence of convergent selection. 745

746

Materials and Methods 747

Sample Collection 748

Specimens of the sargo, zebraperch, and California sheephead were collected 749 from multiple locations throughout the species distribution (Figure 1), while tide 750 pooling, on SCUBA or free diving, or tissue was sampled from local 751

87

fishermen. Longjaw mudsuckers were obtained from several sites using minnow traps 752 in their habitats. For this population analysis, however, we only focused on the 753 differences between the two main regions. Sample sizes for the Pacific and Sea of 754

Cortez varied from 10 to 34 and from 10 to 22 individuals per species, respectively 755

(Table 2). Fin or gill tissue was sampled from specimens and placed in 95% ethanol 756 at room temperature while in the field but at -80ºC in the lab. 757

758

Genotyping, Discovery and Validation of Single Nucleotide Polymorphisms. 759

Genomic DNA was extracted from fin or gill tissue from collected specimens 760 using the Qiagen DNeasy 96 Tissue Kit for purification of DNA from animal tissues 761

(QIAGEN, Valencia, California, USA). The construction of two RAD libraries 762 followed the original protocol utilizing the Sbf1 restriction enzyme to digest DNA 763

(Miller et al. 2007; Baird et al. 2008). Samples were individually barcoded and 764 sequenced in two lanes on an illumina HiSeq 2000 at UC Berkeley (Vincent J. Coates 765

Genomics Sequencing Laboratory). 766

We used the STACKS software version 1.29 (Catchen et al. 2011; Catchen et 767 al. 2013) and modified Perl scripts (Miller et al. 2012) for single nucleotide 768 polymorphism (SNP) discovery and genotyping. Reads with more than 10% 769 probability of sequencing error (Phred score=33) and without the exact sequence of 770 the SBf1 restriction site or the 6-bp barcode were excluded from the analysis. 771

Barcodes and restriction sites were also removed from fragments resulting in 80-bp 772 quality filtered reads. 773

88

40

USA San Francisco MB GOS CAS

35 ELC PC CI EMO ELP SD BLA 30 GUA BK GNE PEU TOR PSR

25 MEXICO ECO LFR N Pacific Ocean PSC

0 200 400 600 km 20

-130 -125 -120 -115 -110 -105 774

Figure 1. Pacific and Sea of Cortez sampling localities. See Table 2 for the 775 number of samples per species. MB, Monterey Bay; GOS, Goleta Slough; 776 CAS, Carpinteria Sough; CI, Catalina Island; SD, San Diego; GUA, Guadalupe 777 Island; GNE, Guerrero Negro; PEU, Punta Eugenia; TOR, Bahia Tortugas; 778 PSR, Punta San Roque; ECO, Estero El Coyote; PSC, Puerto San Carlos; ELC, 779 Estero La Choya; PC, Punta Choya; EMO, Estero Morua; ELP, Estero La 780 Pinta; BK, Bahia Kino; BLA, Bahia de Los Angeles; LFR, Los Frailes. 781 782

These reads were utilized as the input for the population genomic analyses 783 performed using the population scripts in STACKS. We selected only putative SNPs 784 that exhibited a minimum depth coverage of 8x (m=8) and were present in at least 785

80% (r=0.80) of pooled individuals from the Pacific and Sea of Cortez. If any read 786 had more than one SNP, only the first SNP in that read was considered in the analysis 787 89

to avoid linkage disequilibrium. Reads passing the quality and population filters 788 represent our total loci in Table 3. 789

790

Table 2. Pacific and Sea of Cortez (Gulf) collecting sites (ordered by latitude) 791 and samples per species. SPU, Semicossyphus pulcher; KAZ, Kyphosus 792 azureus; GMI, Gillichthys mirabilis; ADA, Anisotremus davidsonii. 793 794 Region Site SPU KAZ GMI ADA 795 Pacific 16 10 34 15 Monterey Bay (MB) 7 796 Goleta Slough (GOS) 10 797 Carpinteria Slough(CAS) 7 Catalina Island (CI) 5 2 798 San Diego (SD) 10 7 799 Guadalupe Island (GUA) 4 Guerrero Negro (GNE) 4 800 Punta Eugenia (PEU) 1 Bahia Tortugas (TOR) 4 801 Punta San Roque (PSR) 5 Estero El Coyote (ECO) 6 802 Puerto San Carlos (PSC) 3

803 Gulf 10 12 21 22 Estero La Choya (ELC) 10 804 Punta Choya (PC) 3 5 Estero Morua (EMO) 5 805 Estero La Pinta (ELP) 6

806 Bahia Kino (BK) 12 Bahia de los Angeles 5 12 5 807 (BLA) 2 Los Frailes (LFR) 808

809

810

811

90

Analysis of Population Genomic Diversity and Differentiation 812

To determine population genomic diversity indexes and distance, we 813 converted the genepop output file from the population script of STACKS into 814

Arlequin format using PDGSpider (Lischer & Excoffier 2012). Fixation index (FST) 815 between populations, based on the total loci, was calculated in Arlequin version 816

3.5.1.2 (Excoffier & Lischer 2010) with 10000 permutations, a significance level of 817

0.05 and computing a distance matrix. From the output of Arlequin, we also report 818 the total number of loci, usable loci, polymorphic loci, percent polymorphism 819

(polymorphic loci divided by total loci), and genetic diversity or theta (Table 3). 820

Genomic population structure was explored and visualized using three 821 different approaches. First, we manually modified the STRUCTURE output file from 822 the population script of STACKS to perform a Discriminant Analysis of Principal 823

Components (DAPC) (Jombart et al. 2010) using the total number of loci. DAPC is a 824 multivariable approach that is particularly useful to analyze the differences between 825 clusters (or populations) because, while combining the benefits of discriminant and 826 principal component analyses, it explores the entire variation in the data and 827 minimizes that within clusters. DAPC plots are used to illustrate the relationship 828 between Pacific and Gulf genetic pools. DAPC was performed in R (R Core Team 829

2013) using the ADEGENET package (Jombart 2008). The plausible number of 830 clusters was identified comparing Bayesian Information Criterion (BIC) values in the 831 algorithm find.clusters and the number of principal components retained was obtained 832 using the cross-validation tool xvalDapc. 833

91

Subsequently, by analyzing genetic structure using neutral and outlier loci, we 834 investigated the possibility that genetic drift and selection created different patterns of 835 divergence between the two regions (our two other structure approaches). The 836 identification of outlier loci, or candidate loci to be under selection, is described 837 below. Neutral loci consist of the total number of loci minus the outlier loci. A 838 population script was run for each type of loci using the whitelist and blacklist flags 839 and the output genepop file was converted into STRUCTURE format in PDGSpider. 840

These files, which included population identification, were utilized to analyze genetic 841 clusters within neutral and outlier loci using a Bayesian approach in STRUCTURE 842 version 2.3.4 (Pritchard et al. 2000). We performed 10 replicates for each K (from one 843 to five) with a burn-in parameter of 10 000 and running 100 000 replicates under the 844 admixture model. We present only the K with the highest likelihood according to the 845

Evanno method (Evanno et al. 2005) in STRUCTURE HARVESTER (Earl & 846

VonHoldt 2012). 847

848

FST Outliers 849

We used the loci with the highest differentiation to highlight the potential 850 diverging effects of selection between the Gulf and the Pacific. While caveats were 851 identified when using FST outliers as a proxy for regions under selection (Bierne et al. 852

2011; Bierne et al. 2013; Lotterhos & Whitlock 2015), they remain a common 853 approach to show evidence of selection in populations (Bernardi et al. 2016; 854

Stockwell et al. 2016; Longo & Bernardi 2015; Gaither et al. 2015). Loci were 855

92

considered as outliers when falling within the top 0.03% (or three standard deviations 856 above the mean) of FST values from the output of the population scripts in STACKS. 857

Outlier loci were used in the STRUCTURE plots and subsequent violin plot 858 which shows the FST range of outliers and their relative density along their 859 differentiation distribution. Violin plots correspond to the density and boxplot hybrid 860 graphic created in R using the package ggplot2 and geom_violin( ) (Wickham 2016). 861

Outliers identified by both methods were compared against the GenBank database 862 using an Expect threshold (E-value) of 0.000001 corresponding to a one in a million 863 chance to get a match by chance alone. We recorded matches to protein coding genes, 864 as well as chromosomal regions. Finally, we classified protein coding genes diverging 865 simultaneously in more than one species into KEGG assignments (Kanehisa et al. 866

2008; Ogata et al. 1999) and searched for their functions or pathways in the 867

GeneCard online database (www..org), since it also provides information 868 from the , UniProt, GTEx, Entrez Gene, and other databases. 869

870

Results 871

Single Nucleotide Polymorphism Discovery 872

A total of 140 individuals (75 from the Pacific and 65 from the Gulf) among 873 the four disjunct species (Table 2) were sequenced in two illumina lanes producing 874 approximately 230 million reads. After quality filters, our database consisted of 875 approximately 6.5 million reads and one million SNPs, which averaged to 45997 876 reads and 7081 SNPs per individual. After population filters selecting for reads with 877

93

a minimum of an 8x coverage and present in at least 80% of the corresponding 878 population (Pacific or Gulf), the total number of loci (RADtags) per species ranged 879 from 4858 to 10532 (mean = 8345). Species showed consistently higher numbers of 880 usable and polymorphic loci in the Gulf than in Pacific populations. Locus, 881 polymorphism, and genetic diversity statistics per species and region are given in 882

Table 3. 883

884

Genomic structure between Pacific and Sea of Cortez populations 885

Percent polymorphism was very similar between Pacific and Gulf populations 886 in the zebraperch but considerably higher in the Gulf for sargo and mudsucker 887 populations. Only the California sheephead clearly showed greater polymorphism in 888 the Pacific but the difference was smaller than in the sargo and especially for the 889 mudsucker (Table 3). All species showed higher genetic diversity, theta, in Gulf 890 populations. The genomic FST (using the total number or loci) between Pacific and 891

Gulf populations ranged from 0.004 in the California sheephead to 0.31 in the 892 longjaw mudsucker (Table 3). Genomic divergence was low and non-significant for 893 the sheephead, low but significant for the zebraperch, and large and significant for the 894 last two species, sargo and mudsucker (Table 3). 895

896

94

represent the total loci minus the outlier loci from STACKS. population filters). Percent polymorphism calculated was by dividing the number of polymorphic loci by the Table 3. Locus, polymorphism, and genetic statistics of the Pacific and Sea of Cortez (Gulf) populations per species Neutral loci Outlier loci F Genetic diversity Percent polymorphism Polymor Usable loci Total l Number of Region Species ST

oci

phic loci

individuals

Pacific Semiccosyphus 0.191 10532

1486 1716 14% 16 4

Gulf

pulcher 0.004 ( 0.19905 10532 10273

1509 2066 14% 259 10 -

)

Pacific Kyphosus azureus 0.1760

4858 10% 500 694 10 3

Gulf

0.1943 0.03 (+)

4738 3297 4631 4858 68% 120 12 8

Pacific Gillichthys 0.0243 1233 8700

281 3% 34 6

mirabilis

Gulf 0.0843 0.31 (+)

8370 1717 2143 8700

20% 330 21 6

total Pacific Anisotremus

0.077 loci. Neutral loci 1825 9288

10% (after quality and 907

15 3

Gulf

davisonii 0.0797

0.1 (+) 8031 1437 2046 9288 15% 207 22 3

897 95

DAPC analyses 898

Two patterns emerged from the DAPC plots using one discriminant function 899 and all loci. Pacific and Gulf clusters of the California sheephead and zebraperch 900 overlapped while they did not for the sargo and mudsucker (Figure 2). In each species 901

(except for the zebraperch), multiple peaks within each region might indicate further 902 structure within our pooled populations. 903

904 Figure 2. DAPC cluster plots of Pacific (Pac; left – blue peaks) and Sea of 905 Cortez (Gulf; right- red peaks) per species based on all loci. Plots were 906 created in R using the adegenet package with one discriminant and retaining 907 the number of principal components selected by xvalDapc; 18 for 908 Semicossyphus pulcher (a); 8 for Kyphosus azureus (b); 10 for Gillichthys 909 mirabilis (c); and 20 for Anisotremus davidsonii (d). 910

96

STRUCTURE analyses 911

Structure Harvester selected a k of 2 for all STRUCTURE plots with the 912 exception of the plots using neutral loci for GMI and ADA, where a k of 3 was 913

914

Neutral loci Outlier loci

1.00 1.00

0.80 0.80

0.60 0.60

SPU 0.40 0.40

0.20 0.20

0.00 0.00

1.00 1.00

0.80 0.80

0.60 0.60

KAZ 0.40 0.40

0.20 0.20

0.00 0.00

1.00

0.80 GMI 0.60 0.40

0.20

0.00

1.00

0.80 ADA 0.60 0.40

0.20

0.00 Pacific Sea of Cortez Pacific Sea of Cortez 915 Figure 3. STRUCTURE plots based on presumed neutral loci (left panels) 916 and outlier loci or loci suspected to be under selection (right panels) and 917 individual Bayesian assignment into distinct genetic clusters or populations 918 (Green, Red or blue). For each panel, individuals to the left of the black bar 919 were collected in the Pacific and the others in the Sea of Cortez. Structure 920 Harvester selected a k of 2 (from 1 to 5) for both types of loci for SPU 921 (Semicossyphus pulcher: neutral loci= 10,273, outlier loci= 259) and KAZ 922 (Kyphosus azureus: neutral loci= 4,738, outlier loci=120 loci). A k of 3 was 923 selected for neutral loci in GMI (Gillichthys mirabilis: neutral loci= 8,370) 924 and ADA (Anisotremus davidsonii: neutral loci= 8,031) but a k of 2 for 925 outlier loci (GMI: outlier loci=300; ADA: outlier loci= 207). Full Structure 926 Harvester results are presented in Supplementary Figure 3. 927

97

928 selected (Supplementary Figure 3). Significantly higher structure was seen in plots 929 using outlier loci (except for SPU) demonstrating different patterns of divergence 930 when we compare Pacific and Gulf populations using neutral and outlier loci (Figure 931

3). 932

933

Outlier loci and evidence of selection 934

The number of outliers ranged from 120 to 303 as identified keeping loci three 935 standard deviations above the mean FST (Table 3). The FST values for loci ranged from 936

0.15 to 0.48 for SPU, from 0.23 to 0.52 for KAZ, from 0.33 to 1 for ADA, and from 937

0.66 to 1 for GMI (Figure 4). Overall, all loci were significantly more differentiated 938 in GMI and ADA including 20 fixed loci between Pacific and Gulf populations of 939

GMI and 15 for ADA. The density of outliers along their FST distribution is shown in 940

Figure 4. STRUCTURE plots based on STACKS outlier loci show well differentiated 941 genetic clusters between the two regions, except for SPU. Several individuals in both 942 regions were found to be genetically homogeneous for their respective clusters 943

(Figure 3). 944

98

259 SPU

120 KAZ

330 GMI

207 ADA

0.25 0.50 0.75 1.00 Fst 945

Figure 4. FST distribution and corresponding relative density 946 of outlier loci (from STACKS) per species. Numbers above 947 shapes depict the total number of outlier loci per species. 948 SPU, Semicossyphus pulcher; KAZ, Kyphosus azureus; 949 GMI, Gillichthys mirabilis; ADA, Anisotremus davidsonii. 950 951

The percent of outliers with a GenBank match to a protein coding gene ranged 952 from 19% to 46%. The percent of outliers with a GenBank match to a sequence or a 953 location in a chromosome, but with not gene information, ranged from 13% to 18%. 954

Loci that did not produce a match in GenBank consisted of 42% to 65% of the total 955 outliers. Figure 5 shows the total number of loci per species matching a gene or 956 sequence and those returning not match at all. 957

99

958

100

81 75 57

166 213

Matched 31 none 50 none seqnon-coding Percent 21 genecoding

44 43 25 95 42

49 74

0 SPU KAZ GMI ADA 959 Figure 5. Stacked barplot showing the percentages 960 of outliers giving a GenBank match to a gene 961 (protein or mRNA coding) (blue bottom stack), a 962 sequence or chromosome location without gene 963 information (green middle stack) or not match 964 found (red top stack). Actual numbers of outlier 965 loci are given inside each stack. 966 967 KEGG Orthology and Annotation Results

5% 5% Protein families: genetic information processing Environmental Information Processing 8% 30% Organismal Systems 7% Protein families: signaling and cellular processes Genetic Information Processing 10% Unclassified: metabolism Cellular Processes 10% 25% Energy metabolism 968 Figure 6. KEGG assignments of outlier loci (matching to the same gene in GenBank 969 in more than one species) using the KEGG’s orthology and link annotation tool 970 (KOALA). Pie depicts the relative percentage of each category. 971

100

in GenBank. KAZ, matched the same gene Gulf populations of more than one species. Numbers give F Table 4. List and function of genes (from outlier loci) presumably under selection and that are diverging between Pacific and Development Metabolism Category Kyphosus azureus; azureus; Kyphosus

Gene f

Gene organelles. Cell shape, membrane proteins, signaling Ce Ce R smooth muscles bile & acids. O processing D Function E espiration, heat production lfactory egradation pathways of several amino xtracellular matrix structural constituent ll adhesion, differentiation, migration & ll unction growth, signaling expression and neural activity

family or gene type GMI, GMI,

transduction - s were s

Pathway Functioning Gillichthys mirabilis;

obtained from

, . T cytoskeleton remodeling ransport of salt, glucose,

of cardiac, skeletal & ) between

organization of KEGG, KEGG,

disjunct ADA, ADA, acids. Gen

ST Anisotremus e

values (or range of values if multiple loci within a species populations in each species. SPU, Cards,

Gene Collagen Spectrin Laminin ArfGAP cytochrome c 1 Ubiquinol Anoctamin Dehydrogenase gene Gene family or davidsonii. - Ontology protein type

-

, UniProt, Gene names

0.2 0.25 0.28 SPU

GTEx,

Semichossyphus pulcher;

were 0.41 0.28 0.4 0.26 KAZ

and/

obta

or Entrez Gene ined from Blastn 0.84 0.79 0.81 GMI - 0.98

. 0.4 0.63 0.54 0.38 ADA

- 0.64 - 0.46

972

101

Multiple response Immune R D Category Table 4 ( eproduction evelop.

Continued.)

or

dependent phospholipid binding chemical synapses. S cellularsuppresses proliferation Mediate proteolysis, N activity Apoptosis and T and cell transformation Ap migration C P Pituitary development Function

ynaptic athogen r ell cycle as well as dendrite growth and neuronal ucleic acid binding ansmembrane signaling receptor a optosis,

carbohydrate binding

and - transmission:

associated molecular patterns . O -

integrin Pathway

palmitoyltransferase activity bsolete signal transducer

- Calcium ion mediated signal transduction,

autocatalytic degradation and

Transmission across

chromatin binding

an d

ctivity calcium

.

-

g Gene family or Synaptotagmin E3 ubiquitin Zinc finger Sialoadhesin Guanine F NLRC3 LIM homeobox ene - box - protein type

0.2 0.21 0.15 0.16 0.16 SPU

0.27 0.34 0.33 0.26 0.27 KAZ

0.9 0.97 0.95 GMI - 0.95

0.57 0.76 0.52 0.39 0.38 ADA - - 0.97 0.94

973

102

Furthermore, 15 outlier loci for sargo, 8 for mudsucker, 9 for zebraperch, and 974

8 for sheephead, were matched the same gene or gene family than another loci from a 975 different species (Table 4). All together, these 40 loci matched 15 different genes or 976 gene families in GenBank. GeneCards’ function annotations for these genes provided 977 numerous cellular processes and gene pathways generally classified within 978 metabolism, development, immune response, and possibly reproduction (see 979 discussion). KEGG’s orthology and annotation results classified convergent genes 980 into 9 different functional categories (Figure 6). More than half of all loci were 981 classified into two categories: genes used to process genetic information, and genes 982 processing environmental information. 983

984 Discussion 985 986 The Baja California peninsula offers a rare opportunity to study the effects of 987 isolation in populations of multiple marine fishes with diverse ecologies that have 988 been subjected to similar selective pressures. Here, we obtained RADseq data from 989 two disjunct fishes with hypnotized low gene flow, mudsucker and sargo, and two 990 with high gene flow across the peninsula, zebraperch and California sheephead, to 991 explore the genomic signatures of different micro-evolutionary processes (drift and 992 selection) and search for patterns of convergent evolution in these species. The 993 established pipeline was successful in producing substantial number of loci in which 994 to scanned these populations and investigate our questions. While previous analyses 995 studied disjunct species with a handful of markers (Present 1987; Tranah & Allen 996

1999; Terry et al. 2000; Huang & Bernardi 2001; Stepien et al. 2001; Bernardi et al. 997

103

2003), this study compared populations with approximately 5,000 to 10,000 998

(depending on the species) unlinked loci that passed rigorous filters. (Table 3). An 999 important number of these loci, approximately 300 to 3000, were also polymorphic in 1000 the different populations. 1001

1002

Differentiation, Gene Flow, and Polymorphism 1003

Genomic analyses agreed with previously observed genetic divergence across 1004 the peninsula in the sargo and mudsucker but uncovered undetected patterns of 1005 differentiation in the zebraperch and California sheephead. The high average of 1006 genomic differentiation from thousands of loci in the sargo and mudsucker (FST= 0.1 1007 and 0.3, respectively; Table 3) was consistent with the hypothesis that these species 1008 are in the initial stages of allopatric speciation (Bernardi et al. 2003). Moreover, the 1009 higher level of differentiation in mudsucker than in sargo was also concordant with a 1010 proposed older divergence between Pacific and Gulf goby populations (Bernardi & 1011

Lape 2005; Huang & Bernardi 2001; Bernardi et al. 2003). 1012

While no divergence between zebraperch populations across the peninsula 1013 was previously found and high levels of gene flow were predicted (Bernardi et al. 1014

2003), our results revealed a low but significant differentiation (i.e. higher than 1015 expected, FST=0.03, p-value <0.05) and refuted the potential panmixia between 1016 disjunct zebraperch populations. Similarly, when analyzing microsatellite and 1017 mitochondrial variation throughout the distribution of the sheephead, Poortvliet et al. 1018

(2013) found that even in the lack of divergence between populations in their data, 1019

104

the partition between Pacific and Gulf clusters explained 9.1% of the total variance. 1020

Authors concluded that genetic divergence could be present but it has not been 1021 detected yet. Although high structure was here seen in outlier loci, our results rather 1022 confirm high levels of gene flow as indicated by the lack of structure in neutral loci 1023 and the extremely low and non-significant average divergence between sheephead 1024 populations. 1025

Interestingly, in every species except for the sheephead, the polymorphism 1026 was higher in the Gulf than in Pacific populations (especially for zebraperch and 1027 mudsucker). This might be a reflection of older origins for the Gulf populations and 1028 the environmental variability within the Sea of Cortez. Bernardi and Lape (2005) 1029 estimated based on coalescence analysis applied to mitochondrial cyt b that Gulf 1030 populations of the sargo might be 330-641 thousand years old and Pacific populations 1031

163-317 thousand years old. While congenerics of the sheephead only occur in 1032 temperate latitudes, ancestral lineages for the zebraperch, sargo, and mudsucker, were 1033 present in tropical waters. Gulf populations in the last three species have likely 1034 maintained high proportions of standing variation from their lineages. These species 1035 might have been migrating north and the uplifting of the peninsula could have 1036 produced peripheral populations in the Pacific with less genetic variation. 1037

1038

Population Structure and Signatures of Drift 1039

Results showed very distinct genetic pools for Pacific and Gulf sargo and 1040 mudsucker populations providing more evidence that these populations are in the 1041

105

process of speciation. Their genetic variation clustered with no overlap in our DAPC 1042 analyses and both, neutral and outlier loci, showed high structure between the two 1043 regions (Figure 2). Sargo populations illustrated deeper structure in neutral loci than 1044 the mudsucker, which might be a sign of stronger drift due to smaller effective 1045 population size in this species (Figure 3). Yet, the structure seen in the mudsucker 1046 was also significant. In contrast, the genetic variation of Pacific and Gulf populations 1047 of both zebraperch and sheephead, showed some differentiation but their clusters do 1048 overlap in the DAPC plots. Interestingly, zebraperch also illustrated high structure 1049 when using suspected neutral loci. Thus, the separation of Pacific and Gulf 1050 zebraperch populations might have happened at a relatively recent time, enough for 1051 drift to create the observed structure but too short to produce divergence that can 1052 detected by first generation genetic tools. Concordant with the proposed high levels 1053 of gene flow, sheephead does not show discernable patterns of structure between its 1054 populations in neutral loci. Pacific and Gulf populations for all species appeared to 1055 belong to distinct genetic pools when analyzing outlier loci. 1056

1057

Outlier Loci and Signatures of Convergent Selection 1058

Our analyses support the presence of unique selective pressures in the Pacific 1059 and Gulf. On one side of the Baja California peninsula, the Pacific distributions of 1060 these species cross multiple upwelling regions and geographic points such as Point 1061

Conception and Punta Eugenia where oceanographic conditions change drastically 1062

(Allen et al. 2006). On the other side, the Sea of Cortez represents a very dynamic 1063

106

and diverse environment for fishes as well. It holds multiple upwelling regimes, 1064 average salinity is higher than in the Pacific, and temperature changes drastically with 1065 the seasons and from north to south (Thomson et al. 2000). Oceanographic 1066 parameters are also affected by more than one hundred islands throughout the Gulf, 1067 deep trenches in the middle, and a shallow shelf as well as exaggerated in the 1068 northern Gulf (Thomson et al. 2000). Thus, it is not surprising that very high levels of 1069 structure were observed in the outlier loci of every species (Figure 3). Moreover, we 1070 detected a large number of outlier loci and these contained high FST ranges. The 1071 separation of disjunct populations of sargo and mudsucker might be older than in 1072 zebraperch (sheephead populations might have never been separated at all) producing 1073 many fixed differences (see results section) and noticeably higher average 1074 differentiation not only in outlier but all loci. 1075

Strong evidence of selection is presented by the fact that almost half of all the 1076 outlier loci in sargo (95 loci or 46%), 74 (22%) in mudsucker, 42 (35%) in 1077 zebraperch, and 49 (19%) in sheephead, were directly matched to coding genes by the 1078

Blastn tool in GenBank. Many other outlier loci (13 to 18% per species) were 1079 similarly matched to a chromosomal location but further investigation is needed to 1080 determine whether or not these regions in the genome are adaptive. More importantly, 1081 we identified 40 loci diverging convergently in two or three of the four studied 1082 species (Supplementary Table 2). The pairing of sargo with zebraperch, which are the 1083 two species with the closest ecologies, produced a slightly larger number of diverging 1084 genes than any other species pairing. Nonetheless, assertively associating groups of 1085

107

genes to particular ecologies requires analyze a larger number of species as diverging 1086 genes were well-distributed among almost every possible species combination (Table 1087

4). 1088

KEGG annotation classified 25% these loci into genes used to process 1089 environmental information (Figure 6). This was KEGG’s second largest 1090 classification, only after the expected differences in “housekeeping genes” or genes 1091 used to process genetic information. General categories of suggested gene pathways 1092 and functions for these genes included metabolism, development, immune response 1093 and possibly reproduction. The last one refers to two loci, one in zebraperch and one 1094 in mudsucker, that were matched to LIM Homeobox genes which have been found to 1095 be overexpress in the pituitary and testes (National Institutes of Health’s data base 1096

“Genotype-Tissue Expression, GTEx”). These genes might be involved in the 1097 development of the pituitary (GeneCards) which regulates many fish reproductive 1098 processes and mediates the interaction between the environment and reproductive 1099 organs (Schreibman et al. 1973; Yamazaki 1965). Other potential pathways of these 1100 genes worth noticing include lipid processing, olfactory transduction, salt transport, 1101 and heat production (Table 4). However, the annotation search engines used in this 1102 study and much of the knowledge of the gene functions focus in providing 1103 information pertinent to humans. While the function of orthologous genes is often 1104 maintained a large variety of organisms, the particular manner in which genes 1105 matched to our identified outlier loci partake in cellular pathways and provide 1106 adaptive advantages to these fishes is outside the scope of our analyses. Overall, we 1107

108

interpret our findings as evidence of convergent evolution in these four species and 1108 suspect that other disjunct species might be sharing similar evolutionary trajectories 1109 as well. 1110

1111

Isolation in the Disjunct Populations 1112

The warmer seawater in the southern portions of the peninsula is probably 1113 limiting the gene flow between Pacific and Gulf disjunct populations as commonly 1114 assumed. Yet, temperature alone cannot be responsible for the isolation (Bernardi et 1115 al. 2003). During the summer, the surface temperature in the north of the Gulf can be 1116 the same and sometimes even higher than in the south of the peninsula. Observed 1117 divergence has probably resulted from the combination of several biotic and abiotic 1118 factors such as competition, diet and habitat specializations, as well as differences in 1119 salinity and oceanographic regimes. For instance, sargo and zebraperch both showed 1120 comparable structure trends in neutral and outlier loci suggesting that they might be 1121 following related evolutionary routes. The lack of tide pools (which they both use as 1122 juveniles) and the presence of congeneric species that are more numerous at the 1123 southern portion of the Sea of Cortez might be affecting dispersal in both of these 1124 species. As an exclusive herbivore, zebraperch might be more dependent on food 1125 availability or also displaced by competition from other species of herbivores such as 1126 the opaleye, Girella simplicidens, which is very abundant in the south. 1127

Furthermore, tidal influences and the choice of habitat might be strong sources 1128 of isolation in the mudsucker to the extent that we see substantial genetic separation 1129

109

between our three Gulf Esteros, which are geographically really close to each other. 1130

In fact, the other two recognized species in the genus (the shortjaw mudsucker, G. 1131 seta, and the delta mudsucker, G. detrusus) likely speciated from the longjaw 1132 mudsucker by specializing to distinct habitats (Swift et al. 2011; Barlow 1961). The 1133 longjaw mudsucker is often found very high in the intertidal in small soft bottom 1134 channels of very large estuaries (as in many of our sites). Thus, it is not uncommon 1135 for this habitat to get completely isolated from the rest of the estuaries at low tides. 1136

The northern Sea of Cortez is a particularly extreme case where the ocean is very 1137 shallow, estuaries very large, and tides can be very long (Thomson et al. 2000). This 1138 along with the paucity of estuaries systems in the southern coast of the peninsula can 1139 help explain the isolation between the disjunct populations as well as the substantial 1140 higher differentiation seen within the Gulf than the Pacific. In the case of the 1141

California sheephead, it has been suggested that this species might migrate around the 1142 peninsula following deep reefs as records of its presence in the south are rare but do 1143 exist from deep collections (Bernardi et al. 2003; Poortvliet et al. 2013). In this 1144 manner, this species is able to maintain gene flow across the peninsula without 1145 conspicuous shallow populations. Therefore, the classification of this species as a 1146

Baja California disjunct might have to be reconsidered. 1147

1148

Conclusion 1149

The purpose of this study was to search for common patterns of evolution 1150 among the studied Baja California disjunct fishes which illustrate very different 1151

110

ecologies and life histories but at the same time share a very similar distribution. All 1152 of our analyses are concordant with the formerly proposed hypothesis where sargo 1153 and mudsucker disjunct populations are in the process of allopatric speciation. 1154

Signatures of prolonged isolation in these species were evident as deep genomic 1155 divergence, strong structure in neutral loci, and high differentiation in outlier loci. 1156

Results further uncovered previously unseen structure in disjunct zebraperch 1157 populations which might have only recently become isolated. All species, except for 1158 sheephead, show signs of genetic drift in the form of structure in neutral loci. 1159

California sheephead Pacific and Gulf populations did not show evidence implying 1160 that they ever separated. Instead, this species presented the highest levels of gene 1161 flow, most like achieved by migrating around the peninsula by following deep reefs. 1162

Disjunct species experience very different environments along their distributions 1163 within and between the northern Eastern Pacific and the Sea of Cortez. Our results 1164 strongly support the presence of unique selective pressures in the Pacific and Gulf 1165 environments. We identified specific adaptive genomic regions that are 1166 simultaneously contributing to the divergence across the peninsula and that might be 1167 assisting local adaptation of species with diverse ecologies. While exemplifying the 1168 utility of RADseq data, this study provides a useful framework to characterize 1169 genomic patterns of neutral and non-neutral population structure, and highlights the 1170 potential of outlier loci analysis to pinpoint markers that can identify specific 1171 selective pressures in isolated populations that might be in the process of allopatric 1172 speciation. 1173

111

References 1174 1175 Allen, L. G., Pondella, D. J. & Horn, M. h. (2006) The Ecology of Marine Fishes: 1176

California and Adjacent Waters. UC Press. 1177

Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., 1178

Selker, E. U., Cresko, W. A. & Johnson, E. A. (2008) Rapid SNP Discovery 1179

and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE 3, 1–7. 1180

Barlow, G. W. (1961) Gobies of the Genus Gillichthys , with Comments on the 1181

Sensory Canals as a Taxonomic Tool. American Society of Ichthyologists and 1182

Herpetologists 423–437. 1183

Bay, L. K., Crozier, R. H. & Caley, M. J. (2006) The Relationship between 1184

Population Genetic Structure and Pelagic Larval Duration in Coral Reef 1185

Fishes on the . 149, 1247–1256. 1186

Bernardi, G. & Lape, J. (2005) Tempo and Mode of Speciation in the Baja California 1187

Disjunct Fish Species Anisotremus Davidsonii. Molecular Ecology 14, 4085– 1188

4096. 1189

Bernardi, G., Findley, L. & Rocha-Olivares, A. (2003) Vicariance and Dispersal 1190

across Baja California in Disjunct Marine Fish Populations. Evolution; 1191

international journal of organic evolution 57, 1599–1609. 1192

Bernardi, G., Beldade, R., Holbrook, S. J. & Schmitt, R. J. (2012) Full-Sibs in 1193

Cohorts of Newly Settled Coral Reef Fishes. PLoS ONE 7. 1194

112

Bernardi, G., Azzurro, E., Golani, D. & Miller, M. R. (2016) Genomic Signatures of 1195

Rapid Adaptive Evolution in the Bluespotted Cornetfish, a Mediterranean 1196

Lessepsian Invader. Molecular ecology 25, 3384–3396. 1197

Bierne, N., Welch, J., Loire, E., Bonhomme, F. & David, P. (2011) The Coupling 1198

Hypothesis: Why Genome Scans May Fail to Map Local Adaptation Genes. 1199

Molecular Ecology 20, 2044–2072. 1200

Bierne, N., Roze, D. & Welch, J. J. (2013) Pervasive Selection or Is It.? Why Are 1201

FSToutliers Sometimes so Frequent? Molecular Ecology 22, 2061–2064. 1202

Bowen, B. W., Bass, A. L., Muss, A., Carlin, J. & Robertson, D. R. (2006) 1203

Phylogeography of Two Atlantic Squirrelfishes (Family Holocentridae): 1204

Exploring Links between Pelagic Larval Duration and Population 1205

Connectivity. Marine Biology 149, 899–913. 1206

Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. (2013) 1207

Stacks: An Analysis Tool Set for Population Genomics. Molecular Ecology 1208

22, 3124–3140. 1209

Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W. & Postlethwait, J. H. (2011) 1210

Stacks : Building and Genotyping Loci De Novo From Short-Read Sequences. 1211

G3&#58; Genes|Genomes|Genetics 1, 171–182. 1212

Coyne, J. A. & Orr, H. A. (2004) Speciation. Sunderland, Massachusetts: Sinauer 1213

Associates. 1214

113

Earl, D. & VonHoldt, B. (2012) STRUCTURE HARVESTER: A Website and 1215

Program for Visualizing STRUCTURE Output and Implementing the Evanno 1216

Method. Conservation Genetics Resources 4, 359–361. 1217

Endler, J. A. (1977) Geographic Variation, Speciation and Clines. Princeton 1218

University Press. 1219

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the Number of Clusters of 1220

Individuals Using the Software STRUCTURE: A Simulation Study. 1221

Molecular Ecology 14, 2611–2620. 1222

Excoffier, L. & Lischer, H. E. L. (2010) Arlequin Suite Ver 3.5: A New Series of 1223

Programs to Perform Population Genetics Analyses under Linux and 1224

Windows. Molecular Ecology Resources 10, 564–567. 1225

Gaither, M. R., Bernal, M. A., Coleman, R. R., Bowen, B. W., Jones, S. A., Simison, 1226

W. B. & Rocha, L. A. (2015) Genomic Signatures of Geographic Isolation 1227

and Natural Selection in Coral Reef Fishes. Molecular Ecology 24, 1543– 1228

1557. 1229

Helfman, G. S., Collette, B. B., Facey, D. E. & Bowen, B. W. (2009) The Diversity of 1230

Fishes, Second. Wiley-Blackwell. 1231

Huang, D. & Bernardi, G. (2001) Disjunct Sea of Cortez-Pacific Ocean Gillichthys 1232

Mirabilis Populations and the Evolutionary Origin of Their Sea of Cortez 1233

Endemic Relative , Gillichthys Seta. Marine Biology 138, 421–428. 1234

114

Jombart, T. (2008) Adegenet: A R Package for the Multivariate Analysis of Genetic 1235

Markers. Bioinformatics 24, 1403–1405. 1236

Jombart, T., Devillard, S., Balloux, F., Falush, D., Stephens, M., Pritchard, J., 1237

Pritchard, J., Stephens, M., Donnelly, P., Corander, J., et al. (2010) 1238

Discriminant Analysis of Principal Components: A New Method for the 1239

Analysis of Genetically Structured Populations. BMC Genetics 11, 94. 1240

Jones, G. P., Jones, G. P., Milicich, M. J., Milicich, M. J., Emslie, M. J., Emslie, M. 1241

J., Lunow, C. & Lunow, C. (1999) Self-Recruitment in a Coral Reef Fish 1242

Population. Nature 402, 802–804. 1243

Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., 1244

Kawashima, S., Okuda, S., Tokimatsu, T., et al. (2008) KEGG for Linking 1245

Genomes to Life and the Environment. Nucleic Acids Research 36, 480–484. 1246

Leis, J. M. & Carson-Ewart, B. M. (2000) Behaviour of Pelagic Larvae of Four 1247

Coral-Reef Fish Species in the Ocean and an Atoll Lagoon. Coral Reefs 19, 1248

247–257. 1249

Leis, J. M. & Carson-Ewart, B. M. (2002) In Situ Settlement Behaviour of 1250

Damselfish (Pomacentridae) Larvae. Journal of Fish Biology 61, 325–346. 1251

Leis, J. M. & Lockett, M. M. (2005) Localization of Reef Sounds by Settlement Stage 1252

Larvae of Coral Reef Fishes (Pomacentridae). Bullentin of Marine Science 76, 1253

715–724. 1254

115

Leis, J. M., Carson-Ewart, B. M., Hay, A. C. & Cato, D. H. (2003) Coral-Reef 1255

Sounds Enable Nocturnal Navigation by Some Reef-Fish Larvae in Some 1256

Places and at Some Times. Journal of Fish Biology 63, 724–737. 1257

Lischer, H. E. L. & Excoffier, L. (2012) PGDSpider: An Automated Data Conversion 1258

Tool for Connecting Population Genetics and Genomics Programs. 1259

Bioinformatics 28, 298–299. 1260

Longo, G. & Bernardi, G. (2015) The Evolutionary History of the Embiotocid 1261

Surfperch Radiation Based on Genome-Wide RAD Sequence Data. Molecular 1262

Phylogenetics and Evolution 88, 55–63. 1263

Lotterhos, K. E. & Whitlock, M. C. (2015) The Relative Power of Genome Scans to 1264

Detect Local Adaptation Depends on Sampling Design and Statistical Method. 1265

Molecular Ecology 24, 1031–1046. 1266

Medina, M. & Walsh, P. J. (2000) Comparison of Four Mendelian Loci of the 1267

California Sea Hare ( Aplysia Californica ) from Populations of the Coast of 1268

California and the Sea of Cortez. Marine Biotechnology 2, 449–455. 1269

Miller, D. J. & Lea, R. N. (1972) Guide to the Coastal Marine Fishes of California. 1270

Berkeley, CA. 1271

Miller, M., Dunham, J., Amores, a, Cresko, W. & Johnson, E. (2007) Genotyping 1272

Using Restriction Site Associated DNA (RAD) Markers. Genome Research 1273

17, 240–248. 1274

116

Miller, M. R., Brunelli, J. P., Wheeler, P. A., Liu, S., Rexroad, C. E., Palti, Y., Doe, 1275

C. Q. & Thorgaard, G. H. (2012) A Conserved Haplotype Controls Parallel 1276

Adaptation in Geographically Distant Salmonid Populations. Molecular 1277

Ecology 21, 237–249. 1278

Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. & Kanehisa, M. (1999) KEGG: 1279

Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 27, 29– 1280

34. 1281

Palumbi, S. R. (1992) Marine Speciation on a Small Planet. Trends in Ecology & 1282

Evolution 7, 114–118. 1283

Planes, S., Lecaillon, G., Lenfant, P. & Meekan, M. (2002) Genetic and Demographic 1284

Variation in New Recruits of Naso Unicornis. Journal of Fish Biology 61, 1285

1033–1049. 1286

Poortvliet, M., Longo, G. C., Selkoe, K., Barber, P. H., White, C., Caselle, J. E., 1287

Perez-Matus, A., Gaines, S. D. & Bernardi, G. (2013) Phylogeography of the 1288

California Sheephead, Semicossyphus Pulcher: The Role of Deep Reefs as 1289

Stepping Stones and Pathways to Antitropicality. Ecology and Evolution 3, 1290

4558–4571. 1291

Present, T. M. C. (1987) Genetic Differentiation of Disjunct Gulf of California and 1292

Pacific Outer Coast Populations of Hypsoblennius Jenkinsi Genetic 1293

Differentiation of Disjunct Gulf of California and Pacific Outer Coast 1294

Populations of Hypsoblennius Jenkinsi. Copeia 1987, 1010–1024. 1295

117

Pritchard, J. K., Stephens, M. & Donnelly, P. (2000) Inference of Population 1296

Structure Using Multilocus Genotype Data. Genetics 155, 945–959. 1297

1298

Pujolar, J. M., Maes, G. E. & Volckaert, F. A. M. (2006) Genetic Patchiness among 1299

Recruits in the European Eel Anguilla Anguilla. Marine Ecology Progress 1300

Series 307, 209–217. 1301

R Core Team. (2013) R: A Language and Environment for Statistical Computing. 1302

Vienna, Austria: R Foundation for Statistical Computing 2013, doi:ISBN 3- 1303

900051-07-0. 1304

Rocha, L. A., Bass, A. L., Robertson, D. R. & Bowen, B. W. (2002) Adult Habitat 1305

Preferences, Larval Dispersal, and the Comparative Phylogeography of Three 1306

Atlantic Surgeonfishes (Teleostei: Acanthuridae). Molecular Ecology 11, 1307

243–252. 1308

Schreibman, M. P., Leatherland, J. F. & McKeown, B. A. (1973) Functional 1309

Morphology of the Teleost Pituitary Gland. American Zoologist 13, 719–742. 1310

Selkoe, K. A., Gaines, S. D., Caselle, J. E. & Warner, R. R. (2006) Current Shifts and 1311

Kin Aggregation Explain Genetic Patchiness in Fish Recruits. Ecology 87, 1312

3082–3094. 1313

Shulman, M. & Bermingham, E. (1995) Early Life Histories , Ocean Currents , and 1314

the Population Genetics of Caribbean Reef Fishes. Evolution 49, 897–910. 1315

118

Stepien, C. A., Rosenblatt, R. H. & Bargmeyer, B. A. (2001) Phylogeography of the 1316

Spotted Sand Bass , Paralabrax Maculatofasciatus : Divergence of Gulf of 1317

California and Pacific Coast Populations. Evolution 55, 1852–1862. 1318

Stockwell, B. L., Larson, W. A., Waples, R. K., Abesamis, R. A., Seeb, L. W. & 1319

Carpenter, K. E. (2016) The Application of Genomics to Inform Conservation 1320

of a Functionally Important Reef Fish (Scarus Niger) in the Philippines. 1321

Conservation Genetics 17, 239–249. 1322

Swearer, S. E., Shima, J. S., Hellberg, M. E., Thorrold, S. R., Jones, G. P., Robertson, 1323

D. R., Morgan, S. G., Selkoe, K. a, Ruiz, G. M. & Warner, R. R. (2002) 1324

Evidence of Self Recruitment in Demersal Marine Populations. Bulletin of 1325

Marine Science 70, 251–271. 1326

Swift, C. C., Findley, L. T., Ellingson, R. a, Flessa, K. W. & Jacobs, D. K. (2011) The 1327

Delta Mudsucker, Gillichthys Detrusus, a Valid Species (Teleostei: Gobiidae) 1328

Endemic to the Colorado River Delta, Northernmost Gulf of California, 1329

Mexico. Copeia 2011, 93–102. 1330

Terry, A., Bucciarelli, G. & Bernardi, G. (2000) Restricted Gene Flow and Incipient 1331

Speciation in Disjunct Pacific Ocean and Sea of Cortez Populations of a Reef 1332

Fish Species , Girella Nigricans. Evolution 54, 652–659. 1333

Thomson, D. A., Findley, L. T. & Kersitch, A. N. (2000) Reef Fishes of the Sea of 1334

Cortez: The Rocky Shore Fishes of the Gulf of California. Austin, TX: The 1335

University of Texas Press. 1336

119

Tranah, G. J. & Allen, L. G. (1999) Morphologic and Genetic Variation among Six 1337

Populations of the Spotted Sand Bass, Paralabrax Maculatofasciatus, from 1338

Southern California to the Upper Sea of Cortez. Los Angeles, CA. 1339

Wickham, H. (2016) Ggplot2: Elegant Graphics for Data Analysis. New York. 1340

Yamazaki, F. (1965) Endocrinological Studies on the Reproduction of the Femail 1341

Goldfish, Carassius Auratus L., with Special Reference to the Function of the 1342

Pituitary Gland. Vol. 13. 1343

1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371

120

Supplemental Information: 1372 40

USA San Francisco 35

Point San Diego Conception 30

Punta Eugenia

25 MEXICO

N Pacific Ocean

0 200 400 600 km 20

-130 -125 -120 -115 -110 -105 1373 Supplementary Figure 1. Typical Baja California disjunct 1374 distribution and relevant phylogeograpic breaks in the 1375 Northeastern Pacific Ocean (Point Conception and Punta Eugenia). 1376 1377

1378 Supplementary Figure 2. Phylogenetic relationships (mtDNA) between populations of 1379 the studied Baja California disjuncts. Pacific populations north of Punta Eugenia 1380 (black circles), south of Punta Eugenia (gray circles), and in the Sea of Cortez (white 1381 circles). Note reciprocal monophyly by Pacific and Sea of Cortez populations in sargo 1382 and mudsucker. The two white circles within the mudsucker gray clade are suspected 1383 anthropogenic releases by sport fishermen (Bernardi et al. 2003). 1384

121

1385 1386 1387 1388 1389 1390 1391 1392 Supplementary Table 1. The 19 Baja California disjunct species 1393 (Bernardi et al. 2003). 1394 Family Species Common name Atherinidae Leuresthes tenuis/ L. grunion sardina Girellidae Girella nigricans / G. opaleye simplicidens Haemulidae Anisotremus davidsonii sargo Blenniidae Hypsoblennius jenkinsi mussel blenny Chaenopsidae Chaenopsis alepidota oragethroat pikeblenny Serranidae Paralabrax spotted sand bass maculatofasciatus Gobiidae Gillichthys mirabilis longjaw mudsucker Gobiidae Lythrytpnus dalli blue banded boby Blenniidae Hypsoblennius gentilis bay blenny Kyphosidae Kyphosus azureus zebraperch Labridae Halichoeres semicinctus rock wrasse Labridae Semicossyphus pulcher California sheephead Embiotocidae Zalembius rosaceus pink surfperch Scorpaenidae Sebastes macdonaldi Mexican rockfish Scorpaenidae Scorpaena guttata scorpionfish Polyprionidae Stereolepis gigas giant seabass Agonidae Xenetremus ritteri flagfin poacher Pleuronectidae Hypsopsetta guttulata diamond turbot Pleuronectidae Pleuronichthys verticalis hornyhead turbot 1395 1396 1397 1398 1399 1400 1401

122

Supplementary Figure 3. Structure Harvester Output: 1402 1403 Semicossyphus pulcher 1404 10,273 neutral loci (10532 total loci -259 outlier loci). K=2 was selected. 1405

K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K

1 10 -354276.510000 55.834148 — — — 2 10 -305608.620000 80.766451 48667.890000 361094.08000 4470.84 3 10 -618034.810000 165002.207277 -312426.190000 2376.310000 0.01440

4 10 -932837.310000 706850.782204 -314802.500000 1720110.51222 2.43348

5 9 -2967750.3222 2130977.65850 -2034913.01222 — — 259 outlier loci (potentially under selection). K=2 was selected. 1406 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -11003.310000 1.568757 — — — 2 10 -9736.060000 1.581279 1267.250000 715.790000 452.665110 3 10 -9184.600000 73.169605 551.460000 218.750000 2.989629 4 10 -8851.890000 8.444518 332.710000 608.250000 72.028976 5 10 -9127.430000 1883.600076 -275.540000 — — Kyphosus azureus 1407 1408 4738 neutral loci (4858 total loci -120 outlier loci). K=2 was selected. 1409 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -115762.090000 29.875871 — — — 2 10 -131925.930000 17087.715398 -16163.840000 2401051.460 140.513311 3 10 -2549141.2300 1680878.7350 -2417215.3000 1992290.290 1.185267

4 10 -2974066.2400 1793501.1344 -424925.01000 5426858.530 3.025846 5 10 -8825849.7800 5254278.8003 -5851783.5400 — — 120 outlier loci (potentially under selection). K=2 was selected. 1410 1411 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 11 -4032.263636 1.145664 — — — 2 11 -3226.272727 1.842331 805.990909 660.236364 358.370200 3 11 -3080.518182 60.214497 145.754545 109.163636 1.812913 4 11 -2825.600000 2.618778 254.918182 167.009091 63.773664 5 11 -2737.690909 33.018130 87.909091 — —

123

Gillichthys mirabilis 1412 8,370 neutral loci (8700 total loci -330 outlier loci). K=2 was selected. 1413 Delta K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| K 1 10 -394436.1400 830.231446 — — — 2 10 -545660.7100 476498.123 -151224.5700 397687.250 0.8346 3 10 -299198.0300 479.341210 246462.6800 3172066.25 6617.5 4 10 -3224801.600 1706842.710 -2925603.5700 5429106.71 3.1807 5 8 -11579511.88 6499308.876 -8354710.287 — — 330 outlier loci (potentially under selection). K=2 was selected. 1414 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -31118.480000 2.626277 — — — 2 10 -17679.410000 1.779794 13439.070000 10190.760000 5725.808731 3 10 -14431.100000 4.053805 3248.310000 3203.300000 790.195917 4 10 -14386.090000 396.973090 45.010000 2480.440000 6.248383 5 10 -16821.520000 3373.243943 -2435.430000 — — 1415 Anisotremus davidsonii 1416 8,031 neutral loci (8,238total loci -207 outlier loci). K=3 was selected. 1417 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -338481.170000 32.570063 — — — 2 10 -291762.360000 7195.96301 46718.8100 38095.27000 5.293978 3 10 -283138.820000 116.141167 8623.540000 9843788.000 84757.09

4 10 -10118303.370 5180965.32 -9835164.55 18794462.72 3.627599

5 10 -38747930.640 13001531.73 -28629627.2 — — 1418 207 outlier loci (potentially under selection). K=2 was selected. 1419 1420 K Reps Mean LnP(K) Stdev LnP(K) Ln'(K) |Ln''(K)| Delta K 1 10 -13941.350000 2.055480 — — —

2 10 -10560.720000 0.814862 3380.630000 1883.990000 2312.035795

3 10 -9064.080000 1.209040 1496.640000 1801.310000 1489.867966 4 10 -9368.750000 1473.669507 -304.670000 391.240000 0.265487 5 10 -10064.660000 1613.562619 -695.910000 — —

124

Supplementary Table 2. List of genes which were matched to outlier loci diverging simultaneously in disjunct populations of more than one species. Gene names are actual GenBank descriptions. FST values show the divergence between Pacific and Gulf populations of Gillichthys mirabilis (GMI), Anisotremus davidsonii (ADA), Kyphosus azureus (KAZ), or Semicossyphus pulcher (SPU).

Gene Species Fst: Pac-Gulf

Anoctamin 5 (ano5), mRNA KAZ 0.2558

Anoctamin-1-like (LOC109997874), mRNA SPU 0.2473 ArfGAP with SH3 domain, ankyrin repeat and PH GMI 0.8058

domain 1 (asap1), mRNA ArfGAPP with dual PH domain-containing protein 1- KAZ 0.2838

like (LOC111671840), mRNA Collagen alpha-4(IV) chain-like (LOC111654436), ADA 0.6367

mRNA Collagen alpha-5(IV) chain-like (LOC111663337), ADA 0.5631

mRNA Collagen type VII alpha 1 chain (col7a1), transcript KAZ 0.4079

variant X4, mRNA Collagen type XIX alpha 1 chain (col19a1), transcript ADA 0.4478

variant X6, mRNA

Collagen type XXIV alpha 1 chain (col24a1), mRNA ADA 0.4044 Cytochrome b-c1 complex subunit 1, mitochondrial-like SPU 0.1978

(LOC109980887), mRNA Cytochrome b-c1 complex subunit 2, mitochondrial-like KAZ 0.397

(LOC109985804), mRNA Dehydrogenase E1 and transketolase domain containing ADA 0.3765

1 (dhtkd1), mRNA Dehydrogenase/reductase SDR family member on ADA 0.4571

chromosome X-like (LOC109139865), partial mRNA

Dehydrogenase/reductase X-linked (dhrsx), mRNA SPU 0.2844 E3 ubiquitin-protein ligase HECW2-like SPU 0.1978

(LOC110001920), mRNA E3 ubiquitin-protein ligase Midline-1 (LOC104924095), GMI 0.9507

mRNA E3 ubiquitin-protein ligase RBBP6-like ADA 0.7601

(LOC106531352), transcript variant X2, mRNA E3 ubiquitin-protein ligase SMURF2-like GMI 0.9023

(LOC110175789), partial mRNA

F-box protein 16 (fbxo16), mRNA KAZ 0.3278

F-box protein 31 (fbxo31), transcript variant X2, mRNA ADA 0.3819

125

Supplementary Table 2 (Continued) Gene Species Fst: Pac-Gulf

Guanine nucleotide exchange factor 1 (sos1), SOS ADA 0.9382

Ras/Rac , mRNA Guanine nucleotide exchange factor 5-like, rap SPU 0.1552

(LOC111668812), mRNA Guanine nucleotide exchange factor DBS-like ADA 0.3907

(LOC109951381), partial mRNA

Laminin subunit alpha 1 (lama1), mRNA ADA 0.5385

Laminin subunit alpha 5 (lama5), mRNA GMI 0.976 Laminin subunit gamma-3-like (LOC110157350), GMI 0.7902

mRNA

LIM homeobox 2 (lhx2), transcript variant X2, mRNA KAZ 0.2684 LIM homeobox transcription factor 1-alpha-like GMI 0.9543

(LOC110169526), mRNA

NLRC3-like protein (LOC104930296), mRNA SPU 0.1607 NLRC3, NLR family CARD domain-containing protein KAZ 0.2558

3-like (LOC111670284), mRNA Sialoadhesin (LOC105008515), transcript variant X4, SPU 0.1464

mRNA

Sialoadhesin-like (LOC110153737), mRNA GMI 0.9749 Spectrin alpha, non-erythrocytic 1 (sptan1), transcript ADA 0.6268

variant X3, mRNA

Spectrin beta, non-erythrocytic 4 (sptbn4), mRNA GMI 0.8411

Synaptotagmin 12 (syt12), mRNA ADA 0.5689

Synaptotagmin 14 (syt14), mRNA KAZ 0.269 Zinc finger DHHC-type containing 13 (zdhhc13), ADA 0.5214

mRNA Zinc finger E-box-binding homeobox 2-like ADA 0.9707

(LOC108874500), transcript variant X2, mRNA Zinc finger protein 385A-like (LOC111668975), KAZ 0.3392

transcript variant X4, mRNA

Zinc finger protein 385B (znf385b), mRNA SPU 0.2127

126

GENERAL CONCLUSION

The field of evolutionary biology has undergone major shifts throughout its short history, from Darwin’s theory of natural selection and Mendel’s fundamental work, to the reconciliation of both in the modern synthesis. Likewise, the methodologies used to study evolution have also drastically changed, from the characterization of DNA as the unit of change in organisms, to the successional development of techniques to detect changes in allele frequencies in natural populations, and to the current parallel sequencing and high throughoutput technologies. The technological advances in last few decades have forever changed the field once again, as they allow scientists to examine the mechanisms of evolutionary change with tools more powerful than ever before. In this genomic era, finding appropriate natural systems where to apply these technologies is also more crucial than ever.

The Baja California disjunct fishes represent an extraordinary natural experiment where the formation of the Baja California peninsula divided the distribution of 19 temperate marine fishes into allopatric Pacific and Sea of Cortez populations, as well as sympatric populations within each region. This natural repetition of species with populations experiences different levels of gene flow presents an excellent framework to test hypothesis of biogeography and to search for genomic signatures of drift and selection.

127

This dissertation has employed Restriction Site-Associated DNA sequencing

(RADseq) to four of these species and revealed previously unseen genetic structure and narrower estimates for the time of disjunction helping to pinpoint the specific events that isolated the populations of these species. This dissertation also analyzed these populations with large numbers of neutral and non-neutral loci and found evidence of convergent drift and selection. Results suggested that the Baja California peninsula acts as an effective barrier to gene flow between populations of the sargo, mudsucker and to a lesser extent zebraperch, but not for the sheephead which might be maintaining connectivity using deep reefs as stepping stones from Pacific to Sea of

Cortez. Point Conception and Punta Eugenia appeared to also significantly impact gene flow between Pacific populations of mudsucker and sargo, respectively.

Sympatric populations within the Pacific and Sea of Cortez, showed large number of outlier loci matching protein-coding genes and emulating a complex selective landscape where individual populations might be pursuing different adaptive peaks in the face of ongoing gene flow. Observed differentiation patterns in these loci, where most fell near the lower end of the range of their differentiation values, supported a possible prevalence of soft selective sweeps brining these populations closer to their respective selective summits.

The strongest evidence of selection, however, was seen between allopatric

Pacific and Sea of Cortez populations of all four species. Results strongly supported the idea of differential selection in these regions as indicated by the observed highest non-neutral differentiation (including dozens of fixed differences in populations of

128

sargo and mudsucker), and the largest number of outlier loci matching genes (up to

49% of all outliers in sargo, for instance). These findings, along with the higher

“outlier loci evenness” or the more even distribution of loci along differentiation values observed in disjunct populations, suggested that the Pacific and Sea of Cortez might represent two very distinct selective peaks and that a more even contribution of soft and hard selective sweeps might be involved in allopatric adaptation. Therefore, low levels of gene flow might facilitate or be required for the presence of hard sweeps in the adaptation of these populations. Furthermore, the examination of outlier loci matching genes among allopatric populations of these species, exposed 15 genomic regions, potentially involved in processing environmental information, metabolism, immune response, and possibly reproduction, diverging in more than one of these species at the same time. While the differentiation of these loci should be validated larger population samples, these results implied the presence of convergent evolution in these species in spite of their very different ecologies and life history strategies.

This dissertation highlights the potential of RADseq to reveal genomic signatures of evolutionary mechanisms, and study evolutionary histories of organisms as well as the interaction between populations and their environments. The unprecedented capabilities to generate information of modern genetic tools, offer a real opportunity to improve our ability to protect natural populations at a time when many organisms are endangered by anthropogenic activity, climatic changes, and other natural pressures.

129