Brigham Young University BYU ScholarsArchive

Theses and Dissertations

2021-07-29

Synthesizing Phylogeography and Community Ecology to Understand Patterns of Community Diversity

Trevor J. Williams Brigham Young University

Follow this and additional works at: https://scholarsarchive.byu.edu/etd

Part of the Life Sciences Commons

BYU ScholarsArchive Citation Williams, Trevor J., "Synthesizing Phylogeography and Community Ecology to Understand Patterns of Community Diversity" (2021). Theses and Dissertations. 9176. https://scholarsarchive.byu.edu/etd/9176

This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected]. Synthesizing Phylogeography and Community Ecology to

Understand Patterns of Community Diversity

Trevor J. Williams

A dissertation submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Jerald B. Johnson, Chair Byron J. Adams Mark C. Belk Seth M. Bybee Lacey L. Knowles

Department of Biology

Brigham Young University

Copyright © 2021 Trevor J. Williams

All Rights Reserved

ABSTRACT

Synthesizing Phylogeography and Community Ecology to Understand Patterns of Community Diversity

Trevor J. Williams Department of Biology, BYU Doctor of Philosophy

Community ecology is the study of the patterns and processes governing species abundance, distribution, and diversity within and between communities. Likewise, phylogeography is the study of the historic processes controlling genetic diversity across space. Both fields investigate diversity, albeit at different temporal, spatial and taxonomic scales and therefore have varying assumptions. Community ecology typically focuses on contemporary mechanisms whereas phylogeography studies historic ones. However, new research has discovered that both genetic and community diversity can be influenced by contemporary and historic processes in tandem. As such, a growing number of researchers have called for greater integration of phylogeography and ecology to better understand the mechanisms structuring diversity. In this dissertation I attempt to add to this integration by investigating ways that phylogeography and population genetics can enhance studies on community ecology. First, I review traditional studies on freshwater community assembly using null model analyses of species co-occurrence, which shows that fish are largely structured by deterministic processes, though the importance of different mechanisms varies across climates, habitats, and spatial scales. Next, I show how phylogeographic data can greatly enhance inferences of community assembly in freshwater fish communities in Costa Rica and Utah respectively. My Costa Rican analyses indicate that historic eustatic sea-level change can be better at predicting community structure within a biogeographic province than contemporary processes. In comparison, my Utah analyses show that historic dispersal between isolated basins in conjunction with contemporary habitat filtering, dispersal limitation, and extinction dynamics both influence community assembly through time. Finally, I adapt a forward-time population genetics stochastic simulation model to work in a metacommunity context and integrate it with Approximate Bayesian Computation to infer the processes that govern observed community composition patterns. Overall, I show that community ecology can be greatly enhanced by including information and methods from different but related fields and encourage future ecologists to further this research to gain a greater understanding of biological diversity.

Keywords: Approximate Bayesian Computation, community ecology, dispersal, diversity, habitat filtering, history, metacommunity ecology, phylogeography, population genetics, process-pattern, synthesis

ACKNOWLEDGMENTS

I first want to thank Jerry, without whom none of this would be possible. His

encouragement and advice have helped me grow not just academically, but in all aspects of life.

He not only taught me a greater love for the study of evolution and ecology but also how to have

fun while doing it. I am also incredibly thankful for Byron, Mark, Seth, and Lacey for all their

help in improving my research. Graduate school is challenging, but it would have been even

more challenging had it not been for the amazing friends and colleagues I have made along the

way. I am especially grateful to my fellow graduate(d) students Scott George, Alli Duffy, Gareth

Powell, Ellie Nielsen, Andrea Roth-Monzon, Kaitlyn Golden, Peter Searle, Kevin Lamb,

Kandace Flanary, Josh Verde, and Spencer Ingley for their friendship and support. I will never

forget my fond memories of congratulatory J-Dawgs and Chub Club adventures. I also want to thank Becca White and the amazing undergraduate students of the Johnson Lab as well as Steve

Peck for helping me find my fondness for modeling and theoretical biology.

I cannot thank my family enough for their constant support, without which I could never have gotten this far. To the never-ending encouragement of my parents, Winslow and Kristi, and the countless hours of babysitting and emotional support from my siblings, Zach, Paige, and

Nathan, their spouses, Janelle, Thomas, and Sariah, and my aunts, Janelle and Carie, all I can say

is a heartfelt thank you, and I love you. Finally, and most importantly, I am the most thankful for

my amazing wife, Aimee, and my two daughters, Addison and Hadley. You mean all the world

to me, and your love helped me through all the ups and downs of this crazy ride. TABLE OF CONTENTS

TITLE PAGE ...... i ABSTRACT ...... ii ACKNOWLEDGMENTS ...... iii TABLE OF CONTENTS ...... iv LIST OF TABLES ...... viii LIST OF FIGURES ...... x CHAPTER ONE: GLOBAL INSIGHTS INTO FRESHWATER COMMUNITY DYNAMICS USING A META-ANALYSIS OF CO-OCCURRENCE NULL MODELS ...... 1 ABSTRACT ...... 1 INTRODUCTION ...... 2 METHODS...... 5 Data Collection ...... 5 Statistical Analyses ...... 8 RESULTS...... 9 DISCUSSION ...... 10 FUTURE DIRECTIONS AND CONCLUSION ...... 15 ACKNOWLEDGMENTS ...... 16 REFERENCES ...... 18 CHAPTER TWO: HISTORY PREDICTS COMMUNITY DIVERSIFICATION WITHIN A BIOGEOGRAPHIC PROVINCE OF FRESHWATER FISH BETTER THAN THE ENVIRONMENT OR SPACE ...... 36 ABSTRACT ...... 36 INTRODUCTION ...... 37 MATERIALS AND METHODS ...... 40 Study System ...... 40 Data Acquisition ...... 42 Data Analysis ...... 45 RESULTS...... 46 DISCUSSION ...... 47 Principal Findings ...... 47 Historical Processes Can Affect Local Communities ...... 48 Integrating Genetic, Community, and Biogeographic Information ...... 50

iv

ACKNOWLEDGMENTS ...... 52 REFERENCES ...... 53 CHAPTER THREE: COMPARATIVE PHYLOGEOGRAPHY INFORMS COMMUNITY STRUCTURE AND ASSEMBLY DURING AND AFTER HISTORIC LAKE BONNEVILLE ...... 73 ABSTRACT ...... 73 INTRODUCTION ...... 74 METHODS...... 77 Study System ...... 77 Data Collection ...... 78 Data Analyses ...... 80 RESULTS...... 84 DISCUSSION ...... 85 Historic Migration ...... 85 Did History Influence Modern Local Communities? ...... 86 Conclusion ...... 88 ACKNOWLEDGMENTS ...... 88 REFERENCES ...... 90 CHAPTER FOUR: ADAPTED POPULATION GENETICS MODELS WITH APPROXIMATE BAYESIAN COMPUTATION INFORM PROCESSES CONTROLLING METACOMMUNITIES ...... 109 ABSTRACT ...... 109 INTRODUCTION ...... 110 METHODS...... 112 Description of the Model ...... 112 Validation of the Model ...... 113 Approximate Bayesian Computation with the Model ...... 114 RESULTS...... 116 DISCUSSION ...... 118 Model Validation ...... 118 ABC in Community Ecology ...... 119 ACKNOWLEDGMENTS ...... 122 LITERATURE CITED ...... 123

v

APPENDIX A: ADDITIONAL METHODS FOR CHAPTER ONE ...... 136 METHODS FOR ANALYSES RUN ON RAW DATA ...... 136 Data from Jackson et al. (1992) ...... 136 Data from Snodgrass et al. (1996) ...... 136 Data from Peres-Neto (2004) ...... 136 Data from Cordero and Jackson (2019) ...... 136 Data from Giam and Olden (2016) ...... 137 Data from Zbinden (2021) ...... 137 REFERENCES ...... 138 APPENDIX B: SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER ONE ...... 142 APPENDIX C: KEY TO THE NATIVE CICHLIDAE OF COSTA RICA ...... 143 KEY TO THE GENERA OF COSTA RICAN CICHLIDAE ...... 143 KEY TO THE SPECIES OF PARACHROMIS...... 144 KEY TO THE SPECIES OF ...... 145 KEY TO THE SPECIES OF CRIBROHEROS ...... 145 KEY TO THE SPECIES OF AMPHILOPHUS ...... 146 REFERENCES ...... 147 APPENDIX D: SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER TWO ...... 149 REFERENCES ...... 151 APPENDIX E: SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER THREE .... 152 APPENDIX F: ADDITIONAL METHODS FOR CHAPTER FOUR...... 172 DESCRIPTION OF THE MODEL EQUATIONS ...... 172 VALIDATING THE MODEL ...... 173 Creation of spatial landscape for simulations ...... 174 Creation of Resource-Utilization Niches from Environmental variables ...... 174 Calculation of Migration Probabilities from Distances ...... 175 STATISTICAL ANALYSES ...... 175 MODEL SELECTION USING APPROXIMATE BAYESIAN COMPUTATION ...... 177 Pseudo-communities ...... 178 Barro Colorado Island Trees...... 179 Bolivia Freshwater Fish Communities ...... 180 REFERENCES ...... 181

vi

APPENDIX G: SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER FOUR ..... 191

vii

LIST OF TABLES

CHAPTER 1

Table 1: List of the studies included in the meta-analysis and mapping of climate, habitat, number of matrices, and major mechanisms by study ...... 27

Table 2: Results of linear regression models testing the significance of habitat type, spatial scale, climate, and matrix size on the standardized effect size of freshwater fish communities using co- occurrence null models ...... 28

Table 3: Results of logistic regression analyses using habitat type, climate, and standardized effect size as predictors on the 11 proposed mechanisms governing community structure ...... 29

CHAPTER 2

Table 1: Results of PERMANOVA analyses ...... 67

Table 2: Results of Redundancy Analysis for the different explanatory variable types ...... 68

CHAPTER 3

Table 1: Results of model selection using Bayes Factors to distinguish between different demographic history models ...... 101

Table 2: Results of matrix-wide co-occurrence null model analyses ...... 103

CHAPTER 4

Table 1: Results of ABC analyses for pseudo-communities ...... 131

Table 2: Results of ABC RF analyses for real-world datasets...... 132

Table 3: Results of traditional ABC model selection analyses for real word datasets ...... 133

APPENDIX D

Table D-1: Estimates of timing of diversification from sister taxa and the geographic association of the sister taxa for some of the sampled species ...... 149

APPENDIX E

Table E-1: Locality, Genbank accession numbers, and sources for genetic data ...... 152

viii

Table E-2: Presence absence matrix derived from museum collection of Carl Hubbs ...... 161

APPENDIX F

Table F-1: Prior distributions and parameters used in pseudo-community ABC analyses ...... 185

Table F-2: Prior distributions and parameters used in BCI simulations ...... 187

Table F-3: Prior distributions and parameters used in the Wsr simulations ...... 188

APPENDIX G

Table G-1: Factorial arrangement for each treatment for pseudo-communities ...... 191

Table G-2: p-values for the effects of each mixed model analysis on variation partition types ..192

Table G-3: Pairwise comparisons for Fisher’s Exact Tests between treatments ...... 193

ix

LIST OF FIGURES

CHAPTER 1

Figure 1: Maps displaying the global arrangement of studies and matrices analyzed...... 31

Figure 2: Barplot showing the mean standardized effect size between taxonomic groups ...... 32

Figure 3: Plots displaying data used in linear regression analyses ...... 33

Figure 4: Percent of studies reporting each of the mechanisms and processes explaining patterns of community structure as proposed within the discussions of the analyzed studies ...... 34

Figure 5: Biplots of Logistic PCA analyses on the binary matrix of mechanisms proposed to explain patterns of community structure within the analyzed studies ...... 35

CHAPTER 2

Figure 1: Map showing the sampled localities overlaid on biogeographic provinces of freshwater fish in the Nicaraguan Depression in Costa Rica...... 69

Figure 2: Non-metric multidimensional scaling of Bray-Curtis dissimilarities of species presence absence ...... 70

Figure 3: Comparison of genetic groupings to cluster analysis groupings ...... 71

Figure 4: NMDS plot of species scores overlaid on the community groupings according to genetics and hierarchical cluster analysis ...... 72

CHAPTER 3

Figure 1: Map of Lake Bonneville and localities for both the community and DNA data used in this study ...... 104

Figure 2: Between basin hypotheses of migration and splitting used in the Migrate-n hypotheses ...... 105

Figure 3: Observed presence absence matrix of Carl Hubbs collections ordered to show patterns of nestedness ...... 106

Figure 4: NMDS biplot of species and site scores in the Bonneville Basin ...... 107

Figure 5: Heatmap showing the significant species associations between species pairs in the Bonneville Basin...... 108

CHAPTER 4

x

Figure 1: Boxplots showing the results of variation partitioning proportions according to each treatment run during the model validation ...... 134

Figure 2: Plots of linear discriminant function scores from ABC RF analyses for A) BCI, and B) Wsr ...... 135

APPENDIX A

Figure A-1: Comparison between the SES of A) Cordero and Jackson (2019) and B) this study ...... 140

Figure A-2: Comparison in community structure between A) Giam and Olden (2016) and B) this study ...... 141

APPENDIX B

Figure B-1: Plots displaying data used in linear regression analyses where values for matrices in temporal studies that were sampled in the same location were averaged ...... 142

APPENDIX C

Figure C-1: Examples of differences between G and C type teeth ...... 148

APPENDIX D

Figure D-1: Approximate distributions of the species collected for this project as estimated using the drainage basins in which each species has been observed to occur ...... 150

APPENDIX E

Figure E-1: Parameter posterior distributions for Catostomus ardens ...... 163

Figure E-2: Parameter posterior distributions for Catostomus platyrhynchus...... 164

Figure E-3: Parameter posterior distributions for Cottus bairdii ...... 165

Figure E-4: Parameter posterior distributions for Gila atraria ...... 166

Figure E-5: Parameter posterior distributions for Iotichthys phlegethontis ...... 169

Figure E-6: Parameter posterior distributions for Lepidomeda aliciae ...... 170

Figure E-7: Parameter posterior distributions for Prosopium williamsoni ...... 171

Figure E-8: Parameter posterior distributions for Rhinichthys osculus ...... 172

xi

Figure E-9: Parameter posterior distributions for Richardsonius balteatus ...... 173

APPENDIX F

Figure F-1: Plots showing the different types of landscapes used in validation simulations ...... 189

Figure F-2: Plots showing a three species, three community example of the resource-utilization niches used in validation simulations ...... 190

Appendix G

Figure G-1: Schematic showing the algorithm used to run simulations in a hypothetical metacommunity with three species inhabiting three communities ...... 194

Figure G-2: Bar graphs showing the counts of different types of structure for the elements of metacommunity structure analyses ...... 195

Figure G-3: LDA projection of niche simulation runs for ABC RF on pseudo-communities along the first discriminant axes showing overlap in summary statistics between the prior distributions used ...... 196

xii

CHAPTER ONE

GLOBAL INSIGHTS INTO FRESHWATER FISH COMMUNITY DYNAMICS USING A

META-ANALYSIS OF CO-OCCURRENCE NULL MODELS

Trevor J. Williams1, Xingli Giam2, Julian D. Olden2, Jerald B. Johnson1,4

1 Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo UT, 84602

2 Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, Knoxville TN, 37996

3 School of Aquatic and Fishery Sciences, University of Washington, Seattle WA, 98195

4 Monte L. Bean Life Science Museum, Brigham Young University, Provo UT, 84602

Formatted in the style of The American Naturalist

ABSTRACT

For decades community ecologists have debated whether communities assemble following deterministic assembly rules or stochastically. Using a meta-analysis approach on co-occurrence

null models, Gotelli and McCabe (2002) indicated that most taxonomic groups exhibit evidence

for non-random community assembly. However, their finding that fish communities are largely

randomly structured disagrees with a large body of work in freshwater ecosystems suggesting

otherwise. Using a much larger dataset, we conducted a meta-analysis of co-occurrence null

model studies on freshwater fish communities to re-evaluate the findings of Gotelli and McCabe

(2002) and assess how community structure, and the mechanisms that govern it, vary globally.

We found that freshwater fish communities are mostly non-randomly structured, but that the

amount of structure varies across habitat, climate, and spatial extent, with lotic habitats and

1

temperate and cold climates showing higher standardized effect sizes than lentic habitats and

tropical climates, respectively. These findings are driven mainly by high habitat heterogeneity in

lotic habitats and an increase in the importance of stochastic processes during flood pulse

dynamics in the tropics. We suggest that future studies can use null models in innovative ways to

further investigate patterns of community structure in freshwater fish and help inform

conservation strategies.

INTRODUCTION

Community ecology is the study of the patterns and processes that govern species

interactions, diversity, and abundance in biological assemblages (Leibold et al. 2004; Loreau

2009; Vellend 2010). Among the central goals of this field is to determine the biotic and abiotic

mechanisms that dictate community structure and assembly (Gotelli and McCabe 2002; Kraft et

al. 2015; Mittlebach and McGill 2019). In 1975, Jared Diamond proposed the ‘assembly rules’

concept to accomplish this goal, arguing that certain combinations of species would never co-

occur due to competitive exclusion (Diamond 1975). Since then, the notion of assembly rules has evolved to reflect the mechanistic constraints and processes that determine patterns of community structure and assembly in a predictive manner (Keddy 1992; Weiher et al. 1995).

Diamond’s initial publication sparked a heated debate on whether assembly rules exist in nature; this scientific discourse focused on evaluating whether communities are structured randomly or non-randomly. Null models of species co-occurrence were the primary approach used in these investigations (reviewed in Strong et al. 1984; Wiens 1989; Gotelli and Graves 1996; Weiher and Keddy 1999b; Leibold and Chase 2018).

Null models are models which use the randomization of observed data to create a null distribution to statistically evaluate ecological patterns (Gotelli and Graves 1996). These models

2 compare observed patterns in ecological data to patterns that are expected in the absence of a particular ecological mechanism such as interspecific competition or abiotic environmental filtering (Gotelli and Graves 1996; Gotelli and Entsminger 2003). The former is achieved by randomizing species occurrences across sites using Monte Carlo simulations such that species

‘colonize’ sites randomly with respect to each other (Connor and Simberloff 1979; Gotelli 2000;

Gotelli and McGill 2006). There are several mechanisms which can cause species to non- randomly co-occur including biotic interactions (i.e. predation or competition, forming ecological checkerboards; Diamond 1975; Englund et al. 2009), shared or dissimilar habitat requirements (i.e. habitat checkerboards, Gilpin and Diamond 1982; Gotelli et al. 1997; Gotelli and McCabe 2002) evolutionary history (i.e. historical checkerboards, Cracraft 1988;

Sfenthourakis et al. 2006), and even neutral dynamics (Ulrich 2004), though most studies have used null models to search for evidence of competitive exclusion (Gotelli and McCabe 2002).

Typically, a co-occurrence index or metric that measures species associations is averaged over all species pairs in the observed presence-absence matrix and compared to the null distribution of the same index from the simulated, randomized matrices. If the observed metric differs significantly from the null distribution, this is evidence of deterministic assembly rules (such as the mechanisms listed above) structuring communities. If the observed metric fails to differ from the null expectation, this suggests that species occurrences are random in respect to other species occurrences and that the metacommunity shows little structure.

Close to two decades ago, Gotelli and McCabe (2002) conducted a meta-analysis using co-occurrence null models on 96 presence-absence matrices representing several taxonomic groups to specifically test Diamond’s assembly rules of competitive exclusion. The authors calculated a standardized effect size (SES) of community structure and found that most

3

taxonomic groups exhibit significant structure indicative of assembly rules and deterministic

processes. A striking exception, however, was that fish communities had the lowest SES of all

taxonomic groups evaluated, thus demonstrating evidence for random structure. This finding,

although based on just three metacommunities, conflicted with previous research showing that

freshwater fish community structure is largely non-random (Yant et al. 1984; Townsend 1991;

Jackson et al. 1992, 2001). As various studies have shown evidence for both stochastic and deterministic control of freshwater fish community structure, it remains unclear whether freshwater fish communities are predominantly random or non-random in their organization, and how the degree of structure differs in comparison to other major taxonomic groups. Furthermore, it is unknown how climate, habitat type, and spatial extent modify patterns of fish species co- occurrence.

In this study, we analyze patterns in freshwater fish co-occurrence through the investigation of three objectives. First, we seek to clarify the findings of Gotelli and McCabe

(2002) in the context of freshwater fish communities by conducting a current review and meta- analysis of studies that have used co-occurrence null models over a roughly two-decade period.

Second, we investigate how community structure in these communities relates to differences in habitat type, climate, and spatial extent. We hypothesize that habitats with flowing water (i.e. lotic) will show larger SES and greater community structure than habitats with standing water

(i.e. lentic) because flowing waters are generally more spatially variable than standing water environments and therefore likely to exhibit greater environmental filtering of species across sites (Poff 1997; Jackson et al. 2001; Dodds and Whiles 2020). Additionally, we hypothesize that higher latitudes with more temperate climate regimes will be more structured than tropical climates. Tropical climates typically (although not exclusively) experience more hydrological

4

variability and disturbance than temperate climates (Poff et al. 2006), allowing for potentially

greater levels of random dispersal and colonization to occur and erase evidence of species

associations (Zalewski and Naiman 1985; Schlosser 1987; Grossman et al. 1998). Alternatively,

tropical climates may show higher SES than temperate climates due to the greater diversity in the

tropics, leading to higher likelihoods for competition and predation to structure communities

(Oberdorff et al. 2011). Lastly, we hypothesize that increased spatial extents will be associated

with higher SES. Larger spatial extents promote greater habitat heterogeneity leading to higher

intensity of environmental filtering. Additionally, larger spatial extents can result in more

historical checkerboards by including species that speciated allopatrically (Gotelli and McCabe

2002). For our final objective, we synthesized authors interpretations of the mechanisms responsible for community structure in freshwater fish communities, and related these findings to differences in climate, habitat, spatial extent, and SES.

METHODS

Data Collection

We conducted a literature search using the parameters “Topic: ("co-occur*" OR

"presence absence" OR "communit* structure") AND Topic: ("null model" OR "randomization model" OR "Randomization test" OR "Random*") AND Topic: (fish*)” using the Institute of

Scientific Information Web of Science online database (http://webofknowledge.com). From the results, we retained all studies that used a co-occurrence null model approach in which the observed species presence absence matrix was repeatedly randomized to create a null distribution as described in Gotelli (2000). Though recently this method has been applied to functional traits we restricted our meta-analysis to studies using taxonomic data. We removed studies where multiple sampling times were included in the same matrix as separate sites, studies that included

5

taxa other than fish, studies where sites in the matrix consisted of entire wetlands or watersheds,

and studies where the construction of the presence absence matrix was unclear. To these we

added any additional studies that fit our criteria and were known to us but were not included in the Web of Science search.

From the retained studies, we calculated the standardized effect size (SES) of each null

model analysis from each matrix using the equation:

I I SES = obs − sim σsim Where, is the observed co-occurrence index, and and are the mean and standard

𝑜𝑜𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠 deviation𝐼𝐼 of the co-occurrence index from the simulated𝐼𝐼 null distribution𝜎𝜎 (Gotelli and McCabe

2002). We obtained the SES from each matrix either directly from the studies (when reported), calculating it from the reported observed index and the mean and standard deviation of the null distribution, or by analyzing the raw presence absence matrices if they were available (for additional information see Appendix A). As most studies used the matrix wide C-score (Stone

and Roberts 1990) and a randomization algorithm which maintained site and species marginal

totals, we chose to use these methods when re-running analyses on raw presence-absence

matrices to maximize comparability among datasets, even if the original study used a different

co-occurrence index (e.g. Jackson et al. 1992; Snodgrass et al. 1996) or compared species-pairs rather than using the whole matrix approach (e.g. Giam and Olden 2016; Cordero and Jackson

2019; Zbinden 2021). Null models on raw data were run with 5000 total simulations per analysis using the swap algorithm of Gotelli and Entsminger (2003) and a burn-in of 500 matrices in the

‘vegan’ package in R v 4.0.3 (Oksanen et al. 2020; R Core Team 2020).

6

To analyze how climate, habitat, spatial extent, and matrix size influence SES, we assigned variables to each matrix according to the following criteria. For climate, we assigned the location of each fish occurrence matrix to a Köppen-Geiger climate type ( i.e. “Arid”,

“Cold”, “Polar”, “Temperate”, and “Tropical”, Beck et al. 2018, Figure 1A) according to the climate value for the majority of sites or the majority of the watershed area if metacommunities were designated according to basin boundaries (e.g. Giam and Olden 2016; Cordero and Jackson

2019). We determined spatial extent for matrices where the latitude and longitude or Easting and

Northing was available by calculating the Euclidean distance between the two furthest sites in the matrix. For studies where coordinate data was unavailable, but maps were included as figures, we estimated the spatial extent using the scale bar to measure the approximate distance across the study area in ImageJ V 1.53e (Schneider et al. 2012). We assigned habitat type by demarcating lakes, wetland ponds, lagoons, and sinkholes as “lentic” (standing water), rivers, streams, and irrigation canals as “lotic” (flowing water), and any matrix that included a combination of at least one type of habitat from lotic and lentic environments as “both.” Finally, we calculated matrix size as the product of the number of sites and the number of species in the matrix. Spatial analyses for climate and spatial extent were conducted in QGIS V 3.10 (QGIS

Development Team 2020) or the ‘sf’ package in R v 4.03 (Pebesma 2018; R Core Team 2020).

To assess how authors’ interpretations related to our predictor variables and the SES, we scanned the Discussion section of each retained study for phrases that discussed how the following mechanisms contributed to community structure and organization: competition, predation, habitat filtering, facilitation, dispersal, disturbance, colonization/extinction dynamics, stochasticity, anthropogenic alteration, or life history. We then created a binary matrix with rows as studies and columns as mechanisms. Cells within this matrix were assigned a ‘1’ if the

7

Discussion considered the mechanism to influence community structure and a ‘0’ if the

mechanism was not mentioned or was considered unimportant.

Statistical Analyses

For all statistical analyses where SES was included, we only included datasets where the matrices contained at least 10 sites and 10 species to ensure optimal statistical power for the underlying null models analysis (Gotelli and Ulrich 2010; Veech 2013; Giam and Olden 2016).

We evaluated whether freshwater fish communities were primarily non-randomly structured by conducting a one-sample Wilcoxon signed rank test to see if the mean SES was significantly different from zero. To test for macroecological patterns on community structure, we ran a multiple linear regression model with SES predicted by climate type, habitat type, spatial extent, and matrix size using datasets with no missing data (n = 372 matrices). As several of the included matrices were conducted on temporal data where the same metacommunity was sampled through time, we re-ran the linear regression model with the values for temporal datasets averaged across all variables to assess how temporal variability affects observed patterns and to account for the non-independence of observations. We log-transformed SES, spatial

extent, and matrix size to better match assumptions of normality. Diagnostic plots of model runs

showed no significant outliers or deviations from the assumptions of linear regression analyses

for either model.

To determine if author interpretations of the mechanisms governing community structure

are associated with any of our analyzed variables, we ran a logistic PCA on the binary matrix of

interpreted mechanisms with k = 2 and m = 3, where k is the number of principal components

and m is a positive constant, tunable parameter used to approximate the natural parameters from

the saturated model (Landgraf and Lee 2020). We selected an appropriate value for m using row-

8

wise cross validation (Landgraf and Lee 2020). We also ran generalized linear models with the

logit link function where each column of the binary mechanism matrix was the response and

either majority climate type, habitat type, or mean SES of each study was the predictor.

Significance for each predictor was determined by conducting likelihood ratio tests between the

model with the predictor and a model with only the intercept, with subsequent p-values adjusted

using the Holm-Bonferroni correction (Holm 1979) to account for the large number of tests. We

decided not to parameterize logistic regression models with spatial extent as a predictor since

only 11 studies contained spatial extent information. All statistical analyses were conducted in R

v 4.03 using the ‘stats’ package (R Core Team 2020) or the ‘logisticPCA’ package (Landgraf and

Lee 2020).

RESULTS

Our literature search resulted in 22 published studies that matched our criteria and 374 matrices with at least 10 species and 10 sites (Table 1). Studies were located across all the major biogeographic realms (Udvardy 1975) except for the Oceania, Australian, and Antarctic realms and represented all major climate divisions except for polar (Figure 1). However, studies were largely biased toward the Western Hemisphere and matrices toward North America (Figure 1).

Freshwater fish communities were largely non-randomly structured (mean SES = 6.31,

Wilcoxon signed rank test p-value < 0.001). However, the magnitude of community structure

varied across climates and habitat types with temperate and cold climates showing higher SES as

compared to tropical climates, and SES values in lotic habitats exceeding lentic habitats (Table 2,

Figure 3A, B). As expected, SES was positively related to spatial extent as well as matrix size

(Figure 3C, D). Results did not change drastically when temporal datasets were averaged,

9

however the strength of the associations diminished slightly for differences between lotic and

lentic habitats, becoming marginally statistically significant (Table 2, Figure B-1).

In agreement with previous work (Giam and Olden 2016), habitat filtering or species environmental affinities were the most mentioned processes contributing to non-random community structure, followed closely by mechanisms related to disturbance and predation

(Figure 4). Competition, on the other hand, was not highly cited as an important structuring mechanism (Figure 4). The Logistic PCA accounted for 42.31% of the deviance. Lotic habitats varied largely along PC axis 1 which described a gradient related to disturbance, stochasticity, and anthropogenic alteration, whereas lentic habitats showed little association (Figure 5A). More distinct grouping was apparent when comparing climates, with tropical climates showing higher levels of disturbance, stochasticity, anthropogenic alteration, and competition while temperate and cold climates showed higher associations with history, environmental affinities, predation, and facilitation, though there was significant overlap between temperate and tropical datasets

(Figure 5B). By contrast, no apparent associations exist between the mean SES or spatial extent and the proposed structuring mechanisms (Figure 5C, D). Despite these qualitative associations, logistic regressions showed no statistical significance, however the power of these tests may be limited due to the relatively small sample size (n < 30, Table 3).

DISCUSSION

This study provides strong evidence that freshwater fish metacommunities across regions of the world are deterministically structured and show patterns indicative of high levels of species segregation that exceed all other major taxonomic groups presented by Gotelli and

McCabe (2002). In their meta-analysis, Gotelli and McCabe (2002) attributed the lower SES observed in poikilothermic organisms (i.e. taxa that must regulate body temperature through

10

behavior) to energy partitioning. As poikilothermic organisms require less energy than

homeotherms (i.e. taxa with internal temperature regulation, Peters 1983; Karasov 2012), it has

been suggested that resource competition will be less severe for these species, resulting in less

competitive exclusion and subsequent segregation between species pairs (Giam and Olden 2016).

The high, positive C-score values in our meta-analysis indicate that species pairs of freshwater fish show high levels of species segregation due to mechanisms such as habitat filtering and predation, but not competition (Table 1, Figure 4). The low SES of fish in Gotelli and McCabe

(2002) may have stemmed from possible sampling bias due to low sample size (n = 3) or sampling location (see discussion below on differences in SES due to climate and habitat). As fish are obligately constrained to freshwater habitats, and as these habitats are highly variable within drainage basins (see below), the higher levels of segregation seen in fish in comparison to terrestrial taxa is likely a result of strong environmental filters in combination with dispersal limitation between communities (Heino et al. 2015).

Of the proposed processes governing community structure, habitat filtering, disturbance, and predation were the top three most cited mechanisms in previous studies (Table 4). Habitat filtering effects community assembly through the segregation of species pairs with disjoint habitat requirements (i.e. habitat checkerboards; Gotelli and McCabe 2002) and by the aggregation of species pairs with similar habitat requirements (Gilpin and Diamond 1982).

Studies in our meta-analysis typically reported either positive associations (e.g. Jackson et al.

1992; Peres-Neto 2004; Giam and Olden 2016) or negative associations (e.g. Hoeinghaus et al.

2007; Escalera-Vázquez and Zambrano 2010; Ohira et al. 2015) related to environmental heterogeneity, however both of these associations work in tandem to shape community structure

(Zbinden 2021).

11

Our review of the literature found that the main sources of disturbance investigated were related to hydrological variability associated with flooding (e.g. Ortega et al. 2015; Echevarría and González 2017) or seasonal drying (e.g. Capone and Kushlan 1991; Snodgrass et al. 1996;

Escalera-Vázquez and Zambrano 2010) and anthropogenic alteration (e.g. Kobza et al. 2004;

Bhat and Magurran 2007; Sá-Oliveira et al. 2016; Bhat 2017). Hydrologic variability can enhance the effects of both stochastic and deterministic processes in riverine ecosystems. Many studies showed random patterns of species co-occurrence during high water periods where the effects of stochastic dispersal and colonization homogenize communities (Kobza et al. 2004;

Ortega et al. 2015; Sá-Oliveira et al. 2016; Echevarría and González 2017). However, these same communities often showed higher levels of structure during lower water levels when the effects of environmental filtering and biotic interactions were enhanced leading to local extinction of some taxa (Capone and Kushlan 1991; Kobza et al. 2004; Ortega et al. 2015; Sá-Oliveira et al.

2016; but see Echevarría and González 2017). Anthropogenic disturbance may decrease community structure across sites by shifting communities to contain more eurytopic species or through disturbance caused by dam operations regulating downstream flow regimes (Bhat and

Magurran 2007; Bhat 2017; Liu et al. 2021). Alternatively, some authors demonstrated that invasive species introductions resulted in greater spatial segregation between native and non- native species (Kobza et al. 2004; Bhat 2017), though this was not always the case (Novak et al.

2011). Finally, predation largely influenced community composition patterns through negative associations between predatory species and small-bodied prey species (Jackson et al. 1992;

Englund et al. 2009; Giam and Olden 2016; Cordero and Jackson 2019; Zbinden 2021).

Our meta-analysis is also consistent with a growing body of literature showing that competitive processes play a minor role in governing negative co-occurrence patterns in

12

freshwater fish communities (Jackson et al. 2001), at least at the scale of a stream reach. Several of the studies reviewed in our analysis explicitly stated that they found little evidence of competitive exclusion leading to higher levels of segregation between species pairs (Snodgrass et al. 1996; Peres-Neto 2004; Hoeinghaus et al. 2007; Giam and Olden 2016; Cordero and Jackson

2019; Zbinden 2021). In streams, most authors attributed the diminutive role of competitive exclusion to low population densities and therefore lower levels of competition (Jackson et al.

2001; Peres-Neto 2004; Hoeinghaus et al. 2007). Exceptions where competitive exclusion may

be important at the reach scale pertain to habitats where fish are confined in high densities due to

drying (Capone and Kushlan 1991; Kobza et al. 2004; Sá-Oliveira et al. 2016) or the exclusion of

native species from communities where invasives have been introduced (Kobza et al. 2004).

Instead, competitive interactions in freshwater fish are more likely to result in more species

aggregation and stable coexistence in response to resource partitioning and character

displacement (Townsend 1991; Robinson and Wilson 1994; Roth-Monzón et al. 2020).

Although most of our analyzed studies exhibited non-random patterns of species co-

occurrence, stochasticity was the fourth highest cited mechanism governing community

dynamics and close to a quarter (24%) of the 374 matrices we analyzed showed random structure

(Figure 4). These results highlight that a mixture of deterministic and stochastic processes work

in tandem during community assembly (Chase and Myers 2011; Vellend et al. 2014). The

importance of different processes is ultimately driven by contingencies related to variability in

habitat, climate, and the spatial extent investigated.

Our analyses indicate that the amount of community structure and, to a lesser extent,

authors’ interpretation of the mechanisms controlling community structure, vary across habitats,

climates, and spatial extents (Tables 1, 3, Figures 3, 5, B-1). In terms of habitat, we found higher

13

SES in lotic environments compared to lentic environments (Figures 3, B-1). Community

structure in lentic environments is known to be affected by gradients in abiotic factors, predation,

and habitat permanence (Wellborn et al. 1996). In addition to these mechanisms, lotic

environments also vary longitudinally (e.g. Vannote et al. 1980) and display considerable hydrological variability (Poff et al. 2006). We suspect that the larger SES values observed in lotic environments is attributable to this variability, especially since habitat filtering is the most cited mechanism structuring freshwater fish communities. Interestingly, lotic communities also face higher levels of disturbance than lentic communities (Resh et al. 1988). Despite higher levels of disturbance, our analyses indicate that deterministic habitat filtering overrides the effects of stochastic processes in many lotic environments, especially when communities are sampled as snapshots in time as many were in our reviewed studies.

We found that climate also affects the amount of deterministically driven community structure in freshwater fish communities. Our results indicate that tropical climates are more randomly structured than temperate and cold climates, with more studies citing stochasticity as a governing process in these regions (Table 1, Figures 3, 5). These findings appear to be linked to flood pulse-related disturbance in tropical aquatic ecosystems, where stochastic colonization during the wet season homogenizes and randomizes community composition. Despite our findings, we caution that our results may not fully encompass the dynamics of tropical aquatic ecosystems due to biases in the sampled habitats. Most of the previous studies conducted in tropical climates investigated lentic habitats directly connected to flood-plain dynamics or that were disturbed by dams (Table 1, Figure 1). The single tropical lotic study available for our meta-analysis had an SES value in the 87th percentile and was comparable to those measured in temperate and cold lotic systems (Peres-Neto 2004). Overall, there were fewer tropical

14

studies/matrices (69) compared to studies/matrices in temperate (72) and cold climates (219).

The relative paucity of tropical data is consistent with a recent global-scale synthesis of long-

term fish community trajectories (Comte et al. 2021) and reflects the need for further research in the biodiverse tropics. We suspect further research to show that climate differences are less influential in governing community structure than habitat differences in fish communities.

FUTURE DIRECTIONS AND CONCLUSION

Null models are valuable tools for investigating ecological patterns and processes (Gotelli

and Graves 1996). However, null models by themselves are ultimately unable to elucidate the

underlying mechanisms that govern these patterns without additional analyses (Gotelli and

McCabe 2002; de Oliveira et al. 2005; Vellend et al. 2014; Liu et al. 2021). By investigating

variation in SES due to habitat, climate, and spatial extent, and by analyzing authors

interpretations of their own results, we were able to make inferences regarding the processes that

govern freshwater fish community structure at the macroecology scale. Using null models in this

manner can be especially useful in uncovering how processes governing community assembly

vary over large spatial and/or temporal scales. Alternatively, several studies have suggested that

patterns in species co-occurrence should be investigated using pairwise approaches rather than a

matrix-wide approach (Sfenthourakis et al. 2006; Gotelli and Ulrich 2010; Veech 2013, 2014) or

by including other information, such as species habitat requirements, during the randomization of

the null model procedure (Peres-Neto et al. 2001). These approaches specifically test whether

individual species pairs are aggregated, segregated, or randomly structured and are more able to

reveal whether species interactions are driving co-occurrence patterns in comparison to other mechanisms. Other advances with null models have been advocated which are able to

statistically evaluate the effects of competition, predation, habitat filtering, and dispersal

15 limitation on species co-occurrence patterns at both small and large spatial scales using meta- analysis or multivariate techniques (e.g. Blois et al. 2014; Cordero and Jackson 2019; Zbinden

2021). These methodological advances, in conjunction with other process-based approaches, are exciting avenues for investigating community structure.

In summary, our meta-analysis indicates that freshwater fish community structure is largely non-random and that the importance of the mechanisms that govern the assembly process varies according to habitat type, climate, and spatial extent. We found that lotic habitats show higher levels of community structure than lentic habitats whereas tropical climates were more randomly structured than temperate and cold climates. Furthermore, increasing spatial extent corresponded to higher levels of structure. Habitat filtering was the most cited mechanism governing freshwater fish community assembly, indicating that high levels of habitat heterogeneity are likely what drove the high SES in comparison to previously analyzed data.

Additionally, our results have broad implications for the conservation and restoration of freshwater ecosystems by highlighting the need to prioritize management strategies to match the mechanisms most responsible for controlling community structure. Future analyses should focus on identifying if these patterns are generalizable across less sampled habitats and climates to gain better information for management and conservation.

ACKNOWLEDGMENTS

We thank the authors of all the studies used in our analyses, especially Mac Kobza and

Zachary Zbinden for sharing their data with us. We also thank Cindy Chu at the Ontario Ministry of Natural Resources & Forestry for helping us with data associated with Ontario's Aquatic

Habitat Survey (AHI) to replicate the findings of Cordero and Jackson (2019). Funding for this

16 research was provided by the Department of Biology and Graduate Studies at Brigham Young

University.

17

REFERENCES

Beck, H. E., N. E. Zimmermann, T. R. McVicar, N. Vergopolan, A. Berg, and E. F. Wood. 2018.

Present and future köppen-geiger climate classification maps at 1-km resolution.

Scientific Data 5:1–12.

Bhat, A. 2017. Towards building conservation prioritization strategies for tropical freshwater

systems: A case study based on fish assemblages from the Western Ghats, India. Aquatic

Ecosystem Health & Management 20:175–187.

Bhat, A., and A. E. Magurran. 2007. Does disturbance affect the structure of tropical fish

assemblages? A test using null models. Journal of Fish Biology 70:623–629.

Blois, J. L., N. J. Gotelli, A. K. Behrensmeyer, J. T. Faith, S. K. Lyons, J. W. Williams, K. L.

Amatangelo, et al. 2014. A framework for evaluating the influence of climate, dispersal

limitation, and biotic interactions using fossil pollen associations across the late

Quaternary. Ecography n/a-n/a.

Canavero, A., D. Hernández, M. Zarucki, and M. Arim. 2014. Patterns of co-occurrences in a

killifish metacommunity are more related with body size than with species identity.

Austral Ecology 39:455–461.

Capone, T. A., and J. A. Kushlan. 1991. Fish community structure in dry-season stream pools.

Ecology 72:983.

Chase, J. M., and J. A. Myers. 2011. Disentangling the importance of ecological niches from

stochastic processes across scales. Philosophical Transactions of the Royal Society B:

Biological Sciences 366:2351–2363.

Comte, L., J. D. Olden, P. A. Tedesco, A. Ruhi, and X. Giam. 2021. Climate and land-use

changes interact to drive long-term reorganization of riverine fish communities globally.

18

Proceedings of the National Academy of Sciences 118:e2011639118.

Connor, E. F., and D. Simberloff. 1979. The assembly of species communities: chance or

competition? Ecology 60:1132.

Cordero, R. D., and D. A. Jackson. 2019. Species‐pair associations, null models, and tests of

mechanisms structuring ecological communities. Ecosphere 10.

Cracraft, J. 1988. Deep-history biogeography: retrieving the historical pattern of evolving

continental biotas. Systematic Zoology 37:221–236. de Oliveira, E. F., C. V Minte-Vera, and E. Goulart. 2005. Structure of fish assemblages along

spatial gradients in a deep subtropical reservoir (Itaipu Reservoir, Brazil-Paraguay

border). Environmental Biology of 72:283–304.

Diamond, J. M. 1975. Assembly of species communities. Pages 342–444 in M. L. Cody and J.

M. Diamond, eds. Ecology and Evolution of Communities. Harvard University Press,

Cambridge, MA.

Dodds, W. K., and M. R. Whiles. 2020. Freshwater ecology: concepts and environmental

applications of limnology (3rd ed.). Academic Press, London.

Echevarría, G. E., and N. González. 2017. Co-occurrence patterns of fish communities in littorals

of three floodplain lakes of the Orinoco River, Venezuela. Journal of Threatened Taxa

9:10249.

Englund, G., F. Johansson, P. Olofsson, J. Salonsaari, and J. Öhman. 2009. Predation leads to

assembly rules in fragmented fish communities. Ecology Letters 12:663–671.

Erős, T., J. Heino, D. Schmera, and M. Rask. 2009. Characterising functional trait diversity and

trait-environment relationships in fish assemblages of boreal lakes. Freshwater Biology

54:1788–1803.

19

Escalera-Vázquez, L. H., and L. Zambrano. 2010. The effect of seasonal variation in abiotic

factors on fish community structure in temporary and permanent pools in a tropical

wetland. Freshwater Biology 55:2557–2569.

Giam, X., and J. D. Olden. 2016. Environment and predation govern fish community assembly in

temperate streams. Global Ecology and Biogeography 25:1194–1205.

Gilpin, M. E., and J. M. Diamond. 1982. Factors contributing to non-randomness in species co-

occurrences on islands. Oecologia 52:75–84.

Gotelli, N. J. 2000. Null model analysis of species co-occurrence patterns. Ecology 81:2606–

2621.

Gotelli, N. J., N. J. Buckley, and J. A. Wiens. 1997. Co-occurrence of Australian land birds:

Diamond’s assembly rules revisited. Oikos 80:311.

Gotelli, N. J., and G. L. Entsminger. 2003. Swap algorithms in null model analysis. Ecology

84:532–535.

Gotelli, N. J., and G. R. Graves. 1996. Null models in ecology. Smithsonian Institution Press,

Washington D.C.

Gotelli, N. J., and D. J. McCabe. 2002. Species co-occurrence: a meta-analysis of J.M.

Diamond’s assembly rules model. Ecology 83:2091–2096.

Gotelli, N. J., and B. J. McGill. 2006. Null versus neutral models: what’s the difference?

Ecography 29:793–800.

Gotelli, N. J., and W. Ulrich. 2010. The empirical Bayes approach as a tool to identify non-

random species associations. Oecologia 162:463–477.

Grossman, G. D., R. E. Ratajczak, M. Crawford, M. C. Freeman Jr, and M. C. Freeman. 1998.

Assemblage organization in stream fishes: effects of environmental variation and

20

interspecific interactions. Ecological Monographs 68:395–420.

Heino, J., A. S. Melo, T. Siqueira, J. Soininen, S. Valanko, and L. M. Bini. 2015.

Metacommunity organisation, spatial extent and dispersal in aquatic systems: Patterns,

processes and prospects. Freshwater Biology 60:845–869.

Hoeinghaus, D. J., K. O. Winemiller, J. S. Birnbaum, and J. S. Birnbaumt. 2007. Local and

regional determinants of stream fish assemblage structure: inferences based on taxonomic

vs. functional groups. Journal of Biogeography 34:324–338.

Holm, S. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of

Statistics 6:65–70.

Jackson, D. A., P. R. Peres-Neto, and J. D. Olden. 2001. What controls who is where in

freshwater fish communities — the roles of biotic, abiotic, and spatial factors. Canadian

Journal of Fisheries and Aquatic Sciences 58:157–170.

Jackson, D. A., K. M. Somers, and H. H. Harvey. 1992. Null models and fish communities:

evidence of nonrandom patterns. The American Naturalist 139:930–951.

Karasov, W. 2012. Terrestrial vertebrates. Pages 212–224 in R. M. Sibly, J. H. Brown, and A.

Kodric-Brown, eds. Metabolic Ecology: A Scaling Approach. Wiley-Blackwell, West

Sussex, UK.

Keddy, P. A. 1992. Assembly and response rules: two goals for predictive community ecology.

Journal of Vegetation Science 3:157–164.

Kobza, R. M., J. C. Trexler, W. F. Loftus, and S. A. Perry. 2004. Community structure of fishes

inhabiting aquatic refuges in a threatened Karst wetland and its implications for

ecosystem management. Biological Conservation 116:153–165.

Kraft, N. J. B., P. B. Adler, O. Godoy, E. C. James, S. Fuller, and J. M. Levine. 2015.

21

Community assembly, coexistence and the environmental filtering metaphor. Functional

Ecology 29:592–599.

Landgraf, A. J., and Y. Lee. 2020. Dimensionality reduction for binary data through the

projection of natural parameters. Journal of Multivariate Analysis 180:104668.

Leibold, M. A., and J. M. Chase. 2018. Metacommunity ecology. Princeton University

Press, Princeton.

Leibold, M. A., M. Holyoak, N. Mouquet, P. Amarasekare, J. M. Chase, M. F. Hoopes, R. D.

Holt, et al. 2004. The metacommunity concept: A framework for multi-scale community

ecology. Ecology Letters 7:601–613.

Liu, F., J. Wang, F. Zhang, H. Liu, and J. Wang. 2021. Spatial organisation of fish assemblages

in the Chishui River, the last free‐flowing tributary of the upper Yangtze River, China.

Ecology of Freshwater Fish 30:48–60.

Loreau, M. 2009. Communities and ecosystems. Pages 253–255 in S. A. Levin, S. R. Carpenter,

H. C. J. Godfrey, A. P. Kinzig, M. Loreau, J. B. Losos, B. Walker, et al., eds. The

Princeton Guide to Ecology. Princeton University Press, Princeton, NJ.

Mittlebach, G. G., and B. J. McGill. 2019. Community ecology (2nd ed.). Oxford University

Press, Oxford.

Novak, M., J. W. Moore, and R. A. Leidy. 2011. Nestedness patterns and the dual nature of

community reassembly in California streams: A multivariate permutation-based

approach. Global Change Biology 17:3714–3723.

Oberdorff, T., P. A. Tedesco, B. Hugueny, F. Leprieur, O. Beauchard, S. Brosse, and H. H. Dürr.

2011. Global and regional patterns in riverine fish species richness: a review.

International Journal of Ecology 2011:1–12.

22

Ohira, M., H. Tsunoda, K. Nishida, Y. Mitsuo, and Y. Senga. 2015. Niche processes and

conservation implications of fish community assembly in a rice irrigation system.

Aquatic Conservation: Marine and Freshwater Ecosystems 25:322–335.

Oksanen, J., F. G. Blanchet, M. Friendly, R. Kindt, P. Legendre, D. McGlinn, P. R. Minchin, et

al. 2020. vegan: community ecology package.

Ortega, J. C. G., R. M. Dias, A. C. Petry, E. F. Oliveira, and A. A. Agostinho. 2015. Spatio-

temporal organization patterns in the fish assemblages of a Neotropical floodplain.

Hydrobiologia 745:31–41.

Parnell, N. F., and J. T. Streelman. 2011. The macroecology of rapid evolutionary radiation.

Proceedings of the Royal Society B: Biological Sciences 278:2486–2494.

Pebesma, E. 2018. Simple features for R: standardized support for spatial vector data. The R

Journal 10:439–446.

Peres-Neto, P. R. 2004. Patterns in the co-occurrence of fish species in streams: the role of site

suitability, morphology and phylogeny versus species interactions. Oecologia 140:352–

360.

Peres-Neto, P. R., J. D. Olden, D. A. Jackson, and D. A. Jackson Peres-Neto. 2001.

Environmentally constrained null models: site suitability as occupancy criterion. Oikos

93:110–120.

Peters, R. H. 1983. The ecological implications of body size. Cambridge University Press,

Cambridge, UK.

Poff, N. L. 1997. Landscape filters and species traits: towards mechanistic understanding and

prediction in stream ecology. Journal of the North American Benthological Society

16:391–409.

23

Poff, N. L., J. D. Olden, D. M. Pepin, and B. P. Bledsoe. 2006. Placing global stream flow

variability in geographic and geomorphic contexts. River Research and Applications

22:149–166.

QGIS Development Team. 2020. QGIS Geographic Information System.

R Core Team. 2020. R: a language and environment for statistical computing. Vienna, Austria.

Resh, V. H., A. V Brown, A. P. Covich, M. E. Gurtz, H. W. Li, G. W. Minshall, S. R. Reice, et

al. 1988. The role of disturbance in stream ecology. Journal of the North American

Benthological Society 7:433–455.

Robinson, B. W., and D. S. Wilson. 1994. Character release and displacement in fishes: A

neglected literature. The American Naturalist 144:596–627.

Roth-Monzón, A. J., M. C. Belk, J. J. Zúñiga-Vega, and J. B. Johnson. 2020. Beyond pairwise

interactions: Multispecies character displacement in mexican freshwater fish

communities. American Naturalist 195:983–996.

Sá-Oliveira, J. C., V. J. Isaac, A. S. Araújo, and S. F. Ferrari. 2016. Factors structuring the fish

community in the area of the Coaracy Nunes hydroelectric reservoir in Amapá, northern

Brazil. Tropical Conservation Science 9:16–33.

Schlosser, I. J. 1987. A conceptual framework for fish communities in small warmwater streams.

Pages 17–24 in W. J. Matthews and D. C. Heins, eds. Community and Evolutionary

Ecology of North American Stream Fishes. University of Oklahoma Press, Norman.

Schneider, C. A., W. S. Rasband, and K. W. Eliceiri. 2012. NIH Image to ImageJ: 25 years of

image analysis. Nature Methods 9:671–675.

Sfenthourakis, S., E. Tzanatos, and S. Giokas. 2006. Species co-occurrence: the case of

congeneric species and a causal approach to patterns of species associations. Global

24

Ecology and Biogeography 15:39–49.

Snodgrass, J. W., J. Bryan A Lawrence, R. F. Lide, and G. M. Smith. 1996. Factors affecting the

occurrence and structure of fish assemblages in isolated wetlands of the upper coastal

plain, U.S.A. Canadian Journal of Fisheries and Aquatic Sciences 53:443–454.

Stone, L., and A. Roberts. 1990. The checkerboard score and species distributions. Oecologia

85:74–79.

Strong, D. R., D. Simberloff, L. G. Abele, and A. B. Thistle, eds. 1984. Ecological communities:

conceptual issues and the evidence. Princeton University Press, Princeton, NJ.

Townsend, C. R. 1991. Community organization in marine and freswhater environments. Pages

125–144 in R. S. K. Barnes and K. H. Mann, eds. Fundamentals of Aquatic Ecology (2nd

ed.). Blackwell Scientific Publications, Oxford.

Udvardy, M. D. F. 1975. A classification of the biogeographic provinces of the world. IUCN

Occassional Paper. International Union for Conservation of Nature and Natural

Resources, Morges, Switzerland.

Ulrich, W. 2004. Species co-occurrences and neutral models: reassessing J. M. Diamond’s

assembly rules. Oikos 107:603–609.

Vannote, R. L., G. W. Minshall, K. W. Cummins, J. R. Sedell, and C. E. Cushing. 1980. The

river continuum concept. Canadian Journal of Fisheries and Aquatic Sciences 37:130–

137.

Veech, J. A. 2013. A probabilistic model for analysing species co-occurrence. Global Ecology

and Biogeography 22:252–260.

———. 2014. The pairwise approach to analysing species co-occurrence. Journal of

Biogeography 41:1029–1035.

25

Vellend, M. 2010. Conceptual synthesis of community ecology. The Quarterly Review of

Biology 85:183–206.

Vellend, M., D. S. Srivastava, K. M. Anderson, C. D. Brown, J. E. Jankowski, E. J. Kleynhans,

N. J. B. Kraft, et al. 2014. Assessing the relative importance of neutral stochasticity in

ecological communities. Oikos 123:1420–1430.

Weiher, E., and P. A. Keddy, eds. 1999. Ecological assembly rules: perspectives, advances,

retreats. Cambridge University Press, Cambridge, UK.

Weiher, E., P. A. Keddy, P. O. Box, A. Station, and C. Kin. 1995. Assembly rules, null models,

and trait dispersion: New questions from old patterns. Oikos 74:159–164.

Wellborn, G. A., D. K. Skelly, and E. E. Werner. 1996. Mechanisms creating community

structure across a freshwater habitat gradient. Annual Review of Ecology and

Systematics 27:337–363.

Wiens, J. A. 1989. The ecology of bird communities: volume 1 foundations and patterns. (R. S.

K. Barnes, H. J. B. Birks, E. F. Connor, J. L. Harper, & R. T. Paine, eds.). Cambridge

University Press, Cambridge, UK.

Yant, P. R., J. R. Karr, and P. L. Angermeier. 1984. Stochasticity in stream fish communities: an

alternative interpretation. The American Naturalist 124:573–582.

Zalewski, M., and R. J. Naiman. 1985. The regulation of riverine fish communities by a

continuum of abiotic-biotic factors. Pages 3–9 in J. S. Alabaster, ed. Habitat Modification

and Freshwater Fisheries. Butterworths, London.

Zbinden, Z. D. 2021. A needle in the haystack? Applying species co‐occurrence frameworks

with fish assemblage data to identify species associations and sharpen ecological

hypotheses. Journal of Fish Biology In Press.

26

Table 1 List of the studies included in the meta-analysis and mapping of climate, habitat, number of matrices, and major mechanisms by study. Climate Habitat Number of Major Mechanisms Studies Matrices Arid Lentic - - - Lotic 14 Habitat Filtering, Hoeinghaus et al. Predation (2007), Giam and Olden (2016) Both - - - Cold Lentic 90 Habitat Filtering, Jackson et al. (1992), Predation, History Englund et al. (2009), Erős et al. (2009), Cordero and Jackson (2019) Lotic 129 Habitat Filtering, Giam and Olden Predation (2016) Both - - - Temperate Lentic 1 Disturbance, Dispersal, Snodgrass et al. (1996) Colonization/Extinction Lotic 71 Habitat Filtering, Capone and Kushlan Predation, Dispersal, (1991), Novak et al. Disturbance, (2011), Ohira et al. Anthropogenic Alteration (2015), Giam and Olden (2016), Liu et al. (2021), Zbinden (2021) Both - - - Tropical Lentic 18 Competition, Predation, Kobza et al. (2004), Habitat Filtering, Escalera-Vázquez and Disturbance, Zambrano (2010), Stochasticity Parnell and Streelman (2011), Canavero et al. (2014), Echevarría and González (2017) Lotic 1 Habitat Filtering, Peres-Neto (2004), Disturbance, Bhat and Magurran Stochasticity, (2007), Bhat (2017) Anthropogenic Alteration Both 50 Disturbance, Ortega et al. (2015), Stochasticity Sá-Oliveira et al. (2016)

27

Table 2 Results of linear regression models testing the significance of habitat type, spatial scale, climate, and matrix size on the standardized effect size of freshwater fish communities using co- occurrence null models. Predictors under the “Averaged Temporal” section are for the model in which the standardized effect size for communities sampled more than once was averaged. Predictor DF Sum of Squares F p-value All Matrices Habitat 2 8.640 27.646 < 0.001 Climate 3 5.265 11.231 < 0.001 Spatial Scale 1 8.629 55.218 < 0.001 Matrix Size 1 16.391 104.891 < 0.001 Averaged Temporal Habitat 2 1.410 4.285 0.015 Climate 3 4.092 8.2879 < 0.001 Spatial Scale 1 4.936 29.990 < 0.001 Matrix Size 1 15.759 95.753 < 0.001

28

Table 3 Results of logistic regression analyses using habitat type, climate, and standardized effect size as predictors on the 11 proposed mechanisms governing community structure. Mechanism/Predictor Resid Resid Df Deviance p-value Holm- Df Dev (χ2) Bonferroni adjusted p- value Competition Habitat 19 24.9979 2 0.783992 0.675707 1 Climate 18 20.7411 3 5.040832 0.168831 1 SES 12 15.6824 1 1.069144 0.301139 1 Predation Habitat 19 29.6931 2 0.805421 0.668506 1 Climate 18 26.0220 3 4.476483 0.214398 1 SES 12 18.2676 1 0.853867 0.355461 1 Habitat Filtering Habitat 19 15.7759 2 1.749605 0.416944 1 Climate 18 15.2763 3 2.249173 0.522328 1 SES 12 8.89885 1 2.584408 0.107921 1 Facilitation Habitat 19 13.0033 2 0.400669 0.818457 1 Climate 18 12.0206 3 1.383382 0.709435 1 SES 12 11.3500 1 0.13329 0.715045 1 Dispersal Habitat 19 26.2409 2 1.280749 0.527095 1 Climate 18 19.0954 3 8.426195 0.037978 1 SES 12 16.1287 1 2.12045 0.145344 1 Disturbance Habitat 19 27.3232 2 2.443965 0.294645 1 Climate 18 24.0995 3 5.667693 0.128946 1 SES 12 15.3737 1 3.747755 0.052878 1 Colonization/Extinction Habitat 19 20.0161 2 0.846033 0.655068 1 Climate 18 17.9105 3 2.951644 0.399141 1 SES 12 14.3032 1 0.245023 0.620602 1 Stochasticity Habitat 19 24.4346 2 4.406626 0.110437 1 Climate 18 15.2763 3 13.56486 0.003561 0.117529 SES 12 13.6076 1 3.14392 0.07621 1 History Habitat 19 18.7190 2 2.143184 0.342463 1 Climate 18 13.0091 3 7.853035 0.049149 1 SES 12 11.1766 1 0.306705 0.579709 1 Anthropogenic Alteration Habitat 19 26.6436 2 2.197617 0.333268 1

29

Climate 18 26.9301 3 1.911152 0.591051 1 SES 12 15.5418 1 1.209772 0.271377 1 Life History Habitat 19 12.7806 2 4.744876 0.093253 1 Climate 18 15.2763 3 2.249173 0.522328 1 SES 12 11.0517 1 0.431556 0.511226 1

30

Figure 1 Maps displaying the global arrangement of studies and matrices analyzed. A) Map showing the centroid position of sampled sites for each matrix for which we were able to extract spatial data overlaid on climate designations extracted from modern Köppen-Geiger classifications (Beck et al., 2018). B) Same as A but enlarged on North America where most matrices were located. C) Heatmap displaying the number of studies analyzed split between global biogeographic realms (Udvardy 1975, shapefile modified from The Nature Conservancy Geospatial Conservation Atlas Terrestrial Ecoregions https://geospatial.tnc.org)

31

Figure 2 Bar plot showing the standardized effect size between taxonomic groups. Adapted from Figure 2 of Gotelli and McCabe (2002) with the inclusion of the data analyzed in this study (indicated by Fish*). “Herps” refers to amphibians and reptiles. Grey bars refer to homeothermic organisms, white to poikilothermic organisms, and black to plants.

32

Figure 3 Plots displaying data used in linear regression analyses. A) Boxplots showing the differences in Standardized Effect Size (SES) of co-occurrence null model analyses between habitat types. B) Boxplots showing the differences in SES values between climates. C) Scatterplot of SES by spatial extent of matrix, as measured as the Euclidean distance between the furthest two sites in the analyzed metacommunity. D) Scatterplot of SES by matrix size calculated as the number of species multiplied by the number of sites within a matrix. Note both C and D are on logarithmic scales. Stars show significance of Tukey’s Honest Significant Difference for pairwise comparisons where ‘*’ < 0.05, ‘**’ < 0.01, ‘***’ < 0.001. These plots were generated with all the analyzed data matrices. Plots generated using data averaged for sites sampled multiple times can be found in Appendix B.

33

Figure 4 Percent of studies reporting each of the mechanisms and processes explaining patterns of community structure as proposed within the discussions of the analyzed studies.

34

Figure 5 Biplots of Logistic PCA analyses on the binary matrix of mechanisms proposed to explain patterns of community structure within the analyzed studies. The combination of PC axes 1 and 2 explained 42.3% of the deviance. Different panels have each study coded according to different variables: A) Habitat, B) Climate, C) Mean Standardized Effect Size D) Mean Spatial Extent. PCA loadings for the 11 variables were multiplied by a factor of 10 to match the magnitude of the PC values of observations. “Col/Ext” stands for Colonization Extinction and “Anthro” stands for Anthropogenic Alteration.

35

CHAPTER TWO

HISTORY PREDICTS COMMUNITY DIVERSIFICATION WITHIN A

BIOGEOGRAPHIC PROVINCE OF FRESHWATER FISH

BETTER THAN THE ENVIRONMENT OR SPACE

Trevor J. Williams1, Jerald B. Johnson1,2

1 Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo UT, 84602

2 Monte L. Bean Life Science Museum, Brigham Young University, Provo UT, 84602

Formatted in the style of the Journal of Biogeography

ABSTRACT

Aim Biogeography and community ecology both seek to uncover the patterns and processes governing the structure and composition of communities, though typically at different temporal and spatial scales. Recent calls to integrate these two fields suggest that both historic and contemporary processes shape biological diversity at large biogeographic scales; however, it is

still typically assumed that history plays little role in assembling contemporary, local

communities. To determine if history can play a role in shaping local community structure, we

incorporated phylogeographic data into traditional metacommunity analyses on freshwater fish

communities and analyzed if contemporary environmental and spatial factors masked the effects

of history.

Location Streams in the Nicaraguan Depression in Northeastern Costa Rica.

36

Taxon Freshwater fish communities.

Methods We collected fish community composition, local environmental variables, landscape- scale environmental variables, and generated spatial variables for each sampled stream community. Additionally, we generated community grouping variables using genetic data from

three livebearing fish species within our study area to act as proxies for community history.

PERMANOVAs and Redundancy Analyses were then used to determine the importance of each

predictor type (environmental, spatial, and historic) on community composition patterns.

Results We found that large-scale patterns of genetic diversity predicted patterns of local

community diversity, and that history explained a greater proportion of variability in community

composition than either the environment or space.

Main Conclusions Our results suggest that history can impact local community assembly and

structure. Species-specific incidence patterns within community groupings varied according to

biogeographic history, likely due to colonization history following eustatic sea-level change,

though the exact mechanisms could not be elucidated. These findings indicate that

phylogeography can bridge the gap between classical biogeography and community ecology by

enhancing inferences regarding community assembly through time and space.

INTRODUCTION

Understanding the organization of species diversity across the geographic landscape, and

identifying factors that shape this diversity, are central to the contemporary study of

biogeography. Interestingly, the field of community ecology has the same objective: to try to

elucidate the mechanisms controlling species diversity. However, these two fields typically

operate on different temporal and/or spatial scales (Gonçalves-Souza et al., 2014; Jenkins &

37

Ricklefs, 2011; Ricklefs & Jenkins, 2011; Weiher et al., 2011). Historically, biogeographers

investigated large scale patterns created by historical processes, whereas ecologists focused on

smaller scale patterns governed by local determinism with little communication between the two

different camps (Ricklefs, 2004). However, Ricklefs' (1987) seminal paper shattered this

dichotomy by explaining how local diversity is governed by both local and regional (i.e. historic)

scale processes. Since then, the scope of community ecology has broadened with an increased

focus on how historical and regional scale processes influence community diversity (Ricklefs &

Schluter, 1993). This has led to advances in methodology and inference through the use of

techniques such as phylogenetic community ecology (Cavender-Bares et al., 2009; Vamosi et al.,

2009; Webb et al., 2002), metacommunity ecology (Holyoak et al., 2005; Mathew A. Leibold &

Chase, 2018; Matthew A. Leibold et al., 2004) and investigations at multiple, hierarchical scales

(Heino et al., 2017; Hoeinghaus et al., 2007).

As biogeography and community ecology have become more integrated, comparative phylogeography, a subdiscipline of biogeography, has become another promising conduit for investigating historical effects on community diversity and assembly (Hickerson et al., 2010).

Comparative phylogeography has been able to test hypotheses of community assembly through time and space by investigating if co-distributed taxa have responded synchronously to historic processes such as climatic change or geologic events (e.g. Chan et al., 2014; Xue & Hickerson,

2015). More recently, new methods have even been used to incorporate species interactions into phylogeographic models (Ortego & Knowles, 2020). Although these methods have greatly enhanced our knowledge of the factors structuring genetic diversity in co-distributed taxa, they typically do not include community data and are limited to investigations at large spatial scales and can therefore say little about whether historic events structure local communities as defined

38 traditionally by community ecology (i.e. “a set of species that live together in some place”

(Loreau, 2009)). Even studies that have combined genetic data with community data still typically limit analyses to be between regions or subregions (see Bonada et al., 2009; Borregaard et al., 2014; Eldon et al., 2013; Mathew A Leibold et al., 2010), though both ecological and evolutionary processes are important at the metacommunity scale (Vellend, 2010, 2016; Vellend

& Geber, 2005). As a result, it is still unclear if and how historical processes affect community structure at more local scales (but see Heino et al., 2017; Herault & Honnay, 2005; Hoeinghaus et al., 2007) and if spatially explicit genetic structure, as identified through phylogeographic methods, can be predictive of metacommunity structure. We need studies that investigate the congruence between genetic and community diversity at local scales and explicitly test the importance of both contemporary and historic processes on said diversity using a combination of community and genetic data.

Herein, we evaluate if genetic breaks discovered in livebearing fishes in Costa Rica correspond to diversity in species composition within a metacommunity along the same spatial gradient. To do so we compared the explanatory power of environmental, spatial, and historic variables on community composition, as well as tested for congruence between genetic results and community results. Freshwater fish communities are ideal for studying the effects of contemporary and historic processes on local community structure because both contemporary and historic factors should be equally relevant for structuring fish communities. Several works have shown that both biotic and abiotic (particularly environmental) factors are important in structuring fish communities (Giam & Olden, 2016; Jackson et al., 2001; Peres-Neto, 2004).

Furthermore, historic processes that alter drainage patterns can have significant effects on both

39 species’ distributions and genetic diversity (Bagley & Johnson, 2014a; Hubbs & Miller, 1948; G.

R. Smith et al., 2002).

In this study we have two objectives. First, we determine if community diversity, as measured by differences in species composition, can be predicted by spatial genetic diversity within a single biogeographic region. If historic processes are important in structuring contemporary local communities, we hypothesize that community composition will differ significantly if grouped according to genetic breaks. Alternatively, if contemporary processes overwhelm the effects of history, then genetic groupings should not predict community groupings of species composition. Second, we investigate whether contemporary or historic factors are more important in structuring community composition. Theory and past work suggest that at local scales contemporary processes should be more evident than historic, therefore we hypothesize that local environmental variables should be the most important factor structuring community composition.

MATERIALS AND METHODS

Study System

To address our objectives, we sampled stream fish assemblages in the Nicaraguan

Depression in Lower Central America (Figure 1). Lower Central America, roughly encompassing modern day Panama and Costa Rica, is an ideal location to investigate the effects of historic and contemporary processes on terrestrial and aquatic taxa as it is a relatively new land mass with a complex geologic and climatic history (Bagley & Johnson, 2014b). The

Nicaraguan Depression is a large lowland extending from El Salvador to Northwestern Costa

Rica which formed during the Miocene-Pliocene due to extensional forces (Funk et al., 2009,

40

Figure 1). Several taxonomic groups display biogeographic or phylogeographic boundaries across the Nicaraguan Depression including birds (D’Horta et al., 2013; Jiménez & Ornelas,

2015; Prieto‐Torres et al., 2019; Rocha-Méndez et al., 2019), mice (Hardy et al., 2013), snakes

(Daza et al., 2010), bees (Duennes et al., 2017), and plants (Cavender‐Bares et al., 2015;

Rodríguez-Correa et al., 2017). Many of these breaks were likely created due to marine incursions or volcanic activity during the formation of the depression (Cavender‐Bares et al.,

2015; Patten & Smith-Patten, 2008). Interestingly, freshwater fish do not seem to exhibit similar biogeographic structure across the Nicaraguan Depression as the San Juan biogeographic province spans this area (Bussing, 1976; S. A. Smith & Bermingham, 2005). However, Bagley &

Johnson (2014a) discovered similar phylogeographic spatial breaks in three livebearing fish species, Poecilia gillii, Alfaro cultratus, and Xenophallus umbratilis, within the Nicaraguan

Depression between the Río Frio and Río San Carlos sub-basins of the Río San Juan drainage basin in Northwestern Costa Rica (Figure 1). This genetic break was dated to Miocene-mid-

Pliocene for Alfaro cultratus and Xenophallus umbratilis, and mid-late Pleistocene for Poecilia gillii, matching similar time estimates for other taxa (e.g. Cavender‐Bares et al., 2015; Daza et al., 2010; Hardy et al., 2013).

Since large-scale biogeographic patterns are mismatched with small-scale genetic patterns for freshwater fish in this portion of the Nicaraguan Depression, it is unclear which patterns local-scale community composition will display and whether historic or contemporary factors are more important in structuring community diversity. If grouping variables created using the results of Bagley & Johnson (2014a) are important predictors of community composition, it suggests that the same historic processes that created the genetic diversity likely have had a lasting effect on community diversity. Likewise, if communities are structured

41

according to contemporary environmental or landscape scale factors, it suggests that history has

not played a role in structuring communities.

Data Acquisition

Fish Sampling We sampled communities in small streams in Northwestern Costa Rica within the

Nicaraguan Depression in May of 2019 (Figure 1). Fish were captured using a handheld seine

with one millimeter mesh size and identified in the field if possible. When taxonomic

identifications were uncertain, we collected voucher specimens of each species by humanely

euthanizing individuals with an overdose of MS-222 and keying them in the lab using the keys in

Angulo et al. (2018), Angulo & Gracian-Negrete (2013), Bussing (1998), Matamoros et al.

(2013), and Schmitter-Soto (2017). For species in the family Cichlidae, we modified the key in

Bussing (1998) using updated taxonomic information from Říčan et al. (2008) and Řïčan et al.

(2016, Appendix C). Communities were sampled until no new species were captured with each locality having approximately equal sampling (number of seines: 23 – 62, mean = 42.6). We used the fish incidence data to create a presence-absence matrix for all analyses, however, to assess the predictive power of the phylogeographic predictors and avoid circularity, we removed

Poecilia gillii, Alfaro cultratus, and Xenophallus umbratilis. We also removed all non-native species and species which occurred in only one community.

Environmental and Landscape Variables We measured several water chemistry variables and local habitat characteristics to classify the environmental variation between sampling sites. We measured pH, temperature, and electroconductivity using a Hannah Combo Multiparameter Meter and canopy cover using a model A spherical densiometer. We measured average stream width along the middle of the

42 sampled stream reach or pool using a tape measure; average depth was calculated by averaging the depth every foot along the width transect. We also estimated substrate composition using a zig-zag Wolman pebble count method with a Wildco gravelometer field sieve. We broke substrate composition into five classes: 1) organic material (woody debris, leaves, etc.); 2) sand

(<2 mm); 3) gravel (2–63 mm); 4) cobble (64–180 mm); and 5) boulder (>180 mm). To assess substrate complexity, we calculated the Shannon-Weiner diversity index (Shannon and Weaver

1963) and Pielou’s evenness (Pielou 1966) on the percentages of each substrate class.

In addition to the local variables, we also calculated several landscape variables known to affect stream community composition using geographic information system (GIS) techniques.

Elevation was estimated by extracting the elevation value from the Shuttle Radar Topography

Mission 1 arc-second global digital elevation model (Farr et al., 2007) available through the

USGS EarthExplorer webpage (https://earthexplorer.usgs.gov/). Stream gradient was calculated by extracting the elevation from points 500 meters up and downstream from the sampling location and dividing the difference between these points by the distance travelled (1 km). If the stream emptied into a larger river before reaching 500 meters downstream, the confluence point was used instead, and the distance shortened appropriately. Strahler order, an indicator of stream size and complexity, was calculated by snapping points to the nearest river of the HydroRivers v

1.0 shapefile and extracting the value in QGIS version 3.10 (Lehner et al., 2008). Lastly, the mean of average monthly maximum and minimum air temperature, the mean of monthly total precipitation (a proxy for stream discharge), the average upstream land cover percentage of evergreen broadleaf trees, mixed and other trees, and cultivated and managed vegetation, and the number of upstream flow grid cells and the number of upstream catchment cells were extracted by snapping points to the global 1 km freshwater environmental variables datasets available from

43

EarthEnv ( http://www.earthenv.org/; Domisch, Amatulli, & Jetz, 2015). Elevation was extracted using the ‘raster’ package (Hijmans, 2020), stream gradient using the ‘streamgradientR’ package

(https://github.com/trevorjwilli/streamgradientR), and all other variables (other than Strahler order) using custom scripts in R version 4.0.3 (R Core Team, 2020).

Spatial Variables To assess the importance of dispersal and the spatial arrangement of sites on community structure we calculated spatial variables using Principal Coordinates of Neighbourhood Matrix

(PCNM) with a threshold set to the minimum distance that gave a connected network (Borcard &

Legendre, 2002). PCNM analyses were conducted using a distance matrix calculated along river segments to better represent dispersal routes between sampling localities. To calculate river distance, we digitized the river network using ArcMap version 10.7.1 (ESRI 2019) using ESRI’s world imagery tile layer for satellite imagery. In areas where river paths were uncertain due to poor satellite imagery or cloud cover, rivers were digitized following paths in the cuace y drenaje shapefile available at the Sistema Nacional de Informacion Territorial (SNIT) website

(snitcr.go.cr). River distance between sites was then calculated using the ‘riverdist’ package in R

(Tyers, 2020). PCNM variables were calculated using the ‘vegan’ package (Oksanen et al.,

2020).

History Variables We created three categorical variables to map sampled communities to the genetic groupings identified by Bagley & Johnson (2014a). Genetic groupings were identified from sequences of the mitochondrial cytochrome b gene from populations of Poecilia gillii, Alfaro cultratus, and Xenophallus umbratilis using SAMOVA 1.0 (Dupanloup et al., 2002). SAMOVA defines genetically homogeneous, maximally differentiated spatial population clusters using

44 simulated annealing (Dupanloup et al., 2002). We assigned communities to the genetic clusters of the closest sampled population within the same drainage for each of the three species analyzed in Bagley & Johnson (2014a). When communities and populations did not occur in the same drainage, the closest population in the nearest drainage was used to assign cluster values. For more information on the genetic data see Bagley & Johnson (2014a).

Data Analysis

To see if community diversity could be predicted by and was correlated with genetic diversity, we ran a Permutational Multivariate Analysis of Variance (PERMANOVA; Anderson,

2001) on pairwise presence-absence standardized Bray-Curtis dissimilarities using the genetic cluster history variables as predictors. Since each of the three species produced similar but slightly different groupings, we ran three separate PERMANOVAs using only the groupings of each species individually as predictors. We compared the results of the PERMANOVAs to groups identified by hierarchical cluster analysis run with no a priori information. We ran hierarchical cluster analysis on the Bray-Curtis dissimilarity matrix using a complete linkage algorithm. To visualize groupings of community dissimilarity we ordinated Bray-Curtis dissimilarity scores using Nonmetric Multidimensional Scaling on three dimensions with 100 random starts.

To assess whether historic or contemporary factors were more important in structuring community composition we ran Redundancy Analyses (RDA) on the Hellinger transformed incidence matrix for each type of predictor (local environmental, landscape, spatial, and historic).

To better fit the assumption of normality we log transformed stream width, electroconductivity, total dissolved solids, elevation, mean precipitation, and the number of upstream flow and catchment cells. We ran initial RDAs using all variables and tested for significance of the global

45 model using Monte Carlo permutation tests (Legendre et al., 2011). If the global model was significant, we ran model selection using a forward stepwise approach to find the best explanatory variables and then fit them to a final RDA model (Blanchet et al., 2008). Initially we planned on running variation partitioning on our RDAs to investigate the contribution of each predictor type on community structure and investigate interactions between predictor types, however, since the global model was only significant for a single predictor type, we forgo that analysis as interpretation of variation partitions on non-significant predictors can be misleading.

All statistical tests were conducted in R using the ‘vegan’ package (Oksanen et al., 2020) or

‘stats’ package (R Core Team, 2020).

RESULTS

Each genetic grouping for the three focal species significantly predicted community composition patterns indicating that genetic diversity patterns are predictive of community diversity patterns (Table 1). NMDS plots along the first two axes corroborated these results, showing clear demarcations in community groupings following the genetic groupings of each species (Figure 2). Interestingly, hierarchical cluster analysis also identified two clusters of community types; however, the separation between clusters was not clearly demarcated between the Río Frio and Río San Carlos drainage basins and did not always align with genetic clusters

(Figures 2, 3). Instead, our cluster analysis identified one group containing communities with a higher prevalence of species endemic or near-endemic to the study area and further south (e.g.

Priapichthys annectans, Amatitlania septemfasciata, and Atherinella hubbsi) and another containing communities with a high prevalence of widespread species, species whose ranges extend further northward, and some endemic species (Figure 4 & D-1).

46

Our redundancy analyses indicate that only the historical variables, identified using the

SAMOVA groupings from genetic results, were significant predictors of community composition, with local environmental, landscape, and spatial predictors showing non- significance (Table 2). This result suggests that historic factors, rather than contemporary, are more important in structuring community composition in this area. Model selection on the RDA for historical groups found the model containing only the Alfaro cultratus predictor as the best model. This is likely because the Poecilia and Xenophallus groupings were similar and contained little additional information (Figure 2). R-squared values for the history RDA indicate that it explained around 9% of the variation in community composition (Table 2).

DISCUSSION

Principal Findings

Our results show that historical processes can have lasting impacts on local community organization within a single biogeographic region, contrasting with traditionally held beliefs in community ecology and biogeography (Ricklefs, 2004; Wiens, 2011; Wiens & Donoghue,

2004). Additionally, our hypothesis that environmental factors would largely explain community structure was unsupported. In typical metacommunity analyses, this result, in conjunction with the insignificance of the spatial variables, would indicate that the sampled communities show an undetermined metacommunity type (Cottenie, 2005). However, our results highlight that including genetic data in these types of analyses can have profound impacts on inferences regarding community assembly.

47

Historical Processes Can Affect Local Communities

Previous research has been mixed on whether regional and historic factors remain

important for local community diversity and composition (Heino et al., 2017; Herault & Honnay,

2005; Hoeinghaus et al., 2007; Souffreau et al., 2015; Van der Gucht et al., 2007; Vyverman et

al., 2007). When significant, these historic effects are largely attributed to turnover in regional

species pools, high levels of dispersal limitation, or colonization-extinction dynamics (e.g.

Herault & Honnay, 2005; Hoeinghaus et al., 2007; Vyverman et al., 2007). Whereas these processes typically occur over long temporal scales, other research has indicated that historical contingency through priority effects can also influence community composition on shorter temporal scales (Chase, 2003; Fukami, 2015). In our study system, it is unlikely that regional turnover or dispersal limitation are structuring community composition as most of the sampled species have distributions that span the study area, and we limited the investigation to a single biogeographical region (Figure 1). Additionally, we found no evidence for dispersal limitation given that the spatial variables were non-significant predictors. Rather, it may be that long-term extinction-recolonization dynamics and priority effects following historic eustatic sea-level fluctuations are the factors creating the split in community composition between the Río Frio and

Río San Carlos sub-basins.

Sea water is an effective barrier for freshwater organisms; therefore eustatic sea level change can affect genetic diversity in freshwater organisms by isolating basins during high stands or connecting drainages through river coalescence during low stands (Bagley et al., 2013;

Bagley & Johnson, 2014a; Jones & Johnson, 2009; Unmack et al., 2012, 2013). For example, genetic diversification in Xenophallus umbratilis within our study area shows striking concordance with marine inundation events over the last 5 million years, suggesting that

48 extinction in inundated areas was followed by recolonization and diversification (Jones &

Johnson, 2009). For our three focal species, Bagley and Johnson (2014a) suggested that a combination of climatic sea level change and geologic change associated with the volcanic rise of the central cordillera were the mechanisms that likely created the shared genetic subdivisions between the Río Frio and Río San Carlos drainages. Though it cannot be determined with certainty, our results suggest that these same mechanisms have had lasting effects on community composition. Our study area has had a complex geologic history and was inundated by marine excursions several times during the Cenozoic period, which would have resulted in extinction of freshwater taxa (Bagley & Johnson, 2014b; Coates & Obando, 1996). During times of low sea level, fish would then recolonize the area from refugium to the north or south of the Nicaraguan

Depression. Though not perfectly congruent, our analyses indicate that most species with higher incidences in the Río San Carlos group or Cluster One of the hierarchical cluster analysis likely originated from southern refugium and diversified from their sister taxa to the south, whereas the opposite is true for species with higher incidences in the Río Frio group and Cluster Two

(Figures 2, 3, 4, D-1; Table D-1). Our results align well with other studies which have discovered a complex colonization history of the region with different species colonizing at different times and from different sources (Table D-1; Reeves & Bermingham, 2006; Reznick et al., 2017;

Říčan et al., 2013). Our results indicate that this complex colonization history has had lasting effects on the likelihood of species incidences within local communities.

Though our results indicate that history is the dominant factor governing community composition in our study area, we stress that other biotic and abiotic variables are still important factors for controlling community structure. Our sampled communities lie in a relatively small portion of the Nicaraguan Depression. Had we sampled a larger geographic extent, the effects of

49

environmental filtering, dispersal limitation, or biotic interactions would likely have been much

more significant (Mathew A. Leibold & Chase, 2018; Viana & Chase, 2019). Given this

information though, the importance of historical processes on the composition of local

communities in this small of an extent is rather remarkable given the context of traditional

biogeographic studies which claim that history is much more likely to be influential at large

spatial scales (Jenkins & Ricklefs, 2011; Ricklefs & Jenkins, 2011). Though true that the signal

of historic processes will be apparent between biogeographic provinces or regions, we have

shown that history can also be important within a province or region. Our samples all fall within

the San Juan Ichthyological Province (Bussing, 1976; S. A. Smith & Bermingham, 2005),

however, by using local community data rather than basin wide data we found evidence of

micro-structure that was missed in these previous studies. The sampled species occur in both

clusters and can be found throughout the region, but our results indicate that the number or

likelihood of species incidences within local communities varies between clusters or groups

because of historical colonization dynamics. This suggests that species which colonized the

different portions of the region first following basin reconnection exhibit a sort of long-term and

large-scale priority effects. Initial colonizers did not completely exclude later colonizers;

however, initial colonizers are more likely to be found in a larger portion of local communities

than later colonizers.

Integrating Genetic, Community, and Biogeographic Information

A growing number of researchers have called for greater integration between ecology,

biogeography, population genetics, and phylogenetics (Cavender-Bares et al., 2009; Hickerson et al., 2010; Jenkins & Ricklefs, 2011; Ricklefs & Jenkins, 2011; Vellend, 2010, 2016; Wiens,

2011). With large data sets and genomic information becoming more readily available, this

50 integration is becoming much more feasible. For example, metacommunity ecology adds aspects of biogeography to community ecology by explicitly incorporating spatial scale and dispersal into investigations of community structure (Holyoak et al., 2005; Mathew A. Leibold & Chase,

2018; Matthew A. Leibold et al., 2004). Phylogenetic methods have even begun to be incorporated into metacommunity analyses (e.g. Borregaard et al., 2014; Mathew A Leibold et al., 2010; Pillar et al., 2010). These studies all used community-wide phylogenies; in contrast, we show that intraspecific genetic variation can also be used to predict variation in community composition and help infer the processes governing community structure. By incorporating phylogeographic information as a proxy for historical events into traditional analyses of metacommunity ecology (i.e., variation partitioning), we were able to more effectively infer the factors that structure community composition. Without the inclusion of historic variables, we would have been unable to determine any significant structure and wrongfully concluded that community composition was randomly determined.

We propose that species histories should be included in analyses of community ecology to determine the effects of local, regional, and historic factors on community structure more accurately. Phylogeographic methods are particularly well suited to these types of studies as they investigate the historic effects on intraspecific genetic diversification and therefore are appropriate for use on the same spatial scale as metacommunity ecology. When combined with biogeographic information, even with our small genetic dataset (single mitochondrial gene for three species), we were able to infer how historical processes structured communities. However, our methods are still largely qualitative and more statistically robust techniques are needed for better inference. Larger datasets with more loci could be used to more explicitly explore patterns of colonization and extinction in our study area using a mixture of species ecological modeling

51

and model choice with Approximate Bayesian Computation (e.g. Ortego & Knowles, 2020;

Papadopoulou & Knowles, 2016). Alternatively, new methods of simulating metacommunities

by incorporating population genetics techniques have recently been developed (Overcast et al.,

2019). These types of models show promise as they could also be used in conjunction with

Approximate Bayesian Computation to estimate how communities assemble under different

historic or biogeographic scenarios (Williams & Johnson, personal communication). The

integration of population genetics, phylogeography, community ecology, and biogeography

shows exciting promise for the elucidation of community assembly in time and space. As

demonstrated, this integration can be particularly useful for uncovering the effects of local and

historical factors on community composition.

ACKNOWLEDGMENTS

We appreciate the support of Javier Guevara Siquiera, Lourdes Vargas Fallas, and Sandra

Díaz Alvarado at the Vida Silvestre, Ministrio del Ambiente y Energia (MINAE), Sistema

Nacional de Áreas de Conservación (SINAC), Costa Rica, who processed our collecting permits.

Fish were collected under permit number R-SINAC-PNI-ACAHN-011-2019. We thank Kaitlyn and Spencer Golden for help collecting specimens. Funding for this research was provided by the

Department of Biology and Graduate Studies at Brigham Young University.

52

REFERENCES

Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance.

Austral Ecology, 26(1), 32–46. https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x

Angulo, A., & Gracian-Negrete, J. M. (2013). A new species of brycon (characiformes:

Characidae) from Nicaragua and costa rica, with a key to the lower mesoamerican species

of the genus. Zootaxa, 3731(2), 255–266. https://doi.org/10.11646/zootaxa.3731.2.6

Angulo, A., Santos, A. C., López, M., Langeani, F., & Mcmahan, C. D. (2018). A new species of

Astyanax (Characiformes: Characidae) from Costa Rica and Panama, with a key to the

lower Central American species of the genus. Journal of Fish Biology, 92(6), 1866–1887.

https://doi.org/10.1111/jfb.13626

Bagley, J. C., & Johnson, J. B. (2014a). Testing for shared biogeographic history in the lower

Central American freshwater fish assemblage using comparative phylogeography:

concerted, independent, or multiple evolutionary responses? Ecology and Evolution, 4(9),

1686–1705. https://doi.org/10.1002/ece3.1058

Bagley, J. C., & Johnson, J. B. (2014b). Phylogeography and biogeography of the lower Central

American Neotropics: diversification between two continents and between two seas.

Biological Reviews, 89(4), 767–790. https://doi.org/10.1111/brv.12076

Bagley, J. C., Sandel, M., Travis, J., De, M., Lozano-Vilano, L., & Johnson, J. B. (2013).

Paleoclimatic modeling and phylogeography of least killifish, Heterandria formosa: insights

into Pleistocene expansion-contraction dynamics and evolutionary history of North

American Coastal Plain freshwater biota. BMC Evolutionary Biology, 13, 223.

53

https://doi.org/10.1186/1471-2148-13-223

Blanchet, F. G., Legendre, P., & Borcard, D. (2008). Forward selection of explanatory variables.

Ecology, 89(9), 2623–2632. https://doi.org/10.1890/07-0986.1

Bonada, N., Múrria, C., Zamora-Muñoz, C., El Alami, M., Poquet, J. M., Puntí, T., Moreno, J.

L., Bennas, N., Alba-Tercedor, J., Ribera, C., & Prat, N. (2009). Using community and

population approaches to understand how contemporary and historical factors have shaped

species distribution in river ecosystems. Global Ecology and Biogeography, 18(2), 202–

213. https://doi.org/10.1111/j.1466-8238.2008.00434.x

Borcard, D., & Legendre, P. (2002). All-scale spatial analysis of ecological data by means of

principal coordinates of neighbour matrices. Ecological Modelling, 153(1–2), 51–68.

https://doi.org/10.1016/s0304-3800(01)00501-4

Borregaard, M. K., Rahbek, C., Fjeldså, J., Parra, J. L., Whittaker, R. J., & Graham, C. H.

(2014). Node-based analysis of species distributions. Methods in Ecology and Evolution,

5(11), 1225–1235. https://doi.org/10.1111/2041-210x.12283

Bussing, W. A. (1976). Geographic distribution of the San Juan ichthyofauna of Central America

with remarks on its origin and ecology. In T. B. Thorson (Ed.), Investigations of the

Ichthyofauna of Nicaraguan Lakes (pp. 156–175). School of Life Sciences, University of

Nebraska Lincoln.

Bussing, W. A. (1998). Peces de las Aguas Continentales de Costa Rica (2nd ed.). Editorial de la

Universidad de Costa Rica.

Cavender-Bares, J., Kozak, K. H., Fine, P. V. A., & Kembel, S. W. (2009). The merging of

54

community ecology and phylogenetic biology. Ecology Letters, 12(7), 693–715.

https://doi.org/10.1111/j.1461-0248.2009.01314.x

Cavender‐Bares, J., González‐Rodríguez, A., Eaton, D. A. R., Hipp, A. A. L., Beulke, A., &

Manos, P. S. (2015). Phylogeny and biogeography of the American live oaks ( Quercus

subsection Virentes ): a genomic and population genetics approach. Molecular Ecology,

24(14), 3668–3687. https://doi.org/10.1111/mec.13269

Chan, Y. L., Schanzenbach, D., & Hickerson, M. J. (2014). Detecting Concerted Demographic

Response across Community Assemblages Using Hierarchical Approximate Bayesian

Computation. Molecular Biology and Evolution, 31(9), 2501–2515.

https://doi.org/10.1093/molbev/msu187

Chase, J. M. (2003). Community assembly: when should history matter? Oecologia, 136(4),

489–498. https://doi.org/10.2307/4223703

Coates, A. G., & Obando, J. A. (1996). The geologic evolution of the Central American isthmus.

In J. B. C. Jackson, A. F. Budd, & A. G. Coates (Eds.), Evolution and Environment in

Tropical America (pp. 21–56). The University of Chicago Press.

Cottenie, K. (2005). Integrating environmental and spatial processes in ecological community

dynamics. Ecology Letters, 8(11), 1175–1182. https://doi.org/10.1111/j.1461-

0248.2005.00820.x

D’Horta, F. M., Cuervo, A. M., Ribas, C. C., Brumfield, R. T., & Miyaki, C. Y. (2013).

Phylogeny and comparative phylogeography ofSclerurus(Aves: Furnariidae) reveal constant

and cryptic diversification in an old radiation of rain forest understorey specialists. Journal

of Biogeography, 40(1), 37–49. https://doi.org/10.1111/j.1365-2699.2012.02760.x

55

Daza, J. M., Castoe, T. A., & Parkinson, C. L. (2010). Using regional comparative

phylogeographic data from snake lineages to infer historical processes in Middle America.

Ecography, no-no. https://doi.org/10.1111/j.1600-0587.2010.06281.x

Domisch, S., Amatulli, G., & Jetz, W. (2015). Near-global freshwater-specific environmental

variables for biodiversity analyses in 1 km resolution. Scientific Data, 2, 150073.

https://doi.org/10.1038/sdata.2015.73

Duennes, M. A., Petranek, C., De Bonilla, E. P. D., Mérida-Rivas, J., Martinez-López, O., Sagot,

P., Vandame, R., & Cameron, S. A. (2017). Population genetics and geometric

morphometrics of the Bombus ephippiatus species complex with implications for its use as

a commercial pollinator. Conservation Genetics, 18(3), 553–572.

https://doi.org/10.1007/s10592-016-0903-9

Dupanloup, I., Schneider, S., & Excoffier, L. (2002). A simulated annealing approach to define

the genetic structure of populations. Molecular Ecology, 11(12), 2571–2581.

https://doi.org/10.1046/j.1365-294x.2002.01650.x

Eldon, J., Price, J. P., Magnacca, K., & Price, D. K. (2013). Patterns and processes in complex

landscapes: testing alternative biogeographical hypotheses through integrated analysis of

phylogeography and community ecology in Hawai’i. Molecular Ecology, 22(13), 3613–

3628. https://doi.org/10.1111/mec.12326

Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M.,

Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin,

M., Burbank, D., & Alsdorf, D. (2007). The shuttle radar topography mission. Reviews of

Geophysics, 45(2), 1–33. https://doi.org/10.1029/2005RG000183.1.INTRODUCTION

56

Fukami, T. (2015). Historical Contingency in Community Assembly: Integrating Niches, Species

Pools, and Priority Effects. Annual Review of Ecology, Evolution, and Systematics, 46(1),

1–23. https://doi.org/10.1146/annurev-ecolsys-110411-160340

Funk, J., Mann, P., McIntosh, K., & Stephens, J. (2009). Cenozoic tectonics of the Nicaraguan

depression, Nicaragua, and Median trough, El Salvador, based on seismic-reflection

profiling and remote-sensing data. Bulletin of the Geological Society of America, 121(11–

12), 1491–1521. https://doi.org/10.1130/B26428.1

Giam, X., & Olden, J. D. (2016). Environment and predation govern fish community assembly in

temperate streams. Global Ecology and Biogeography, 25(10), 1194–1205.

https://doi.org/10.1111/geb.12475

Gonçalves-Souza, T., Romero, G. Q., & Cottenie, K. (2014). Metacommunity versus

Biogeography: A Case Study of Two Groups of Neotropical Vegetation-Dwelling

Arthropods. PLoS ONE, 9(12), e115137. https://doi.org/10.1371/journal.pone.0115137

Hardy, D. K., González-Cózatl, F. X., Arellano, E., & Rogers, D. S. (2013). Molecular

phylogenetics and phylogeographic structure of Sumichrast’s harvest mouse

(Reithrodontomys sumichrasti: Cricetidae) based on mitochondrial and nuclear DNA

sequences. Molecular Phylogenetics and Evolution, 68(2), 282–292.

https://doi.org/10.1016/j.ympev.2013.03.028

Heino, J., Soininen, J., Alahuhta, J., Lappalainen, J., & Virtanen, R. (2017). Metacommunity

ecology meets biogeography: effects of geographical region, spatial dynamics and

environmental filtering on community structure in aquatic organisms. Oecologia, 183(1),

121–137. https://doi.org/10.1007/s00442-016-3750-y

57

Herault, B., & Honnay, O. (2005). The relative importance of local, regional and historical

factors determining the distribution of plants in fragmented riverine forests: an emergent

group approach. Journal of Biogeography, 32(12), 2069–2081.

https://doi.org/10.1111/j.1365-2699.2005.01351.x

Hickerson, M. J., Carstens, B. C., Cavender-Bares, J., Crandall, K. A., Graham, C. H., Johnson,

J. B., Rissler, L., Victoriano, P. F., & Yoder, A. D. (2010). Phylogeography’s past, present,

and future: 10 years after Avise, 2000. Molecular Phylogenetics and Evolution, 54(1), 291–

301. https://doi.org/10.1016/j.ympev.2009.09.016

Hijmans, R. J. (2020). raster: Geographic Data Analysis and Modeling. https://cran.r-

project.org/package=raster

Hoeinghaus, D. J., Winemiller, K. O., Birnbaum, J. S., & Birnbaumt, J. S. (2007). Local and

regional determinants of stream fish assemblage structure: inferences based on taxonomic

vs. functional groups. Journal of Biogeography, 34(34), 324–338.

http://www.jstor.org/stable/4125126

Holyoak, M., Leibold, M. A., Mouquet, N., Holt, R. D., & Hoopes, M. F. (2005).

Metacommunities: A framework for large-scale community ecology. In M. Holyoak, M. A.

Leibold, & R. D. Holt (Eds.), Metacommunities: Spatial Dynamics and Ecological

Communities (pp. 1–31). University of Chicago Press.

Hubbs, C. L., & Miller, R. R. (1948). The Zoological Evidence: Correlation between Fish

Distribution and Hydrographic History in the Desert Basins of Western United States. In

The Great Basin: With Emphasis on Glacial and Postglacial Times (pp. 18–166).

University of Utah.

58

Jackson, D. A., Peres-Neto, P. R., & Olden, J. D. (2001). What controls who is where in

freshwater fish communities — the roles of biotic, abiotic, and spatial factors. Canadian

Journal of Fisheries and Aquatic Sciences, 58, 157–170. https://doi.org/10.1139/cjfas-58-1-

157

Jenkins, D. G., & Ricklefs, R. E. (2011). Biogeography and ecology: two views of one world.

Philosophical Transactions: Biological Sciences, 366(1576), 2331–2335.

https://doi.org/10.2307/23035484

Jiménez, R. A., & Ornelas, J. F. (2015). Historical and current introgression in a Mesoamerican

hummingbird species complex: a biogeographic perspective. PeerJ, 4, e1556.

https://doi.org/10.7717/peerj.1556

Jones, C. P., & Johnson, J. B. (2009). Phylogeography of the livebearer Xenophallus umbratilis

(Teleostei: Poeciliidae): glacial cycles and sea level change predict diversification of a

freshwater tropical fish. Molecular Ecology, 18(8), 1640–1653.

https://doi.org/10.1111/j.1365-294X.2009.04129.x

Legendre, P., Oksanen, J., & Ter Braak, C. J. F. (2011). Testing the significance of canonical

axes in redundancy analysis. Methods in Ecology and Evolution, 2(3), 269–277.

https://doi.org/10.1111/j.2041-210x.2010.00078.x

Lehner, B., Verdin, K., & Jarvis, A. (2008). New Global Hydrography Derived From Spaceborne

Elevation Data. Eos, Transactions American Geophysical Union, 89(10), 93.

https://doi.org/10.1029/2008eo100001

Leibold, Mathew A., & Chase, J. M. (2018). Metacommunity Ecology. Princeton University

Press.

59

Leibold, Mathew A., Economo, E. P., & Peres-Neto, P. R. (2010). Metacommunity

phylogenetics: separating the roles of environmental filters and historical biogeography.

Ecology Letters, 13(10), 1290–1299. https://doi.org/10.1111/j.1461-0248.2010.01523.x

Leibold, Matthew A., Holyoak, M., Mouquet, N., Amarasekare, P., Chase, J. M., Hoopes, M. F.,

Holt, R. D., Shurin, J. B., Law, R., Tilman, D., Loreau, M., & Gonzalez, A. (2004). The

metacommunity concept: A framework for multi-scale community ecology. Ecology

Letters, 7(7), 601–613. https://doi.org/10.1111/j.1461-0248.2004.00608.x

Loreau, M. (2009). Communities and Ecosystems. In S. A. Levin, S. R. Carpenter, H. C. J.

Godfrey, A. P. Kinzig, M. Loreau, J. B. Losos, B. Walker, & D. S. Wilcove (Eds.), The

Princeton Guide to Ecology (pp. 253–255). Princeton University Press.

Matamoros, W. A., Chakrabarty, P., Angulo, A., Garita-Alvarado, C. A., & McMahan, C. D.

(2013). A new species of Roeboides (Teleostei: Characidae) from Costa Rica and Panama,

with a key to the middle American species of the genus. Neotropical Ichthyology, 11(2),

285–290. https://doi.org/10.1590/s1679-62252013000200006

Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., Minchin, P. R.,

O’Hara, R. B., Simpson, G. L., Solymos, P., Stevens, M. H. H., Szoecs, E., & Wagner, H.

(2020). vegan: Community Ecology Package. https://cran.r-project.org/package=vegan

Ortego, J., & Knowles, L. L. (2020). Incorporating interspecific interactions into

phylogeographic models: A case study with Californian oaks. Molecular Ecology, 29(23),

4510–4524. https://doi.org/10.1111/mec.15548

Overcast, I., Emerson, B. C., & Hickerson, M. J. (2019). An integrated model of population

genetics and community ecology. Journal of Biogeography, 46(4), 816–829.

60

https://doi.org/10.1111/jbi.13541

Papadopoulou, A., & Knowles, L. L. (2016). Toward a paradigm shift in comparative

phylogeography driven by trait-based hypotheses. Proceedings of the National Academy of

Sciences, 113(29), 8018–8024. https://doi.org/10.1073/pnas.1601069113

Patten, M. A., & Smith-Patten, B. D. (2008). Biogeographical boundaries and Monmonier’s

algorithm: a case study in the northern Neotropics. Journal of Biogeography, 35(3), 407–

416. https://doi.org/10.1111/j.1365-2699.2007.01831.x

Peres-Neto, P. R. (2004). Patterns in the co-occurrence of fish species in streams: the role of site

suitability, morphology and phylogeny versus species interactions. Oecologia, 140(2), 352–

360. http://www.jstor.org/stable/40005673

Pillar, V. D., & Duarte, L. D. S. (2010). A framework for metacommunity analysis of

phylogenetic structure. Ecology Letters, 13(5), 587–596. https://doi.org/10.1111/j.1461-

0248.2010.01456.x

Prieto‐Torres, D. A., Rojas‐Soto, O. R., Bonaccorso, E., Santiago‐Alarcon, D., & Navarro‐

Sigüenza, A. G. (2019). Distributional patterns of Neotropical seasonally dry forest birds: a

biogeographical regionalization. Cladistics, 35(4), 446–460.

https://doi.org/10.1111/cla.12366

R Core Team. (2020). R: A Language and Environment for Statistical Computing. https://www.r-

project.org/

Reeves, R. G., & Bermingham, E. (2006). Colonization, population expansion, and lineage

turnover: phylogeography of Mesoamerican characiform fish. Biological Journal of the

61

Linnean Society, 88(2), 235–255. https://doi.org/10.1111/j.1095-8312.2006.00619.x

Reznick, D. N., Furness, A. I., Meredith, R. W., & Springer, M. S. (2017). The origin and

biogeographic diversification of fishes in the family Poeciliidae. PLOS ONE, 12(3),

e0172546. https://doi.org/10.1371/journal.pone.0172546

Říčan, O., Piálek, L., Dragová, K., & Novák, J. (2016). Diversity and evolution of the Middle

American fishes (Teleostei: Cichlidae) with revised classification. Vertebrate

Zoology, 66(1), 1–102.

Říčan, O., Piálek, L., Zardoya, R., Doadrio, I., & Zrzavý, J. (2013). Biogeography of the

Mesoamerican Cichlidae (Teleostei: ): Colonization through the GAARlandia land

bridge and early diversification. Journal of Biogeography, 40(3), 579–593.

https://doi.org/10.1111/jbi.12023

Říčan, O., Zardoya, R., & Doadrio, I. (2008). Phylogenetic relationships of Middle American

(Cichlidae, Heroini) based on combined evidence from nuclear genes, mtDNA, and

morphology. Molecular Phylogenetics and Evolution, 49(3), 941–957.

https://doi.org/10.1016/j.ympev.2008.07.022

Ricklefs, R. E. (1987). Community diversity: relative roles of local and regional processes.

Science, 235(4785), 167–171.

Ricklefs, R. E. (2004). A comprehensive framework for global patterns in biodiversity. Ecology

Letters, 7(1), 1–15. https://doi.org/10.1046/j.1461-0248.2003.00554.x

Ricklefs, R. E., & Jenkins, D. G. (2011). Biogeography and ecology: towards the integration of

two disciplines. Philosophical Transactions of the Royal Society B: Biological Sciences,

62

366(1576), 2438–2448. https://doi.org/10.1098/rstb.2011.0066

Ricklefs, R. E., & Schluter, D. (Eds.). (1993). Species Diversity in Ecological Communities:

Historical and Geographical Perspectives. University of Chicago Press.

Rocha-Méndez, A., Sánchez-González, L. A., González, C., & Navarro-Sigüenza, A. G. (2019).

The geography of evolutionary divergence in the highly endemic avifauna from the Sierra

Madre del Sur, Mexico. BMC Evolutionary Biology, 19(1). https://doi.org/10.1186/s12862-

019-1564-3

Rodríguez-Correa, H., Oyama, K., Quesada, M., Fuchs, E. J., Quezada, M., Ferrufino, L.,

Valencia-Ávalos, S., Cascante-Marín, A., & González-Rodríguez, A. (2017). Complex

phylogeographic patterns indicate Central American origin of two widespread

Mesoamerican Quercus (Fagaceae) species. Tree Genetics & Genomes, 13(3).

https://doi.org/10.1007/s11295-017-1147-7

Schmitter-Soto, J. J. (2017). A revision of Astyanax (Characiformes: Characidae) in Central and

North America, with the description of nine new species. Journal of Natural History,

51(23–24), 1331–1424. https://doi.org/10.1080/00222933.2017.1324050

Smith, G. R., Dowling, T. E., Gobelet, K. W., Lugaski, T., Shiozawa, D. K., & R.P., E. (2002).

Biogeography and timing of evolutionary events among Great Basin fishes. In R. Hershler,

D. B. Madsen, & D. R. Currey (Eds.), Great Basin Aquatic Systems History (pp. 175–234).

Smithsonian Institution Press.

Smith, S. A., & Bermingham, E. (2005). The Biogeography of Lower Mesoamerican Freshwater

Fishes. Journal of Biogeography, 32(10), 1835–1854. http://www.jstor.org/stable/3566353

63

Souffreau, C., Van Der Gucht, K., Van Gremberghe, I., Kosten, S., Lacerot, G., Lobão, L. M.,

De Moraes Huszar, V. L., Roland, F., Jeppesen, E., Vyverman, W., & De Meester, L.

(2015). Environmental rather than spatial factors structure bacterioplankton communities in

shallow lakes along a > 6000 km latitudinal gradient in South America. Environmental

Microbiology, 17(7), 2336–2351. https://doi.org/10.1111/1462-2920.12692

Tyers, M. (2020). riverdist: River Network Distance Computation and Applications.

https://cran.r-project.org/package=riverdist

Unmack, P. J., Bagley, J. C., Adams, M., Hammer, M. P., & Johnson, J. B. (2012). Molecular

phylogeny and phylogeography of the australian freshwater fish genus Galaxiella, with an

emphasis on dwarf galaxias (G. pusilla). PLoS ONE, 7(6).

https://doi.org/10.1371/journal.pone.0038433

Unmack, P. J., Hammer, M. P., Adams, M., Johnson, J. B., & Dowling, T. E. (2013). The role of

continental shelf width in determining freshwater phylogeographic patterns in south-eastern

Australian pygmy perches (Teleostei: Percichthyidae). Molecular Ecology, 22(6), 1683–

1699. https://doi.org/10.1111/mec.12204

Vamosi, S. M., Heard, S. B., Vamosi, J. C., & Webb, C. O. (2009). Emerging patterns in the

comparative analysis of phylogenetic community structure. Molecular Ecology, 18(4), 572–

592. https://doi.org/10.1111/j.1365-294x.2008.04001.x

Van der Gucht, K., Cottenie, K., Muylaert, K., Vloemans, N., Cousin, S., Declerck, S., Jeppesen,

E., Conde-Porcuna, J.-M., Schwenk, K., Zwart, G., Degans, H., Vyverman, W., & De

Meester, L. (2007). The power of species sorting: local factors drive bacterial community

composition over a wide range of spatial scales. Proceedings of the National Academy of

64

Sciences of the United States of America, 104(51), 20404–20409.

https://doi.org/10.2307/25450904

Vellend, M. (2010). Conceptual synthesis of community ecology. The Quarterly Review of

Biology, 85(2), 183–206. https://doi.org/10.1086/652373

Vellend, M. (2016). The Theory of Ecological Communities. Princeton University Press.

Vellend, M., & Geber, M. A. (2005). Connections between species diversity and genetic

diversity. Ecology Letters, 8(7), 767–781. https://doi.org/10.1111/j.1461-

0248.2005.00775.x

Viana, D. S., & Chase, J. M. (2019). Spatial scale modulates the inference of metacommunity

assembly processes. Ecology, 100(2), e02576. https://doi.org/10.1002/ecy.2576

Vyverman, W., Verleyen, E., Sabbe, K., Vanhoutte, K., Sterken, M., Hodgson, D. A., Mann, D.

G., Juggins, S., Vijver, B. Van De, Jones, V., Flower, R., Roberts, D., Chepurnov, V. A.,

Kilroy, C., Vanormelingen, P., & Wever, A. De. (2007). Historical processes constrain

patterns in global diatom diversity. Ecology, 88(8), 1924–1931. https://doi.org/10.1890/06-

1564.1

Webb, C., Ackerly, D. D., Mcpeek, M. A., & Donoghue, M. J. (2002). Phylogenies and

Community Ecology. Annual Review of Ecology and Systematics, 33, 475–505.

Weiher, E., Freund, D., Bunton, T., Stefanski, A., Lee, T., & Bentivenga, S. (2011). Advances,

challenges and a developing synthesis of ecological community assembly theory.

Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1576), 2403–

2413. https://doi.org/10.1098/rstb.2011.0056

65

Wiens, J. J. (2011). The niche, biogeography, and species interactions. Philosophical

Transactions: Biological Sciences, 366(1576), 2336–2350.

https://doi.org/10.2307/23035485

Wiens, J. J., & Donoghue, M. J. (2004). Historical biogeography, ecology and species richness.

Trends in Ecology & Evolution, 19(12), 639–644. https://doi.org/10.1016/j.tree.2004.09.011

Xue, A. T., & Hickerson, M. J. (2015). The aggregate site frequency spectrum for comparative

population genomic inference. Molecular Ecology, 24, 6223–6240.

https://doi.org/10.1111/mec.13447

66

Table 1 Results of PERMANOVA analyses. Rows correspond to the predictor variables created by each of the three species SAMOVA groupings. Species DF F p-value Alfaro cultratus 1 4.374 0.007 Poecilia gillii 1 3.257 0.009 Xenophallus umbratilis 2 2.111 0.058

67

Table 2 Results of Redundancy Analysis for the different explanatory variable types. Starred p- values are significant. Variable Type Adjusted R2 F p-value Local Environment 0.181 1.319 0.127 Landscape 0.210 1.315 0.192 Spatial 0.029 1.039 0.459 Genetic Group 0.098 2.414 0.012*

68

Figure 1 Map showing the sampled localities overlaid on biogeographic provinces of freshwater fish in the Nicaraguan Depression in Costa Rica. Localities are colored according to the genetic break of Alfaro cultratus described in Bagley and Johnson (2014) between the Río Frio and Río San Carlos major drainage basins. Biogeographic province boundaries follow those described in Smith and Bermingham (2005) and Bussing (1998). Dotted lines indicate the approximate extent of the Nicaraguan Depression.

69

Figure 2 Non-metric multidimensional scaling of Bray-Curtis dissimilarities of species presence absence. Dots represent communities and are colored according to the genetic groupings of each species a) Alfaro cultratus, b) Poecilia gillii, c) Xenophallus umbratilis used as predictors in the PERMANOVAs. Blue = Rio Frio, Red = San Carlos/Low San Carlos, Green = High San Carlos. Shapes of points indicate the cluster that communities were placed in using hierarchical cluster analyses.

70

Figure 3 Comparison of genetic groupings to cluster analysis groupings. a) Dendrogram showing results of hierarchical cluster analysis on Bray-Curtis dissimilarities between communities. b) Map showing the spatial distribution of cluster analysis results. Dashed lines delineate the Nicaraguan Depression while the dotted line shows the split between the Río Frio and Río San Carlos Drainage basins following the genetic split in Alfaro cultratus.

71

Figure 4 NMDS plot of species scores overlaid on the community groupings according to genetics and hierarchical cluster analysis. Red and blue polygons mark the extent of the DNA groupings whereas dotted polygons show extent of the results of the cluster analysis. Species abbreviations are as follows: Asep – Amatitlania septemfasciata, Asiq – Amatitlania siquia, Aors – Astyanax orstedii, Ahub – Atherinella hubbsi, Bhold – Brachyrhaphis holdridgei, Bpar – Brachyrhaphis parismina, Bcos – Brycon costaricensis, Bscl – Bryconamericus scleroparius, Calf – Cribroheros alfari, Cros – Cribroheros rostratus, Hnic – nicaraguensis, Nnem – Neetroplus nematopus, Pdov – Parachromis dovii, Pman – Parachromis managuensis, Pama – Phallichthys amates, Pann – Priapichthys annectans, Rnic – Rhamdia nicaraguensis, Rbou – Roeboides bouchellei.

72

CHAPTER THREE

COMPARATIVE PHYLOGEOGRAPHY INFORMS COMMUNITY STRUCTURE AND

ASSEMBLY DURING AND AFTER HISTORIC LAKE BONNEVILLE

Trevor J. Williams1, Dennis K. Shiozawa1,2, Jerald B. Johnson1,2

1Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo, UT

2Monte L. Bean Life Science Museum, Brigham Young University, Provo UT, 84602

Formatted in the style of the Biological Journal of the Linnean Society

ABSTRACT

Dispersal is one of the major processes controlling both genetic and species diversity and is frequently studied in both phylogeography and community ecology. As such, integrating these fields to uncover how both historic and contemporary dispersal have affected local community structure can provide greater insights into community assembly. We used comparative phylogeography to see if freshwater fish species in the Bonneville Basin show evidence of recent dispersal, which would likely have occurred when the basin was inundated by Lake Bonneville

in the late Pleistocene. We then used museum records to uncover patterns of contemporary

community structure and relate them to the results of the phylogeographic analyses. We found

evidence for recent dispersal throughout the Bonneville Basin in most of the fish species studied,

which would have homogenized ancient communities. However, modern communities show

evidence of non-random community structure and dispersal limitation between major sub-basins

and habitats. Together, these results suggest that the Bonneville Basin fish fauna assembled due

to a combination of historic dispersal and contemporary habitat filtering and extinction dynamics

73

following isolation. Further work should continue to combine different data sets to achieve more

accurate inferences regarding community assembly.

INTRODUCTION

Dispersal, the movement of individuals or genes from one population or community to

another, is one of the major processes governing both genetic diversity in species (Wright, 1931;

Nielsen & Slatkin, 2013; Constable & McKane, 2014) and species diversity in communities

(Leibold et al., 2004; Vellend, 2010, 2016; Leibold & Chase, 2018). Early in the theoretical development of population genetics, dispersal (also called gene flow or migration) was recognized as a force that can homogenize diversity across subdivided populations (Wright,

1931; Slatkin, 1985). As such, intricate model-based approaches have been developed to allow researchers to directly estimate dispersal rates between populations (Marko & Hart, 2011). These methods have been used to investigate contemporary patterns of gene flow (e.g. Wilson and

Rannala 2003; Berry et al. 2004) as well as historical migration and colonization (Hey &

Nielsen, 2004; Hickerson & Meyer, 2008; Bagley & Johnson, 2014). Phylogeographical studies, in particular, have tried to elucidate how historical climatic or geological changes have influenced past patterns of gene flow and divergence (Hickerson et al., 2010). Such studies have helped researchers understand patterns of colonization and divergence during and after glacial cycles (Brown & Knowles, 2012; Bagley et al., 2013), in response to sea level change (Jones &

Johnson, 2009; Bagley & Johnson, 2014), as well as determining if species show patterns of isolation, isolation with migration, or secondary contact (Pinho & Hey, 2010; Souza, Thomaz, &

Fagundes, 2020).

In contrast to population genetics, community ecology has only recently begun to investigate how dispersal influences community diversity. However, the advent of

74 metacommunity ecology has revealed the enormous impact that dispersal can have on community structure (Leibold, 2009). In general, dispersal increases alpha diversity through the introduction of species from the regional species pool into local communities (Leibold, 2009;

Grainger & Gilbert, 2016). Alternatively, dispersal decreases beta diversity through the homogenization of species composition between communities (Leibold, 2009; Grainger &

Gilbert, 2016; Vellend, 2016). Although we know that dispersal is important during community assembly, it is uncertain if and how historical dispersal can have lasting effects on community structure. Unlike phylogeography, community ecology typically ignores historical processes when investigating diversity patterns (Ricklefs, 2004). Priority effects, the described colonization sequence of species during assembly, are an exception to this rule as several works have shown that the order of species colonization can impact subsequent community structure (reviewed in

Fukami, 2015). However, most studies still consider historical contingency on ecological timescales (Grman et al. 2010; Fukami 2015, but see Seehausen 2007), although it is known that both contemporary and historical factors can influence community assembly (Ricklefs, 1987;

Ricklefs & Schluter, 1993). As such, it is likely that historical migration can influence contemporary communities by altering the regional species pool. Studies that combine both phylogeographic data with local community data could inform the degree to which historic dispersal has lasting effects on the assembly of communities. It has been suggested that the incorporation of genetic analyses can help inform dispersal dynamics in metacommunities

(Heino et al., 2015), however few studies have empirically combined genetic and metacommunity data in explicit tests of dispersal and community assembly (but see Eldon et al.

2013; Salces-Castellano et al. 2021). We need additional studies that test how and if historic

75

dispersal influences contemporary community structure to determine whether past migration can

have lasting effects on community composition.

Herein, we combine genetic and local community data in freshwater fish in the

Bonneville Basin to test for evidence of historic migration and its lasting influence on

community structure. Specifically, we evaluate if fish species were able to move throughout the

Bonneville Basin during Pleistocene pluvial cycles and then show how this information can help

elucidate processes governing community assembly. Freshwater fish are perfect organisms to

investigate how historic dispersal influences contemporary community structure since their

distributional patterns are largely influenced by hydrologic history (Smith et al., 2002; Bagley &

Johnson, 2014). Furthermore, the hydrologic history of the Bonneville Basin during the

Pleistocene is relatively well known and can therefore inform hypotheses regarding fish movement and isolation during wet and dry cycles associated with climatic change.

For this study we had two objectives: 1) use genetic data on different fish species to test

for evidence of historic migration between now disconnected sub-basins of the Bonneville Basin and 2) investigate patterns of contemporary community structure and relate this to species migratory history. As such we have four hypotheses for how history can influence community assembly in this system, depending on whether fish show evidence of historic migration and local communities display random or non-random structure. If fish were unable to historically migrate throughout the basin and local communities show random structure, this suggests that the Bonneville Basin fish fauna exhibit long term homogenous communities that are unaffected by historic contingency or spatial or environmental gradients. Alternatively, if local communities show non-random structure in conjunction with no historic migration, this suggests that communities assembled according to contemporary processes following expansion from refugia.

76

If migration occurred and communities show random structure, this suggests that stochastic

ecological drift is largely responsible for community structure where fish species composition

was determined by random extinction following basin disconnection. Finally, if we find evidence

for historic migration and non-random structure then we can infer that historic dispersal allowed

community homogenization during high water stands but that local communities are governed by

deterministic contemporary processes.

METHODS

Study System

The Bonneville Basin is an internal drainage system at the eastern margin of the Great

Basin that extends throughout much of Utah and partially into Wyoming, Nevada, and Idaho

(Figure 1). During Pleistocene pluvial cycles, the Bonneville Basin was periodically inundated

by large lakes that connected previously (and currently) disjoint sub-basins (Oviatt & Currey,

1987; Oviatt, 1997; Reheis et al., 2014). The most recent of these large pluvial lakes to cover the

Bonneville basin was Lake Bonneville, which persisted between approximately 30,000 to 12,000

years ago (Oviatt, Currey, & Sack, 1992; Oviatt, 2015; Oviatt & Shroder Jr., 2016). At its

highest (termed the Bonneville Shoreline) Lake Bonneville covered an estimated 51,000 km2

(Currey 1990; Grayson 2011, Figure 1), before exceeding the lowest point on its northern rim resulting in a catastrophic overflow into the Snake River Basin called the Bonneville Flood

(Oviatt, 2015; Oviatt & Shroder Jr., 2016). After this period, Lake Bonneville receded to the

Provo Shoreline level which still covered approximately 38,000 km2 and was comparable in size

to other modern large lakes (Miller, 2016). Both the Bonneville and Provo periods of Lake

Bonneville connected several closed sub-basins, the largest three being the Great Salt Lake Basin

(GSL), Great Salt Lake Desert Basin (GSLD), and Sevier Basin (SEV; Figure 1; Sack 2002). As

77 lake levels that connected these three basins persisted for several thousand years (Oviatt, 2015;

Miller, 2016), it is possible that aquatic organisms were able to migrate between them during this time (Hubbs & Miller, 1948; Brown, 1971).

Studies on Bonneville Basin aquatic organisms frequently assume that Lake Bonneville homogenized populations and communities across the basin (Hubbs & Miller, 1948; Smith et al.,

2002). This hypothesis largely stems from the pioneering work of Carl Hubbs and Robert Rush

Miller on Great Basin fishes, particularly their monograph on the distribution of fish and hydrographic history in the Great Basin (Hubbs & Miller, 1948). Several studies seem to corroborate this hypothesis through evidence obtained from the fossil record, distributional patterns, or genetics (e.g. Johnson 2002; Broughton and Smith 2016) while others refute it (e.g.

Taylor and Bright 1987; Billman et al. 2010). As such, it is unclear exactly how these large pluvial lakes influenced the assembly of freshwater fish species into contemporary communities.

We do not know of any large-scale comparative study that has explicitly tested if fish species were able to migrate through Lake Bonneville and then related those findings to local community structure. As the entirety of the Bonneville Basin fish fauna pre-dates Lake Bonneville and was therefore affected by its hydrological variability (Broughton & Smith, 2016), the Bonneville

Basin is a perfect system to investigate the effects of history on local community organization.

Data Collection

Genetic Data To investigate whether fish were able to migrate throughout the Bonneville Basin during pluvial high-water periods, we collected sequence data from the literature and the National

Center for Biotechnology Information’s (NCBI) GenBank database for species native to the

Bonneville Basin (See Table E-1 for accension numbers). Sequences came from a variety of

78 studies investigating the systematics, phylogeography, and evolutionary history of Great Basin fishes (Johnson & Jordan, 2000; Smith et al., 2002, 2017; Johnson, 2002; Crowley, 2004; Mock

& Miller, 2005; Miller, 2006; Mock et al., 2006; Mock & Bjerregaard, 2007; Smith & Dowling,

2008; Billman et al., 2010; Houston, Shiozawa, & Riddle, 2010; Unmack et al., 2014). Most datasets we found consisted of only a single to a few mitochondrial genes. Although accurate estimation of parameters from complex models in a coalescent framework requires several loci

(Felsenstein, 2006) ideally from both nuclear and mitochondrial sources (Rubinoff & Holland,

2005), recent work has shown that mitochondrial DNA can be used to effectively select between different demographic histories with and without migration (Beerli & Palczewski, 2010; Souza et al., 2020). As the models we employed are relatively simple and the results of model selection analyses were consistent using different analytical frameworks we are confident that the data we collected can accurately determine if fish in the Bonneville Basin exhibit evidence of recent migration.

Prior to analyses, sequences were aligned using the default settings in MAFFT v 7.222

(Katoh & Standley, 2013). We used ModelTest-NG to find the best fit model of DNA substitution for each sequence alignment in each species keeping rate heterogeneity uniform

(Darriba et al., 2020). For species genotyped at multiple loci where the best model disagreed, we used the substitution model that was most consistently considered a good fit between different loci. Sequences for individuals genotyped at multiple loci were concatenated using MEGA X v

10.2.4, which was also used to estimate transition/transversion rates for each locus (Kumar et al.,

2018). Downstream analyses required a single value for the transition/transversion rate, so for datasets with multiple loci we used the average rate, except for in Prosopium williamsoni, for which we only used the rate estimated for the ND2 locus as the rate estimated for cytb was

79

unusually high. Additionally, we used the default value for Iotichthys phlegethontis as its rate

was also unusually high.

Community Data For community analyses, we used museum records of fish specimens collected by Carl

Hubbs between 1915 and 1950 which were an integral dataset used in Hubbs and Miller (1948).

Carl Hubbs, in collaboration with John Snyder, Robert Miller, and his wife and children,

conducted extensive sampling expeditions throughout the Bonneville Basin in 1915, 1934, 1938,

and 1942 with the purpose of surveying and documenting fish distribution and habits (Hubbs &

Miller, 1948; Miller & Shor, 1997). We created a dataset of fish occurrence by extracting

information of fish collections from these surveys from the databases of the University of

Michigan Museum of Zoology, the Brigham Young University Monte L. Bean Life Science

Museum, and the Ichthyology Collection of the California Academy of Sciences either directly

from each collection’s database or from the Fishnet2 portal (www.fishnet2.org). From this

dataset, we extracted site locality information to assign each sampling locality to one of the three

major sub-basins and created a presence-absence matrix of native species for community analyses. To limit the effects of rare species, we removed all species found at only a single site

from the presence-absence matrix.

Data Analyses

Genetic Analyses To test specific hypotheses of migration and colonization in Bonneville fishes we

estimated parameters of effective population size, migration rate, and splitting times using

Bayesian inference and coalescent theory (Kingman, 1982a,b) under various models in the

program MIGRATE v 4.4.4 (Figure 2; Beerli and Felsenstein 2001; Beerli 2006). MIGRATE

80

estimates these parameters using Markov Chain Monte Carlo under user specified models of

demographic history and migration. These models can then be compared in a model selection

approach using Bayes Factors to estimate the best fit model matching observed genetic data

(Beerli & Palczewski, 2010). For each species, except Prosopium williamsoni, we ran four separate models detailing the history of population splitting between the three major sub-basins either with or without subsequent migration (Figure 2). Since Prosopium williamsoni is only native to GSL, we instead ran a migration model between each sampling locality vs. a no- migration model with no specific divergence history specified. For species sampled in only the

GSL and SEV basins, models contained a single split with one population being derived from the other either with or without subsequent migration between basins (Figure 2 A-D). Alternatively, for species that had genetic data sampled from all three basins, we ran models where species originated in either GSL or SEV with subsequent colonization through GSLD to the other basin

(Figure 2 E-H). These scenarios are the most plausible given the geologic and hydrologic history of the Bonneville Basin. Fossil and genetic evidence suggest that speciation of the Bonneville

Basin fish fauna predates Lake Bonneville but is younger than the Cache Valley Member of the

Salt Lake Formation which formed in the Miocene approximately 10 Ma (Smith et al., 2002;

Long et al., 2006; Broughton & Smith, 2016). Modern fish fauna likely colonized the Bonneville

Basin either through northern connections with the Western Snake River Plain sometime in the

Mio-Pliocene or ancient southern connections with the pre-modern Virgin River, though evidence also exists for some direct connections between the Lahontan and Bonneville Basins

(Smith et al., 2002; Billman et al., 2010; Houston et al., 2014; Broughton & Smith, 2016). As these connections occurred before Pleistocene pluvial cycles, it is likely that fish then either moved northward or southward through the Bonneville Basin using early pluvial lake

81

connections or through river capture followed by subsequent diversification in interglacial

periods and possible reconnection in later high-water episodes. If colonization occurred in this

manner through early pluvial connections we would expect the colonization history outlined in

Figure 2, however direct connections between GSL and SEV are also possible (e.g. Oviatt 1987).

For all MIGRATE runs we ran two replicates each with 10000 recorded steps and a sampling increment of 1000 for a total of 20 million parameter values sampled. We ran each replicate with 4 chains with temperature values being 1, 1.5, 3, and 1000000 and a burn-in of

1000 trees per chain. Starting parameter values were chosen from priors at random with the mutation model and transversion/transition rates set according to results of ModelTest-NG and

Mega X analyses. All other settings, except those specifying the migration models and inheritance scalars, were kept as the default. We assessed convergence of model runs by checking the effective sample sizes for parameter values as well as posterior distributions. We attributed instances where the posteriors were multimodal but effective sample sizes were high to low power due to low numbers of loci or individuals (Figures E-1 – E-9).

Community Analyses To investigate if communities showed evidence of random or non-random structure, we ran null model analyses testing for species associations using a co-occurrence framework

(Gotelli, 2000) as well as a nestedness framework (Patterson & Atmar, 1986). For co-occurrence analyses, we used the re-scaled C-score (Stone & Roberts, 1990; Gotelli & Ulrich, 2010) as the co-occurrence index and simulated 5000 null models with a 500-community burn-in. In all simulations we kept column and row sums fixed using the curveball algorithm (Strona et al.,

2014). We ran these analyses on the entire dataset as well as within each major sub-basin and used two-tailed p-values for significance testing. For nestedness analyses we used the NODF

82

metric (Almeida-Neto et al., 2008) to test for species nestedness, site nestedness, and total

nestedness under the same simulation settings as those used for the co-occurrence analyses. In

addition to these matrix wide approaches, we also wanted to investigate if individual species

pairs showed significant positive or negative associations (Veech, 2014). To accomplish this, we

used the probabilistic framework where observed species co-occurrences are compared to the

number of co-occurrences expected by chance (Veech, 2013).

We also ran a Permutational Multivariate Analysis of Variance (PERMANOVA;

Anderson 2001) to test for differences in community composition between sub-basins. We visualized PERMANOVA results using Nonmetric Multidimensional Scaling (Kruskal, 1964;

Minchin, 1987) on two dimensions with 20 random starts. For both PERMANOVA and NMDS we used Bray-Curtis dissimilarities. Finally, to test for evidence of dispersal limitation, we created spatial variables using Moran’s eigenvector maps (Dray, Legendre, & Peres-Neto, 2006) and used them as predictors in a Redundancy Analysis (RDA) on a Hellinger transformed incidence matrix following the procedure in Legendre and Legendre (2012). We used the optimization method of Bauman et al. (2018) to select the best spatial weighting matrix with options for the B matrix being Delaunay triangulation, Gabriel’s graphs, and the minimum spanning tree and either no weights or linear weights. We tested for dispersal limitation on the basin wide dataset as well as within each sub-basin. If spatial predictors could be made the RDA was tested for significance using Monte Carlo permutation tests (Legendre, Oksanen, & Ter

Braak, 2011). We ran the null model analyses, the PERMANOVA, NMDS, and the RDA using the ‘vegan’ package in R v 4.0.3 (Oksanen et al., 2020; R Core Team, 2020). Creation of spatial variables was done using the ‘adespatial’ package (Dray et al., 2021) while probabilistic co- occurrence analyses were run using the ‘cooccur’ package (Griffith, Veech, & Marsh, 2016).

83

RESULTS

We were able to collect sequence data from nine separate species where all but one

species showed evidence of recent inter-basinal migration (Table 1, E-1, Figure 1). The model with expansion from SEV was the best fit for four species while three species showed evidence of expansion from GSL (Table 1). The only species that did not show evidence of recent migration between basins was Rhinichthys osculus, which matches results from previous research (Billman et al., 2010). Although the parameter effective sample sizes were high, many of the posterior distributions were very wide or multimodal (Figures E-1 – E-9). This is likely due to the low number of loci sampled within species.

Our database search resulted in a presence-absence matrix with 10 species distributed across 43 sites (Figure 1, Table E-2). Our community analyses indicated that communities are non-randomly structured across the Bonneville Basin with compositional patterns differing significantly between sub-basins (Table 2, PERMANOVA F value = 4.537, R2 = 0.185, p-value

< 0.001). We also found significant patterns of nestedness across species, sites, and the entire

matrix (Figure 3, species p-value = 0.014, sites p-value < 0.001, whole matrix p-value < 0.001).

Differences in community composition between basins seems to be driven by habitat differences

between desert spring systems within GSL and GSLD and mountain streams in GSL and SEV

(Figures 3, 4). Desert springs were composed of a few generalist species such as Gila atraria,

Iotichthys phlegethontis, and Rhinichthys osculus, whereas mountain streams contained more

specialist species such as Catostomus discobolus, Rhinichthys cataractae, and Lepidomeda

aliciae (Figures 3, 4). Our pairwise species co-occurrence analyses corroborated this finding,

showing mostly positive relationships between mountain stream species (Figure 5). Although we

found significant inter-basinal differences, there was still considerable overlap in species

84

composition for some communities found in different sub-basins (Figure 4). Furthermore, we

found random community structure within sub-basins, contrasting with the results of the whole basin analysis (Table 2). Lastly, we found evidence for spatial structure and dispersal limitation for the whole basin dataset (F = 5.213, p-value < 0.001, adjusted R2 = 0.474) as well as within

GSL (F = 5.666, p-value < 0.001, adjusted R2 = 0.564), but no evidence for spatial structuring in

SEV or GSLD.

DISCUSSION

Historic Migration

Our results show that many fish species in the Bonneville Basin exhibit genetic diversity

indicative of recent migration between sub-basins following colonization and divergence (Table

1). Although we were unable to date instances of secondary contact, this migration most likely

occurred during high water periods when currently isolated sub-basins were connected by large

pluvial lakes, the most recent being Lake Bonneville. As each sub-basin within the Bonneville

Basin is an independent closed system (Sack, 2002), and as the native fish are obligate

freshwater species, the most parsimonious explanation for timing of recent migrations

throughout the basin is during the high water stand of Lake Bonneville. This inference is further

supported by fine scale phylogeographic analyses which show evidence of genetic structure

related to recent isolation (Johnson & Jordan, 2000; Johnson, 2002; Mock & Miller, 2005; Mock

et al., 2006). Furthermore, fossil records throughout the Bonneville Basin, in combination with

DNA evidence, indicate that many of our studied species were present at some capacity within

the lake and that some, such as Gila atraria, were widespread (Broughton & Smith, 2016). When taken all together, our analyses and previous work suggest that a majority of the Bonneville

Basin freshwater fish taxa were able to migrate throughout the Bonneville Basin during Lake

85

Bonneville. We encourage future researchers to gather additional genetic data to further corroborate or refute this assessment.

Did History Influence Modern Local Communities?

Our results showing historical migration for many of the Bonneville Basin species suggests that Lake Bonneville homogenized the regional species pool throughout the basin.

However, we found strong levels of community structure between the closed sub-basins (Table

2, Figure 4). This structure was largely a result of the differences between species-poor communities found in the desert springs of GSL and GSLD and the species-rich communities closer to the Wasatch mountains in GSL and SEV. We found significant levels of nested structure within our presence absence matrix, which can be created by a variety of mechanisms

(Ulrich, Almeida-Neto, & Gotelli, 2009). Our analysis of pairwise species associations showed mostly positive interactions between specialist or stream adapted species, with only a few negative associations between the generalist Gila atraria and two other species (Figure 5). These analyses suggest that the nested structure was likely caused by a combination of habitat heterogeneity, selective environmental tolerances, and selective colonization and extinction

(Ulrich et al., 2009). Species-poor sites were found in relatively small and potentially harsh desert spring systems and contained generalist cyprinid species (Figure 3). Alternatively, species-rich sites were either in larger streams or ponds and contained a higher proportion of specialist species (Figure 3). In addition to the nestedness analyses, we found evidence for dispersal limitation for the whole basin as well as within GSL. Overall, these results suggest that species composition is largely controlled by spatially structured environmental heterogeneity, and that community assembly following basin isolation was governed by contemporary deterministic processes such as habitat filtering, dispersal limitation, and local extinction.

86

At first glance, our genetic results of long-distance dispersal seem to contradict our

community results showing dispersal limitation. However, closer inspection reveals unique

inferences that could not be made without both datasets. Although communities varied greatly in

composition and structure at the entire basin level, we found random patterns of species co-

occurrence within all sub-basins and no evidence of dispersal limitation in SEV and GSLD

(Table 2). These results suggest that, at least at smaller spatial scales, Lake Bonneville likely

allowed homogenization of communities within independent closed sub-basins. As water levels in the lake fell and isolated these systems from each other (as well as some sites within GSLD and GSL) communities within GSLD and SEV assembled in a manner that was no different than expected following random colonization from the regional species pool (Gotelli, 2000).

Alternatively, across basins, or within GSLD, fish that became trapped in isolated desert springs for which they were evolutionarily maladapted quickly became extinct as the climate continued to dry. This inference is further supported by fossil evidence of Prosopium species found in the

Old River Bed between SEV and GSLD (Broughton & Smith, 2016). Additionally, others have suggested that the Great Basin fish fauna experienced high levels of extinction during dry periods (Smith et al., 2002). Taken all together, we hypothesize that Lake Bonneville allowed both interbasin and intrabasin community homogenization. As the climate dried and freshwater habitats became isolated between basins, species composition in distant localities began to diversify due to dispersal limitation and intense habitat filtering, leading to local extinction of maladapted species. Alternatively, species membership in communities in the relatively homogenous habitats in SEV and GSLD likely assembled no differently than expected according to stochastic extinction/colonization and demography.

87

Conclusion

Our analyses suggest that the community assembly of freshwater fish species in the

Bonneville Basin has been governed by a complex set of interacting historic and contemporary processes. Had we not analyzed both genetic and community data, inferences regarding the assembly process would likely have been misleading. Though our community dataset pointed to contemporary ecological processes governing community assembly, our genetic dataset showed evidence of historic regional species pool homogenization, which helped inform inferences regarding community patterns within the major sub-basins. While other studies have found correlations between dispersal limitation in communities and genetic structure in populations

(e.g. Salces-Castellano et al. 2021), our results found an apparent contradiction between species dispersal and the effects of dispersal limitation on communities. This finding was due to the mismatch between historic and present climate conditions, but ultimately informed inferences regarding community assembly. Future work should evaluate if the correlation between species- specific dispersal history and community-level dispersal limitation is ubiquitous and what these correlations mean for interpreting community assembly. Furthermore, we encourage a continued effort to increase the integration of ecology, biogeography, population genetics, and evolutionary biology to further uncover the processes governing biological diversity (Vellend, 2010, 2016;

Ricklefs & Jenkins, 2011; Overcast, Emerson, & Hickerson, 2019).

ACKNOWLEDGMENTS

We would like to thank Carl Hubbs and Robert Miller for their contributions to the ecology and evolution of Bonneville Basin fish. We also want to thank the authors of the studies that have investigated the genetic diversity of Bonneville Basin fish for the sequences and previous work they have provided for this system. Specifically, we thank Paul Evans for his

88 contributions to the understanding of Bonneville Basin fish and their genetics as well as Becky

Miller and Jared Crowley for their graduate work on mountain whitefish and sculpins, respectively. We greatly appreciate the helpful comments that Alli Duffy made on the manuscript which greatly enhanced this work. Funding for this project came from the

Department of Biology and Graduate Studies at Brigham Young University.

89

REFERENCES

Almeida-Neto M, Guimarães P, Guimarães PR, Loyola RD & Ulrich W. 2008. A consistent

metric for nestedness analysis in ecological systems: reconciling concept and measurement.

Oikos 117: 1227–1239.

Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance.

Austral Ecology 26: 32–46.

Bagley JC, Sandel M, Travis J, Lozano-Vilano M de L & Johnson JB. 2013. Paleoclimatic

modeling and phylogeography of least killifish, Heterandria formosa: insights into Pleistocene

expansion-contraction dynamics and evolutionary history of North American Coastal Plain

freshwater biota. BMC Evolutionary Biology 13: 223.

Bagley JC & Johnson JB. 2014. Testing for shared biogeographic history in the lower Central

American freshwater fish assemblage using comparative phylogeography: concerted,

independent, or multiple evolutionary responses? Ecology and Evolution 4: 1686–1705.

Bauman D, Drouet T, Fortin MJ & Dray S. 2018. Optimizing the choice of a spatial weighting matrix in eigenvector-based methods. Ecology 99: 2159–2166.

Beerli P. 2006. Comparison of Bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics 22: 341–345.

Beerli P & Felsenstein J. 2001. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proceedings of the

National Academy of Sciences of the United States of America 98: 4563–4568.

Beerli P & Palczewski M. 2010. Unified framework to evaluate panmixia and migration

90

direction among multiple sampling locations. Genetics 185: 313–326.

Berry O, Tocher MD & Sarre SD. 2004. Can assignment tests measure dispersal? Molecular

Ecology 13: 551–561.

Billman EJ, Lee JB, Young DO, McKell MD, Evans RP & Shiozawa DK. 2010. Phylogenetic

divergence in a desert fish: Differentiation of Speckled Dace within the Bonneville, Lahontan,

and Upper Snake River Basins. Western North American Naturalist 70: 39–47.

Broughton JM & Smith GR. 2016. The fishes of lake bonneville: Implications for drainage

history, biogeography, and lake levels. In: Oviatt C, Shroder J, eds. Lake Bonneville: A Scientific

Update. Elsevier, 292–351.

Brown JH. 1971. Mammals on moutaintops: nonequilibrium insular biogeography. The

American Naturalist 105: 467–478.

Brown JL & Knowles LL. 2012. Spatially explicit models of dynamic histories: examination of

the genetic consequences of Pleistocene glaciation and recent climate change on the American

Pika. Molecular Ecology 21: 3757–3775.

Constable GWA & McKane AJ. 2014. Population genetics on islands connected by an

arbitrary network: An analytic approach. Journal of Theoretical Biology 358: 149–165.

Crowley JM. 2004. A phylogenetic analysis of sculpin (Teleostei: Cottidae) in the basin and

range province of western North America based on the ND4-L/ND4 region of the mitochondrial

genome.

Currey DR. 1990. Quaternary palaeolakes in the evolution of semidesert basins, with special

emphasis on Lake Bonneville and the Great Basin, U.S.A. Palaeogeography, Palaeoclimatology,

91

Palaeoecology 76: 189–214.

Darriba D, Posada D, Kozlov AM, Stamatakis A, Morel B & Flouri T. 2020. ModelTest-NG:

A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models. Molecular

Biology and Evolution 37: 291–294.

Dray S, Bauman D, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque

G, Legendre P, Madi N & Wagner HH. 2021. adespatial: Multivariate Multiscale Spatial

Analysis.

Dray S, Legendre P & Peres-Neto PR. 2006. Spatial modelling: a comprehensive framework

for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling 196:

483–493.

Eldon J, Price JP, Magnacca K & Price DK. 2013. Patterns and processes in complex

landscapes: testing alternative biogeographical hypotheses through integrated analysis of

phylogeography and community ecology in Hawai’i. Molecular Ecology 22: 3613–3628.

Felsenstein J. 2006. Accuracy of Coalescent Likelihood Estimates: Do We Need More Sites,

More Sequences, or More Loci? Molecular Biology and Evolution 23: 691–700.

Fukami T. 2015. Historical Contingency in Community Assembly: Integrating Niches, Species

Pools, and Priority Effects. Annual Review of Ecology, Evolution, and Systematics 46: 1–23.

Gotelli NJ. 2000. Null model analysis of species co-occurrence patterns. Ecology 81: 2606–

2621.

Gotelli NJ & Ulrich W. 2010. The empirical Bayes approach as a tool to identify non-random

species associations. Oecologia 162: 463–477.

92

Grainger TN & Gilbert B. 2016. Dispersal and diversity in experimental metacommunities:

linking theory and practice. Oikos 125.

Grayson DK. 2011. The Great Basin: A Natural Prehistory. Berkeley: University of California

Press.

Griffith DM, Veech JA & Marsh CJ. 2016. cooccur : Probabilistic Species Co-Occurrence

Analysis in R. Journal of Statistical Software 69: 1–17.

Grman E & Suding KN. 2010. Within-Year Soil Legacies Contribute to Strong Priority Effects

of Exotics on Native California Grassland Communities. Restoration Ecology 18: 664–670.

Heino J, Melo AS, Siqueira T, Soininen J, Valanko S & Bini LM. 2015. Metacommunity

organisation, spatial extent and dispersal in aquatic systems: Patterns, processes and prospects.

Freshwater Biology 60: 845–869.

Hey J & Nielsen R. 2004. Multilocus Methods for Estimating Population Sizes, Migration Rates

and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D.

persimilis. Genetics 167: 747–760.

Hickerson MJ, Carstens BC, Cavender-Bares J, Crandall KA, Graham CH, Johnson JB,

Rissler LJ, Victoriano PF & Yoder AD. 2010. Phylogeography’s past, present, and future: 10

years after Avise, 2000. Molecular Phylogenetics and Evolution 54: 291–301.

Hickerson MJ & Meyer CP. 2008. Testing comparative phylogeographic models of marine

vicariance and dispersal using a hierarchical Bayesian approach. BMC Evolutionary Biology 8:

322.

Houston DD, Shiozawa DK, Smith BT & Riddle BR. 2014. Investigating the effects of

93

Pleistocene events on genetic divergence within Richardsonius balteatus, a widely distributed

western North American minnow. Bmc Evolutionary Biology 14: 111.

Houston DD, Shiozawa DK & Riddle BR. 2010. Phylogenetic relationships of the western

North American cyprinid genus Richardsonius, with an overview of phylogeographic structure.

Molecular Phylogenetics and Evolution 55: 259–273.

Hubbs CL & Miller RR. 1948. The Zoological Evidence: Correlation between Fish Distribution

and Hydrographic History in the Desert Basins of Western United States. In: The Great Basin:

With Emphasis on Glacial and Postglacial Times. Salt Lake City, UT: University of Utah, 18–

166.

Johnson JB. 2002. Evolution after the flood: Phylogeography of the desert fish Utah chub.

Evolution 56: 948–960.

Johnson JB & Jordan S. 2000. Phylogenetic divergence in leatherside chub (Gila copei) inferred from mitochondrial cytochrome b sequences. Molecular Ecology 9: 1029–1035.

Jones CP & Johnson JB. 2009. Phylogeography of the livebearer Xenophallus umbratilis

(Teleostei: Poeciliidae): glacial cycles and sea level change predict diversification of a

freshwater tropical fish. Molecular Ecology 18: 1640–1653.

Katoh K & Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7:

Improvements in Performance and Usability. Molecular Biology and Evolution 30: 772–780.

Kingman JFC. 1982a. On the genealogy of large populations. Journal of Applied Probability

19: 27–43.

Kingman JFC. 1982b. The coalescent. Stochastic Processes and their Applications 13: 235–

94

248.

Kruskal JB. 1964. Nonmetric multidimensional scaling: a numerical method. Psychometrika 29:

115–129.

Kumar S, Stecher G, Li M, Knyaz C & Tamura K. 2018. MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35: 1547–1549.

Legendre P & Legendre L. 2012. Numerical Ecology. Elsevier.

Legendre P, Oksanen J & Ter Braak CJF. 2011. Testing the significance of canonical axes in

redundancy analysis. Methods in Ecology and Evolution 2: 269–277.

Leibold MA, Holyoak M, Mouquet N, Amarasekare P, Chase JM, Hoopes MF, Holt RD,

Shurin JB, Law R, Tilman D, Loreau M & Gonzalez A. 2004. The metacommunity concept:

A framework for multi-scale community ecology. Ecology Letters 7: 601–613.

Leibold MA. 2009. Spatial and metacommunity dynamics in biodiversity. In: Levin SA,

Godfray HCJ, Kinzig AP, Loreau M, Losos JB, Walker B, Wilcove DS, eds. The Princeton

Guide to Ecology. Princeton, NJ: Princeton University Press, 312–319.

Leibold MA & Chase JM. 2018. Metacommunity Ecology. Princeton: Princeton University

Press.

Long SP, Link PK, Janecke SU, Perkins ME & Fanning CM. 2006. Multiple phases of

Tertiary extension and synextensional deposition of the Miocene-Pliocene Salt Lake Formation

in an evolving supradetachment basin, Malad Range, Southeast Idaho, U.S.A. Rocky Mountain

Geology 41: 1–27.

Marko PB & Hart MW. 2011. The complex analytical landscape of gene flow inference.

95

Trends in Ecology and Evolution 26: 448–456.

Miller BA. 2006. The Phylogeography of Prosopium in Western North America.

Miller DM. 2016. The Provo Shoreline of Lake Bonneville. In: Oviatt CG, Schroder Jr. JF, eds.

Lake Bonneville: A Scientific Update. Amsterdam: Elsevier, 127–144.

Miller RR & Shor EN. 1997. Carl L. Hubbs (1894-1979): Collection builder extraordinaire. In:

Pietsch TW, Anderson WD, eds. Collection Building in Ichthyology and Herpetology. Lawrence,

KS: American Society of Ichthyologists and Herpetologists, Allen Press, 367–376.

Minchin PR. 1987. An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69: 89–107.

Mock KE, Evans RP, Crawford M, Cardall BL, Janecke SU & Miller M. 2006. Rangewide molecular structuring in the Utah sucker (Catostomus ardens). Molecular Ecology 15: 2223–

2238.

Mock KE & Bjerregaard LS. 2007. Genetic analysis of a recently discovered population of the

Least Chub (Iotichthys phlegethontis). Western North American Naturalist 67: 142–146.

Mock KE & Miller MP. 2005. Patterns of Molecular Diversity in Naturally Occurring and

Refugial Populations of the Least Chub. Transactions of the American Fisheries Society 134:

267–278.

Nielsen R & Slatkin M. 2013. An Introduction to Population Genetics. Sunderland, MA:

Sinauer Associates.

Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR,

O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Szoecs E & Wagner H. 2020. vegan:

96

Community Ecology Package.

Overcast I, Emerson BC & Hickerson MJ. 2019. An integrated model of population genetics and community ecology. Journal of Biogeography 46: 816–829.

Oviatt CG. 1987. Probable late Cenozoic capture of the Sevier River into the Sevier Desert

Basin, Utah. In: Kopp RS, Cohenour RE, eds. Cenozoic Geology of Western Utah. Provo, UT:

Brigham Young University Press, 265–270.

Oviatt CG. 1997. Lake Bonneville fluctuations and global climate change. GEOLOGY 25: 155–

158.

Oviatt CG. 2015. Chronology of Lake Bonneville, 30,000 to 10,000 yr B.P. Quaternary Science

Reviews 110: 166–171.

Oviatt CG & Currey DR. 1987. Pre-Bonneville Quaternary lakes in the Bonneville Basin,

Utah. In: Kopp RS, Cohenour RE, eds. Cenozoic Geology of Western Utah. Provo, UT: Brigham

Young University Press, 257–264.

Oviatt CG, Currey DR & Sack D. 1992. Radiocarbon chronology of Lake Bonneville, Eastern

Great Basin, USA. Palaeogeography, Palaeoclimatology, Palaeoecology 99: 225–241.

Oviatt CG & Shroder Jr. JF (Eds.). 2016. Lake Bonneville: A Scientific Update. Amsterdam:

Elsevier.

Patterson BD & Atmar W. 1986. Nested subsets and the structure of insular mammalian faunas and archipelagos. Biological Journal of the Linnean Society 28: 65–82.

Pinho C & Hey J. 2010. Divergence with Gene Flow: Models and Data. Annual Review of

Ecology Evolution and Systematics 41: 215–230.

97

R Core Team. 2020. R: A Language and Environment for Statistical Computing.

Reheis MC, Adams KD, Oviatt CG & Bacon SN. 2014. Pluvial lakes in the Great Basin of the

western United States-a view from the outcrop. Quaternary Science Reviews 97: 33–57.

Ricklefs RE. 1987. Community diversity: relative roles of local and regional processes. Science

235: 167–171.

Ricklefs RE. 2004. A comprehensive framework for global patterns in biodiversity. Ecology

Letters 7: 1–15.

Ricklefs RE & Jenkins DG. 2011. Biogeography and ecology: towards the integration of two

disciplines. Philosophical Transactions of the Royal Society B: Biological Sciences 366: 2438–

2448.

Ricklefs RE & Schluter D (Eds.). 1993. Species Diversity in Ecological Communities:

Historical and Geographical Perspectives. Chicago, IL: University of Chicago Press.

Rubinoff D & Holland BS. 2005. Between two extremes: mitochondrial DNA is neither the panacea nor the nemesis of phylogenetic and taxonomic inference. Systematic biology 54: 952–

961.

Sack D. 2002. Fluvial linkages in Lake Bonneville subbasin integration. In: Hershler R, Madsen

DB, Currey DR, eds. Great Basin Aquatic Systems History. Washington D.C.: Smithsonian

Institution Press, 129–144.

Salces-Castellano A, Andújar C, López H, Pérez-Delgado AJ, Arribas P & Emerson BC.

2021. Flightlessness in insects enhances diversification and determines assemblage structure across whole communities. Proceedings of the Royal Society B: Biological Sciences 288:

98

20202646.

Seehausen O. 2007. Evolution and ecological theory: Chance, historical contingency and ecological determinism jointly determine the rate of adaptive radiation. Heredity 99: 361–363.

Slatkin M. 1985. Gene flow in natural populations. Annual Review of Ecology and Systematics

16: 393–430.

Smith GR, Dowling TE, Gobalet KW, Lugaski T, Shiozawa DK & Evans RP. 2002.

Biogeography and timing of evolutionary events among Great Basin fishes. In: Hershler R,

Madsen DB, Currey DR, eds. Great Basin Aquatic Systems History. Washington D.C.:

Smithsonian Institution Press, 175–234.

Smith GR, Chow J, Unmack PJ, Markle DF & Dowling TE. 2017. Fishes of the Mio-

Pliocene Western Snake River Plain and Vicinity.

Smith GR & Dowling TE. 2008. Correlating hydrographic events and divergence times of

speckled dace (Rhinichthys: Teleostei: Cyprinidae) in the Colorado River drainage. In: Late

Cenozoic Drainage History of the Southwestern Great Basin and Lower Colorado River Region:

Geologic and Biotic Perspectives: Geological Society of America., 301–317.

Souza MS, Thomaz AT & Fagundes NJR. 2020. River capture or ancestral polymorphism: an

empirical genetic test in a freshwater fish using approximate Bayesian computation. Biological

Journal of the Linnean Society.

Stone L & Roberts A. 1990. The Checkerboard Score and Species Distributions. Oecologia 85:

74–79.

Strona G, Nappo D, Boccacci F, Fattorini S & San-Miguel-Ayanz J. 2014. A fast and

99

unbiased procedure to randomize ecological binary matrices with fixed row and column totals.

Nature Communications 5.

Taylor DW & Bright RC. 1987. Drainage History of the Bonneville Basin. In: Kopp RS,

Cohenour RE, eds. Cenozoic Geology of Western Utah. Provo, UT: Brigham Young University

Press, 239–256.

Ulrich W, Almeida-Neto M & Gotelli NJ. 2009. A consumer’s guide to nestedness analysis.

Oikos 118: 3–17.

Unmack PJ, Dowling TE, Laitinen NJ, Secor CL, Mayden RL, Shiozawa DK & Smith GR.

2014. Influence of Introgression and Geological Processes on Phylogenetic Relationships of

Western North American Mountain Suckers (Pantosteus, Catostomidae). Plos One 9: e90061.

Veech JA. 2013. A probabilistic model for analysing species co-occurrence (P Peres-Neto, Ed.).

Global Ecology and Biogeography 22: 252–260.

Veech JA. 2014. The pairwise approach to analysing species co-occurrence (M Araújo, Ed.).

Journal of Biogeography 41: 1029–1035.

Vellend M. 2010. Conceptual synthesis of community ecology. The Quarterly Review of Biology

85: 183–206.

Vellend M. 2016. The Theory of Ecological Communities. Princeton, NJ: Princeton University

Press.

Wilson GA & Rannala B. 2003. Bayesian inference of recent migration rates using multilocus

genotypes. Genetics 163: 1177–1191.

Wright S. 1931. Evolution in mendelian populations. Genetics 16: 97–159.

100

Table 1 Results of model selection using Bayes Factors to distinguish between different demographic history models. Letters in parentheses in the “Model” column correspond to the model panes in Figure 2. Bolded models are the ones with the highest probability. LmL = Log(maximum likelihood), LBF = Log(Bayes Factors) Species Model LmL LBF Probability Catostomus ardens GSL  SEV Migration (C) -1000.13 -1.24 0.224 GSL  SEV No Migration (A) -1016.96 -18.07 0.000 SEV  GSL Migration (D) -998.89 0 0.776 SEV  GSL No Migration (B) -1014.84 -15.95 0.000 Catostomus platyrhynchus GSL  SEV Migration (C) -12745.2 -0.27 0.433 GSL  SEV No Migration (A) -12817.1 -72.14 0.000 SEV  GSL Migration (D) -12744.9 0 0.567 SEV  GSL No Migration (B) -12816.2 -71.28 0.000

Cottus bairdii GSL  SEV Migration (C) -1158.38 0 0.754 GSL  SEV No Migration (A) -1238.51 -80.13 0.000 SEV  GSL Migration (D) -1159.5 -1.12 0.246 SEV  GSL No Migration (B) -1254.3 -95.92 0.000 Gila atraria GSL  SEV Migration (G) -1517.42 0 1.000 GSL  SEV No Migration (E) -1616.92 -99.5 0.000 SEV  GSL Migration (H) -1549.32 -31.9 0.000 SEV  GSL No Migration (F) -1619.18 - 0.000 101.76 Iotichthys phlegethontis GSL  SEV Migration (G) -1721.29 0 0.608 GSL  SEV No Migration (E) -1749.51 -28.22 0.000 SEV  GSL Migration (H) -1721.73 -0.44 0.392 SEV  GSL No Migration (F) -1750.6 -29.31 0.000 Lepidomeda aliciae GSL  SEV Migration (C) -3615.16 -1.23 0.226 GSL  SEV No Migration (A) -3652.38 -38.45 0.000 SEV  GSL Migration (D) -3613.93 0 0.774 SEV  GSL No Migration (B) -3650.83 -36.9 0.000 Prosopium williamsoni Migration -3224.72 0 1.000

No Migration -4926.94 - 0.000 1702.2 2

101

Rhinichthys osculus GSL  SEV Migration (G) -4315.15 -22.69 0.000 GSL  SEV No Migration (E) -4303.02 -10.56 0.000 SEV  GSL Migration (H) -4314.75 -22.29 0.000 SEV  GSL No Migration (F) -4292.46 0 1.000 Richardsonius balteatus GSL  SEV Migration (G) -3394.5 -0.26 0.435 GSL  SEV No Migration (E) -3446.17 -51.93 0.000 SEV  GSL Migration (H) -3394.24 0 0.565 SEV  GSL No Migration (F) -3457.48 -63.24 0.000

102

Table 2 Results of matrix-wide co-occurrence null model analyses. Bold p-values are statistically significant. Basin Number of Number of Obs C- Mean C- SES p-value Sites Species Score Score All Basins 43 10 0.437 0.347 2.575 0.023 GSL 19 9 0.321 0.324 -0.151 0.928 GSLD 11 3 0.667 0.717 -0.835 0.278 SEV 13 7 0.052 0.044 1.438 0.180

103

Figure 1 Map of Lake Bonneville and localities for both the community and DNA data used in this study. The Light grey area outlines the extent of the Bonneville Basin while the darker polygons show the extent of Lake Bonneville during the Bonneville shoreline phase colored according to the three major sub-basins connected during pluvial inundation. A) Localities sampled by Carl Hubbs and colleagues between 1915 and 1950 and used in the community analyses. B) Localities for species used in the genetic analyses, with species differentiated according to shape. The shapefile for Lake Bonneville was modified from the one available at https://gis.utah.gov/data/water/historic-lake-bonneville.

104

Figure 2 Between basin hypotheses of migration and splitting used in the Migrate-n hypotheses. A) Sevier Basin derived from Great Salt Lake Basin with no subsequent migration, B) Great Salt Lake Basin derived from Sevier Basin with no subsequent migration, C) Sevier Basin derived from Great Salt Lake Basin with subsequent migration, D) Great Salt Lake Basin derived from Sevier Basin with subsequent migration, E) Sevier Basin derived from Great Salt Lake Desert Basin after its derivation from the Great Salt Lake Basin with no subsequent migration, F) Great Salt Lake Basin derived from the Great Salt Lake Desert Basin after its derivation from the Sevier Basin with no subsequent migration, G) Sevier Basin derived from Great Salt Lake Desert Basin after its derivation from the Great Salt Lake Basin with subsequent migration, H) Great Salt Lake Basin derived from the Great Salt Lake Desert Basin after its derivation from the Sevier Basin with subsequent migration.

105

Figure 3 Observed presence absence matrix of Carl Hubbs collections ordered to show patterns of nestedness. Note how the communities at the top of the matrix are predominately from the Sevier Basin (SEV) and the Great Salt Lake Basin (GSL) while those at the bottom of the matrix are predominately from GSL and GSLD.

106

Figure 4 NMDS biplot of species and site scores in the Bonneville Basin. Sites are colored according to basin affinity. The stress for the NMDS plot equaled 0.097.

107

Figure 5 Heatmap showing the significant species associations between species pairs in the Bonneville Basin.

108

CHAPTER FOUR

ADAPTED POPULATION GENETICS MODELS WITH APPROXIMATE BAYESIAN

COMPUTATION INFORM PROCESSES CONTROLLING METACOMMUNITIES

Trevor J. Williams1, Jerald B. Johnson1,2

1 Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo UT, 84602

2 Monte L. Bean Life Science Museum, Brigham Young University, Provo UT, 84602

Formatted in the style of Ecology

ABSTRACT

Community and metacommunity ecology are highly complex and contingent fields where the interpretation of process from pattern is often uncertain. Current methods of analyzing empirical metacommunity datasets, such as variation partitioning and elements of metacommunity structure null models, have shown unique patterns but are unable to provide robust inferences of underlying processes. We developed a simulation model adapted from the population genetics

Moran model that can simulate metacommunity dynamics by explicitly outlining the effects of selection, dispersal, and ecological drift on metacommunity structure. We show that this model recreates expected patterns when evaluated using variation partitioning and elements of metacommunity structure and then use it in Approximate Bayesian Computation (ABC) analyses to select metacommunity models that best fit observed data. ABC analyses were able to accurately assign observed data to appropriate models for both simulated datasets with known parameter values and real-world datasets where metacommunity dynamics have been previously researched. ABC methods have been used extensively for inference in population genetics,

109 phylogeography, and other fields. Our results indicate that it can also be used in a metacommunity context to infer process from pattern.

INTRODUCTION

A fundamental challenge in biological research is the process-pattern conundrum; in many disciplines it is difficult to ascertain underlying process from observed patterns (Endler

1982, Leibold and Mikkelson 2002, Csilléry et al. 2010, Mcgill 2010, He et al. 2013). This is especially true within community ecology, where many processes can create very similar patterns (Chase et al. 2005, Vellend et al. 2014). For example, a pattern of segregated species co- occurrences could be explained by several alternative processes, such as habitat filtering, competitive interactions, or differing evolutionary histories (Gotelli and McCabe 2002). This problem, in connection with the inherent complexity of communities, has caused some to comment that a holistic theory of community ecology is non-existent due to the highly contingent aspects of community diversity (Lawton 1999).

Despite these challenges, a recent synthesis has addressed this process-pattern conundrum by positing that all patterns of community diversity are governed by four high level processes: selection, ecological drift, dispersal, and speciation (Vellend 2010, 2016). This synthesis draws heavily from theoretical population genetics, similar to the way that Neutral

Theory was influenced by neutral models of evolution (Bell 2001, Hubbell 2001).

Metacommunity theory has adopted this framework by showing how the importance of these four processes differ across the major metacommunity archetypes: Species Sorting (SS), Patch

Dynamics (PD), Mass Effects (ME), and Neutral Theory (NT) (Leibold and Chase 2018).

Selection, specifically niche selection across heterogenous environments, is important in SS and

ME (Loreau and Mouquet 1999, Chase and Leibold 2003), whereas NT and PD are driven

110 largely by ecological drift and dispersal limitation (Levins and Culver 1971, Hubbell 2001,

Leibold et al. 2004). Despite this theoretical framework, it is still unclear how ecologists can quantify the processes underlying metacommunity structure by looking at observed metacommunity patterns (Leibold and Chase 2018). Community ecology needs methods that can explicitly link processes to observed patterns of community structure to better understand how communities assemble.

As community ecology is inherently complex and contains multiple sources of stochasticity (e.g. ecological drift, demographic stochasticity, etc.), one tool that ecologists have used to investigate community assembly is stochastic simulation models, which draw samples from simulations of complex stochastic processes (Grimm and Railsback 2005, Hartig et al.

2011). Such simulation techniques have been used to model community dynamics where the four major processes governing community structure are explicitly incorporated allowing ecologists to explore how patterns vary under different parameterizations. For example, Ruokolainen et al.

(2009) used stochastic simulation modelling to investigate patterns between niche and neutral dynamics and Fournier et al. (2017) used a similar approach to model communities under differing metacommunity scenarios. However, linking stochastic simulation models with empirical data is challenging since computing the likelihoods for these complex models is usually intractable (Hartig et al. 2011, Sisson et al. 2019a). Fortunately, a technique developed originally in population genetics called Approximate Bayesian Computation (ABC) can approximate likelihoods by comparing simulated summary statistics to those observed in empirical data (Beaumont 2010, Sisson et al. 2019b). Within population genetics, this technique has allowed researchers to untangle the relative importance of selective and demographic processes on genetic diversity as well as determine which models best fit genetic data (Csilléry et

111

al. 2010). Furthermore, it is couched in a robust statistical framework which allows evaluation of

uncertainty and quantification of underlying processes (Knowles and Maddison 2002, Knowles

2009). Despite its versatility, ABC has not been widely used in ecological research (but see May

et al. 2013, Fasiolo and Wood 2019, Pontarp et al. 2019) even though it has the potential to

explicitly link differences of the importance of the four high-level processes to observed

community patterns.

We propose a new method that will allow ecologists to link metacommunity patterns with

underlying processes by incorporating the ABC framework used in population genetics and

phylogeography. First, we demonstrate how a population genetics stochastic simulation model,

the Moran model (Moran 1958), can be adapted to simulate metacommunity dynamics (Vellend

2016). We then evaluate our model using variation partitioning (Borcard et al. 1992, Legendre et

al. 2005) and elements of metacommunity structure null models (Leibold and Mikkelson 2002,

Presley et al. 2010) by showing how it matches theoretical and empirical predictions and reiterate how different processes can produce similar patterns. Finally, we show how these simulation models can be incorporated into an ABC framework to quantify the importance of dispersal and/or selection on metacommunity structure. By using this approach, ecologists will be able to evaluate how differences in the importance of selection, dispersal, and ecological drift create observed patterns in empirical datasets.

METHODS

Description of the Model

We built upon the adapted Moran models in Vellend (2016) to produce simulations for N

species across n communities allowing for the input of any arbitrary migration matrix. Our

112

simulations followed the general procedure of a multi-allelic haploid Moran model with

migration and selection using a stochastic birth-death algorithm (Figure G-1 adapted from

Constable and McKane 2014; see Appendix F for model details). In short, an individual is randomly chosen to give birth to a single offspring which then replaces a randomly chosen individual either within its local birth community or a connected community through dispersal.

This algorithm is continued until t number of “generations” or time-steps have occurred. A generation is considered J birth-death sequencies where J is the number of individuals within the metacommunity.

Validation of the Model

To test the validity of the model and its ability to produce community patterns as predicted by metacommunity theory we simulated 800 pseudo-communities (pseudo-observed

presence absence matrices) using a factorial design where each treatment consisted of a unique

combination of selection (niche vs neutral), dispersal limitation (dispersal limited vs dispersal

sufficient) and spatial landscape (random vs autocorrelated) (Table G-1 see Appendix F for

treatment and simulation details). We ran 100 simulations for each treatment using

metacommunities with 30 species in 20 communities, each containing 500 individuals. Starting

metacommunities were randomly generated with the constraint that each community must start

with at least two species present. Each simulation was run for 50 time steps. To test the effect of

time we also sampled the metacommunity at 10 and 25 time steps. All further analyses were then

conducted on the simulated presence absence matrices output from each run.

We conducted two different tests using the simulations described above. In the first test,

we used variation partitioning to determine the importance of environmental vs. spatial factors in

explaining community composition (Cottenie 2005, Peres-neto et al. 2006, see Appendix F for

113 analysis details). Theory predicts that communities following the species sorting archetype should have a greater proportion of variance explained by the environment, since species will track environments for which they are selected, whereas neutral models should have a larger spatial proportion since dispersal limitation will enhance community diversity with increasing spatial distance (Cottenie 2005, Soininen 2014, Leibold and Chase 2018). To evaluate whether our simulations produced this expected pattern we ran three separate linear mixed models with the proportion of variance explained by the environment conditioned on space, space conditioned on environment, and the overlap between environment and space as the response variables. Each mixed model included model type, dispersal limitation type, landscape type, and run length (i.e.

10, 25, 50 time steps), and all second order and third order interactions with model type as fixed factors and simulation run as a random factor.

Second, we analyzed the metacommunity structure exhibited in each treatment using null models on coherence, turnover, and boundary clumping (Leibold and Mikkelson 2002, Presley et al. 2010). This method explicitly evaluates species distribution patterns in binary presence absence matrices by classifying each matrix into one of fourteen metacommunity structures

(Leibold and Mikkelson 2002, Presley et al. 2010). We ran a Fisher’s exact test with post-hoc comparisons using Bonferroni corrections on the counts of each metacommunity type from each treatment to see if the different treatments created different counts of metacommunity structures.

Approximate Bayesian Computation with the Model

To assess if ABC could be used in conjunction with our methods, we ran ABC model selection analyses on our simulated pseudo-communities from our validation step as well as on two real-world datasets: 1) the Barro Colorado Island Tree dataset (hereafter referred to as BCI;

Condit et al. 2002) and 2) freshwater fish communities from the White and Savanna water river

114

channels in Bolivia (hereafter referred to as Wsr; Yunoki and Velasco 2016). The BCI dataset

was used extensively in the development of Neutral theory (Hubbell 2001) while results from

Yunoki and Velasco (2016) suggest that the Wsr communities are structured mainly by SS dynamics. For each dataset, except for BCI for which we only ran 1000 simulations due to the large size of the metacommunity, we ran 5000 simulations for four separate models: 1) Niche with dispersal limitation (hereafter referred to as SSDL), 2) Niche with dispersal sufficiency (SS sensu stricto, hereafter referred to as SSNDL), 3) Neutral with dispersal limitation (NT sensu stricto, hereafter referred to as NTDL), and 4) Neutral with dispersal sufficiency (hereafter referred to as NTNDL, see Appendix F for priors, parameters, and summary statistics used). We chose these four models because they are built around differences in the two most widely discussed high-level processes: selection (e.g. niche vs. neutral) and dispersal (Cottenie 2005,

Peres-neto et al. 2006). The importance of selection in driving community structure has been a topic of debate since the inception of Neutral Theory however most ecologists now agree that these processes work in tandem (Gravel et al. 2006, Vellend et al. 2014). By including both sensu stricto models for niche and Neutral Theory as well as two models that relax the assumptions of the amounts of dispersal limitation within these paradigms, we can test to see if our results agree with and build upon former research. We then used two methods of model selection to select which model best fit the observed datasets: 1) Approximate Bayesian

Computation with Random Forests (ABC RF; Pudlo et al. 2016, Estoup et al. 2018, Marin et al.

2019), and 2) traditional ABC model selection (Grelaud et al. 2009, Toni and Stumpf 2010,

Sunnåker et al. 2013). ABC RF is a reliable method for model selection with several improvements over typical ABC model choice analyses (Pudlo et al. 2016), however we ran both types of analyses to compare results. For ABC RF analyses, we created model predictions using

115

a forest of 1000 trees and approximated the posterior using 1000 trees. For pseudo-communities

we also ran additional simulations using three different prior distributions (uniform, normal, and

gamma) for selection coefficients in niche models in the pseudo-community dataset to see if the

priors altered model selection results (Appendix F). We ran model selection analyses between

neutral models and niche models separately for each prior distribution type as well as within

neutral and niche models to assess how well ABC could distinguish between dispersal limited

and sufficient models and if the prior distribution affected selection outcomes. We assessed the

effectiveness of ABC by analyzing out-of-bag error rate, creating a confusion matrix for model selection of pseudo-communities, and calculating the mean and standard deviation of posterior probabilities for the model selected in ABC RF. We note that the selection of summary statistics can have drastic impacts on the results of ABC analyses (Robert et al. 2011, Marin et al. 2019,

Prangle 2019), which was also the case in our analyses when using insufficient or uninformative statistics (data not shown). However, our results were consistent using the set of summary statistics we chose. These statistics are typically used in ecological analyses to measure diversity and seem to be informative in delimiting between niche and neutral models in an ABC context

(Appendix F).

RESULTS

Variation partitioning results agreed with metacommunity theory; In general, niche

models had a higher proportion of beta-diversity explained by environmental variables whereas

neutral models had a higher proportion explained by spatial variables (Figure 1, Cottenie 2005,

Soininen 2014, Leibold and Chase 2018). All effects were significant within our mixed model

analyses except run length when the proportion explained by the environment conditioned on

space was the response (Table G-2). The amount of variation explained solely by the

116 environmental variables remained relatively stable through time for all models, whereas the proportions explained by space alone and the correlation between space and environment grew overtime for neutral models and dispersal sufficient models respectively (Figure 1).

Our elements of metacommunity structure analyses showed significant differences in the counts of metacommunity structure types between the different models, especially when sampling the metacommunity at low and medium numbers of time-steps (Table G-3). However, at the latest time-step, several of the niche and neutral models could not be distinguished from each other, resulting in very similar counts of metacommunity structure types (Table G-3), indicating that different metacommunity archetypes can produce very similar patterns of coherence, turnover, and boundary clumping.

ABC analyses on pseudo-communities showed high accuracy for selecting the correct model (Table 1). Niche and neutral models were rarely misidentified as the other; most misclassifications were between dispersal limited and dispersal sufficient models within niche or neutral models. Prior error rates were low for all analyses except in the comparison between models using different prior distributions. However, this can be attributed to the fact that the separate niche models with differing prior distributions produced very similar results (Figure G-3

Results did not vary depending upon the prior used for selection coefficients; gamma and normal priors had lower prior error rates and higher accuracy than uniform priors (Table 1). Model selection analyses on the BCI dataset selected the NTNDL as the best fit model whereas the Wsr dataset selected the SSDL as the best fit model (Tables 2 and 3), matching predictions from previous research regarding the importance of selection. Model selection results did not differ between ABC RF analyses and typical model selection analyses or for differing run lengths

(Tables 1-3).

117

DISCUSSION

Model Validation

Metacommunity ecology arose as a need to explain how spatial dynamics and the interactions between local and regional scale processes influence community composition and structure (Leibold et al. 2004, Holyoak et al. 2005). Empirical work on spatial patterns has largely focused on disentangling the effects of niche and neutral processes or uncovering the importance of dispersal limitation on metacommunity structure by fitting species abundance distributions (SADs) to model predictions or through the use of null models and variation partitioning (Presley et al. 2010, Logue et al. 2011, Leibold and Chase 2018). Variation partitioning has been an exceptionally popular tool in this endeavor, where variation in community composition (as measured by β-diversity) is partitioned into components explained by the environment and space (Borcard et al. 1992, Legendre et al. 2005, Peres-neto et al. 2006,

Logue et al. 2011). However, simulation studies have cast doubt on this easily delineated interpretation by showing that dispersal and landscape type can alter multiple proportions

(Gilbert and Bennett 2010, Smith and Lundholm 2010, Fournier et al. 2017). Our simulations indicated that the proportions explained by environmental heterogeneity and space vary according to theoretical predictions. Our sensu stricto NT (models A and B) and SS (models G and H) simulations showed that NT models had high levels of variation explained by space whereas SS models had high levels explained by the environment (Figure 1). We also found that differences in dispersal can alter the amount of variation explained by the environment and that spatial landscape also plays an important role in determining variation partitioning outcomes, consistent with other simulation studies (Figure 1; Smith and Lundholm 2010, Fournier et al.

2017). As such, our model validates the findings of previous research and adds to a growing

118

body of literature calling to move beyond the four metacommunity archetypes to analyze

metacommunity structure along a continuum of various processes and assumptions (Brown et al.

2017, Thompson et al. 2020).

Our results further highlight “that multiple processes can generate patterns that are very

similar” (Leibold and Chase 2018, see also Chase 2005, McGill 2010). Our elements of

metacommunity structure analyses showed that niche and neutral models resulted in similar

counts of structure types after running simulations for 50 time steps (Table G-3). Where niche models reached equilibrium rather quickly, neutral models took more time to reach more stable dynamics, which likely explained why greater differences in EMS types were seen in earlier time steps. Likewise, although our mixed model analyses on variation partitioning results showed significant differences between model types (Table G-2), very similar proportions were produced by different models (Figure 1). These results support recent suggestions that variation partitioning provides weak inference for underlying processes as variation proportions can be influenced by multiple dynamics (Brown et al. 2017). Though both these methods are widely used, clearly new methods are needed to accurately quantify the effects of the various processes governing metacommunity structure and pattern observed in empirical datasets.

ABC in Community Ecology

The links between population genetics and community ecology – including metacommunity ecology – have been discussed since Caswell's (1976) seminal paper on neutral community structure. However, the use and adaptation of population genetics models and theory in community ecology did not start in earnest until the development of the Unified Neutral

Theory of Biodiversity (Bell 2001, Hubbell 2001). In fact, early in the development of metacommunity theory Chase et al. (2005) suggested that population genetics models other than

119

strict Neutral Theory could be used as analogs for metacommunity ecology. These ideas were

conceptualized as a unified theory of community ecology by Vellend (2010, 2016), who showed

that ecological phenomena could be simulated using models such as the Moran model. Despite

these theoretical underpinnings, little empirical work has investigated how population genetics

techniques might inform metacommunity structure (but see Finn et al. 2011, García-Girón et al.

2019). Our results show that ABC analyses can effectively select the appropriate

metacommunity model for both simulated and real-world datasets. This adds to the growing literature indicating that ABC can be an effective tool for ecological applications (May et al.

2013, Fasiolo and Wood 2019, Pontarp et al. 2019) and, to our knowledge, is the first time ABC has been used to assess the importance of species equivalence and dispersal limitation for metacommunity structure. The methods used herein also add to the number of frameworks which can model across the continuum of metacommunity theory (Shoemaker and Melbourne 2016,

Fournier et al. 2017). Depending on parameterization, researchers using our model can simulate metacommunities under any of the four metacommunity paradigms as well as anywhere along metacommunity axes (e.g. Figure 1 of Logue et al. 2011). Though we focused on the differences between NT and SS dynamics in this paper we stress additional insight will be gained when viewing metacommunities as falling along a continuum instead of lumped into discrete

classifications (Brown et al. 2017, Leibold and Chase 2018). This is shown in our results by

comparing NT and SS sensu stricto models with models that relaxed assumptions about the

importance of dispersal limitation. Furthermore, our model selection approach indicated that the

observed data was best explained by models not considered sensu stricto (Tables 2, 3, Figure 2).

Our results show that population genetics models adapted for simulating

metacommunities can effectively model metacommunity dynamics and can be used in

120

mechanistic and statistically based model selection analyses through Approximate Bayesian

Computation to accurately quantify the importance of various processes such as selection and

dispersal in shaping metacommunity structure. Furthermore, our framework only requires a

species x site abundance or incidence matrix as well the spatial arrangement of sites as inputs

(though model fitting will likely be necessary). ABC analyses can overcome the weak inference

problem of null model analyses and variation partitioning by explicitly modelling the underlying

processes creating the observed patterns (Gotelli et al. 2009, Pontarp et al. 2019). Though we

focused on model selection in this paper, ABC methods were originally developed for parameter

estimation (Beaumont et al. 2002) and can be used for such analyses in ecology as well (May et

al. 2013, Pontarp et al. 2019). Future work should elucidate the effectiveness of using ABC

analyses to quantify processes such as the amount and maximum distance of dispersal, which can

help limit the use of spatial predictors as proxies for dispersal (Heino et al. 2015). Such questions

can likely be addressed using our modelling framework (code and tutorials available at

https://github.com/trevorjwilli/CommSimABCR) or other models with different assumptions as

ABC is a highly versatile method (Sisson et al. 2019b). Already new advances in ecological

modelling are being developed, such as the adaptation of the coalescent model for ecological

communities (Griswold 2019). These are exciting developments as ABC in conjunction with

coalescent models has played a significant role in developing the fields of population genetics

and phylogeography (Csilléry et al. 2010, Papadopoulou and Knowles 2016). We envision that similar advances can be made in metacommunity and community ecology, increasing our knowledge of the interactive effects of local and regional scale processes on metacommunity structure as well as providing the ability to ascertain process from pattern.

121

ACKNOWLEDGMENTS

We thank Steven Peck for his help with individual-based and population genetics modelling. We also thank Becca White and Alli Duffy whose helpful comments greatly enhanced the manuscript. Funding for this work was provided by Brigham Young University’s Department of

Biology and Graduates Studies.

122

LITERATURE CITED

Beaumont, M. A. 2010. Approximate Bayesian Computation in evolution and ecology. Annual

Review of Ecology, Evolution, and Systematics 41:379–406.

Beaumont, M. A., W. Zhang, and D. J. Balding. 2002. Approximate Bayesian Computation in

population genetics. Genetics 162:2025–2035.

Bell, G. 2001. Neutral macroecology. Science 293:2413–2418.

Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling out the spatial component of

ecological variation. Ecology 73:1045–1055.

Brown, B. L., E. R. Sokol, J. Skelton, and B. Tornwall. 2017. Making sense of

metacommunities: dispelling the mythology of a metacommunity typology. Oecologia

183:643–652.

Caswell, H. 1976. Community structure: a neutral model analysis. Ecological Monographs

46:327.

Chase, J. M., P. Amarasekare, K. Cottenie, A. Gonzalez, R. D. Holt, M. Holyoak, M. F. Hoopes,

M. A. Leibold, M. Loreau, N. Mouquet, J. B. Shurin, and D. Tilman. 2005. Competing

theories for competitive metacommunities. Pages 335–354 in M. Holyoak, M. A. Leibold,

and R. D. Holt, editors. Metacommunities: Spatial Dynamics and Ecological Communities.

University of Chicago Press, Chicago, IL.

Chase, J. M., and M. A. Leibold. 2003. Ecological Niches: Linking Classical and Contemporary

Approaches. University of Chicago Press, Chicago.

Condit, R., N. Pitman, E. G. J. Leigh, J. Chave, J. Terborgh, R. B. Foster, P. Núñez, S. Aguilar,

123

R. Valencia, G. Villa, H. C. Muller-Landau, E. Losos, and S. P. Hubbell. 2002. Beta-

diversity in tropical forest trees. Science 295:666–669.

Constable, G. W. A., and A. J. McKane. 2014. Population genetics on islands connected by an

arbitrary network: An analytic approach. Journal of Theoretical Biology 358:149–165.

Cottenie, K. 2005. Integrating environmental and spatial processes in ecological community

dynamics. Ecology Letters 8:1175–1182.

Csilléry, K., M. G. B. Blum, O. E. Gaggiotti, and O. François. 2010. Approximate Bayesian

Computation (ABC) in practice. Trends in Ecology & Evolution 25:410–418.

Endler, J. A. 1982. Problems in distinguishing historical from ecological factors in

biogeography. American Zoologist 22:441–452.

Estoup, A., L. Raynal, P. Verdu, and J. Marin. 2018. Model choice using Approximate Bayesian

Computation and Random Forests : analyses based on model grouping to make inferences

about the genetic history of Pygmy human populations. Journal de la Société Française de

Statistique 159:167–190.

Fasiolo, M., and S. N. Wood. 2019. ABC in ecological modelling. Pages 597–622 in S. A.

Sisson, Y. Fan, and M. A. Beaumont, editors. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Finn, D. S., and N. L. Poff. 2011. Examining spatial concordance of genetic and species diversity

patterns to evaluate the role of dispersal limitation in structuring headwater

metacommunities. Journal of the North American Benthological Society 30:273–283.

Fournier, B., N. Mouquet, M. A. Leibold, and D. Gravel. 2017. An integrative framework of

124

coexistence mechanisms in competitive metacommunities. Ecography 40:630–641.

García-Girón, J., P. García, M. Fernández-Aláez, E. Bécares, and C. Fernández-Aláez. 2019.

Bridging population genetics and the metacommunity perspective to unravel the

biogeographic processes shaping genetic differentiation of Myriophyllum alterniflorum DC.

Scientific Reports 9.

Gilbert, B., and J. R. Bennett. 2010. Partitioning variation in ecological communities: Do the

numbers add up? Journal of Applied Ecology 47:1071–1082.

Gotelli, N. J., M. J. Anderson, H. T. Arita, A. Chao, R. K. Colwell, S. R. Connolly, D. J. Currie,

R. R. Dunn, G. R. Graves, J. L. Green, J.-A. Grytnes, Y.-H. Jiang, W. Jetz, S. Kathleen

Lyons, C. M. Mccain, A. E. Magurran, C. Rahbek, T. F. L. V. B. Rangel, J. Soberón, C. O.

Webb, and M. R. Willig. 2009. Patterns and causes of species richness: a general simulation

model for macroecology. Ecology Letters 12:873–886.

Gotelli, N. J., and D. J. McCabe. 2002. Species co-occurrence: a meta-analysis of J.M.

Diamond’s assembly rules model. Ecology 83:2091–2096.

Gravel, D., C. D. Canham, M. Beaudet, and C. Messier. 2006. Reconciling niche and neutrality:

the continuum hypothesis. Ecology Letters 9:399–409.

Grelaud, A., C. P. Robert, J.-M. Marin, F. Rodolphe, and J.-F. Taly. 2009. ABC likelihood-free

methods for model choice in Gibbs random fields. Bayesian Analysis 4:317–336.

Grimm, V., and S. F. Railsback. 2005. Individual-based Modeling and Ecology. Princeton

University Press, Princeton.

Griswold, C. K. 2019. An ancestral process with selection in an ecological community. Journal

125

of Theoretical Biology 466:128–144.

Hartig, F., J. M. Calabrese, B. Reineking, T. Wiegand, and A. Huth. 2011. Statistical inference

for stochastic simulation models - theory and application. Ecology Letters 14:816–827.

He, Q. X., D. L. Edwards, and L. L. Knowles. 2013. Integrative testing of how environments

from the past to the present shape genetic structure across landscapes. Evolution 67:3386–

3402.

Heino, J., A. S. Melo, T. Siqueira, J. Soininen, S. Valanko, and L. M. Bini. 2015.

Metacommunity organisation, spatial extent and dispersal in aquatic systems: Patterns,

processes and prospects. Freshwater Biology 60:845–869.

Holyoak, M., M. A. Leibold, N. Mouquet, R. D. Holt, and M. F. Hoopes. 2005.

Metacommunities: a framework for large-scale community ecology. Pages 1–31 in M.

Holyoak, M. A. Leibold, and R. D. Holt, editors. Metacommunities: Spatial Dynamics and

Ecological Communities. University of Chicago Press, Chicago, IL.

Hubbell, S. P. 2001. The unified neutral theory of biodiversity and biogeography. Princeton

University Press.

Knowles, L. L. 2009. Statistical phylogeography. Pages 593–612 Annual Review of Ecology

Evolution and Systematics.

Knowles, L. L., and W. P. Maddison. 2002. Statistical phylogeography. Molecular Ecology

11:2623–2635.

Lawton, J. H. 1999. Are there general laws in ecology? Oikos 84:177–192.

Legendre, P., D. Borcard, and P. R. Peres-Neto. 2005. Analyzing beta diversity: partitioning the

126

spatial variation of community composition data. Ecological Monographs 75:435–450.

Leibold, M. A., and J. M. Chase. 2018. Metacommunity Ecology. Princeton University Press,

Princeton.

Leibold, M. A., M. Holyoak, N. Mouquet, P. Amarasekare, J. M. Chase, M. F. Hoopes, R. D.

Holt, J. B. Shurin, R. Law, D. Tilman, M. Loreau, and A. Gonzalez. 2004. The

metacommunity concept: a framework for multi-scale community ecology. Ecology Letters

7:601–613.

Leibold, M. A., and G. M. Mikkelson. 2002. Coherence, species turnover, and boundary

clumping: elements of meta-community structure. Oikos 97:237–250.

Levins, R., and D. Culver. 1971. Regional coexistence of species and competition between rare

species. Proceedings of the National Academy of Sciences of the United States of America

68:1246–1248.

Logue, J. B., N. Mouquet, H. Peter, and H. Hillebrand. 2011. Empirical approaches to

metacommunities: A review and comparison with theory. Trends in Ecology and Evolution

26:482–491.

Loreau, M., and N. Mouquet. 1999. Immigration and the maintenance of local species diversity.

The American Naturalist 154:427–440.

Marin, J.-M., P. Pudlo, A. Estoup, and C. Robert. 2019. Likelihood-free model choice. Page in S.

A. Sisson, Y. Fan, and M. A. Beaumont, editors. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

May, F., I. Giladi, M. Ristow, Y. Ziv, and F. Jeltsch. 2013. Metacommunity, mainland-island

127

system or island communities? Assessing the regional dynamics of plant communities in a

fragmented landscape. Ecography 36:842–853.

Mcgill, B. J. 2010. Towards a unification of unified theories of biodiversity. Ecology Letters

13:627–642.

Moran, P. A. P. 1958. Random processes in genetics. Mathematical Proceedings of the

Cambridge Philosophical Society 54:60–71.

Papadopoulou, A., and L. L. Knowles. 2016. Toward a paradigm shift in comparative

phylogeography driven by trait-based hypotheses. Proceedings of the National Academy of

Sciences 113:8018–8024.

Peres-neto, P. R., P. Legendre, S. Dray, and D. Borcard. 2006. Variation partitioning of species

data matrices. Ecology 87:2614–2625.

Pontarp, M., Å. Brännström, and O. L. Petchey. 2019. Inferring community assembly processes

from macroscopic patterns using dynamic eco‐evolutionary models and Approximate

Bayesian Computation (ABC). Methods in Ecology and Evolution 10:450–460.

Prangle, D. 2019. Summary statistics. Pages 125–152 in S. A. Sisson, Y. Fan, and M. A.

Beaumont, editors. Handbook of Approximate Bayesian Computation. CRC Press, Boca

Raton, Florida.

Presley, S. J., C. L. Higgins, and M. R. Willig. 2010. A comprehensive framework for the

evaluation of metacommunity structure. Oikos 119:908–917.

Pudlo, P., J.-M. Marin, A. Estoup, J.-M. Cornuet, M. Gautier, and C. P. Robert. 2016. Reliable

ABC model choice via random forests. Bioinformatics 32:859–866.

128

Robert, C. P., J. M. Cornuetd, J. M. Marine, and N. S. Pillaif. 2011. Lack of confidence in

approximate Bayesian computation model choice. Proceedings of the National Academy of

Sciences of the United States of America 108:15112–15117.

Ruokolainen, L., E. Ranta, V. Kaitala, and M. S. Fowler. 2009. When can we distinguish

between neutral and non-neutral processes in community dynamics under ecological drift?

Ecology Letters 12:909–919.

Shoemaker, L. G., and B. A. Melbourne. 2016. Linking metacommunity paradigms to spatial

coexistence mechanisms. Ecology 97:2436–2446.

Sisson, S. A., Y. Fan, and M. A. Beaumont. 2019a. Overview of ABC. Pages 3–54 in S. A.

Sisson, Y. Fan, and M. A. Beaumont, editors. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Sisson, S. A., Y. Fan, and M. A. Beaumont, editors. 2019b. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Smith, T. W., and J. T. Lundholm. 2010. Variation partitioning as a tool to distinguish between

niche and neutral processes. Ecography 33:648–655.

Soininen, J. 2014. A quantitative analysis of species sorting across organisms and ecosystems

95:3284–3292.

Sunnåker, M., A. G. Busetto, E. Numminen, J. Corander, M. Foll, C. Dessimoz, and S. Wodak.

2013. Approximate Bayesian Computation. PLOS Computational Biology 9:e1002803.

Thompson, P. L., L. M. Guzman, L. De Meester, Z. Horváth, R. Ptacnik, B. Vanschoenwinkel,

D. S. Viana, and J. M. Chase. 2020. A process‐based metacommunity framework linking

129

local and regional scale community ecology. Ecology Letters 23:1314–1329.

Toni, T., and M. P. H. Stumpf. 2010. Simulation-based model selection for dynamical systems in

systems and population biology. Bioinformatics 26:104–110.

Vellend, M. 2010. Conceptual synthesis of community ecology. The Quarterly Review of

Biology 85:183–206.

Vellend, M. 2016. The Theory of Ecological Communities. Princeton University Press,

Princeton, NJ.

Vellend, M., D. S. Srivastava, K. M. Anderson, C. D. Brown, J. E. Jankowski, E. J. Kleynhans,

N. J. B. Kraft, A. D. Letaw, A. A. M. Macdonald, J. E. Maclean, I. H. Myers-Smith, A. R.

Norris, and X. Xue. 2014. Assessing the relative importance of neutral stochasticity in

ecological communities. Oikos 123:1420–1430.

Yunoki, T., and L. T. Velasco. 2016. Fish metacommunity dynamics in the patchy heterogeneous

habitats of varzea lakes, turbid river channels and transparent clear and black water bodies

in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes 99:391–408.

130

Table 1 Results of ABC analyses for pseudo-communities. PP = Posterior Probability, CI = Confidence Interval. Mean and standard deviations are for the selected models using ABC RF analyses. Prior error rates were also estimated using ABC RF. Analysis Mean SD Prior Accuracy ABC RF Accuracy ABC PP PP Error Rate (95% CI) Traditional (95% CI) Within Neutral 0.865 0.119 0.119 0.935 (0.906, 0.957) 0.965 (0.942, 0.981)

Within Niche 0.592 0.123 0.299 0.838 (0.798, 0.872) 0.865 (0.828, 0.897)

Neutral vs Niche 0.838 0.145 0.107 0.836 (0.808, 0.861) 0.923 (0.902, 0.940) (Uniform) Neutral vs Niche 0.838 0.144 0.095 0.886 (0.862, 0.907) 0.928 (0.907, 0.945) (Normal) Neutral vs Niche 0.841 0.128 0.084 0.924 (0.903, 0.941) 0.916 (0.895, 0.935) (Gamma)

131

Table 2 Results of ABC RF analyses for real-world datasets. Dataset Selected Posterior Prior Votes Votes Votes Votes Model Probability Error NTDL NTNDL SSDL SSNDL Rate BCI NTNDL 0.779 0.099 291 707 0 2 Wsr 25 SSDL 0.686 0.086 0 0 646 354 50 SSDL 0.713 0.053 3 1 647 349 75 SSDL 0.660 0.045 5 0 632 363

132

Table 3 Results of traditional ABC model selection analyses for real word datasets. Dataset Selected Posterior Posterior Posterior Posterior Model Probability Probability Probability Probability NTDL NTNDL SSDL SSNDL BCI NTNDL 0.285 0.715 0.00 0.00 Wsr 25 SSDL 0.000 0.000 0.611 0.389 50 SSDL 0.000 0.000 0.616 0.384 75 SSDL 0.000 0.000 0.583 0.417

133

Figure 1 Boxplots showing the results of variation partitioning proportions according to each treatment run during the model validation. Responses are a) environment conditioned on space, b) space conditioned on the environment, c) environment with space.

134

Figure 2 Plots of linear discriminant function scores from ABC RF analyses for A) BCI, and B) Wsr. Colored dots correspond to simulated values while the black star shows the values for the observed datasets. The Wsr plot shown is for simulations run for 50 time steps; plots for the 25 and 75 time step simulations were very similar.

135

APPENDIX A

ADDITIONAL METHODS FOR CHAPTER ONE

METHODS FOR ANALYSES RUN ON RAW DATA

Data from Jackson et al. (1992)

Jackson et al. (1992) originally ran four different null model methods on lakes from five regions (Manitoulin Island, Bruce Peninsula, Black River, LaCloche Mountains, and Wawa) in

Ontario. We extracted the presence absence matrices and spatial arrangement of sites for

Manitoulin Island from Harvey and Coombs (1971) and Harvey (1978), the Bruce Peninsula from Harvey (1981), and Wawa from Somers and Harvey (1984).

Data from Snodgrass et al. (1996)

Species presence absence was extracted from Table 1, while the spatial arrangement of sites was extracted from Figure 8 using WebPlotDigitizer (Rohatgi 2020).

Data from Peres-Neto (2004)

Presence absence data and the spatial arrangement of sites were extracted from Peres-

Neto (2002). Before null model analyses, we removed all species that occurred at less than 5% of sites as well as Synbranchus marmoratus and Gymnotus pantherinus to match the methods of the original study.

Data from Cordero and Jackson (2019)

We obtained presence-absence and spatial data from the Aquatic Habitat Inventory (AHI) program of the Ontario Ministry of Natural Resources and Forestry. We then created raw presence-absence matrices by clipping site data to the tertiary level watersheds of the Ontario

136

Watershed Boundaries dataset (https://data.ontario.ca/dataset/ontario-watershed-boundaries) which we modified to better match the boundaries delineated in Figure 1 of Cordero and Jackson

(2019) using QGIS v 3.10 (QGIS Development Team 2020). Matrices from the 86 watersheds analyzed in Cordero and Jackson (2019) were retained. We then excluded any species that occurred in less than three sites in at least three watersheds to match the original study. Our final datasets contained six fewer species than Cordero and Jackson (2019) perhaps due to updates to the AHI database, however, our results from the null model analyses were consistent with those of the original study (Figure A-1), therefore we retained these studies for our meta-analysis.

Data from Giam and Olden (2016)

We used the original presence-absence matrices used in Giam and Olden (2016), retaining species that occurred in 2 to N-2 sites. Although Giam and Olden (2016) did not use a matrix-wide approach to assess community structure, results for individual watersheds were largely congruent between both approaches (Figure A-2), and thus were retained in our meta- analysis.

Data from Zbinden (2021)

Presence-absence matrices and the spatial arrangement of sites were obtained from https://github.com/zdzbinden/Species_co-occurrence_frameworks.

137

REFERENCES

Cordero, R. D., and D. A. Jackson. 2019. Species‐pair associations, null models, and tests of

mechanisms structuring ecological communities. Ecosphere 10.

Giam, X., and J. D. Olden. 2016. Environment and predation govern fish community assembly in

temperate streams. Global Ecology and Biogeography 25:1194–1205.

Harvey, H. H. 1978. Fish communities of the Manitoulin Island lakes. SIL Proceedings, 1922-

2010 20:2031–2038.

———. 1981. Fish communities of the lakes of the Bruce Peninsula. Internationale Vereinigung

für theoretische und angewandte Limnologie: Verhandlungen 21:1222–1230.

Harvey, H. H., and J. F. Coombs. 1971. Physical and chemical limnology of the lakes of

Manitoulin Island. Journal of the Fisheries Research Board of Canada 28:1883–1897.

Jackson, D. A., K. M. Somers, and H. H. Harvey. 1992. Null models and fish communities:

evidence of nonrandom patterns. The American Naturalist 139:930–951.

Peres-Neto, P. R. 2002. The Distribution of Fishes across Stream Landscapes: Analytical

Approaches and Ecological Patterns. University of Toronto.

———. 2004. Patterns in the co-occurrence of fish species in streams: the role of site suitability,

morphology and phylogeny versus species interactions. Oecologia 140:352–360.

QGIS Development Team. 2020. QGIS Geographic Information System.

Rohatgi, A. 2020. Webplotdigitizer: Version 4.4.

Snodgrass, J. W., J. Bryan A Lawrence, R. F. Lide, and G. M. Smith. 1996. Factors affecting the

occurrence and structure of fish assemblages in isolated wetlands of the upper coastal

plain, U.S.A. Canadian Journal of Fisheries and Aquatic Sciences 53:443–454.

Somers, K. M., and H. H. Harvey. 1984. Alteration of fish communities in lakes stressed by acid

138

deposition and heavy metals near Wawa, Ontario. Canadian Journal of Fisheries and

Aquatic Sciences 41:20–29.

Zbinden, Z. D. 2021. A needle in the haystack? Applying species co‐occurrence frameworks

with fish assemblage data to identify species associations and sharpen ecological

hypotheses. Journal of Fish Biology In Press.

139

Figure A-1 Comparison between the SES of A) Cordero and Jackson (2019) and B) this study.

140

Figure A-2 Comparison in community structure between A) Giam and Olden (2016) and B) this study.

141

APPENDIX B

SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER ONE

Figure B-1 Plots displaying data used in linear regression analyses where values for matrices in temporal studies that were sampled in the same location were averaged. A) Boxplots showing the differences in Standardized Effect Size (SES) of co-occurrence null model analyses between habitat types. B) Boxplots showing the differences in SES values between climates. C) Scatterplot of SES by spatial scale of matrix, as measured as the Euclidean distance between the furthest two sites in the analyzed metacommunity. D) Scatterplot of SES by matrix size calculated as the number of species multiplied by the number of sites within a matrix. Note both C and D are on logarithmic scales. Stars show significance of Tukey’s Honest Significant Difference for pairwise comparisons where ‘*’ < 0.05, ‘**’ < 0.01, ‘***’ < 0.001.

142

APPENDIX C

KEY TO THE NATIVE CICHLIDAE OF COSTA RICA

Taken and modified from Bussing (1998) using information from Schmitter-Soto (2007b,a),

Říčan et al. (2008, 2016)

KEY TO THE GENERA OF COSTA RICAN CICHLIDAE

1. Anal fin with 3 spines (Only in Río Coto Drainage)…………Andinoacara coeruleopunctatus 1a. Anal fin with 4 or more spines………………………….………….………………………….2 2. Teeth labiolingually flattened with tricuspid tips (Říčan et al. (2016) Figure 10H); Dorsal fin with 18-19 spines; Anal fin with 11-12 spines; Distributed through San Juan Ichthyological Province……………………...………………………………………….Herotilapia multispinosa 2a. Teeth not as above; Dorsal fin with 13-19 spines; Anal fin with 4-12 spines……………...…3 3. Teeth are truncated, incisor-like, and labiolingually flattened (Říčan et al. (2016) Figure 10J); Lower jaw frequently shorter than upper jaw; oral jaws short; Distributed through San Juan Ichthyological Province…………………………..………………………..Neetroplus Nematopus 3a. Teeth not as above; jaws similar or variable………...……………………………………...…4 4. Distinctly enlarged anterior-most canines in upper jaw and two on each side of the anterior pair on lower jaw; Lower jaw longer than upper jaw (can also occur in Cribroheros but not to same extent, most equal in Cribroheros)…………….……………………………….Parachromis 4a. Canine teeth not as above; jaws of equal length or with lower jaw shorter than upper jaw…..5 5. Outer row teeth labiolingually flattened; may contain second cusp on lingual side (at least at tip; Říčan et al. (2016) Figure. 10 E,G) (Be careful not to confuse these with the C and D type teeth which are not flattened at tip; Figure C-1); snout relatively rounded……………….…………………………………………………………………………...6 5a. Teeth conical, cylindrical, and sharp (Říčan et al. (2016) Figure 10 C,D); snout relatively pointed…………………….……………………………………………………………….……..10 6. Frenum absent; teeth robust, conical, with tip flattened but no second cusp (Rican 10E); 13-14 dorsal rays, 9-10 anal fin rays; 4-5 anal fin spines; contains two lines between typically red eyes (also in Talamancaheros underwoodi); Distributed through Atlantic Versant……………………………………………………………………..……Tomocichla Tuba 6a. Frenum present; Tooth type as above or variable; ≤ 13 dorsal fin rays, ≤ 10 anal fin rays...…7

143

7. Second lower lip present, though reduced; teeth labiolingually flattened at tip but delicate with relatively large teeth in the 2nd and 3rd rows (Rican 10G), ≥17 dorsal fin spines, >6 anal fin spines………………………………………………………………………………..……………..8 7a. Second lower lip absent; teeth labiolingually flattened at tip and robust, may have second cusp (Rican 10E), < 6 anal fin spines, < 17 dorsal fin spines………………..……………………9 8. Lower jaw shorter than upper jaw with mouth in ventral position (similar profile to Neetroplus); adult coloration L-type, frequently has a large black blotch on mid-side crossed by a lateral stripe; 32 scales in lateral line; 18-19 dorsal fin spines (modally 19), 7-9 anal fin spines (modally 8); Lotic postcranial morphology (Elongated body and caudal peduncle)……………………………………………………………… 8a. Oral jaws short, subequal or lower jaw shorter than upper (mouth typically in terminal position) with rounded profile; adult coloration B-type, commonly with vertical bars across body (though may only have mid-side blotch and caudal blotch); <32 scales in lateral line; 17-19 dorsal fin spines, 7-11 anal fin spines; Lentic postcranial morphology (Shortened caudal peduncle, deep body with)………………………..………………………..…………..Amatitlania 9. Mouth subterminal; Lotic postcranial morphology (Elongated body and caudal peduncle, caudal peduncle blotch small); Teeth without second cusp; 4-5 anal fin spines; Distributed on Pacific slope of the Talamanca mountains…………...…….………Talamancaheros underwoodi 9a. Mouth terminal; Lentic postcranial morphology (Shortened caudal peduncle, deep body width, caudal peduncle blotch large); Teeth with second cusp; >4 anal fin spines; Atlantic versant between 0 and 5 m elevation (coastal)……………..….………………Vieja maculicauda 10. Frenum present; Second lower lip present but reduced………………………..….Cribroheros 10a. Frenum absent; Second lower lip large……………………….…………………………….11 11. Number of anal fin spines 9-10; Rounder body shape (compared to Amphilophus), < 150 mm SL; Distributed in the San Juan ichthyological province on Caribbean slope; contains two distinctive operculum blotches…………………………….……..……Archocentrus centrarchus 11a. Number of anal fin spines < 9; can reach 200-300 mm SL; without operculum blotches……………………………………….…………...…………………………Amphilophus

KEY TO THE SPECIES OF PARACHROMIS

1. Posterior edge of the preopercle usually with a conspicuous lobe at its lower angle……..………………………………………………………….…Parachromis managuensis 1a. Posterior edge of preopercule without conspicuous lobe……………………..……………….2 2. Greatest body depth equal to or greater than head length; longitudinal scales 27-31; modally 8 anal fin spines; 31 scales along lateral line………………………….………Parachromis loisellei 2a. Greatest body depth less than head length; longitudinal scales 31-34; modally 6 anal fin spines; 33 scales along lateral line………………….………………..…………Parachromis dovii

144

KEY TO THE SPECIES OF AMATITLANIA

1. 6-8 Anal fin spines (usually 7); caudal blotch tenuous and on base of caudal fin; third vertical bar (at tip of pectoral fin) is most prominent and almost equal width along length; Chiriqui province of Smith and Bermingham (2005)………………….…………….…..Amatitlania sajica 1a. 7-11 anal fin spines (usually 9); caudal blotch more prominent and on base of caudal fin or on caudal peduncle; several vertical bars of equal intensity or third bar the most conspicuous and broadened at midlength………………………………..…………………………………………..2 2. Secondary caudal pores absent (pored scales that do not continue the lateral line, but appear between other caudal fin rays); Río Sixaola drainage……………...………….Amatitlania kanna 2a. Secondary caudal pores present…………………………...…………………………………..3 3. Caudal blotch mainly on base of caudal fin itself; a black blotch on upper part of opercle; first vertical bar Y-shaped; live fish with a golden iris; modally 1.5 scales from lateral line to first dorsal fin ray (Note: this is the species that Bussing considered to be Archocentrus nigrofasciata)…………………………………...…….……………….….……Amatitlania siquia 3a. Caudal blotch mainly on caudal peduncle; opercular blotch absent; first vertical bar not bifurcated; live fish with a blue iris……………………………………………………………….4 4. Vertical bars, if present, diffuse and incomplete; upper symphysial teeth large with strong lingual cusp >27 scales in lateral line; Río Estrella drainage southward …….Amatitlania myrnae 4a. Vertical bars, generally evident, sometimes reduced to an oval blotch on midside and a bar at base of tail; Upper symphysial teeth not as large and without second cusp; 27 scales in lateral line; North of Río Estrella drainage …………………………..…….Amatitlania septemfasciatus

KEY TO THE SPECIES OF CRIBROHEROS

1. Teeth robust, pointed, conical and gradually increase in size towards the symphysis (Figure 10 C); Caudal fin rounded …………………………………………..………………………………..2 1a. Teeth slender, conical, pointed that do not or only very slightly increase in size towards symphysis (Figure 10 D); Caudal fin slightly emarginate…………………………………..…..6 2. Anal fin spines 4 or 5; lower lip with broad lateral lobes (Bussing (1998) figure 54); Pacific versant……………………….………………………………………………Cribroheros altifrons 2a. Anal fin spines 6 to 11; rarely 5; both versants……………………………………….……….3 3. A dark blotch in axil of pectoral fin; dorsal fin spines 15 to 17 (usually 16); broad vertical bars generally present; Pacific versant………………………………...……………Cribroheros diquis 3a. Axil of pectoral fin without a dark blotch; dorsal fin spines 17 to 19 (usually 17 or 18); broad vertical bars present or absent……………………………………………………………………..4

145

4. Without a band between eye and lateral blotch or base of tail; each scale of body with a blue spot (live); no dark vertical bars on body; no transparent spots on caudal and dorsal fins of females; black and clear spots present on caudal fin of males; Dorsal fin spines increasing in length posteriorly; Río Sixaola drainage……………………….…...………Cribroheros rhytisma 4a. A band, sometimes tenuous, between eye and lateral blotch, usually continuing to base of caudal fin; scales of body with blue spots only on anterior half of body in some populations of C. alfari; generally with broad diffuse bars on dorsal half of posterior part of body; Dorsal fin spines increasing, equal, increasing in length posteriorly…………………………………………5 5. A black spot on each scale of sides, on dorsal fin, soft anal-fin rays and proximal half of caudal fin of males; some blue markings on head, but none on body; a thin, intensive red margin on tail of both sexes; sensory papillae of lips very conspicuous with magnifications; Río Sixaola drainage southward; Atlantic versant…………………..………………..…..Cribroheros bussingi 5a. Scales of sides without black spots, but with blue irregular specks sometimes on anterior part of body and blue spots on the dorsal fin, soft anal-fin rays and on caudal fin of both sexes; sensory papillae inconspicuous; drainages to the north of the Río Sixaola and north Pacific sector….…………………………………………………………..…………….Cribroheros alfari 6. Caudal fin dusky, covered with transparent spots; eye and lateral blotch not connected by a dark band; Increasing-equal-increasing lengths of dorsal spines posteriorly, jaws of equal length, pectoral fin reaches to middle of anal spines; Atlantic versant…………...... Cribroheros rostratus 6a. Caudal fin without transparent spots; eye and lateral blotch connected by a dark, sometimes discontinuous, band; increasing-equal-decreasing-increasing lengths of dorsal spines posteriorly; lower jaw can be longer than upper jaw (Rican 2008); pectoral fin reaches to last anal spines; both versants……………………………………………………………..Cribroheros longimanus

KEY TO THE SPECIES OF AMPHILOPHUS

1. Pectoral fin length 3.5 to 4.0 times in standard length; sum of the number of spines of the anal and dorsal fins 20 to 22; Pacific versant…………………………………..….Amphilophus lyonsi 1a. Pectoral fin length 2.7 to 3.3 times in standard length; sum of the number of spines of the anal and dorsal fins 23 or 24; sometimes with thick lips; Atlantic versant……..……………………………………………………………..Amphilophus citrinellus

146

REFERENCES

Bussing, W. A. (1998). Peces de las Aguas Continentales de Costa Rica (2nd ed.). Editorial de la

Universidad de Costa Rica.

Říčan, O., Piálek, L., Dragová, K., & Novák, J. (2016). Diversity and evolution of the Middle

American cichlid fishes (Teleostei: Cichlidae) with revised classification. Vertebrate

Zoology, 66(1), 1–102.

Říčan, O., Zardoya, R., & Doadrio, I. (2008). Phylogenetic relationships of Middle American

cichlids (Cichlidae, Heroini) based on combined evidence from nuclear genes, mtDNA, and

morphology. Molecular Phylogenetics and Evolution, 49(3), 941–957.

https://doi.org/10.1016/j.ympev.2008.07.022

Schmitter-Soto, J. J. (2007a). A systematic revision of the genus Archocentrus (Perciformes:

Cichlidae), with the description of two new genera and six new species. Zootaxa, 1603, 1–

78.

Schmitter-Soto, J. J. (2007b). Phylogeny of species formerly assigned to the genus Archocentrus

(Perciformes: Cichlidae). Zootaxa, 1618, 1–50.

Smith, S. A., & Bermingham, E. (2005). The Biogeography of Lower Mesoamerican Freshwater

Fishes. Journal of Biogeography, 32(10), 1835–1854. http://www.jstor.org/stable/3566353

147

Figure C-1 Examples of differences between G and C type teeth. From left to right: Lower mandible tooth from Amatitlania septemfasciata, notice the second cusp and labiolingually flattened tip; Lower mandible tooth from Amatitlania siquia, notice the labiolingual flattening when compared to far-right tooth; Lower mandible tooth from Amatitlania siquia; Lower mandible tooth form Cribroheros alfari. The first three teeth are type G teeth whereas the last tooth is a type C tooth. The Amatitlania siquia specimen was collected from the Río San Rafael tributary in Costa Rica (Latitude: 10.246029, Longitude: -84.426494); the other specimens were collected in the Río Sucio in Costa Rica (Latitude: 10.246029, Longitude: -83.903431).

148

APPENDIX D

SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER TWO

Table D-1 Estimates of timing of diversification from sister taxa and the geographic association of the sister taxa for some of the sampled species. Confidence intervals for estimates from Říčan et al. 2013 were extracted from their figure 2 using the online version of WebPlotDigitizer (Rohatgi, 2020). Estimate Confidence Sister Taxa Species (mya) Interval (mya) Sister Taxa Biogeography Source Cluster 1 Brachyrhaphis Brachyrhaphis parismina 0.7 0.1-1.9 cascajalensis South Reznick et al. 2017 Priapichthys Priapichthys annectans 11.42 5.5-17.7 puetzi South Reznick et al. 2017 Amatitlania septemfasciatus 5.6 3.5-7.7 Amatitlania spp. South Říčan et al. 2013 Neetroplus Hypsophrys nematopus 8 5.4-10.6 Nicaraguensis Same Říčan et al. 2013 Cluster 2 Brachyrhaphis Brachyrhaphis holdridgei 13.63 7.5-19.4 spp. West/North Reznick et al. 2017 Phallichthys Phallichthys amates 17.65 13.8-21 quadripunctatus South Reznick et al. 2017 Amatitlania 1.35-1.6 0.504-2.378 Amatitlania spp. Same/South Bagley et al. 2017; siquia Říčan et al. 2013 Cribroheros Cribroheros rostratus 2.5 1.3-3.7 longimanus Same/North Říčan et al. 2013 Parachromis Parachromis managuensis 7.8 5.2-9.9 dovii Same Říčan et al. 2013 Neither Cluster Cribroheros alfari 4.3 2.1-7.1 Cribroheros spp. South Říčan et al. 2013 Hypsophrys Neetroplus Nicaraguensis 8 5.4-10.6 Nematopus Same Říčan et al. 2013 Parachromis Parachromis dovii 7.8 5.2-9.9 managuensis Same Říčan et al. 2013

149

Figure D-1 Approximate distributions of the species collected for this project as estimated using the drainage basins in which each species has been observed to occur. Red highlighted maps belong to species who have higher incidences in communities forming Cluster One of the hierarchical cluster analysis, blue highlighted maps belong to species who have higher incidences in Cluster Two and yellow highlighted maps belong to species who show no incidence preference for Cluster One or Two. The dotted lines indicate the approximate boundaries of the Nicaraguan Depression. Species distributions were estimated using information from Bussing (1998) and FishBase (Froese & Pauly, 2020).

150

REFERENCES

Bagley, J. C., Matamoros, W. A., Mcmahan, C. D., Tobler, M., Chakrabarty, P., & Johnson, J. B.

(2017). Phylogeography and species delimitation in convict cichlids (Cichlidae:

Amatitlania): implications for and Plio–Pleistocene evolutionary history in

Central America. Biological Journal of the Linnean Society, 120(4), 155–170.

https://oup.silverchair-

cdn.com/oup/backfile/Content_public/Journal/biolinnean/120/1/10.1111_bij.12845/3/bij.

12845.pdf?Expires=1498947632&Signature=EJFHvoitxRavLEbf0Vd6vSkVZ9Mm-

8e7D7aAPhWQurNzYaU4l9v58vFZNKNIK1EmN8iSNEojatB3Xb0i~LPOCfN4KWRM

FMsxbw4fueDDvlr

Bussing, W. A. (1998). Peces de las Aguas Continentales de Costa Rica (2nd ed.). Editorial de la

Universidad de Costa Rica.

Froese, R., & Pauly, D. (2020). FishBase. www.fishbase.org

Reznick, D. N., Furness, A. I., Meredith, R. W., & Springer, M. S. (2017). The origin and

biogeographic diversification of fishes in the family Poeciliidae. PLOS ONE, 12(3),

e0172546. https://doi.org/10.1371/journal.pone.0172546

Říčan, O., Piálek, L., Zardoya, R., Doadrio, I., & Zrzavý, J. (2013). Biogeography of the

Mesoamerican Cichlidae (Teleostei: Heroini): Colonization through the GAARlandia

land bridge and early diversification. Journal of Biogeography, 40(3), 579–593.

https://doi.org/10.1111/jbi.12023

Rohatgi, A. (2020). Webplotdigitizer: Version 4.4. https://automeris.io/WebPlotDigitizer

151

APPENDIX E

SUPPPLEMENTARY FIGURES AND TABLES FOR CHAPTER THREE

Table E-1 Locality, Genbank accession numbers, and sources for genetic data. Species N Localit Longitude Latitud Genbank Genes Reference y e Catostomus 13 Chalk -111.066 40.988 DQ360075 ND2 Mock et al. ardens Creek 01 (10), (2006) DQ360076 (2), DQ360085 (1) Catostomus 15 Deer -111.493 40.445 DQ360075 ND2 Mock et al. ardens Creek 49 (10), (2006) DQ360080 (2), DQ360082 (1), DQ360091 (1), DQ360098 (1) Catostomus 2 Jordan -111.916 40.571 DQ360098 (1), ND2 Mock et al. ardens River 77 DQ360103 (1) (2006) Catostomus 2 Lehi -111.829 40.380 DQ360098 (2) ND2 Mock et al. ardens Pond 4 (2006) Catostomus 3 Mamm -112.614 37.625 DQ360075 (1), ND2 Mock et al. ardens oth 11 DQ360101 (1), (2006) Creek DQ360107 (1) Catostomus 1 Otter -111.989 38.203 DQ360086 (1) ND2 Mock et al. ardens Creek 93 (2006) Reserv oir Catostomus 9 Sevier -112.563 39.394 DQ360099 (5), ND2 Mock et al. ardens River 61 DQ360100 (1), (2006) DQ360104 (1), DQ360105 (1), DQ360106 (1) Catostomus 1 Spanish -111.736 40.165 DQ360098 (1) ND2 Mock et al. ardens Fork 91 (2006) River Catostomus 5 Upper -111.433 40.964 DQ360075 (5) ND2 Mock et al. ardens Weber 92 (2006) River Catostomus 43 Utah -111.739 40.237 DQ360075 (1), ND2 Mock et al. ardens Lake 1 DQ360079 (1), (2006) DQ360080 (1), DQ360086 (2), DQ360098

152

(37), DQ360102 (1) Catostomus 6 Yuba -111.898 39.271 DQ360099 (6) ND2 Mock et al. ardens Reserv 33 (2006) oir Catostomus 1 Salina -111.849 38.936 KJ441203, ATP- Unmack et platyrhynchu Creek 35 KJ441254, Cytb- al. (2014) s KJ441101, NAD- KJ441356, ND1-4 KJ441305, KJ441152 Catostomus 1 Mamm -112.614 37.625 KJ441202, ATP- Unmack et platyrhynchu oth 11 KJ441253, Cytb- al. (2014) s Creek KJ441100, NAD- KJ441355, ND1-4 KJ441304, KJ441151 Catostomus 1 San -111.643 39.365 KJ441204, ATP- Unmack et platyrhynchu Pitch 34 KJ441255, Cytb- al. (2014) s River KJ441102, NAD- KJ441357, ND1-4 KJ441306, KJ441153 Catostomus 1 Soldier -111.495 39.993 KJ441200, ATP- Unmack et platyrhynchu Creek 7 KJ441251, Cytb- al. (2014) s KJ441098, NAD- KJ441353, ND1-4 KJ441302, KJ441149 Catostomus 1 Weber -111.163 40.626 KJ441198, ATP- Unmack et platyrhynchu River KJ441249, Cytb- al. (2014) s KJ441096, NAD- KJ441351, ND1-4 KJ441300, KJ441147 Cottus 29 Little -111.83 41.449 ND4 Crowley bairdii Bear 88 (2004) South Fork Cottus 24 Beaver -113.011 38.371 ND4 Crowley bairdii River 67 (2004) Cottus 22 Blacks -111.777 41.621 ND4 Crowley bairdii miths 76 (2004) Fork

153

Cottus 15 Logan -111.835 41.720 ND4 Crowley bairdii River 92 (2004) 89/91 Cottus 28 Logan -111.589 41.839 ND4 Crowley bairdii River 74 (2004) Ricks Spring Cottus 7 Logan -111.594 41.772 ND4 Crowley bairdii River 51 (2004) Temple Fork Cottus 27 Mamm -112.512 37.623 ND4 Crowley bairdii oth 95 (2004) Creek Cottus 35 Otter -111.871 38.481 ND4 Crowley bairdii Creek 76 (2004) Cottus 52 Provo -111.469 40.480 ND4 Crowley bairdii River 94 (2004) Midwa y Cottus 37 Provo -111.43 40.594 ND4 Crowley bairdii River 24 (2004) Jordane lle Cottus 8 San -111.449 39.631 ND4 Crowley bairdii Pitch 51 (2004) River Fairvie w Cottus 7 San -111.575 39.522 ND4 Crowley bairdii Pitch 86 (2004) River Below Fair Cottus 29 Sevier -112.005 38.110 ND4 Crowley bairdii River 79 (2004) East Fork Cottus 11 Sevier -112.18 38.206 ND4 Crowley bairdii River at 29 (2004) Kingsto n Cottus 13 Weber -111.491 41.015 ND4 Crowley bairdii River 23 (2004) Henefer

154

Gila atraria 5 Beaver -112.682 38.255 AF481740 (1), CR Johnson and River 04 AF481741(2), Belk (2002) AF481751 (2) Gila atraria 4 Big -112.662 40.768 AF481747 (3), CR Johnson and Springs 97 AF481761 (1) Belk (2002) Gila atraria 5 Bishop -113.872 39.395 AF481743 (1), CR Johnson and Spring 5 AF481744 (3), Belk (2002) AF481745 (1) Gila atraria 5 East -112.026 38.163 AF481740 (1), CR Johnson and Fork 92 AF481751 (3), Belk (2002) Sevier AF481752 (1) Gila atraria 5 Fish -113.372 39.859 AF481748 (1) CR Johnson and Springs 12 AF481749 (1), Belk (2002) AF481750 (1), AF481753 (1), AF481754 (1) Gila atraria 5 Leland -113.894 39.552 AF481739 (1), CR Johnson and Harris 65 AF481743 (2), Belk (2002) Spring AF481757 (2) Gila atraria 5 Locom -112.993 41.697 AF481743 (1), CR Johnson and otive 32 AF481745 (3), Belk (2002) Springs AF481746 (1) Gila atraria 5 Mona -111.876 39.804 AF481739 (3), CR Johnson and Springs 74 AF481747 (1), Belk (2002) AF481759 (1) Gila atraria 5 Rush -112.386 40.442 AF481739 (3), CR Johnson and Lake 68 AF481742 (2) Belk (2002) Gila atraria 3 Spring -111.721 40.019 AF481739 (2), CR Johnson and Creek 56 AF481760 (1) Belk (2002) Iotichthys 5 Leland -113.894 39.552 AY641420 Cytb Mock and phlegethontis Harris 65 (3), AY641421 Miller Spring (2) (2005) Iotichthys 5 Gandy -113.924 39.476 AY641419 Cytb Mock and phlegethontis Salt 29 (1), AY641420 Miller Marsh (4) (2005) Iotichthys 5 Bishop -113.872 39.395 AY641413 Cytb Mock and phlegethontis Springs 5 (1), AY641414 Miller (1) , (2005) AY641415 (1), AY641420 (2) Iotichthys 17 Mona -111.876 39.804 AY641420 Cytb Mock and phlegethontis Springs 74 (2), AY641422 Miller (2), AY641423 (2005) (8), AY641424 (4), AY641425 (1)

155

Iotichthys 5 Mills -112.055 39.386 AY641416 Cytb Mock and phlegethontis Valley 51 (1), AY641417 Miller Springs (1) , (2005) AY641418 (1), AY641420 (2) Iotichthys 6 Clear -112.625 39.108 AY641420 Cytb Mock and phlegethontis Lake 82 (5), DQ065820 Miller (2005), Mock and Bjerregaard (2007) Lepidomeda 3 East -112.005 38.109 AF270912- Cytb- Johnson and aliciae Fork 79 AF270914, S7- Jordan Sevier AY825487, TPI (2000), AY825380 Dowling et al (2002) Lepidomeda 3 Main -111.409 40.374 AF270894- Cytb Johnson and aliciae Creek 07 AF270896 Jordan (2000), Dowling et al (2002) Lepidomeda 3 Salina -111.75 38.918 AF270906- Cytb- Johnson and aliciae Creek 69 AF270908 S7- Jordan AY825489, TPI (2000), AY825490, Dowling et AY825496, al (2002) AY825381, AY825382 Lepidomeda 3 San -111.684 39.290 AF270903- Cytb- Johnson and aliciae Pitch 16 AF270905 S7- Jordan River AY825488, TPI (2000), AY825495, Dowling et AY825497, al (2002) AY825387- AY825490 Lepidomeda 4 Sevier -112.236 39.547 AF270909- Cytb- Johnson and aliciae River 92 AF270911, S7- Jordan AF452085, TPI (2000), AY825498- Dowling et AY825501, al (2002) AY825383- AY825386 Lepidomeda 4 Spanish -111.638 40.078 AF270897- Cytb- Johnson and aliciae Fork 41 AF270899, S7- Jordan River AF452084, TPI (2000), AY825491,

156

AY825494, Dowling et AY825391, al (2002) AY825392 Lepidomeda 3 Thistle -111.509 39.817 AF270900- Cytb- Johnson and aliciae Creek 81 AF270902, S7- Jordan AY825492, TPI (2000), AY825493, Dowling et AY825393, al (2002) AY825394 Prosopium 25 Logan -111.651 41.774 MF382117 Cytb- Miller williamsoni River 96 (25), ND2 (2006) MF382125 (2), MF382126 (1), MF382554 (23), MF382563 (2) Prosopium 35 Weber -111.408 40.908 MF382117 Cytb- Miller williamsoni River 68 (23), ND2 (2006) MF382124 (2), MF382443 (20), MF382554 (2), MF382557 (6), MF382559 (4), MF382565 (2), MF382569 (1) Prosopium 25 Provo -111.54 40.394 MF382117 Cytb- Miller williamsoni River 07 (12), ND2 (2006) MF382121 (4), MF382443 (7), MF382554 (14), MF382564 (2), MF382566 (2) Rhinichthys 9 Blue -112.458 41.822 FJ528941- Cytb Smith and osculus Creek 34 FJ528949 Dowling (2008), Billman et al. (2010), Smith et al. (2017) Rhinichthys 5 Cotton -113.832 41.8 FJ528929- Cytb Smith and osculus Creek FJ528933 Dowling (2008), Billman et al. (2010),

157

Smith et al. (2017) Rhinichthys 1 East -111.672 40.971 KY399015, Cytb- Smith and osculus Canyon 22 KY398930 ND4 Dowling Creek (2008), Billman et al. (2010), Smith et al. (2017) Rhinichthys 2 Fish -113.372 39.859 DQ990283, Cytb- Smith and osculus Springs 12 FJ528985, ND4 Dowling DQ990182 (2008), Billman et al. (2010), Smith et al. (2017) Rhinichthys 1 Indian -114.027 39.476 KY399001, Cytb- Smith and osculus George 77 KY398916 ND4 Dowling Wash (2008), Billman et al. (2010), Smith et al. (2017) Rhinichthys 10 Lake -114.086 38.749 DQ990252- Cytb- Smith and osculus Creek 16 DQ990253, ND4 Dowling DQ990302- (2008), DQ990304, Billman et FJ528954- al. (2010), FJ5285958, Smith et al. DQ990151- (2017) DQ990152, DQ990201- DQ990203 Rhinichthys 2 Park -113.886 37.490 DQ990297, Cytb- Smith and osculus Canyon 02 DQ990312, ND4 Dowling Creek DQ990196, (2008), DQ990211 Billman et al. (2010), Smith et al. (2017) Rhinichthys 4 Rock -114.376 41.723 DQ990278- Cytb- Smith and osculus Spring 35 DQ990281, ND4 Dowling DQ990177- (2008), DQ990180 Billman et al. (2010),

158

Smith et al. (2017) Rhinichthys 4 Salina -111.852 38.941 DQ990298- Cytb- Smith and osculus Canyon 43 DQ990301, ND4 Dowling Creek DQ990197- (2008), DQ990200 Billman et al. (2010), Smith et al. (2017) Rhinichthys 13 Salina -111.849 38.936 FJ528967- Cytb Smith and osculus Creek 35 FJ528979 Dowling (2008), Billman et al. (2010), Smith et al. (2017) Rhinichthys 2 Sevier -111.866 38.988 FJ528950, Cytb Smith and osculus River 98 FJ528951 Dowling (2008), Billman et al. (2010), Smith et al. (2017) Richardsoniu 4 Blue -112.723 41.952 GU182719- Cytb- Houston et s balteatus Creek GU182722, CR al. (2010) GU182514- GU182517 Richardsoniu 5 Lake -114.048 38.767 GU182782- Cytb- Houston et s balteatus Creek GU182286, CR al. (2010) GU182577- GU182581 Richardsoniu 5 Little -112.48 38.25 GU182795- Cytb- Houston et s balteatus Reserv GU182799, CR al. (2010) oir GU182590- GU182594 Richardsoniu 5 Main -111.442 40.394 GU182810- Cytb- Houston et s balteatus Creek GU182814, CR al. (2010) GU182605- GU182609 Richardsoniu 5 Tropic -112.25 37.58 GU182855- Cytb- Houston et s balteatus Reserv GU182859, CR al. (2010) oir GU182650- GU182654 Richardsoniu 4 Weber -111.163 40.626 GU182865- Cytb- Houston et s balteatus River GU182868, CR al. (2010)

159

GU182660- GU182663

160

Table E-2 Presence absence matrix derived from museum collection of Carl Hubbs.

Catostomus ardens Catostomus discobolus Catostomus platyrhynchus Catostomus bairdii Cottus Gila atraria Iotichthys phlegethontis Lepidomeda aliciae Rhinichthys cataractae Rhinichthys osculus balteatus Richardsonius

DD-01 0 0 1 0 0 0 0 0 1 0 DD-03 0 0 0 0 1 0 0 0 1 0 DD-05 0 0 1 0 1 0 1 0 0 1 DD-07 0 0 1 1 0 0 1 0 1 1 DD-09 0 0 1 1 0 0 0 0 1 0 DD-100 0 0 0 0 1 0 0 0 0 0 DD-101 0 0 0 0 1 0 0 0 0 0 DD-102 0 0 0 0 1 0 0 0 0 0 DD-103 0 0 1 1 0 0 0 0 1 0 DD-107 0 0 0 0 1 1 0 0 0 0 DD-108 0 0 0 0 1 1 0 0 0 0 DD-109 1 0 0 0 0 0 0 0 1 1 DD-11 0 0 1 1 0 0 1 0 1 1 DD-110 0 0 0 0 1 0 0 0 0 0 DD-13 0 0 0 0 0 0 1 0 0 0 DD-15 1 0 1 1 1 0 1 0 1 1 DD-21 0 0 1 0 0 0 1 0 1 0 DD-22 1 0 1 1 1 0 1 0 1 1 DD-24 0 0 1 1 0 0 1 0 1 1 DD-27 1 0 1 1 1 0 1 0 1 1 DD-28 0 0 1 1 0 0 0 0 0 1 DD-35 0 0 1 0 0 0 1 1 1 1 DD-38 1 0 1 1 0 0 1 1 1 1 DD-45 0 0 0 1 0 0 0 0 1 1 DD-46 0 0 0 0 0 0 0 0 0 1 DD-49 0 0 1 1 1 0 1 1 0 1 DD-51 1 0 1 0 1 0 0 1 1 1 DD-57 0 0 0 0 1 0 0 0 0 0 DD-58 0 0 0 0 1 0 0 0 0 0 DD-60 0 1 0 1 0 0 0 1 0 1 DD-62 0 0 0 1 0 0 0 0 1 1 DD-63 0 1 0 0 0 0 0 0 1 1

161

DD-66 0 0 0 0 1 0 0 0 1 1 DD-89 0 0 0 0 1 0 0 0 1 0 DD-90 0 0 0 0 1 0 0 0 1 0 DD-91 1 0 0 0 1 0 0 0 1 1 DD-93 0 0 0 0 0 0 0 0 1 0 DD-94 0 0 0 0 0 0 0 0 1 0 DD-95 0 0 0 0 0 0 0 0 1 0 DD-96 0 0 0 0 1 0 0 0 0 0 DD-97 0 0 0 0 1 0 0 0 0 0 DD-98 0 0 0 0 1 0 0 0 0 0 DD-99 0 0 0 0 1 0 0 0 1 0

162

Figure E-1 Parameter posterior distributions for Catostomus ardens.

163

Figure E-2 Parameter posterior distributions for Catostomus platyrhynchus.

164

Figure E-3 Parameter posterior distributions for Cottus bairdii.

165

Figure E-4 Parameter posterior distributions for Gila atraria.

166

Figure E-5 Parameter posterior distributions for Iotichthys phlegethontis.

167

Figure E-6 Parameter posterior distributions for Lepidomeda aliciae.

168

Figure E-7 Parameter posterior distributions for Prosopium williamsoni.

169

Figure E-8 Parameter posterior distributions for Rhinichthys osculus.

170

Figure E-9 Parameter posterior distributions for Richardsonius balteatus.

171

APPENDIX F

ADDITIONAL METHODS FOR CHAPTER FOUR

DESCRIPTION OF THE MODEL EQUATIONS

The Moran model is a stochastic birth-death process modelling the change of alleles in a

finite-sized population with overlapping generations and can be modified to include processes such as mutation, selection, and migration between demes (Moran 1958). In our model species

were treated as ‘haploid alleles’ and demes were treated as communities.

Selection occurs during the birth process where each species within the birth community

is given a probability of reproducing according to the equation:

= 𝑠𝑠𝑖𝑖𝑓𝑓𝑖𝑖 𝑝𝑝𝑖𝑖 𝑛𝑛 ∑𝑗𝑗=1 𝑠𝑠𝑗𝑗𝑓𝑓𝑗𝑗 Where n is the number of species present in the community, fi(j) is the frequency of species i (j) in

the birth community, and si(j) is the selection coefficient for species i (j) in the birth community.

Selection coefficients in this formulation are locality dependent and are expressed as the ratio in

expected fitness of species i to the expected fitness of a reference species for that community

(Ewens 2004). We also included stabilizing mechanisms within the model by making the

selection coefficients density dependent (Chesson 2000, 2018, Adler et al. 2007). The strength of

selection was able to vary according to the following equation adapted from Vellend (2016):

( ) = , 1 𝐿𝐿𝑖𝑖∗�𝑓𝑓𝑖𝑖− � + ln 𝑠𝑠𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 𝑛𝑛 𝑠𝑠𝑖𝑖 𝑒𝑒 Where fi is the frequency of species i in the community, n is the number of species in the

community, si,ave is the average selection coefficient of species i (following the formulation

172

above) when all species are at equal frequencies, and Li is the frequency dependence parameter

which can be thought of as the strength of intraspecific to interspecific competition. If Li is

negative, it denotes that intraspecific competition for species i is stronger than interspecific

competition, therefore allowing the possibility of species coexistence (Chesson 2000, 2018,

Adler et al. 2007). Under different parameterizations, this model can simulate the processes

governing metacommunity structure following the assumptions for any of the metacommunity

archetypes (Leibold et al. 2004, Leibold and Chase 2018).

VALIDATING THE MODEL

For each simulation, a 100 x 100 grid landscape was created; Each cell was given values

for five arbitrary environmental measures, with assignments dependent on the landscape

treatment (see below). Twenty cells were then randomly chosen from the grid to give the twenty

communities environmental and spatial components. Neutral models were parameterized so that

all species selection coefficients were equal to one and all frequency dependence parameters

were set to 0. Niche models assigned selection coefficients to species for each community

following multidimensional resource-utilization niches based upon the five environmental

measures (see below). Niche models also randomly assigned each species a frequency

dependence parameter between -1 and 0. Migration matrices were the same for each species

within simulations, where the probability of migrating was inversely proportional to the distance

from the birth community (see below). Dispersal limited models allowed species to migrate 30

grid units from their birth communities, whereas dispersal sufficient models allowed species to

migrate 150 grid units from their birth communities (effectively allowing them to reach any

community within the metacommunity).

173

Creation of spatial landscape for simulations

Environmental landscapes were generated in one of two ways: 1) completely random, or

2) spatially autocorrelated (Figure F-1). For completely random landscapes, each cell in the 100 x 100 grid landscape was randomly assigned five values between 0 and 1 – one for each arbitrary environmental variable – using a uniform distribution. In spatially autocorrelated landscapes, for each environmental variable 10 cells were randomly chosen and assigned one of the following values: 0, 0.25, 0.5, 0.75, and 1. Values were assigned such that each value appeared twice in the landscape (i.e. two of the ten sampled cells were assigned 0.25, etc.). Using these cells, inverse distance weighted (IDW) interpolation was used to assign environmental values to the rest of the landscape. Landscapes were created using the make_spatialenv function in the CommSimABC package in R (https://github.com/trevorjwilli/CommSimABCR).

Creation of Resource-Utilization Niches from Environmental variables

To create species specific selection coefficients from simulated environmental landscapes, we gave each species a multivariate normal distribution to describe the expected frequency of the species at differing values of each arbitrary environmental variable (Figure F-2).

This formulation follows that of the resource-utilization niche developed by MacArthur and

Levins (Macarthur and Levins 1967). We created multivariate niches by randomly sampling means (i.e. the value of an environmental variable at which the species occurs most frequently) and variances for each environmental variable for each species. Means were sampled using a uniform distribution with a minimum of 0 and a maximum of 1. Variances were also sampled using a uniform distribution but with a minimum of 0.4 and a maximum of 0.6. All covariances were set to 0 for all species. These settings produced random niches that allowed enough niche- breadth for species persistence within the metacommunity. The density of each species in each

174

community was then calculated using the environmental variable values found at each

community. We assigned selection coefficients for species within each site by taking the ratio of

the density of species i to the density of the species with the highest density for that site.

Selection matrices were created using the make_selfromenv function in the CommSimABC

package in R.

Calculation of Migration Probabilities from Distances

In our simulations, migration matrices were parameterized with local birth communities

as columns and destination communities as rows with cell values equal to the probability of

dispersing from the local birth community to the destination community. We assumed that

migration probability was inversely proportional to the distance between communities, and

calculated migration matrix cell values using the following equation:

( ) = , < 1 𝑝𝑝 𝑀𝑀 𝑦𝑦𝑖𝑖 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑑𝑑𝑗𝑗 𝐷𝐷𝑚𝑚𝑚𝑚𝑚𝑚 𝑗𝑗 𝑑𝑑 ∗ ∑ 𝑗𝑗 𝑑𝑑 Where yi is the probability of the new offspring migrating to community i, p(M) is the

total probability of migrating from the birth community, dj is the distance to the destination community j, and Dmax is the maximum distance the species can migrate. This formulation

distributes the total probability of an offspring dispersing from its birth community between the

communities within dispersal range using the user inputted parameters.

STATISTICAL ANALYSES

We validated the model using variation partitioning (Legendre et al. 2005, Peres-neto et al. 2006) and elements of metacommunity structure (Leibold and Mikkelson 2002, Presley et al.

2010). For variation partitioning, we partitioned the variation between environmental and spatial

175 predictors. For the environmental predictors we used the five environmental values used to create the spatial landscape. Spatial predictors were generated using Moran eigenvector maps

(MEM)(Dray et al. 2006). MEMs were calculated using a distance-based connectivity matrix with the connectivity distance threshold set as the minimum distance which kept all points connected. The weighting matrix was constructed using the concave-up function:

1 =

𝑓𝑓 ∝ 𝑑𝑑𝑚𝑚𝑚𝑚𝑚𝑚 Where, dmax is the maximum distance between two sites and = 0.05 (Bauman et al. 2018). The concave-up function was used as it matched our model assumption∝ of inversely proportional migration probabilities with distance. Only positive eigenvectors were generated, and MEM variables were tested in a global test for significance. If the global test was significant, a subset of MEM variables were selected using forward selection with a double stopping criterion

(Blanchet et al. 2008). If the global test was non-significant the first five MEM variables were used to match the number of environmental variables. MEM variables were generated using the

‘adespatial’ package in R v 3.6.2 (R Core Team 2020, Dray et al. 2021). Variation partition was then conducted on Hellinger transformed abundance counts (Legendre and Gallagher 2001) using the ‘vegan’ package in R (Oksanen et al. 2019). Linear mixed models to test for differences in the proportions of space and environment explained in different models were run using the ‘lme4’ package (Bates et al. 2015).

Elements of metacommunity structure were investigated using the coherence, turnover, and boundary clumping null model approach of Leibold and Mikkelson (2002) and Presley et al.

(2010). Each null model was run using a non-sequential algorithm that fixed site frequencies but made species frequencies equiprobable as suggested in Presley et al. (2010). Each incidence

176 matrix (taken from our simulation runs) was evaluated against 1000 null model simulations and classified into one of fourteen different metacommunity structure types (Presley et al. 2010).

Counts of structure types were then calculated from each simulation treatment and used in

Fisher’s exact tests. Null model analyses were run using the Metacom package in R (Dallas

2014).

MODEL SELECTION USING APPROXIMATE BAYESIAN COMPUTATION

Approximate Bayesian Computation is a method of estimating the posterior probability of a vector of model parameters for Bayesian inference where the posterior equals:

𝜃𝜃 ( | ) ( ) ( | ) = ( ) 𝑝𝑝 𝑥𝑥 𝜃𝜃 𝜋𝜋 𝜃𝜃 𝑝𝑝 𝜃𝜃 𝑥𝑥 𝑝𝑝 𝑥𝑥 Where x are observed data and ( ) are priors related to the model parameters (Beaumont et al.

2002, Beaumont 2010, Sisson et𝜋𝜋 al.𝜃𝜃 2019a). Frequently numerical evaluation of the likelihood function ( | ) ( ) is difficult or impossible due to high dimensionality (Beaumont 2010,

Sisson et𝑝𝑝 al.𝑥𝑥 2019a)𝜃𝜃 𝜋𝜋 𝜃𝜃. An alternative way of approximating the posterior is to use a rejection-based algorithm using simulations and summary statistics to lower the dimensionality. The basic algorithm (Beaumont et al. 2002) is as follows:

1. Calculate value s of a summary statistic S for the observed data x

2. Simulate from prior distribution ( )

3. Simulate x’𝜃𝜃′ using a model M using 𝜋𝜋 𝜃𝜃

4. Compute s’ as the value of S for x’ 𝜃𝜃′

5. If ||s’ – s|| a predefined tolerance rate , accept

6. Repeat steps≤ 2-5 until k acceptances have𝛿𝛿 been obtained𝜃𝜃′

177

The vector of the accepted then form an approximation to ( | ). Multiple adjustments and

advances on this basic algorithm𝜃𝜃′ have been put forward (reviewed𝑝𝑝 𝜃𝜃 𝑥𝑥 in Sisson et al. 2019a)

allowing the estimation of both discrete and continuous parameters as well as for use in model

selection.

Model selection using ABC methods if done incorrectly can provide improper inference,

especially when choosing appropriate summary statistics (Marin et al. 2019, Prangle 2019).

Recently, a new approach (Approximate Bayesian Computation with Random Forests ABC RF)

using machine learning and random forests has been developed that is robust to the size and

choice of summary statistics and is found to have lower prior error rates and high reliability

when compared to typical ABC model choice algorithms (Pudlo et al. 2016, Estoup et al. 2018,

Marin et al. 2019). ABC RF alters the typical ABC model choice by first creating a classifier using random forests on a training reference table of simulated summary statistics then calculating the posterior of the best fit model by using additional random forests (Pudlo et al.

2016, Marin et al. 2019). Due to its enhanced properties, we used ABC RF for all model choice analyses, but we also ran typical ABC model selection analyses for comparison.

Pseudo-communities

For each model we ran 5000 simulations on random metacommunities constructed of 30 species distributed across 20 sites with equal community sizes on a 100 x 100 grid landscape.

Each simulation consisted of 50 time-steps. For prior distributions and values see Table F-1. We

assigned selection coefficients by site rather than across the entire metacommunity to ensure that

sites did not contain a disproportionate amount of low or high selection coefficients. To assess

the variability due to prior distribution, we ran simulations for six different niche models (three

dispersal limited and three dispersal sufficient) using either a uniform, normal, or gamma

178

distribution for selection coefficients. All other parameters were assigned using uniform priors.

For these analyses we used the following summary statistics: mean and standard deviation of

Pielou’s evenness, mean and standard deviation of the number of sites species occupied, and

weighted alpha and beta diversity using Hill’s numbers with q = 1 (Jost 2006, 2007). For ABC

RF analyses we also included the discriminant scores from Linear Discriminant Analysis of

summary statistics (Pudlo et al. 2016, Estoup et al. 2018).

Barro Colorado Island Trees

The BCI dataset is composed of 50 one-hectare plots and contains 225 species (Condit et al. 2002). As it is likely that species are extirpated during the simulation runs, we parameterized our simulations to contain metacommunities with 50 sites and 235 species. As the BCI dataset is large, we ran 1000 simulations for each model. Priors used for BCI simulations are found in

Table F-2 and are only slightly modified from priors used for pseudo-community simulations.

For selection coefficients for both BCI and Wsr datasets we chose to use a gamma distribution prior as it contained the lowest prior-error rate and highest accuracy in pseudo-community analyses. Lastly, we allowed community size to vary as community sizes varied in the observed dataset. The set of summary statistics used for ABC analyses were the mean and standard deviation of Pielou’s evenness, the mean and standard deviation of the number of sites species occupied, and weighted alpha and beta diversity using Hill’s numbers with q = 1 (Jost 2006,

2007). For ABC RF analyses we also included the discriminant scores from Linear Discriminant

Analysis of summary statistics (Pudlo et al. 2016, Estoup et al. 2018).

179

Bolivia Freshwater Fish Communities

The Wsr dataset is composed of 162 species across 21 sites in the Río Madre de Díos

river basin in Bolivia (Yunoki and Velasco 2016). Simulations were run using the priors in Table

F-3. We parameterized our metacommunities to contain a total of 172 species and used the river distance matrix from Yunoki and Velasco (2016) as our spatial context for sites. Before running

ABC analyses, cell values in simulated metacommunities were divided by four than Hellinger transformed to match the catch per unit effort (CPUE) data from Yunoki and Velasco (2016).

The set of summary statistics used for ABC analyses the mean and standard deviation of Pielou’s evenness, the mean and standard deviation of the number of sites species occupied, and weighted alpha and beta diversity using Hill’s numbers with q = 1 (Jost 2006, 2007). For ABC RF

analyses we also included the discriminant scores from Linear Discriminant Analysis of

summary statistics (Pudlo et al. 2016, Estoup et al. 2018).

180

REFERENCES

Adler, P. B., J. HilleRislambers, and J. M. Levine. 2007. A niche for neutrality. Ecology Letters

10:95–104.

Bates, D., M. Mächler, B. M. Bolker, and S. C. Walker. 2015. Fitting linear mixed-effects

models using lme4. Journal of Statistical Software 67:1–48.

Bauman, D., T. Drouet, M. J. Fortin, and S. Dray. 2018. Optimizing the choice of a spatial

weighting matrix in eigenvector-based methods. Ecology 99:2159–2166.

Beaumont, M. A. 2010. Approximate Bayesian Computation in evolution and ecology. Annual

Review of Ecology, Evolution, and Systematics 41:379–406.

Beaumont, M. A., W. Zhang, and D. J. Balding. 2002. Approximate Bayesian Computation in

population genetics. Genetics 162:2025–2035.

Blanchet, F. G., P. Legendre, and D. Borcard. 2008. Forward selection of explanatory variables.

Ecology 89:2623–2632.

Chesson, P. 2000. Mechanisms of maintenance of species diversity. Annual Review of Ecology

and Systematics 31:343–66.

Chesson, P. 2018. Updates on mechanisms of maintenance of species diversity. Journal of

Ecology 106:1773–1794.

Condit, R., N. Pitman, E. G. J. Leigh, J. Chave, J. Terborgh, R. B. Foster, P. Núñez, S. Aguilar,

R. Valencia, G. Villa, H. C. Muller-Landau, E. Losos, and S. P. Hubbell. 2002. Beta-

diversity in tropical forest trees. Science 295:666–669.

Dallas, T. 2014. Metacom: An R package for the analysis of metacommunity structure.

181

Ecography 37:402–405.

Dray, S., D. Bauman, G. Blanchet, D. Borcard, S. Clappe, G. Guenard, T. Jombart, G. Larocque,

P. Legendre, N. Madi, and H. H. Wagner. 2021. adespatial: Multivariate Multiscale Spatial

Analysis.

Dray, S., P. Legendre, and P. R. Peres-Neto. 2006. Spatial modelling: a comprehensive

framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological

Modelling 196:483–493.

Estoup, A., L. Raynal, P. Verdu, and J. Marin. 2018. Model choice using Approximate Bayesian

Computation and Random Forests : analyses based on model grouping to make inferences

about the genetic history of Pygmy human populations. Journal de la Société Française de

Statistique 159:167–190.

Ewens, W. J. 2004. Mathematical Population Genetics: I. Theoretical Introduction. 2nd ed.

Springer-Verlag, New York.

Jost, L. 2006. Entropy and diversity. Oikos 113:363–375.

Jost, L. 2007. Partitioning diversity into independent alpha and beta components. Ecology

88:2427–2439.

Legendre, P., D. Borcard, and P. R. Peres-Neto. 2005. Analyzing beta diversity: partitioning the

spatial variation of community composition data. Ecological Monographs 75:435–450.

Legendre, P., and E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination

of species data. Oecologia 129:271–280.

Leibold, M. A., and J. M. Chase. 2018. Metacommunity Ecology. Princeton University Press,

182

Princeton.

Leibold, M. A., M. Holyoak, N. Mouquet, P. Amarasekare, J. M. Chase, M. F. Hoopes, R. D.

Holt, J. B. Shurin, R. Law, D. Tilman, M. Loreau, and A. Gonzalez. 2004. The

metacommunity concept: a framework for multi-scale community ecology. Ecology Letters

7:601–613.

Leibold, M. A., and G. M. Mikkelson. 2002. Coherence, species turnover, and boundary

clumping: elements of meta-community structure. Oikos 97:237–250.

Macarthur, R., and R. Levins. 1967. The limiting similarity, convergence, and divergence of

coexisting species. The American Naturalist 101:377–385.

Marin, J.-M., P. Pudlo, A. Estoup, and C. Robert. 2019. Likelihood-free model choice. Page in S.

A. Sisson, Y. Fan, and M. A. Beaumont, editors. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Moran, P. A. P. 1958. Random processes in genetics. Mathematical Proceedings of the

Cambridge Philosophical Society 54:60–71.

Peres-neto, P. R., P. Legendre, S. Dray, and D. Borcard. 2006. Variation partitioning of species

data matrices. Ecology 87:2614–2625.

Prangle, D. 2019. Summary statistics. Pages 125–152 in S. A. Sisson, Y. Fan, and M. A.

Beaumont, editors. Handbook of Approximate Bayesian Computation. CRC Press, Boca

Raton, Florida.

Presley, S. J., C. L. Higgins, and M. R. Willig. 2010. A comprehensive framework for the

evaluation of metacommunity structure. Oikos 119:908–917.

183

Pudlo, P., J.-M. Marin, A. Estoup, J.-M. Cornuet, M. Gautier, and C. P. Robert. 2016. Reliable

ABC model choice via random forests. Bioinformatics 32:859–866.

R Core Team. 2020. R: a language and environment for statistical computing. Vienna, Austria.

Sisson, S. A., Y. Fan, and M. A. Beaumont. 2019a. Overview of ABC. Pages 3–54 in S. A.

Sisson, Y. Fan, and M. A. Beaumont, editors. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Sisson, S. A., Y. Fan, and M. A. Beaumont, editors. 2019b. Handbook of Approximate Bayesian

Computation. CRC Press, Boca Raton, Florida.

Yunoki, T., and L. T. Velasco. 2016. Fish metacommunity dynamics in the patchy heterogeneous

habitats of varzea lakes, turbid river channels and transparent clear and black water bodies

in the Amazonian Lowlands of Bolivia. Environmental Biology of Fishes 99:391–408.

184

Table F-1 Prior distributions and parameters used in pseudo-community ABC analyses. Value 1 corresponds to the minimum for uniform distributions, the mean for normal distributions, and the shape parameter for gamma distributions. Value 2 corresponds to the maximum for uniform distributions, the standard deviation for normal distributions, and the scale parameter for gamma distributions. Model Parameter Distribution Value 1 Value 2 Neutral - Dispersal Limited Community Size Uniform 100 1000 Selection Coefficients Uniform 1 1 Frequency Uniform 0 0 Dependence Migration Distance Uniform 10 50 Migration Probability Uniform 0.05 0.2 Neutral – Non-Dispersal Community Size Uniform 100 1000 Limited Selection Coefficients Uniform 1 1 Frequency Uniform 0 0 Dependence Migration Distance Uniform 100 200 Migration Probability Uniform 0.05 0.2 Niche - Dispersal Limited Community Size Uniform 100 1000 Uniform Selection Coefficients Uniform 0 1 Frequency Uniform -1 0 Dependence Migration Distance Uniform 10 50 Migration Probability Uniform 0.05 0.2 Niche – Non-Dispersal Limited Community Size Uniform 100 1000 Uniform Selection Coefficients Uniform 0 1 Frequency Uniform -1 0 Dependence Migration Distance Uniform 100 200 Migration Probability Uniform 0.05 0.2 Niche - Dispersal Limited Community Size Uniform 100 1000 Normal Selection Coefficients Normal 0.5 0.25 Frequency Uniform -1 0 Dependence Migration Distance Uniform 10 50 Migration Probability Uniform 0.05 0.2 Niche – Non-Dispersal Limited Community Size Uniform 100 1000 Normal Selection Coefficients Normal 0.5 0.25 Frequency Uniform -1 0 Dependence Migration Distance Uniform 100 200 Migration Probability Uniform 0.05 0.2 Niche - Dispersal Limited Community Size Uniform 100 1000 Gamma Selection Coefficients Gamma 2 0.15 Frequency Uniform -1 0 Dependence

185

Migration Distance Uniform 10 50 Migration Probability Uniform 0.05 0.2 Niche – Non-Dispersal Limited Community Size Uniform 100 1000 Gamma Selection Coefficients Gamma 2 0.15 Frequency Uniform -1 0 Dependence Migration Distance Uniform 100 200 Migration Probability Uniform 0.05 0.2

186

Table F-2 Prior distributions and parameters used in BCI simulations. Migration distance parameters are in meters. Model Parameter Distribution Value 1 Value 2 Neutral - Dispersal Limited Community Size Uniform 400 600 Selection Coefficients Uniform 1 1 Frequency Dependence Uniform 0 0 Migration Distance Uniform 50 400 Migration Probability Uniform 0.01 0.25 Neutral – Dispersal Sufficient Community Size Uniform 400 600 Selection Coefficients Uniform 1 1 Frequency Dependence Uniform 0 0 Migration Distance Uniform 400 1000 Migration Probability Uniform 0.01 0.25 Niche - Dispersal Limited Community Size Uniform 400 600 Selection Coefficients Gamma 2 0.3 Frequency Dependence Uniform -2 0 Migration Distance Uniform 50 400 Migration Probability Uniform 0.01 0.25 Niche – Dispersal Sufficient Community Size Uniform 400 600 Selection Coefficients Gamma 2 0.3 Frequency Dependence Uniform -2 0 Migration Distance Uniform 400 1000 Migration Probability Uniform 0.01 0.25

187

Table F-3 Prior distributions and parameters used in the Wsr simulations. Model Parameter Distribution Value 1 Value 2 Neutral - Dispersal Limited Community Size Uniform 500 1500 Selection Coefficients Uniform 1 1 Frequency Dependence Uniform 0 0 Migration Distance Uniform 500 1000 Migration Probability Uniform 0.05 0.2 Neutral – Non-Dispersal Limited Community Size Uniform 500 1500 Selection Coefficients Uniform 1 1 Frequency Dependence Uniform 0 0 Migration Distance Uniform 1000 2000 Migration Probability Uniform 0.05 0.2 Niche - Dispersal Limited Community Size Uniform 500 1500 Selection Coefficients Gamma 2 0.3 Frequency Dependence Uniform -1 0 Migration Distance Uniform 500 1000 Migration Probability Uniform 0.05 0.2 Niche – Non-Dispersal Limited Community Size Uniform 500 1500 Selection Coefficients Gamma 2 0.3 Frequency Dependence Uniform -1 0 Migration Distance Uniform 1000 2000 Migration Probability Uniform 0.05 0.2

188

Figure F-1 Plots showing the different types of landscapes used in validation simulations.

189

Figure F-2 Plots showing a three species, three community example of the resource-utilization niches used in validation simulations. The blue contours in each panel represent a species niche for two different environmental variables, whereas red diamonds show the environmental variables measured in each of the communities. Species 1 and 2 are able exist in community A, however, since community A’s environmental values fall closer to the mean of species 1 then species 2, species 1 would have a higher selection coefficient than species 2. Likewise, species 2 and species 3 both coexist in communities B and C with species two having a higher selection coefficient for community C and species 3 having a higher selection coefficient in community B. Additionally, the plot shows that species 1 can only exist in community A (i.e. would have a selection coefficient of 0 in communities B and C), species 2 can exist in all communities, and species 3 can exist in communities B and C.

190

APPENDIX G

SUPPLEMENTARY FIGURES AND TABLES FOR CHAPTER FOUR

Table G-1 Factorial arrangement for each treatment for pseudo-communities. Model Type Dispersal Limitation Landscape A Neutral Limited Random B Neutral Limited Spatially autocorrelated C Neutral Not Limited Random D Neutral Not Limited Spatially autocorrelated E Niche Limited Random F Niche Limited Spatially autocorrelated G Niche Not Limited Random H Niche Not Limited Spatially autocorrelated

191

Table G-2 p-values for the effects of each mixed model analysis on variation partition types. F- values are given in parentheses. Effect E|S S|E E with S Model Type < 0.001 (2737.02) < 0.001 (682.33) < 0.001 (392.89) Dispersal Type < 0.001 (434.62) < 0.001 (3857.56) < 0.001 (278.40) Spatial Landscape < 0.001 (304.05) < 0.001 (443.43) < 0.001 (2660.88) Run Length 0.263 (1.34) < 0.001 (141.01) < 0.001 (317.21) Model Type * Run Length < 0.001 (7.88) < 0.001 (79.51) < 0.001 (7.47) Model Type * Dispersal Type < 0.001 (252.73) < 0.001 (80.84) < 0.001 (34.67) Model Type * Spatial Landscape < 0.001 (267.23) < 0.001 (36.58) < 0.001 (60.25) Model Type * Dispersal Type * < 0.001 (75.18) < 0.001 (184.66) < 0.001 (105.39) Spatial Landscape Model Type * Dispersal Type * < 0.001 (32.11) < 0.001 (90.67) < 0.001 (29.61) Run Length Model Type * Spatial Landscape 0.021 (2.89) < 0.001 (20.72) < 0.001 (142.64) * Run Length

192

Table G-3 Pairwise comparisons for Fisher’s Exact Tests between treatments. Numbers shown are Bonferroni corrected p-values. Bolded rows are rows comparing Neutral treatments with Species Sorting treatments. Letters correspond to those in Table G-1.

Comparison 10 25 50 A : B 1 1 1 A : C 0.0056 0.0056 0.0056 A : D 0.0056 0.0056 0.0056 A : E 0.0056 0.0112 1 A : F 0.0056 0.0056 0.0112 A : G 0.0056 0.0056 0.0056 A : H 0.0056 0.0056 0.0056 B : C 0.0056 0.0056 0.0056 B : D 0.0056 0.0056 0.0056 B : E 0.0056 0.0056 0.6220 B : F 0.0056 0.0056 0.0056 B : G 0.0056 0.0056 0.0056 B : H 0.0056 0.0056 0.0056 C : D 1 1 1 C : E 0.0056 0.0056 0.0056 C : F 0.0056 0.0224 0.2070 C : G 0.0056 0.0728 1 C : H 0.0056 0.5040 1 D : E 0.0056 0.0056 0.0056 D : F 0.0056 0.0168 0.0280 D : G 0.0056 0.0672 0.1010 D : H 0.0056 0.0784 1 E : F 0.1010 1 1 E : G 0.8010 0.1620 0.1790 E : H 0.1180 0.1570 0.0056 F : G 1 1 0.6380 F : H 1 1 0.0056 G : H 1 1 1

193

Figure G-1 Schematic showing the algorithm used to run simulations in a hypothetical metacommunity with three species inhabiting three communities. Communities are represented by big black circles and individuals of each species are represented by the smaller colored circles. Dotted circles represent communities or species chosen for birth, whereas an x-ed circle is the individual chosen to die. a) A community is randomly chosen in which a birth will occur. In this step, the probability of choosing a community is weighted according to community size as shown in the equations. b) Once a community is chosen an individual within that community is chosen to give birth. This is the step where selection occurs as shown in the equations (this example ignores stabilizing mechanisms; see Appendix S2). c) After birth, the offspring either disperses to a different community or stays within its local birth community. Arrows represent dispersal pathways with mij representing the probability of dispersing from community j to community i. In this instance, the solid arrows are showing the possible migration routes for the offspring born, whereas dotted arrows show migration routes that are present but not used in this birth-death sequence. d) After the offspring migrates (or stays within the local community) an individual from the community is randomly chosen to die according to the frequencies of each species within the community.

194

Figure G-2 Bar graphs showing the counts of different types of structure for the elements of metacommunity structure analyses. Run length are shown as rows and dispersal type is shown in columns.

195

Figure G-3 LDA projection of niche simulation runs for ABC RF on pseudo-communities along the first discriminant axes showing overlap in summary statistics between the prior distributions used.

196