Quick viewing(Text Mode)

Downloaded from Dryad (Doi:10.5061/Dryad.J4802), and the Subtrees Corresponding to the Four Fish Groups Investigated in Our Study Were Extracted

Downloaded from Dryad (Doi:10.5061/Dryad.J4802), and the Subtrees Corresponding to the Four Fish Groups Investigated in Our Study Were Extracted

PHYLOGENETIC ANALYSIS OF CHROMOSOME NUMBERS AND GENETIC

MARKERS

by

Shing Hei Zhan

B.Sc. (Hon.), The University of British Columbia, 2010

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Bioinformatics)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

August 2020

© Shing Hei Zhan, 2020

The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:

Evolutionary analysis of chromosome numbers and genetic markers

submitted by Shing Hei Zhan in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics

Examining Committee:

Sarah P. Otto, Professor, Department of Zoology, The University of British Columbia Supervisor

Wayne P. Maddison, Professor, Department of Zoology, The University of British Columbia Supervisory Committee Member

Darren Irwin, Professor, Department of Zoology, The University of British Columbia University Examiner

Alexandre Bouchard-Côté, Associate Professor, Department of Statistics, The University of British Columbia University Examiner

Additional Supervisory Committee Members:

Sean W. Graham, Professor, Department of Botany, The University of British Columbia Supervisory Committee Member

Itay Mayrose, Associate Professor, Department of Molecular and of , Tel Aviv University Supervisory Committee Member

ii Abstract

A captures the evolutionary relationships among sampled taxa – major taxonomic groups, , infraspecific taxa, or isolates. Phylogenetic analysis is a central component of evolutionary and ecological studies, as it lends a unifying framework to draw inferences about evolutionary and ecological processes that form biodiversity. Via phylogenetic comparative methods, trait data (for example, morphological or physiological data) and geographical data may be analyzed jointly with a given phylogeny to test specific hypotheses about the and ecology of focal groups of organisms. In this thesis dissertation, I present four studies demonstrating how phylogenetic analysis can yield new evolutionary and ecological insights. In the first two studies, I compare the evolutionary fates of polyploid versus diploid lineages in and to test whether polyploidization coincides with speciation events in land plants. Polyploid species arise from whole genome duplication and often exhibit morphological, physiological, and ecological differentiation from their diploid parents. Understanding their evolutionary patterns in the background of diploid species help us to understand why polyploidization is abundant in some organisms (plants) but not in others (). In the other two studies, I explore the biodiversity of freshwater in the wild and aquarium shops, using phylogenetic analyses to reveal potential introductions of these organisms via the global aquarium trade. Furthermore, I identify candidate genetic markers that may be more suitable than commonly used markers to facilitate future studies of phylogenetic community ecology of the red algae. Not only do these studies illustrate the utility of phylogenetic analyses to tackle diverse questions in evolution and ecology, but they also have forwarded the discussion on those four distinct topics.

iii Lay Summary

A phylogenetic tree represents the evolutionary relationships among organisms.

Phylogenetic trees allow biologists to put questions into a larger evolutionary context. In the four studies described herein, I utilized phylogenetic methods to reap insights into the evolution and ecology of different organisms – fishes, plants, and algae. In the first two studies, I compare the evolutionary fates of polyploids (with more than two sets of chromosomes) and their diploid relatives (with two sets of chromosomes) in fishes and plants. Polyploids arise from genome duplication and often possess characteristics distinct from their diploid parents. In the other two studies, I investigate the biodiversity of the red algae in the wild and in aquarium shops and develop a new molecular tool to explore their diversity better. The studies illustrate how phylogenetic analyses can help investigate diverse questions in evolution and ecology.

iv Preface

The studies presented in this thesis dissertation have either been published or submitted for publication. Below I indicate my contributions and the contributions of my co-authors to each of the studies.

A version of Chapter 2 has been published as “Zhan, S. H., Glick, L., Tsigenopoulos, C.

S., Otto, S. P., & Mayrose, I. 2014. Comparative analysis reveals that does not decelerate diversification in fish. J. Evol. Biol. 27: 391–403”. I assembled the chromosome number and phylogenetic data, and performed the BiSSE analyses. L. Glick performed the

ChromEvol analysis on the data set. C. S. Tsigenopoulos, S. P. Otto, I. Mayrose, and

I interpreted the results. S. P. Otto, I. Mayrose, and I jointly wrote the manuscript.

A version of Chapter 3 has been published as “Zhan, S. H., Drori, M., Goldberg, E. E.,

Otto, S. P., & Mayrose, I. 2016. Phylogenetic evidence for cladogenetic polyploidization in land plants. Am. J. Bot. 103: 1252–1258”. I performed the analysis. M. Drori collected the PloiDB data. S. P. Otto performed the mathematical analysis showing the uniformity of the prior distribution of the proportion of cladogenetic shifts. E. E. Goldberg, S.P. Otto, I. Mayrose, and I interpreted the results together. I wrote the manuscript with editing help from S. P. Otto and I.

Mayrose.

A version of Chapter 4 has been submitted as “Zhan, S. H., Hsieh, T. Y., Yeh, L. W.,

Kuo, T. C., Suda, S., & S. L. Liu. Hidden introductions of freshwater red algae via the aquarium trade exposed by DNA barcodes.” Also, the version is available on bioRxiv

(https://www.biorxiv.org/content/10.1101/2020.06.30.180042v1). At the time that this document was finalized, some of the supplemental materials were being deposited on Dryad Repository

(DOI: doi:10.5061/dryad.3n5tb2rf8). S. L. Liu and I conceived the project. S. L. Liu and T. Y.

v Hsieh designed the study. S. L. Liu, T. Y. Hsieh, S. Suda, and I performed specimen collection and laboratory experiment. S. L. Liu, L. W., Yeh, and T. C. Kuo analyzed the data. S. L. Liu and

I jointly interpreted the results and wrote the manuscript.

A version of Chapter 5 has been published as “Zhan, S. H., Shih, C. C., & Liu, S. L.

2020. Reappraising markers of the red algae for phylogenetic community ecology in the genomic era. Ecol. Evol. 10: 1299–1310.” S. L. Liu and I conceived the project. I developed and implemented the methodology and analyzed the data. C. C. Shih and S. L. Liu conducted the experiments. S. L. Liu and I interpreted the results and wrote the manuscript together.

vi Table of Contents

Abstract ...... iii

Lay Summary ...... iv

Preface ...... v

Table of Contents ...... vii

List of Tables ...... xii

List of Figures ...... xiii

Acknowledgements ...... xiv

Dedication...... xvi

Introduction...... 1

Scientific background ...... 1

Evolutionary consequences of whole genome duplication ...... 2

Invasion biology of freshwater red algae ...... 7

Community ecology of the red algae ...... 9

Remarks ...... 11

Comparative analysis reveals that polyploidy does not decelerate diversification in fish ...... 13

Overview ...... 13

Introduction ...... 14

Materials and Methods ...... 17

2.3.1 Sequence datasets and ploidy level assignments ...... 17

2.3.2 Phylogenetic reconstruction ...... 20

vii 2.3.3 Inference of polyploidy ...... 21

2.3.4 BiSSE diversification analysis ...... 24

2.3.5 BiSSE analysis on time-calibrated phylogenies ...... 29

Results ...... 29

2.4.1 Phylogenetic distribution of polyploidy...... 29

2.4.2 Comparing the diversification rates of diploids and polyploids ...... 32

Discussion ...... 37

Phylogenetic evidence for cladogenetic polyploidization in land plants...... 43

Overview ...... 43

Introduction ...... 43

Materials and Methods ...... 47

3.3.1 Database construction ...... 47

3.3.2 Models of polyploid evolution ...... 49

3.3.3 Bayesian analysis ...... 51

3.3.4 Maximum likelihood analysis ...... 52

3.3.5 Implementation ...... 53

Results ...... 53

Discussion ...... 55

Hidden introductions of freshwater red algae via the aquarium trade exposed by DNA barcodes ...... 59

Overview ...... 59

Introduction ...... 60

Materials and Methods ...... 64

viii 4.3.1 Sample collection ...... 64

4.3.2 DNA extraction, PCR, and Sanger sequencing ...... 65

4.3.3 Sequence acquisition and curation ...... 66

4.3.4 Phylogenetic tree reconstruction ...... 67

4.3.5 Species delimitation ...... 68

4.3.6 Rarefaction Analysis ...... 69

4.3.7 Identification of introduced species ...... 69

4.3.8 Trade data ...... 70

Results ...... 70

Discussion ...... 76

Reappraising plastid markers of the red algae for phylogenetic community ecology in the genomic era ...... 83

Overview ...... 83

Introduction ...... 83

Materials and Methods ...... 89

5.3.1 Sequence data collection and processing ...... 89

5.3.2 Partitioning analysis...... 90

5.3.3 Phylogenetic tree comparisons ...... 91

5.3.4 Estimation of degrees of sequence variation and rates of molecular evolution ...... 92

5.3.5 PCR experiments and Sanger sequencing ...... 92

Results and Discussion ...... 93

Conclusions ...... 104

Conclusion ...... 105

ix Evolutionary consequences of polyploidy revisited ...... 105

6.1.1 Polyploidization and evolutionary success in fish...... 105

6.1.2 Cladogenetic polyploidization in plants...... 109

6.1.3 Do processes of chromosomal evolution other than polyploidization contribute to

lineage success in plants? ...... 110

Community and invasion ecology of the red algae ...... 111

6.2.1 Human-mediated introductions of freshwater red algae ...... 112

6.2.2 New molecular tools to study the community ecology of the red algae ...... 113

Final remarks ...... 114

Bibliography ...... 115

Appendices ...... 147

Appendix A Comparative analysis suggests that polyploidy does not decelerate diversification

in fish ...... 147

A.1 Data Availability ...... 147

A.2 Supporting Methods...... 147

A.3 Supporting Figures ...... 149

A.4 Supporting Tables ...... 152

Appendix B Phylogenetic evidence for cladogenetic polyploidization in land plants ...... 174

B.1 Data Availability ...... 174

B.2 Supporting Tables ...... 175

Appendix C Hidden introductions of freshwater red algae via the aquarium trade exposed by

DNA barcodes...... 190

C.1 Data Availability ...... 190

x C.2 Supporting Figures ...... 190

C.3 Supporting Tables ...... 195

Appendix D Reappraising plastid markers of the red algae for phylogenetic community ecology in the genomic era ...... 196

D.1 Data Availability ...... 196

D.2 Supporting Figures ...... 197

D.3 Supporting Tables ...... 200

xi List of Tables

Table 1: Fish groups examined in the current study...... 27

Table 2: Best-fitting BiSSE model (with the constraint qPD = 0) according to AIC ...... 34

Table 3: MCMC-based estimates of evolutionary rates using the BiSSE model (with the

constraint qPD = 0) ...... 35

Table 4: Maximum pairwise geographical distance, haplotype diversity, and nucleotide diversity of five molecular operational taxonomic units found in the aquarium shops surveyed in Taiwan in this study...... 74

xii List of Figures

Figure 1: Distribution of 95% highest posterior density intervals of the relative proportion of cladogenetic transitions in the PloiDB and M2011 data sets ...... 54

Figure 2: Examples of epiphytic and epizoic freshwater red macroalgae as aquatic hitchhikers. 62

Figure 3: Sites in Taiwan, Okinawa (Japan), Hong Kong, Thailand, and the Philippines sampled in this study...... 65

Figure 4: Number of freshwater red algal mOTUs and proportion of chantransia (versus gametophyte) found in the field and aquarium samples from Taiwan ...... 72

Figure 5: Haplotype network of Montagnia macrospora, a potential introduced species identified in this study ...... 75

Figure 6: Schematic illustrating how phylogenetic misplacement of a taxon may inflate the phylogenetic diversity of an ecological community or deflate it ...... 86

Figure 7: Phylogenies based on the AA alignment concatenated from the core plastid genes and rbcL ...... 95

Figure 8: Phylogenies based on the AA alignment concatenated from the core plastid genes and rpoC1...... 99

Figure 9: Negative correlation between the normalized Robinson-Foulds distance to a target tree and p-distance across the plastid genes ...... 98

Figure 10: PCR primers designed for rpoC1 and their amplification efficacy across major taxonomic groups ...... 102

xiii Acknowledgements

My PhD journey was rather unconventional: two leaves of absence and multiple transitions into different research areas. During the first leave, I gladly took on an entrepreneurial venture, helping to build a company that develops diagnostic assays for infectious diseases.

During the second leave, I learned how impermanent life is – getting diagnosed with acute leukemia and luckily surviving from it. Throughout this journey, I received constant support from my mentors, collaborators, family, and friends. To them, I am deeply grateful.

First, I thank Prof. Itay Mayrose (Tel Aviv Univ.), Prof. Michael Barker (Univ. of

Arizona), Prof. Nolan Kane (Univ. of Colorado at Boulder), and Prof. Rose Andrew (Univ. of

New England, Australia) for their mentorship when I was an undergraduate. They inspired me to pursue scientific research, and the lessons I learned from them have guided me this whole time.

My studies began at the Michael Smith’s Genome Sciences Centre with Prof. Steven

Jones, to whom I am thankful for providing me a research home and mentorship early on. Also, I thank Prof. Inanc Birol and Prof. Sohrab Shah for being my first committee members and members of the Centre (in particular, Dr. Anthony Fejes, Dr. Daryanaz Dargahi, Louise Clark, and Sharon Ruschkowski) for helping me become a better student.

I thank Prof. Sarah Otto, my doctoral advisor, for being an inspiring and caring mentor and for guiding me to become a better scientist. I thank my supportive committee members, Prof.

Sean Graham, Prof. Wayne Maddison, and Prof. Itay Mayrose for many insightful discussions.

Also, I thank the members of the Otto lab and Biodiversity Research Centre for their constructive comments and Bryn Wiley and Dr. Elizabeth Kleynhans for helping me get this document into shape. The works presented herein were supported by the Bioinformatics Training Program, the

UBC Four Year Fellowship, CIHR (Doctoral Research Award), and NSERC.

xiv Along this journey, I have met and collaborated with many creative individuals who have expanded my intellectual horizons: Prof. Andre Mattman and Prof. Mollie Carruthers (UBC);

Prof. Tamara Munzner and Zipeng Liu (UBC); Prof. Leon French (Univ. of Toronto); Prof.

Samira Mubareka (Univ. of Toronto); Dr. Yohannes Berhane (National Centre for Foreign

Animal Disease, Canada); Prof. Benjamin Deverman, Dr. Alina Chan, and Albert Chen (Broad

Institute); Dr. Emma Goldberg (Los Alamos National Laboratory); Prof. Andres Munoz (Univ. of Melbourne). These collaborations have led to manuscripts (published or in preparation) on diverse topics ranging from laboratory medicine to machine learning, which are beyond the scope of this dissertation.

I thank the warmhearted members of the Leukemia/Bone Marrow Transplant unit of the

Vancouver General Hospital who have saved my life: Dr. Abou Mourad (my oncologist), fellows and other attending physicians, nurses, and support staff. Also, I thank Michael Hingley (my hospital roommate whose humor illumed some dark hours), blood donors, and the kind anonymous donor who gave me blood stem cells so that I have the best shot at long-term survival. Had it not been for them, I would not have the breath to finish this work.

Most of all, I thank my mom, dad, sister, and auntie for their unconditional love all these years. I thank my friends who still laugh with me even after seeing me at my worst: Alex Liang,

Prof. Allen Liu (Trisha and Kai), Dr. Anthony Baniaga, Calvin Lefebvre, Dennis PretteJohn, Dr.

Jean Gelinas, Mark Lee, Dr. Niels Hanson, Tyler Funnell, Dr. Vik Chopra, and Zheng Li. I thank my coworkers at Fusion Genomics Corp. – especially, Dr. Mohammad Qadir, Brian Kwok, Greg

Stazyk, Akmal Jameel, and Dr. Sepideh Alamouti – for an incredible adventure and for being there for me during hard times. There are collaborators and friends who have not been named due to space constraints, but their support I have not forgotten and to them I am ever thankful.

xv Dedication

To my grandparents.

xvi Introduction

Scientific background

Phylogenetic analysis is an important part of evolutionary and ecological studies. It provides a unifying framework for drawing inferences about myriad evolutionary and ecological phenomena. Phylogenetic analysis involves the collection of appropriate data (e.g., discretely coded characters or molecular sequences – nucleotides or amino acids), followed by inference of a phylogenetic tree using parsimony-, distance-, or likelihood-based methods (Felsenstein, 2004).

A phylogenetic tree represents the evolutionary relatedness of sampled taxa – major taxonomic groups (e.g., a family or an order of organisms), species, infraspecific taxa (e.g., ), or isolates. The tree may or may not be scaled to time using records or sample collection dates. Additionally, using phylogenetic comparative methods, trait data (e.g., morphological or physiological) and geographical data may be analyzed jointly with a given phylogeny to test specific hypotheses about the evolution and ecology of focal groups of organisms (Harmon,

2018).

In this dissertation, I present four studies demonstrating how phylogenetic analysis can lead to evolutionary and ecological insights. Specifically, I employed phylogenetic methodologies to investigate the evolutionary fate of polyploid versus diploid lineages in fishes

(Chapter 2), to test whether polyploidization coincides with speciation events in land plants

(Chapter 3), to identify instances of introduction of freshwater red algae via the global aquarium trade (Chapter 4), and to find alternative genetic markers suitable for phylogenetic community analysis in the red algae (Chapter 5). Below, I provide some background for these Chapters.

1 Evolutionary consequences of whole genome duplication

Polyploidy, or the doubling of an entire genome, is a widespread phenomenon in – occurring in fungi (most famously, the budding yeast; Albertin & Marullo, 2012), (mainly fishes and amphibians; Mable et al., 2011), and plants (most prominently, the flowering plants; Stebbins, 1971). Polyploid species are frequently reported in plants, but rarely in animals (Otto & Whitton, 2000). About 35 to 40% of extant flowering species are estimated to be of recent polyploid origin (Wood et al., 2009; Scarpino et al., 2014). Going further back in time, most extant species of plants – and also vertebrates – have descended from some ancient polyploid ancestor (Taylor et al., 2003; Dehal & Boore, 2005; Van de Peer et al.,

2009; Jiao et al., 2011; One Thousand Plants Transcriptomes Initiative, 2019). In terms of morphological, physiological, and life history features, polyploid species often differ from their diploid progenitors (Levin, 1983; Ramsey & Schemske, 2002). These differences may contribute to the establishment – and even success – of newly formed polyploid species in novel ecological settings. Given its ubiquity and potential impact on phenotypic and ecological differentiation, polyploidy is thought to play an important role in the evolution of eukaryotes, particularly the flowering plants (Soltis et al., 2009; Fawcett & Van de Peer, 2010; Van de Peer et al., 2017).

A long-standing debate concerning polyploidy is whether it influences a lineage's evolutionary success. Historically, researchers have focused on plants because of the rich documentation of polyploidy in this group. Polyploids were traditionally regarded as evolutionary ‘dead-ends’ because of the hypothesized deleterious effects associated with ploidy level increase, such as gene dosage imbalance of the sex chromosomes (Orr, 1990), reduced fertility in heteroploid hybrids (Ramsey & Schemske, 2002), and inefficiency of selection when genes are masked by multiple copies (Haldane, 1933; Fisher, 1935; Wright, 1969). It was further

2 argued that if polyploids were more successful than their diploid relatives, polyploidy should have replaced diploidy as the predominant genetic system in extant eukaryotes (Stebbins, 1971).

Supporting these views, a statistical analysis showed that the high prevalence of polyploidy in plants can be explained by frequent polyploid formation and slow reversal to diploidy rather than elevated lineage diversification following polyploidy (Meyers & Levin, 2006). Recent comparative analyses of plant genomes, however, revealed signatures of ancient polyploidization events (i.e., paleopolyploidy) that occurred multiple times during evolution (Van de Peer et al., 2009; Jiao et al., 2011), indicating that nearly all extant flowering plants have experienced at least one round of polyploidy in their evolutionary past. This suggests that rather than being evolutionary ‘dead ends’, polyploids can indeed persist and even blossom into diverse and successful clades. Greater genetic degrees of freedom, increased heterosis, different niche tolerances and altered colonizing abilities, as well as molecular mechanisms such as functional divergence of duplicated genes and buffering of crucial functions, are some of the hypotheses proposed to explain the success of polyploid lineages (Werth & Windham, 1991; Soltis & Soltis,

2000; Taylor et al., 2001; Comai, 2005; Otto, 2007; Semon & Wolfe, 2007).

Although genomic evidence has rekindled the ‘polyploid-success’ view, large-scale phylogenetic investigations have suggested otherwise, at least for relatively short evolutionary time scales. Using a comprehensive phylogenetic and cytological data set of vascular plants,

Wood et al. (2009) reported that polyploidy accompanied 15% and 31% of speciation events in flowering plants and in , respectively. However, they discovered no significant association between polyploid incidence and elevated diversification in plants. Using likelihood-based methodologies, we further found that recently formed polyploids (termed “neo-polyploids”) in plants experience lower diversification rates compared to their diploid congeners, as a

3 consequence of both lower speciation and higher extinction rates (Mayrose et al., 2011). While evidence suggests that in plants, neo-polyploids generally diversify less rapidly, this hypothesis has not been rigorously assessed in animals.

Among animals, fish exhibit the most appreciable degree of polyploid incidence (Leggatt

& Iwama, 2003; Mable et al., 2011) – particularly sturgeons, cyprinid fishes (including goldfish), and salmonids. Genomic analyses established that the exceptionally species-rich ray-finned fishes descended from a polyploid ancestor, highlighting the potentially profound impact of polyploidy on fish evolution (Taylor et al., 2003). A couple of phylogenetic studies found that the ancient polyploidization event coincides with the phylogenetic branch leading to the radiation of the teleost fishes (Hoegg et al., 2004) and that there is a significant rise in diversification rate around the timing of that event (Santini et al., 2009). These studies, however, focused on a single event, whose association with increased diversification may be coincidental (Maddison &

FitzJohn, 2015). To further our understanding of the evolutionary impact of polyploidy on fishes, this hypothesis needs to be revisited with the consideration of additional polyploidy events, which is the subject of Chapter 2.

Polyploidy is thought to be a major mechanism of speciation in plants (Coyne & Orr,

2004). This view has been inspired by the high frequency of polyploidization and the common observation of reproductive incompatibilities between polyploids and related diploids (“triploid block”; Ramsey & Schemske, 1998). Previous phylogenetic estimates of the rate of polyploidy have not, however, assessed whether polyploidization is coupled in time with speciation itself.

Rather, prior work has focused on methods that estimate the rate of polyploidization per unit time (i.e., “anagenesis”) or on methods that do not distinguish when ploidy shifts occur

(Stebbins, 1938; Grant, 1981; Masterson, 1994; Wood et al., 2009; Mayrose et al., 2011;

4 Scarpino et al., 2014). It is possible that anagenetic transitions in ploidy occur either without full reproductive isolation ever evolving (i.e., without speciation) and/or by simple displacement of diploids by polyploid descendants.

Indeed, it is known that polyploidy does not always lead to immediate reproductive isolation. For example, Slotte et al. (2008) found that polyploidy does not terminate gene flow between the diploid parent and its polyploid progeny in the Capsella. Furthermore, extensive intra-specific variation in ploidy levels (Stebbins, 1971; Wood et al., 2009; Rice et al.,

2015) and evidence of multiple origins in many polyploid lineages (Soltis & Soltis, 1999) suggest that multiple cytotypes often segregate within species. Gene flow between diploids and polyploids remains possible via a number of mechanisms (Ramsey & Schemske, 1998, 2002), including the occasional production of viable seeds from triploid intermediates (“triploid- bridge”), crosses involving unreduced gametes produced by diploids, or genome reduction yielding offspring bearing half the genome size of their polyploid parents (“polyhaploids”).

Evidence for gene flow between diploids and polyploids has been found in the genomes of several plants, particularly between crops and their wild relatives (Chapman & Abbott, 2010).

These observations demonstrate that the speciation of polyploid lineages may be a dynamic – rather than instantaneous – process, which generates and maintains genetic variation within species for some time (Thompson & Lumaret, 1992). Prior to this thesis (Chapter 3), there has been no systematic investigation on whether polyploidization is coupled in time with speciation events (i.e., cladogenesis) or occurs gradually over time (i.e., anagenesis).

Until recently, phylogenetic methods and resources have not been available to examine the effects of chromosomal evolution on species diversification at a taxonomically broad scale.

There exists a vast, but scattered, collection of chromosome number data for plants in the

5 literature. Access to all these data has been difficult until the Chromosome Count Database

(CCDB; Rice et al, 2015) effort to centralize all chromosome number data. An equivalent database is not yet available for animals; the closest alternatives are the Genome Size

Database (Gregory, 2016) and FishBase (Froese & Pauly, 2015). Another database initiative by the CCDB group, named PloiDB, is combining the chromosome number data with phylogenetic data to enable large-scale comparative analyses of polyploids in plants. Central to the PloiDB effort is a statistical method, called ChromEvol (Mayrose et al., 2010; Glick & Mayrose, 2014), which extracts information about polyploidy using chromosome number and phylogenetic data.

ChromEvol co-estimates the rate of polyploidization and the rates of chromosome gain and loss by modelling transitions in chromosome number as a continuous-time Markov process along the branches in a phylogenetic tree. By computing the expected number of diploid-to-polyploid transitions along each root-to-tip path, ChromEvol can be utilized to infer the ploidy level

(polyploid versus diploid) of extant taxa.

To explore the effects of polyploidy on diversification, models of trait evolution and statistical methods have been applied (see, e.g., Mayrose et al., 2011; Goldberg et al., 2010;

Sabath et al., 2016). A widely used approach to study trait evolution involves the BiSSE (Binary

State Speciation and Extinction) family of models, which includes the original BiSSE (Maddison et al., 2007) and its various extensions such as ClaSSE (Cladogenetic State change Speciation and Extinction Model; Goldberg & Igic, 2012). Given a phylogeny with complete data for a two- state trait at the tips, the original BiSSE model describes the rates of speciation and extinction for different trait states, as well as transitions in the trait along branches. As originally formulated, the trait evolves over time from one state to the other state, assuming that only anagenetic changes (i.e., along branches) are possible. A couple of equivalent extensions of BiSSE allow for

6 cladogenetic changes (i.e., at internal nodes), modeled either as the probability that speciation generates daughter species whose traits differ from the parent (as in BiSSE-ness; Magnuson-Ford

& Otto, 2012) or the rate at which speciation with trait change occurs (as in ClaSSE). BiSSE and its extensions can be used in Bayesian or maximum likelihood analyses to assess the parameter combinations that best account for both the present-day trait distribution and the shape of the phylogeny, thus providing a framework within which character-dependent macro-evolutionary hypotheses may be statistically tested (FitzJohn, 2012). Recently, criticisms about BiSSE and related methods have been raised (Maddison & FitzJohn, 2015; Rabosky & Goldberg, 2015). A major concern is that a neutral character can be falsely associated with differences in net diversification in a given phylogeny. False correlations are exacerbated when there are no replications (e.g., when there is a single character transition). This issue may be partially mitigated when a clade contains multiple trait transitions or when evidence is gathered across a multitude of clades (Maddison & FitzJohn, 2015; Rabosky & Goldberg, 2015).

In Chapters 2 and 3, I investigate long-standing hypotheses regarding the evolutionary consequences of polyploidy in fishes and in plants, respectively. I have utilized publicly available chromosome number data and phylogenetic data assembled either by me and/or by my collaborators (the authors of CCDB and PloiDB), and then applied statistical methods and evolutionary models (ChromEvol and BiSSE/CLaSSE) to assess support for various hypotheses.

Invasion biology of freshwater red algae

Phylogenetic trees provide a framework for understanding the evolution of traits such as ploidy level, as discussed above. Trees can also be applied to reap insights into ecological phenomena such as biological invasions and formation of biological communities.

7 Invasive species can have negative impacts on their non-native environment, for example, by harming the local biodiversity of the introduced . Once widespread, invasive species are often difficult to control and eliminate. Therefore, early efforts are crucial to detect introduced species and to prevent their expansion in the non-native habitat before they cause significant damage. To support such efforts, important knowledge gaps about the taxonomic diversity and transportation modes of potential introduced species need to be filled.

Human activities can introduce and disperse invasive species. For example, improper practices of aquarium and fish tank owners (e.g., disposing of tank water into nearby streams) can unintentionally spread hitchhiking aquatic organisms (e.g., Patoka et al., 2016; Duggan and

Pullan, 2017; Duggan et al., 2018). Other human activities that spread aquatic alien species include ballast water discharge (e.g., Lin & Blum, 1977; Manny et al., 1991) and accidental release from laboratories (Hawes et al., 1991). Unlike ornamental animals and plants, the diversity and introduction potential of aquatic hitchhikers are not well recognized because species are often cryptic, that is, difficult to identify and differentiate based on morphology alone

(Stoyneva et al., 2006; Kato et al., 2009). One kind of aquatic hitchhikers that are occasionally found in aquarium tanks is freshwater algae (e.g., Kaufmann, 2010). Freshwater algae are rarely observed near human presumably due to their vulnerability to water pollution from urbanization and industrialization (Sheath & Hambrook, 1990; Sheath & Vis, 2015).

Conventional biodiversity monitoring programs rely upon morphological examination to identify species (Riedel et al., 2013). However, hitchhiking freshwater red algal species are often morphologically indistinguishable and therefore require alternative approaches to identify. DNA barcoding has been applied to detect introduced and invasive organisms (Armstrong & Ball,

2005; Pečnikar & Buzan, 2014). Typically, the population of a newly introduced species contains

8 lower genetic diversity than its source population(s) as a result of a recent genetic bottleneck

(Nei et al., 1975; Barrett & Husband, 1990). Given this prediction, using DNA barcode sequence data, one can identify candidate introduced taxa by estimating the genetic diversity in their putative non-native range and by comparing it to the genetic diversity of its putative source population(s) (e.g., Bonett et al., 2007; Kinziger et al., 2011).

In Chapter 4, we employed DNA barcoding to survey the biodiversity of freshwater red algae from the field and aquarium shops in East Asia, in particular Taiwan – a key hub in the global aquarium trade (Padilla & William, 2004). Using phylogenetic and species delimitation methods, we estimated the number of operational taxonomic units of freshwater red algae in

Taiwan. This study not only painted the first detailed landscape of the biodiversity of freshwater red algae in the field and aquaria in Taiwan, but also allowed us to identify algal taxa putatively introduced via the global aquarium trade.

Community ecology of the red algae

A central goal of community ecology is to understand the processes and factors that shape biodiversity and drive community assembly in various habitats. Community ecology has been enriched by the integration of phylogenetic analysis to place biodiversity information in a phylogenetic context. This type of information can be acquired via environmental DNA (eDNA) metabarcoding, which enables the identification of all taxa in an environmental sample simultaneously, thereby obviating the need to sequence isolates (i.e., DNA barcoding of individual specimens). eDNA metabarcoding commonly utilizes a single genetic marker or a few markers (e.g., cox1, rbcL, and 16S rRNA). Most eDNA metabarcoding studies employ

9 established markers for practical reasons. One reason is the availability of a large reference database (e.g., Barcode of Life Data System; Ratnasingham & Hebert, 2007).

In phylogenetic community ecology, two key quantities are relatedness among species within a community (i.e., phylogenetic alpha diversity) and relatedness among species between communities (i.e., phylogenetic beta diversity) (Faith, 1992; Webb, 2000; reviewed in Webb et al., 2002). Phylogenetic diversity indices measure the amount and evenness of the species diversity of biological communities (e.g., Kembel et al., 2010; Daru et al., 2017). Estimates of phylogenetic diversity can depend on the choice of genetic marker. For example, some markers may yield a poor phylogenetic signal, thus leading to under- or overestimation of the phylogenetic alpha and beta diversity of biological communities.

Our understanding of biodiversity and community ecology have benefitted from the use of classic organellar marker genes, such as plastid genes (e.g., Heise et al., 2015; Porter et al.,

2016). But, for poorly studied eukaryotes (such as algae), it is unclear whether there are better markers to explore their diversity. In the red algae, commonly used plastid markers poorly approximate the red algal Tree of Life when analyzed individually (e.g., Verbruggen et al.,

2010). Hence, multi-locus and whole plastid genome approaches have been taken in order to test phylogenetic hypotheses about deep nodes in the red algal Tree of Life (e.g., Nelson et al., 2015;

Boo et al., 2016; Lam et al., 2016; Díaz-Tapia et al., 2017). In phylogenetic community ecology studies, however, a single appropriate genetic locus can still yield useful information if it can simultaneously approximate the red algal phylogeny and produce accurate estimates of alpha and beta diversity.

There has been a recent influx of studies describing the plastid genomes of the red algae.

Phylogenetic analyses of these plastid genomes have produced robust species trees of the red

10 algae (e.g., Janouškovec et al., 2013; Costa et al., 2016; Díaz-Tapia et al., 2017). This growing genome database presents an unprecedented opportunity to discover novel candidate phylogenetic markers and to build novel resources to support their usage for biodiversity surveys and community ecology.

In Chapter 5, we devised a simple ranking strategy that involves comparing the topologies of individual plastid gene trees (reconstructed from the published genomes) to the topology of a single target phylogeny—the species tree inferred using all the plastid genes in the published genomes. Using this approach, we assessed widely used markers (e.g., psaA, psaB, psbA, and rbcL), as well as less known markers, to identify candidate markers that may be more suitable for future studies of biodiversity and community ecology.

Remarks

This thesis dissertation explores four distinct evolutionary and ecological questions using phylogenetic methods. By framing the questions in the context of phylogenetic trees (which are hypotheses of evolutionary history), I illustrate how phylogenetic analysis can yield new insights otherwise not obtainable using trait data or geographical data alone. These example studies reinforce the utility of phylogenetic analyses as an insightful tool to address a broad array of evolutionary and ecological questions.

In Chapters 2 and 3, I investigate the evolutionary consequences of polyploidy in fishes and in plants. Before the advent of phylogenetic methods, researchers collected ploidy level data

(e.g., as estimated from chromosome numbers) and conducted simple correlation analysis, such as relating the frequency of polyploid species within genera with the species richness of genera in plants. By employing phylogenetic models, such as BiSSE and CLaSSE, I test whether

11 polyploidization leads to evolutionary success and whether polyploidization contributes to speciation given the same ploidy level data and phylogenetic trees. In Chapter 4, I utilize phylogenetic methods to explore the hidden biodiversity of freshwater red algae in the field and in aquarium shops in East Asia. Finally, in Chapter 5, I develop a phylogenomic methodology to find promising alternative genetic markers to facilitate community ecology studies such as that described in Chapter 4.

The debate about the evolutionary importance of polyploidy in eukaryotes is far from over. Chapters 2 and 3 have contributed to this discussion, but our understanding of phylogenetic comparative methods has advanced significantly since the publication of the two chapters. In the

Conclusions, I discuss open questions as well as future studies to revisit the questions investigated in Chapters 2 and 3, as new data become available and phylogenetic methods become more sophisticated. For example, do we still see a positive correlation between polyploidization and net diversification in the cyprinid fishes, as seen in Chapter 2, after accounting for a hidden neutral character? Moreover, I discuss studies being conducted as a follow-up to Chapters 4 and 5. In the long run, the efforts described in Chapters 4 and 5 may enhance our biodiversity monitoring methodologies and thereby contribute to the broader objective to understand the ecology (community assembly and introduction/invasion potential) of the red algae.

12 Comparative analysis reveals that polyploidy does not decelerate diversification in fish

Overview

While the proliferation of the species-rich teleost fishes has been ascribed to an ancient genome duplication event at the base of this group, the broader impact of polyploidy on fish evolution and diversification remains poorly understood. Here, we investigate the association between polyploidy and diversification in several fish lineages: the sturgeons (Acipenseridae:

Acipenseriformes), the botiid loaches (Botiidae: ), fishes (Cyprinidae:

Cypriniformes) and the salmonids (Salmonidae: Salmoniformes). Using likelihood-based evolutionary methodologies, we co-estimate speciation and extinction rates associated with polyploid vs. diploid fish lineages. Family-level analysis of Acipenseridae and Botiidae revealed no significant difference in diversification rates between polyploid and diploid relatives, while analysis of the subfamily Cyprininae revealed higher polyploid diversification. Additionally, order-level analysis of the polyploid Salmoniformes and its diploid sister clade, the Esociformes, did not support a significantly different net diversification rate between the two groups. Taken together, our results suggest that polyploidy is generally not associated with decreased diversification in fishes – a pattern that stands in contrast to that previously observed in plants.

While there are notable differences in the time frame examined in the two studies, our results suggest that polyploidy is associated with different diversification patterns in these two major branches of the tree of life.

13 Introduction

From vertebrates to fungi, polyploidy (or whole genome duplication) is widely recognized as a key feature of eukaryotic genomes (Taylor et al., 2003; Jaillon et al., 2004; Kellis et al., 2004;

Dehal & Boore, 2005). Polyploidy reaches its zenith in plants, with all seed plants thought to have experienced one or more genome duplications in their evolutionary past (Bowers et al., 2003; Cui et al., 2007; Soltis et al., 2009; Van de Peer et al., 2009; Jiao et al., 2011). While polyploidy is widespread in plants, it is more sparsely documented in animals (instances summarized in Otto &

Whitton, 2000; Mable et al., 2011). Nevertheless, genome-wide analyses have revealed several ancient genome duplications in animals: two episodes early in vertebrate evolution (Dehal &

Boore, 2005) and one specific to teleost fish (Taylor et al., 2003). Evolutionarily recent cases are reported in amphibians and reptiles, and most notably in fish where entire polyploid lineages have been described (reviewed in Otto & Whitton, 2000; Mable et al., 2011). Polyploids often differ markedly from their diploid progenitors in morphological, physiological, and life history characteristics (Levin, 1983; Ramsey & Schemske, 2002), and these differences may contribute to the establishment and success of polyploid species in novel ecological settings. It is thus hypothesized that polyploidy may serve as an important mechanism for niche differentiation and ecological diversification, especially in harsh environments (reviewed in Otto, 2007; Fawcett &

Van de Peer, 2010).

A long-standing debate concerning polyploidy is whether it influences a lineage’s evolutionary success. Historically, researchers have focused on plants because of the rich documentation of polyploidy in this group. Polyploids were traditionally regarded as evolutionary

“dead-ends” because of the hypothesized deleterious effects associated with ploidy level increase, such as gene dosage imbalance of the sex chromosomes (Orr, 1990), reduced fertility in

14 heteroploid hybrids (Ramsey & Schemske, 2002), and inefficiency of selection when genes are masked by multiple copies (Haldane, 1933; Fisher, 1935; Wright, 1969). It was further argued that if polyploids were more successful than their diploid relatives, polyploidy should have replaced diploidy as the predominant genetic system in extant eukaryotes (Stebbins, 1971). Supporting these views, a statistical analysis showed that the high prevalence of polyploidy in plants can be explained by frequent polyploid formation and slow reversal to diploidy rather than elevated lineage diversification following polyploidy (Meyer & Levin, 2006). Recent comparative analyses of plant genomes, however, revealed signatures of ancient polyploidization events (i.e., paleopolyploidy) that occurred multiple times during flowering plant evolution (Van de Peer et al.,

2009; Jiao et al., 2011), indicating that all extant flowering plants have experienced at least one round of polyploidy in their evolutionary past. This suggests that rather than being evolutionary

“dead-ends”, polyploids can indeed persist and even blossom into diverse and successful clades.

Greater genetic degrees of freedom, increased heterosis, different niche tolerances, and altered colonizing abilities, as well as molecular mechanisms such as functional divergence of duplicated genes and buffering of crucial functions, are a few of the hypotheses proposed to explain the success of polyploid lineages (Werth & Windham, 1991; Soltis & Soltis, 2000; Taylor et al., 2001;

Comai, 2005; Chapman et al., 2006; Otto, 2007; Semon & Wolfe, 2007).

Although genomic evidence has rekindled the “polyploid-success” view, large-scale phylogenetic investigations have suggested otherwise, at least for relatively short evolutionary time scales. Using a comprehensive phylogenetic and cytological dataset of vascular plants, Wood et al. (2009) reported that polyploidy accompanied 15% and 31% of speciation events in angiosperms and in ferns, respectively. However, they discovered no significant association between polyploid incidence and elevated diversification in plants. Using likelihood-based

15 methodologies, Mayrose et al. (2011) further found that recently-formed polyploid species plant lineages experience lower diversification rates compared to their diploid congeners as a consequence of both lower speciation and higher extinction rates. While the current picture in plants illustrates that neopolyploids generally diversify less rapidly, this hypothesis has not been rigorously investigated in animals.

Among animals, fish exhibit the most appreciable degree of polyploid incidence (reviewed in Leggatt & Iwama, 2003; Le Comber & Smith, 2004; Mable et al., 2011). Polyploid assemblages have been well documented in Acipenseridae (Birstein et al., 1997; Ludwig et al., 2001),

Ostariophysi (e.g., Botiidae [Šlecthtová et al., 2006]), and most notably in Cyprinidae

(Machordom & Doadrio, 2001a; Tsigenopoulos et al., 2002; Tsigenopoulos et al., 2010; Levin et al., 2012). In addition, the families Salmonidae (Allendorf & Thorgaard, 1984; Johnson et al.,

1987) and Catostomidae (Uyeno & Smith, 1972; Ferris, 1984) are thought to have undergone genome duplication in their ancestry. Genomic analyses established that the exceptionally species- rich ray-finned fishes descended from a polyploid ancestor, highlighting the potentially profound impact of polyploidy on fish evolution (Taylor et al., 2003). A later study by Hoegg et al. (2004) narrowed the phylogenetic window of the ancient polyploidization event, pinpointing it to the branch leading to the radiation of the teleost fishes – the main constituent of the ray-finned fish clade. Hoegg and colleagues suggested that the teleost-specific ancient polyploidy was linked to the evolutionary success and phenotypic diversification of teleost fishes. More recently, Santini et al. (2009) tested the same association using a model-based method that incorporates both phylogenetic and diversity information. Their analysis detected a significant rise in diversification rate around the timing of the ancient polyploidization event. However, the authors cautioned that the ancient polyploidization event (or any other transition along the same branch in the tree) may

16 explain merely ~10% of extant teleost diversity, as much of the remaining diversity may be ascribed to two subsequent radiations that are not associated with known genome duplication events. Importantly, these studies focused on a single ancient polyploidy event, whose link to higher diversification may be coincidental. To improve our understanding of the contribution of polyploidy to fish evolution, more events must be considered and the generality of the association robustly investigated.

In the current study, we assess the link between polyploidy and diversification in a few fish lineages where polyploid species have been extensively documented. By applying likelihood-based phylogenetic methodologies, we estimate the diversification rates of polyploids and their diploid kin, and explore the relative contribution of speciation and extinction to the evolutionary fate of polyploid fish lineages. As our methods are comparable to those used by

Mayrose et al. (2011), our study enables the first quantitative comparison of polyploid diversification patterns between plants and fishes, thus providing important insights into the long-term consequences of polyploidy in eukaryotes.

Materials and Methods

2.3.1 Sequence datasets and ploidy level assignments

We gathered phylogenetic and ploidy level data from the literature for four fish groups that display notable variation in ploidy levels: (1) the sturgeons (Acipenseridae: Acipenseriformes), (2) the botiid loaches (Botiidae: Cypriniformes), (3) the Cyprininae subfamily (Cyprinidae:

Cypriniformes), and (4) the salmonids and their relatives, the pikes and (Salmonidae:

Salmoniformes and Esocidae/Umbridae: Esociformes). Table 1 summarizes information regarding the fish groups examined in this study.

17 Sturgeon diversity consists of 25 species belonging to 4 genera (Ludwig, 2008). To reconstruct a sturgeon phylogeny, we assembled a multi-locus dataset using cytochrome b (cytb),

12S ribosomal RNA (12S rRNA), 16S ribosomal RNA (16S rRNA), cytochrome oxidase c subunit

II (COII), NADH dehydrogenase subunit 5 (ND5), tRNA-Asp, and tRNA-Phe sequence data gathered from Krieger et al. (2008), Birstein et al. (2002), references cited in those studies, as well as additional sequences retrieved from NCBI GenBank; for Acipenser dabryanus, sequence data for all loci except cytb were extracted from the mitochondrial genome (GenBank accession:

AY510085.1) published by Peng et al. (2007). Two paddlefish species, Polyodon spathula and

Psephurus gladius, were included as outgroup taxa. The combined sequence dataset encompasses the entire sturgeon diversity except Pseudoscaphirhynchus fedtschenkoi, which has been considered critically endangered and possibly extinct (Birstein, 1993). Ploidy level estimates for

21 species were taken from Table 2 in Peng et al. (2007). These estimates were based on microsatellite locus analysis (Ludwig et al., 2001) and genome size data (Zhang et al., 1999 and references therein), and agree well with the chromosome number distribution (available for 19 species; Table 3 in Ludwig et al., 2001). The GenBenk accession numbers and ploidy level estimates used to assemble the Acipenseridae data set are provided in Table S1 (Appendix A)

The family Botiidae includes 47 species belonging to 7 genera (Kottelat, 2004). The cytb sequence data from Šlecthtová et al. (2006) represent ~74% of extant botiid loach diversity. Three species (Cobitis bilineata, Sabanejewia balcanica, and S. larvata) were used as outgroup taxa.

Šlecthtová et al. (2006) inferred a single polyploidy event in the Botiidae that occurred along the ancestral lineage leading to the well-supported monophyletic group Botiinae. Thus, we treated all taxa in subfamily Botiinae as polyploid and all taxa in its sister subfamily Leptobotiinae as diploid.

18 Cyprininae, a remarkably diverse subfamily within Cyprinidae, is estimated to encompass over 1,300 species belonging to roughly 110 genera (Yang et al., 2010). We pooled together cytb sequence data from several sources to create a dataset that is as taxonomically and cytologically comprehensive as possible. For species belonging to the sensu lato group (including

Labeobarbus, Luciobarbus, , and Barbus sensu stricto), cytb sequence data were compiled from Machordom and Doadrio (2001a), Machordom and Doadrio (2001b),

Tsigenopoulos et al. (2002), Tsigenopoulos et al. (2003), Marková et al. (2010), and Tsigenopoulos et al. (2010); for species in the genera , , and Sinocyclocheilus, sequence data were primarily taken from Levin et al. (2012), He and Chen (2006), and Xiao et al. (2005), respectively; for the tribe Cyprinini sensu stricto (including Carassioides, Carassius, , and ; Yang et al. 2010), most sequence data used for the current study were also used in

Yang et al. (2010); additionally, for species belonging to various other genera within Cyprininae

(as described in Yang et al., 2010), sequence data were retrieved from NCBI GenBank. Only species considered valid according to FishBase (Froese & Pauly, 2012; http://www.fishbase.org/) were used. Sequence entries having ambiguous species designation (e.g., “Barbus sp.” and

“Capoeta cf. banarescui”) as well as multiple subspecies entries were excluded (e.g., among

Carassius auratus subspecies only Carassius auratus langdorfii was retained). Two non-

Cyprininae species (Tinca tinca and Gobio gobio) were used as outgroup taxa. Chromosome numbers, used for the inference of ploidy levels (see below), were retrieved primarily from

FishBase, Tsigenopoulos et al. (2002), references listed in Yang et al. (2010), and miscellaneous sources provided in Appendix A. Table S2 (Appendix A) provides the GenBank accession numbers and chromosome numbers used for the Cyprininae data set.

19 In addition to these three fish groups, in which variation in ploidy level exists within the family, we constructed a dataset to examine the between-order consequences of polyploidy in

Salmoniformes following its divergence from its sister clade Esociformes (the relationship between these two orders was demonstrated by Ishiguro et al., 2003, López et al., 2004, and Li et al., 2008). Salmoniformes is thought to have evolved from a tetraploid ancestor (Allendorf &

Thorgaard, 1984; Johnson et al., 1987). Esociformes, on the other hand, did not undergo genome duplication after its divergence from Salmoniformes – a hypothesis supported by karyotypic data

(Phillips & Ráb, 2001; Mank & Avise, 2006) and genome size data (C values of ~1.87-4.90 pg in salmonid species and ~0.85-2.70 pg in esocid and umbrid species according to the Animal Genome

Size Database; Gregory, 2012). Thus, we regarded all salmonid species as polyploid and all esociform species as diploid. We assembled a cytb sequence dataset that includes 60 salmonid species from 9 genera and 9 esociform species from 4 genera. Two cyprinid species, Barbus bocagei and Gobiobotia abbreviata, were incorporated as outgroup taxa. Table S3 (Appendix A) lists the GenBank accession numbers of the cytb sequences used for the

Salmoniformes/Esociformes data set.

2.3.2 Phylogenetic reconstruction

Multiple sequence alignments were constructed using MUSCLE 3.8 (Edgar, 2003). The best-fitting nucleotide substitution model was selected using jModelTest 0.1.1 (Posada, 2008).

Using Akaike Information Criterion (AIC) (Akaike, 1974), the GTR+G model was chosen as the most appropriate model for all single-locus cytb datasets (i.e., Botiidae, Cyprininae, and

Salmoniformes/Esociformes). The best-fitting model in the Acipenseridae dataset was identified independently for each locus. The GTR+G model provided the best fit for 12S rRNA and ND5;

20 the HKY+G model for 16S rRNA, COII, and cytb; the SYM+G model for tRNA-Phe; and the

K80+G model for tRNA-Asp. Next, a set of ultrametric Bayesian trees was sampled for each dataset using MrBayes 3.2.1 (Huelsenbeck & Ronquist, 2001) under a relaxed molecular clock according to a Brownian motion model (Thorne et al., 1998), running for 2,000,000 steps with a sampling frequency of once per 2,000 steps; the initial 50% of the steps were discarded as burn- in. For the Acipenseridae data set, we conducted a partitioned MrBayes analysis allowing each defined locus to evolve at its best-fitting model identified by jModelTest. For each dataset, tree topology was constrained so that the ingroup taxa formed a monophyletic group separate from the outgroup taxa. Outgroup taxa were pruned from the resulting MrBayes trees prior to the inference of ploidy levels and the diversification analysis. To summarize the distribution of

MrBayes trees, 50% majority-rule consensus trees were built using phyutility 2.1.1 (Smith &

Dunn, 2008; http://code.google.com/p/phyutility/) (Figure S1, Appendix A). We assessed the quality of our phylogenetic reconstruction by comparing the MrBayes consensus trees to the phylogenies presented in the previous studies when available.

2.3.3 Inference of polyploidy

The Cyprininae data set contained species with available chromosome number information but for which ploidy level had not previously been determined. Thus, given the reconstructed phylogeny (a 50% majority-rule consensus tree built from the MrBayes trees was initially used; see below for a procedure that accounted for phylogenetic uncertainties), we next aimed to infer extant taxa as diploid or polyploid relative to the base chromosome number of the group examined.

By doing so, we implicitly treated the root of the phylogeny as diploid. Thus, polyploids are defined here as those lineages that underwent a polyploidization event since the divergence from

21 the common ancestor of the group examined. Specifically, the ChromEvol methodology (Mayrose et al., 2010) was used to assign ploidy levels. This likelihood-based method assesses the fit of several models that allow for various types of chromosome number change along the phylogeny, infers the expected number of polyploid and dysploid (chromosome number changes by one, due, for example, to processes such as chromosome fission or fusion) transitions along each branch of the phylogeny, and reconstruct chromosome numbers at ancestral nodes of the tree. The models available in ChromEvol include six parameters for various types of chromosome-number transition; ascending and descending dysploidy, polyploidy (i.e., doubling of the number of chromosomes), and “demi-polyploidy” (i.e., multiplication of the chromosome number by 1.5, leading to, e.g., triplication events). Additional two rate parameters allow the ascending and descending dysploidy rates to depend on the current number of chromosomes. We ran all eight available ChromEvol models and used AIC to select the best model. The expected number of ploidy transitions along each branch of the phylogeny was recorded based on the best-fitting model. In Mayrose et al. (2011), an extant taxon was categorized as a polyploid if the estimated expected number of ploidy transitions from the root to the tip exceeded a certain pre-defined threshold and as diploid otherwise. However, by arbitrarily setting a strict (or lenient) threshold for assigning polyploidy, the number of polyploid taxa may be underestimated (or overestimated).

This misclassification may be particularly pronounced for groups with sparse chromosome number data. Thus, to prevent misestimating polyploid diversity, a simulation-based approach was developed (see below) and was applied to the Cyprininae dataset in order to determine the optimal threshold that should be used (i.e., 0.48). Diversification analyses obtained using the 0.90 threshold

(as in Mayrose et al., 2011) resulted in similar conclusions regarding the relative diversification rates of diploids and polyploids (Table S10, Appendix A).

22 The ChromEvol methodology allowed us to categorize an extant species as polyploid or diploid regardless of whether chromosome number data were available for that specific taxon.

However, because sampling of chromosome number data in certain clades may be comparatively sparse, ploidy levels may not always be estimated reliably. Thus, simulations were used to assign a confidence measure to the ploidy assignment of each extant taxon. Specifically, using the best- supported ChromEvol model, chromosome numbers were evolved from the root of the tree to the tips starting with the maximum a posteriori ancestral chromosome number estimated by the original ChromEvol analysis. In these simulations, we recorded for each tip taxon the evolutionary path leading to it (i.e., the number of polyploidization events from the root of the tree) and thus the “true” (simulated) ploidy level is known for all taxa. The resulting chromosome numbers at the tips of the tree were used as the data input to ChromEvol. To make the inference step as realistic as possible, simulated chromosome numbers at the tips were retained only for those species with available chromosome number data in the original dataset and were converted to “unknown” for species with missing chromosome number information prior to the inference step. An extant taxon was then inferred to be a polyploid if the estimated number of ploidy transitions from the root to the tip exceeded a certain threshold and as diploid otherwise. The assigned ploidy levels (either diploid or polyploid) were then compared with the ploidy levels that were actually simulated, and the number of correctly assigned taxa (i.e., true positives) was determined. This procedure was repeated for 100 randomly selected MrBayes trees and the threshold that resulted in the highest number of true positives was considered the optimal threshold for that dataset (for Cyprininae, the optimal threshold was determined to be 0.48; varying this threshold in the range 0.44-0.65 resulted in nearly identical ploidy assessments, with true positive rates in the range 95.9-96.0%). Finally, using the optimal threshold, taxa with correctly inferred ploidy levels in at least 95% of the

23 simulated runs were considered reliable, while the ploidy levels inferred for species that did not meet this cut-off were treated as unreliable.

A second complementary procedure was taken to account for phylogenetic uncertainties in the ChromEvol inferences – that is, to identify taxa whose inferred ploidy levels were sensitive to the underlying phylogeny. Specifically, ChromEvol was run on the set of 100 MrBayes trees using the best-fitting model determined from the initial ChromEvol analysis (which was based on the consensus tree), resulting in ploidy level estimates per tree as detailed above. Taxa whose ploidy level assignment was different than their consensus assignment (the assignment that was most commonly inferred) in more than 5% of the trees were deemed unreliable.

The ploidy status for all taxa whose ploidy assignment was flagged as unreliable according to either one of the two approaches described above was converted to missing data in subsequent

BiSSE analyses (assigned as ‘NA’ when using the diversitree package, see below). The ploidy levels estimated by ChromEvol for Cyprininae are provided in Table S2 (Appendix A).

2.3.4 BiSSE diversification analysis

To estimate diversification rates for diploids and polyploids, we applied the binary state speciation and extinction (BiSSE) model (Maddison et al., 2007). BiSSE co-estimates six

parameters: speciation rates of lineages in state P (polyploid) and D (diploid) ( lPand lD ,

respectively); extinction rates of lineages in state P and D ( µ P and µ D, respectively); and transition

rates from P to D ( qPD ) and D to P ( qDP ). Using these estimates, the net diversification rate in

each state ( rD and rP ), was calculated as, for example, rD = lD - µD . Because we defined polyploids as those species that had undergone a polyploidization event sometime since divergence

from the base of the group examined, we forced the root state to the diploid state and fixed qPD

24 to zero (see Appendix A for results allowing for polyploid-to-diploid reversals). This constraint is also compatible with the common assumption that polyploidy is largely an irreversible process

(Meyers & Levin, 2006). Our analyses were performed using the “skeletal” tree approach

(FitzJohn et al., 2009) implemented in the R package diversitree version 0.9.3 (FitzJohn, 2012; www.zoology.ubc.ca/prog/diversitree/), which accounts for the sampling fraction of species in the given phylogeny out of the total number of species in the clade. Diversity estimates for the various groups analyzed here were drawn from the literature (summarized in Table 1). Moreover, uneven sampling of polyploids and diploids in different clades of a phylogeny may influence estimates of diversification. Therefore, we accounted for uneven sampling using the “split” extension of the

BiSSE model (as implemented in diversitree) for the analysis of Salmoniformes and Esociformes as well as for Cyprininae. In the Salmoniformes and Esociformes phylogeny, we assumed that all members of Salmoniformes are polyploid and all members of Esociformes are diploid. Thus, we corrected for uneven sampling by specifying that 28% of polyploids and 100% of diploids in

Salmoniformes were represented in the phylogeny, while 100% of polyploids and 69% of diploids in Esociformes were represented (Table 1). Similarly, we adjusted for the biased sampling in

Cyprininae. We were able to confidently assign diversity estimates to four genera (Capoeta,

Pseudobarbus, Schizothorax, and Sinocyclocheilus) which were well corroborated to be monophyletic (with at least 95% posterior probability support; Figure S1C, Appendix A). Using genus diversity estimates from FishBase (Table 1), the sampling fractions for Capoeta,

Pseudobarbus, Schizothorax, and Sinocyclocheilus were estimated as 82%, 100%, 56%, and 58%, respectively. Assuming that Cyprininae consists of 1,300 species (Yang et al. 2010), the

“background” clade (the rest of the phylogeny, excluding the above four genera) has a sampling fraction of ~21%. Unlike in the case of Salmoniformes and Esociformes, we assumed that within

25 each specified clade, polyploids and diploids were being sampled at the same rates (e.g., in

Capoeta, 82% of polyploids and 82% of diploids were represented in the data set). Results obtained using the complete sampling assumption were nearly identical (but with broader confidence intervals for the model parameters compared to those obtained while accounting for incomplete sampling; results not shown).

First, we conducted a maximum likelihood (ML) analysis to test whether (1) diploids and polyploids speciate at different rates, (2) diploids and polyploids go extinct at different rates, or

(3) diploids and polyploids have both different speciation and different extinction rates. These three hypotheses can be tested by comparing the following BiSSE models, starting with the null

model, M0, wherelD = lP and µD = µP . (1) Ms, where only speciation rates differ and µD = µP ;

(2) Me, where only extinction rates differ and lD = lP ; and (3) Mse, in which both the speciation rates and the extinction rates are allowed to differ between diploids and polyploids. To account for uncertainty in the trees, we fitted the four models to 100 randomly selected post burn-in MrBayes trees. The AIC model selection criterion was used to identify the best-fitting model given each individual tree. The best fitting model across all trees was then chosen as the model that was best supported most frequently (we note that in all four datasets examined, the best supported model was also chosen across all 100 trees when tested individually). In addition, most model comparisons were nested (except Ms versus Me), and all of the conclusions drawn from those comparisons were also supported by likelihood-ratio tests.

26 Table 1: Fish groups examined in the current study.

Taxa Overlap Diploid-to- Taxa Species % Group with Rabosky Polyploid Sampled a Richness Polyploids et al. Subtrees b Transitions Acipenseridae 24[24] 0/24/0 25 57c 3 Botiidae 35[34] 5/30/4 47 69 1 Cyprininae 329[420] 39/290/130 1,300d 71e 10f Capoeta 18[11] 8/10/1 22g 100c 0 Pseudobarbus 7[7] 0/7/0 7g 100c 0 Schizothorax 33[34] 2/31/3 59g 100c 0 Sinocyclocheilus 33[34] 2/31/3 57g 100c 0 Salmoniformes 60[59] 8/52/7 215h 100 1 Esociformes 9[11] 0/9/2 13h 0 0 a Bracketed is the number of taxa in the subtrees extracted from the time-calibrated mega- phylogeny published by Rabosky et al. (2013). b Taxa only in tree built for the current study/taxa in both trees/taxa only in Rabosky et al. (2013) subtree. c Calculated out of 21 species because ploidy level estimates for 3 species are not available. d According to Yang et al. (2010). e Estimated using the best-fitting ChromEvol model, excluding 81 species whose ploidy level could not be reliably inferred according to a ChromEvol power analysis (see Materials and

Methods). f Inferred using the best-fitting ChromEvol model. g Estimated by counting the number of valid species entries in FishBase, while excluding subspecies entries. h According to the California Academy of Sciences Catalog of Fishes database (Eschmeyer &

Fong, 2012).

27 In addition to the ML-based analysis, the Markov chain Monte Carlo (MCMC) approach described in FitzJohn et al. (2009) was applied to obtain posterior probability distributions for each

of the five parameters ( lD , lP, µ D, µ P , and qDP), accounting for uncertainty in parameter estimation and incomplete sampling. Specifically, exponential priors (mean set to 2 [or 0.5 for

qDP ] × log (number of tip taxa) / tree height)) were placed on the five parameters. The BiSSE analysis was again conducted across the set of 100 MrBayes trees. For the first tree in the sample, the initial starting point was determined based on a heuristic estimated by diversitree according to the state-independent birth-death model. The subsequent 99 trees were started from the last point sampled in the previous tree. For each of the MrBayes trees, MCMC analysis was run for 1,500 steps (except the initial tree, which was run for 2,000 steps) and sampled every 10th step. The first

500 steps of the chain for each tree were regarded as burn-in and discarded from the analysis (first

1,000 steps for the initial tree). The 100 chains (each corresponding to one tree sampled by

MrBayes) were then concatenated to form a single sample. We note that individual MCMC chains converged rather quickly (graphical analysis suggests that the MCMC chains stabilized within several hundred steps), and that results were indistinguishable whether we pooled MCMC samples from 10, 50, or 100 trees (we nonetheless used the larger sample set).

To test whether estimated extinction and speciation rates differ between polyploids and diploids, we calculated the percentage of BiSSE MCMC steps in which the diploid rate was higher than that of polyploids (the posterior probability, PP, of diploids having a higher rate than polyploids). For example, to test whether extinction rates differ, we calculated the percentage of

post burn-in steps in which µ D> µ P ; PP(µD > µP ) ³ 0.975 is interpreted as significant support for the conclusion that diploids go extinct at a higher rate than polyploids while

PP(µD > µP ) £ 0.025 supports higher polyploid extinction.

28

2.3.5 BiSSE analysis on time-calibrated phylogenies

The BiSSE analyses described above were repeated using time-calibrated phylogenies obtained through a recently assembled mega-phylogeny that encompasses 7,822 extant fish species and spans the entire actinopterygiian diversity (Rabosky et al., 2013). This mega-phylogeny was reconstructed based on a 13-gene matrix and was time-calibrated using 60 fossil dates. The mega phylogeny was downloaded from Dryad (doi:10.5061/dryad.j4802), and the subtrees corresponding to the four fish groups investigated in our study were extracted. Table 1 provides information regarding the overlap between the taxa in the time-calibrated trees and the taxa in the trees reconstructed in the current study. For taxa found in the time-calibrated tree only, additional chromosome numbers were taken from FishBase and the literature (Table S2, Appendix A).

Ploidy levels were assigned as detailed above except for Cyprininae, for which the simulation procedure detailed above was conducted along the single time-calibrated tree rather than a set of trees. The procedures for the diversification analyses followed those described above except that the BiSSE MCMC chain was run using a single tree for 20,000 generations (with the first 10,000 discarded as burn-in) instead of 1,500.

Results

2.4.1 Phylogenetic distribution of polyploidy

The 50% majority-rule consensus tree for each group is presented in Figure S1 (Appendix

A). The reconstructed Acipenseridae phylogeny is highly similar to recently published

Acipenseridae phylogenies (Peng et al., 2007; Krieger et al., 2008). All partitions in our phylogeny that received ≥ 95% posterior probability supported the same descendent species as the

29 corresponding partitions in the phylogeny shown in Figure 1 of Peng et al. (2007), while the three partitions that were not identical received low support (< 85 posterior probability) and involved species with the same ploidy levels. Our phylogeny confirms the basal relationships among the major lineages − the monophyly of the genus Scaphirhynchus, the basal clade containing

Acipenser oxyrinchus and A. sturio, and the basal split to the monophyletic Atlantic clade and

Pacific clade (Figure S1A, Appendix A). Our phylogeny also recovered the three monophyletic lineages (one in the Atlantic clade and two in the Pacific clade) that experienced a common polyploidization event (as shown in Peng et al., 2007).

The cytb Botiidae phylogeny reconstructed in this study (Figure S1B, Appendix A) is in strong agreement with the cytb phylogeny published by Šlecthtová et al. (2006). Our phylogenetic analysis recovered the well-supported subfamilies Botiinae and Leptobotiinae, the monophyly of

Botia, Leptobotia, Parabotia, Sinibotia, and Syncrossus, and the paraphyly of Yasuhikotakia; the basal relationships among these lineages were supported, as demonstrated previously by

Šlecthtová et al. (2006), except for Chromobotia macranthus, which is clustered with the

Sinibotia/Syncrossus/Yasuhikotakia lineage with relatively poor support (Figure S1B, Appendix

A). Botiinae was previously inferred to have experienced a whole genome duplication event

(Šlecthtová et al., 2006), a conclusion supported by a ChromEvol analysis using the chromosome number data referred to in Šlecthtová et al. (2006) and using the consensus phylogeny of Figure

S1B (Appendix A; results not shown).

The evolutionary relationships among the major clades within Cyprininae (Barbus sensu stricto, Capoeta, , Schizothorax, and Sinocyclocheilus) are represented in our phylogenetic reconstruction of the subfamily (Figure S1C, Appendix A). The monophyly of each of the genera Capoeta, Pseudobarbus, Schizothorax, and Sinocyclocheilus was supported, as well

30 as the monophyly of several clades consisting primarily of Barbus species. Given the consensus

Bayesian phylogeny, we next aimed to use chromosome number data to infer shifts in ploidy levels. Because chromosome number data are incomplete and not uniformly distributed across the

Cyprininae (available for 103 species out of the 329 species in the reconstructed phylogeny) and the phylogeny of this group is still debated, we performed two tests – one based on parametric simulations and one to account for phylogenetic uncertainties – to determine the species for which ploidy levels can be reliably inferred (see Materials and Methods). This procedure resulted in

175 species inferred as polyploids, 73 as diploids, and 81 as “unreliable”. ChromEvol performs

ML reconstruction of chromosome numbers at each internal node. Using these ML estimates derived under the best supported ChromEvol model, we inferred ploidy transitions along the

Cyprininae phylogeny. The reconstructed haploid chromosome numbers consists of three ploidal levels (corresponding to haploid chromosome numbers of ~25, ~50, and ~75) with the root being at 25; chromosome number estimates at inferred ploidy shifts are indicated in Figure S1C

(Appendix A). We estimated ploidy transitions along 11 internal branches and 3 terminal branches of the Cyprininae phylogeny; 10 of these shifts involve transitions from diploidy to tetraploidy (8 along internal branches and 2 along terminal branches), while the other four shifts involve transitions to higher ploidy levels (3 internal and 1 terminal). These ploidy shifts lead to or occur within Cyprininae clades that contain a notable number of reported polyploid taxa (Capoeta,

Carassius/Cyprinus, Luciobarbus, Sinocyclocheilus, a Barbus/Pseudobarbus group described by

Tsigenopoulos et al. (2002), and a large assemblage of miscellaneous genera that include Tor,

Labeobarbus, and Varicorhinus).

Previous molecular phylogenetic studies investigated the evolutionary relationships among the major genera of Salmonidae (Crespi & Fulton, 2004) and among the subfamilies of Salmonidae

31 (Coregoninae, Salmoninae, and Thymallinae; Yasuike et al., 2010). To the best of our knowledge, however, no comprehensive species-level phylogeny exists for Salmonidae (the only family within

Salmoniformes). In brief, our phylogeny recovers the monophyly of the major salmonid genera

(Salmo, Salvelinus, and Oncorhynchus) and the monophyly of each of the 3 subfamilies (Figure

S1D, Appendix A). Within Esociformes, all the genera (Esox, Dallia, Novumbra, and Umbra) are each monophyletic, and the phylogenetic relationships among them that are supported by our phylogeny were also shown by López et al. (2004).

2.4.2 Comparing the diversification rates of diploids and polyploids

In the datasets examined in this study, the vast majority of diploid-to-polyploid transitions were inferred along internal branches of the phylogeny (Figure S1, Appendix A). Additional ploidy level increases (e.g., tetraploidy to octaploidy), however, were inferred to occur along terminal branches in Acipenseridae (Figure S1A, Appendix A) and in Cyprininae (Figure S1C,

Appendix A). This phylogenetic distribution of ploidy level shift contrasts with that observed in plants, where diploid-to-polyploid shifts mainly occur along terminal branches, not internal ones

(Mayrose et al., 2011). The difference in the phylogenetic distribution of ploidy shifts between fishes and plants suggests that polyploidy may have a different impact on the persistence and diversification in the groups examined.

For Acipenseridae, the BiSSE ML analysis indicated that the best supported model according to AIC was M0, which assumes equal speciation and extinction rates of diploids and polyploids (Table 2). Thus, there is no significant support for the more complex models that allow for unequal diversification rates. The BiSSE MCMC results were consistent with this conclusion

( PP(lD > lP ) = 0.25 and PP(µD > µP ) = 0.40). The median of the MCMC posterior

32 distribution suggested that, if anything, polyploids tend to exhibit higher speciation rates and higher extinction rates than diploids (both by nearly 1.4-fold), but these differences were not statistically significant. Together, the MCMC analysis did not support the hypothesis that polyploidy is associated with different net-diversification rates among sturgeons (

PP(rD > rP ) = 0.37; Table 3). We note that Huso dauricus, which was estimated to be diploid by microsatellite locus analysis (Ludwig et al., 2001), was recently shown to be polyploid by karyotyping (Vasil’ev et al., 2009). Our results were qualitatively the same whether we treated H. dauricus as diploid (presented here) or as polyploid (not shown).

According to the ML analysis, the best-fitting model for Botiidae was again M0, suggesting that diploid and polyploid lineages in this group do not differ substantially in their speciation or extinction rates. In agreement, the MCMC results did not support significantly different speciation

rates ( PP(lD > lP ) = 0.81) or extinction rates ( PP(µD > µP ) = 0.79). Based on the MCMC posterior distributions, the median speciation and extinction rates were higher for diploids (roughly by 1.4-fold and 3.4-fold, respectively), but there was virtually no difference in the overall

diversification rates between diploids and polyploids ( PP(rD > rP ) = 0.46; Table 3).

For Cyprininae, the ML analysis supported the hypothesis that polyploids possess both higher speciation rates and higher extinction rates than diploids, with Mse being the best supported model. In agreement, the MCMC analysis showed that speciation rates are about 5.5-fold higher

in polyploids versus diploids ( PP(lD > lP ) = 0) and that extinction rates are about 47.1-fold

higher in polyploids versus diploids ( PP(µD > µP ) = 0). Given the lower absolute extinction rates, however, the net result was a marginally significant higher diversification rate of polyploid

lineages ( PP(rD > rP ) = 0.03; Table 3).

33 Table 2: Best-fitting BiSSE model (with the constraint qPD = 0) according to AIC. The percent of trees (out of 100) where each model received the best (lowest) AIC score is presented. The distribution of difference in AIC score (AIC) relative to the best-fitting model is presented by the 5th and 9th percentile and the median. M0 = diploid and polyploid lineages have equal rates of both speciation and extinction; Ms = speciation rates of diploid and polyploid lineages are allowed to vary, while extinction rates are equal; Me = extinction rates of diploid and polyploid lineages are allowed to vary, while speciation rates are equal; Mse = speciation rates as well as extinction rates of diploid and polyploid lineages are allowed to vary.

Salmoniformes/ Acipenseridae Botiidae Cyprininae Esociformes Best-fitting model M0 M0 Mse Mse % of trees supporting Mse 0 0 100 100 Me 0 0 0 0 Ms 0 0 0 0 M0 100 100 0 0

Model 5/50/95 percentiles of ΔAIC compared to best-fitting model Mse 2.27/2.73/3.06 2.81/3.47/3.75 0/0/0 0/0/0 Me 0.96/1.16/1.41 2.00/2.00/2.00 63.44/68.67/75.15 13.80/16.08/19.22 Ms 0.29/0.76/1.18 1.19/1.56/1.79 27.30/32.94/38.40 12.99/15.19/18.83 M0 0/0/0 0/0/0 88.98/94.86/99.74 15.82/17.88/20.82

34 Table 31: MCMC-based estimates of evolutionary rates using the BiSSE model (with the

constraint qPD = 0). The rates of speciation ( l ) and extinction ( µ ) for diploids (D) and polyploids (P) were estimated from the median of the posterior distributions generated by

MCMC sampling over all 100 trees. For each MCMC step, the diversification rate was derived

as the speciation rate minus the extinction rate for diploids and polyploids ( rD and rP , respectively). The posterior probability (PP) that diploids exhibit higher rates than polyploids is represented by the percentage of MCMC steps where the inequality was satisfied. PP £ 0.025 is considered as support for higher rates in polyploids than diploids (highlighted in bold; marginal significance is in italics).

Group l D l P PP(lD > lP ) PP(µD > µP ) PP(rD > rP ) µ D µ P q DP Acipenseridae 69.45 99.46 32.05 45.44 13.76 0.25 0.40 0.37 Botiidae 27.21 18.89 11.91 3.46 2.26 0.81 0.79 0.46 Cyprininae 21.80 121.45 1.95 91.69 1.55 0.00 0.00 0.03 Salmoniformes/ 14.21 157.30 8.77 143.81 2.50 0.00 0.00 0.11 Esociformes

Our BiSSE ML analysis also revealed that polyploid salmonids experience higher speciation rates and higher extinction rates than the diploid esociforms, with Mse being the best supported model. In agreement, the MCMC analysis indicated that the polyploid salmonids have significantly higher speciation rates (by about 11.1-fold; PP(lD > lP ) = 0) and higher extinction rates (by about 16.4-fold; PP(µD > µP ) = 0) than diploid esociforms. We caution, however, that the association between polyploidy and higher speciation and extinction rates is based on the single polyploid transition leading to the salmonid clade, and so the evidence in this case is not replicated.

1 In the published version of this chapter, the legend states that “all rates are scaled relative to a total tree depth of 1, before pruning off the outgroup taxa.” This is incorrect. The trees were not rescaled prior to BiSSE analysis. 35 Furthermore, the inflated speciation rates and extinction rates did not result in a significantly

increased net diversification rate in the salmonids ( PP(rD > rP ) = 0.11; Table 3).

The above diversification analyses were repeated but this time using, for each group, a single time-calibrated tree extracted from a mega phylogeny reconstructed for the ray-finned fishes

(Rabosky et al., 2013). Results obtained for Acipenseridae, Botiidae, and Cyprininae were very similar to those obtained using the sample of trees reconstructed here (see Table S9, Appendix

A). For the Salmoniformes/Esociformes comparison this analysis also supported significantly higher polyploid speciation ( PP(lD > lP ) = 0; Table S9, Appendix A) and extinction rates (

PP(µD > µP ) = 0; Table S9, Appendix A). However, this analysis supported a significantly higher net-diversification rate for polyploids ( PP(rD > rP ) = 0.01).

The above BiSSE analyses were conducted under the assumption that polyploid-to-diploid

reversals do not occur (i.e., qPD = 0). Results obtained while allowing for polyploidy reversals generally mirror those obtained under the irreversibility assumption. In the ML analyses, the best models were the same, except for Acipenseridae for which Ms instead of M0 had a marginally superior, but not significant, AIC score (Table S6, Appendix A). ML parameter estimates with and without the irreversibility constraint are given in Table S5 and Table S7 (Appendix A), respectively, while results obtained using MCMC reached the same conclusions and are given in

Table S8 (Appendix A).

36 Discussion

Recent literature surveys indicate that polyploidization is generally a rare occurrence among vertebrates, but it is particularly prominent in fish where entire polyploid assemblages have arisen (Mable et al., 2011). Genome-wide analyses have unearthed ancient polyploidization events across the eukaryotic tree of life, promoting the view that polyploidy has played an important role in eukaryotic evolution. However, at least over a relatively short time scale, large-scale phylogenetic studies have suggested that recently-formed polyploid plant species generally experience lower diversification rates compared to their diploid congeners (Mayrose et al., 2011).

To the best of our knowledge, the question whether polyploidy is associated with a shift in diversification patterns has not been rigorously explored in fish.

Our results suggest that polyploid fish lineages do not exhibit lower diversification rates compared to closely related diploids. In all four groups examined, the diversification rate for polyploids was higher than that of diploids in the majority of MCMC steps. Using the collection of Bayesian trees, this difference was marginally significant in the Cyprininae and not significant in the three other groups (Table 3). Results obtained using a single time-calibrated phylogeny were similar except that the higher net diversification of the polyploids Salmoniformes was significantly higher than their Esociformes diploid relatives. In addition, both the speciation rates and the extinction rates were higher in polyploids than in diploids in the Acipenseridae, Cyprininae, and

Salmoniformes/Esociformes comparisons, but the differences were only significant in the latter two groups while Botiidae exhibited the opposite trend (Tables 2 and 3). These general conclusions also held when we relaxed the assumption that polyploids could not revert to diploid

(allowingqPD ¹ 0; see Tables S6 and S8, Appendix A).

37 The subfamily Cyprininae encompasses the largest number of known polyploidization events in fish, with multiple transitions leading to the tetraploid and hexaploid Barbus groups

(Tsigenopoulos et al., 2002). Because polyploidy has been frequently reported in Cyprininae and because this clade is so species-rich, it is tempting to hypothesize that the evolutionary success of

Cyprininae is attributed to polyploidy. Our study indicates that polyploid Cyprininae lineages indeed diversify more rapidly than diploid lineages. These results nonetheless should be interpreted with caution owing to taxonomic biases as discussed below.

In a recent diversification analysis comparing diploid and polyploid plant lineages,

Mayrose et al. (2011) demonstrated that polyploid plant species tend to undergo lower speciation rates and higher extinction rates than their diploid congeners, thereby leading to markedly lower polyploid net-diversification rates. In stark contrast to plants, in none of the datasets analyzed here did we find support for lower polyploid diversification than the reverse (Table 3). This may seem counterintuitive considering that polyploidy is widespread in plants and relatively uncommon in fish. Below, we discuss potential explanations for and caveats about this difference.

First, taxonomic biases may influence our results. While polyploidy has been investigated for decades in plants, this phenomenon has generally been much less appreciated by fish taxonomists. In particular, ploidy level is a central character used in the plant systematic literature, but it has only recently been incorporated in fish systematics (Mable et al., 2011), perhaps due to the difficulty in identifying cryptic polyploids. Consequently, the frequency of polyploid fishes may well be underestimated with non-trivial implications to downstream diversification analyses.

In particular, polyploidization events are more likely to be recognized when they affect many animal species (i.e., at internal nodes leading to diverse groups), as events along terminal branches would go unnoticed until the ploidy level of the resulting species were measured. Such a bias

38 would inflate the estimated diversification rate associated with polyploidy. Furthermore, in groups where polyploidy is known to occur, the frequency of polyploid fishes may be overestimated, if phylogenetic and cytological studies tend to focus on those sub-clades exhibiting polyploidy, which may be the case with the Cyprininae. Thus, while our current results demonstrate higher diversification in polyploid Cyprininae fishes, they should be interpreted cautiously pending future re-analyses with a more complete dataset.

Second, low or sparse taxonomic sampling may limit the power of our analyses, thus rendering our results tentative until sufficient data are collected. In the cases of Acipenseridae and

Botiidae, while the clades are well sampled (96% and 74%, respectively), the overall species richness in these clades is rather small. Therefore, it is possible that there is insufficient power to draw robust conclusions from the BiSSE analyses for Acipenseridae and Botiidae even as more data accumulate. For the large Cyprininae group, however, the current number of species with sequence data is rather low (below 25%, assuming that the group contains over 1,300 species). It is also important to note that while alternative Bayesian phylogenies were considered in the diversification analyses presented here, the of the Cyprininae clade is unsettled and its phylogeny largely unresolved, adding considerate noise to the current dataset. Thus, the diversification results for Cyprininae reported here should be interpreted with caution until more data is gathered in terms of both the number of species with available sequence data as well as the number of loci sampled for each species.

Third, polyploidy may be associated with different suites of characters in plants and animals, and it might be these associated characters that drive differences in diversification rates.

In particular, polyploidy is associated with self-fertilization in plants (Barringer, 2007; Robertson et al., 2010), perhaps as a preadaptation or an evolved response to reduce minority cytotype

39 disadvantage (Levin, 1975). In fish, however, self-fertilization is rare2. While theory predicts that self-compatibility would increase the establishment success of polyploids (Rausch & Morgan,

2005), on a longer time scale having reduced levels of genetic mixing may make such lineages prone to extinction (Goldberg et al., 2010). Interestingly, in animals, newly arisen polyploid taxa may avoid minority cytotype disadvantage through phenotypic shifts that cause polyploids to assortatively mate with other polyploids. For example, mating calls are altered in polyploid anurans (Keller & Gerhardt, 2001), and a similar mechanism has been hypothesized to operate in fish (reviewed in Mable et al., 2011), thereby increasing the probability that polyploids establish without the long-term detrimental effects of self-fertilization.

Fourth, in plants, the high frequency of heteroploid speciation (i.e., speciation events involving a shift in ploidy) from the diploid state (estimated to be as high as 32% in plants;

Mayrose et al., 2011) may be a major component of the elevated speciation rates of diploids compared to neopolyploids, which may have a lower capacity to further speciate via polyploidy.

In fish, heteroploid speciation appears to be less frequent –from 5% in Cyprininae to 21% in

Acipenseridae (estimated using an extension of the BiSSE model described in Magnuson-Ford &

Otto, 2012; see Supporting Methods and Table S4, Appendix A), thereby contributing less to the diploid speciation rate. Thus, the high polyploid abundance in plants may, in fact, be driven by the elevated speciation rates of diploids generating polyploids. In addition, it seems that in fish, the rate of heteroploid speciation is not strikingly different between diploids and polyploids (e.g., in Acipenseridae, we inferred four heteroploid speciation events, with these occurring at 3 of the

2 In the published version of this chapter, Alves et al. (2001) was incorrectly cited here.

40 14 nodes where the ancestral lineage was estimated to be diploid and one of the 9 nodes where the ancestral lineage was estimated to be polyploid; Figure S1A, Appendix A).

Fifth, it may be the case that polyploidization is particularly advantageous in lineages that have not undergone previous rounds of polyploidization. It is established that angiosperms have an extensive history of ancient polyploidy (Jiao et al., 2011, Van de Peer et al., 2009, Soltis et al.,

2009), and that polyploid formation in plants is a common and ongoing phenomenon. Fish, on the other hand, have undergone rather few ancient polyploidization events (two events early in vertebrate evolution (Dehal & Boore, 2005), and another one preceding the radiation of the ray- finned fishes (Taylor et al., 2003)), and polyploids are rarely reported except in certain lineages, such as Cyprininae. It is thus possible that the advantageous effects of polyploidy, such as increased genetic degrees of freedom, diminish with the number of prior rounds of polyploidization and might even be absent in many groups of plants where polyploidy has been particularly rampant.

Finally, it is important to consider the time frame of the analyses. The analyses in Mayrose et al. (2011) focused on a set of 63 genus-level plant groups. However, the currently available resolution of fish systematics has only allowed us to investigate a heterogeneous set of taxonomic ranks, all above the genus level. As stated by Levin (1983), “chromosome doubling may propel a population into a new adaptive sphere, and render it capable of occupying habitats beyond the limits of its diploid progenitor. For this to occur, however, polyploids must survive long enough for chromosome doubling to influence subsequent evolution.” It is possible that the extra degree of genetic freedom of polyploids, provided by the additional paralogous gene set, is advantageous only over longer periods of evolutionary time. Similarly, the costs of polyploidy may be most acute over shorter time scales, such as reduced genetic variation when only one or a few individuals

41 undergo polyploidization, a lack of prior adaptation to the phenotypic and ecological shifts induced by polyploidization, as well as minority cytotype disadvantage. It is thus possible that polyploidy does not affect diversification rates differently between animals and plants, but rather that older polyploid lineages enjoy an evolutionary success that the younger polyploid lineages do not. As we obtain richer datasets, with phylogenies spanning multiple taxonomic levels and more complete ploidy information, such possibilities can be explicitly examined. Nonetheless, our current investigation raises the possibility that polyploidy has had different evolutionary repercussions in fishes and plants.

42 Phylogenetic evidence for cladogenetic polyploidization in land plants

Overview

Polyploidization is a common and recurring phenomenon in plants and is often thought to be a mechanism of “instant speciation”. Whether polyploidization is associated with the formation of new species (cladogenesis) or simply occurs over time within a lineage

(anagenesis), however, has never been assessed systematically. We tested this hypothesis using phylogenetic and karyotypic information from 235 plant genera (mostly angiosperms). We first constructed a large database of combined sequence and chromosome number data sets using an automated procedure. We then applied likelihood models (ClaSSE) that estimate the degree of synchronization between polyploidization and speciation events in maximum likelihood and

Bayesian frameworks. Our maximum likelihood analysis indicated that 35 genera supported a model that includes cladogenetic transitions over a model with only anagenetic transitions, whereas three genera supported a model that incorporates anagenetic transitions over one with only cladogenetic transitions. Furthermore, the Bayesian analysis supported a preponderance of cladogenetic change in four genera but did not support a preponderance of anagenetic change in any genus. Overall, these phylogenetic analyses provide the first broad confirmation that polyploidization is temporally associated with speciation events, suggesting that it is indeed a major speciation mechanism in plants, at least in some genera.

Introduction

Polyploidization, or whole genome duplication, has been a rampant and ongoing process contributing to plant evolution. Building on the early work by Stebbins (1938), the latest

43 estimates suggest that 35-40% of extant flowering plant species are recent polyploids, or “neo- polyploids” (Wood et al., 2009; Scarpino et al., 2014), with genomes that have doubled since the initial divergence of their genus. Deeper in time, all seed plants are thought to have undergone a polyploidization event some time during their evolutionary history (Jiao et al., 2011). The frequency of polyploidization, along with the common observation of reproductive incompatibilities between polyploids and related diploids (“triploid block”; Ramsey &

Schemske, 1998), has led to the view that polyploidization is a mechanism of “instant speciation” and a relatively easy path to sympatric speciation, particularly in plants (Coyne &

Orr, 2004). Previous phylogenetic estimates of the rate of polyploidy have not, however, assessed whether polyploidization is indeed coupled in time with speciation itself. Rather, prior work has focused on methods that estimate the rate of polyploidization per unit time

(“anagenesis”) or on methods that do not distinguish when ploidy shifts occur (Stebbins, 1938;

Grant, 1963; Masterson, 1994; Wood et al., 2009; Mayrose et al., 2011; Scarpino et al., 2014). It is indeed possible that transitions in ploidy occur either without full reproductive isolation ever evolving (i.e., without speciation) and/or by simple displacement of diploids by polyploid descendants. Here, we ask whether there is a phylogenetic signal that polyploidization is coupled in time with speciation events (“cladogenesis”), using recent phylogenetic methods that tease apart anagenetic and cladogenetic processes.

In addition to initiating reproductive incompatibilities, polyploidization is thought to be a driver of speciation because newly formed polyploids often differ from their diploid ancestors in morphological, physiological, and life history characteristics (e.g., Levin, 1983; Ramsey &

Schemske, 2002). Polyploidy therefore may serve as an important mechanism for niche differentiation and ecological diversification, which may contribute to the successful

44 establishment of new polyploid species (Levin, 1983; Otto, 2007). Establishing the causative link between polyploidization and speciation is challenging, however. For example, ecological differences between related diploid and polyploid taxa may have occurred independently of the polyploidization event (before or after). Similarly, it is difficult to determine whether polyploidization itself was a major early driver of reproductive isolation or occurred later in the speciation process.

Indeed, it is known that polyploidy does not always lead to immediate reproductive isolation. For example, Slotte et al. (2008) found that polyploidy does not terminate gene flow between the diploid parent and its polyploid progeny in Capsella. Furthermore, extensive intra- specific variation in ploidy levels (Stebbins, 1971; Wood et al., 2009; Rice et al., 2015) and evidence of multiple origins in many polyploid lineages (Soltis & Soltis, 1999) suggest that multiple cytotypes often segregate within species. Gene flow between diploids and polyploids remains possible via a number of mechanisms (Ramsey & Schemske, 1998; Ramsey &

Schemske, 2002), including the occasional production of viable seeds from triploid intermediates

(“triploid-bridge”), from crosses involving unreduced gametes produced by diploids, or from genome reduction yielding offspring bearing half the genome size of their polyploid parents

(“polyhaploids”). Evidence for gene flow between diploids and polyploids has been found in the genomes of several plants, particularly between crops and their wild relatives (reviewed in

Chapman & Abbott, 2010). These observations demonstrate that the speciation of polyploid lineages may be a dynamic – rather than instantaneous – process, which generates and maintains genetic variation within species for some time (Thompson & Lumaret, 1992).

Recent advances in methods to analyze trait evolution across phylogenetic trees allow researchers to infer rates of anagenetic versus cladogenetic change in a trait and to assess the

45 degree to which a change in a trait, such as polyploidization, occurs concurrently with the formation of species. These methods build upon the BiSSE (Binary State Speciation and

Extinction) model (Maddison et al., 2007; FitzJohn et al., 2009), which models the evolution of a two-state trait that can affect speciation rate (li) and extinction rates (µi). BiSSE can be used in

Bayesian or maximum likelihood (ML) analyses to assess the parameter combinations that best account for both the present-day trait distribution and the shape of the phylogeny, thus providing a framework within which character-dependent macro-evolutionary hypotheses may be statistically tested (e.g., Goldberg et al., 2010; Hugall & Stuart-Fox, 2012; Beaulieu &

Donoghue, 2013; Zhan et al., 2014; Sabath et al., 2016). As originally formulated, the trait evolves over time from state i to state j at rate qij, assuming that only anagenetic changes are possible. Subsequent work also allowed for cladogenesis, modeled either as the probability that speciation generates daughter species whose traits differ from the parent (BiSSE-ness;

Magnuson-Ford & Otto, 2012) or estimating the rate at which speciation with trait change occurs

(ClaSSE; Goldberg & Igic, 2012). The models are interchangeable in a likelihood framework but have different natural prior distributions when used in Bayesian analyses. Here, we use ClaSSE with a uniform prior on the fraction of trait changes that are cladogenetic, φ (see Appendix S1,

Supporting Information; https://bsapubs.onlinelibrary.wiley.com/doi/10.3732/ajb.1600108).

In the current study, we tested the main prediction of the hypothesis that polyploidization is a major speciation mechanism: ploidy shifts should coincide with speciation events (either at internal nodes of the phylogeny or at "hidden speciation nodes" along the branches due to subsequent extinction of a daughter lineage). To do so, we applied the ClaSSE model in both

Bayesian and ML frameworks to a large cohort of plant genera (mostly angiosperms) for which

46 adequate sequence data and chromosome number data are available. Our study provides the first broad confirmation that polyploidization is frequently cladogenetic in plants.

Materials and Methods

3.3.1 Database construction

For this study, we assembled a database of plant genera exhibiting variation in ploidy levels. We created 223 angiosperm genus data sets, which are collectively referred to as PloiDB

(Table S1, Appendix B), by retrieving and combining sequence and karyotypic data from various public data sources. Phylogenetic trees for each data set were reconstructed as similarly described in Sabath et al. (2016). Briefly, ultrametric Bayesian phylogenies were inferred using sequence data available at NCBI GenBank (www.ncbi.nlm.nih.gov/genbank). Sequences were binned by locus using OrthoMCL v2.0.3 (Li et al., 2003). An appropriate outgroup, which was used to root the phylogeny, was selected and added to the list of sequences, which were aligned using MAFFT v7.149b (Katoh & Standley, 2013). GUIDANCE v1.41 (Penn et al., 2010) was applied to the resulting multiple sequence alignment (MSA) of each cluster to discard sequences and positions that reduce the MSA reliability. The best-supported model of sequence evolution was determined for each locus independently using jModelTest v2.1.7 (Guindon & Gascuel,

2003; Darriba et al., 2012). MSAs for multiple clusters were concatenated to form a multi-locus

MSA. Phylogenies were estimated by applying MrBayes v3.2.1 (Ronquist et al., 2012) using two independent runs, each with one cold and three heated chains of 2,000,000 generations each (the average standard deviation of split frequencies of 195 out of 223 genera falls below 0.1), and their results were then combined. In each run, the best-supported nucleotide model determined for each locus was used and branch lengths were allowed to vary according to a birth-death

47 relaxed clock model (Thorne et al., 1988). Finally, the outgroup species were pruned from all resulting trees.

For this expanded data set, chromosome numbers were taken from the Chromosome

Counts Database v1.1 (Rice et al., 2015; ccdb.tau.ac.il), a database that houses chromosome numbers from multiple compendia. Using 100 randomly sampled MrBayes trees combined with chromosome numbers, ploidy levels (diploid or polyploid) were inferred using ChromEvol v2.0

(Mayrose et al., 2010; Glick & Mayrose, 2014). The reliability of estimated ploidy levels was assessed by comparing ploidy inferences across phylogenies and by using a simulation-based approach (Glick & Mayrose, 2014). For each genus, the ML parameter estimates inferred using

ChromEvol were used to simulate ploidy levels across each of the 100 trees, after which ploidy levels were inferred again using ChromEvol. The simulation reliability score was defined for each species as the percentage of accurate ChromEvol inferences out of 100 simulations, while the phylogenetic reliability score was defined as the fraction of phylogenies with the same ploidy inference as the majority rule as defined in the ChromEvol manual

(http://www.tau.ac.il/~itaymay/cp/chromEvol/). A taxon was considered uncertain and treated as

‘NA’ (data not available) if 1) chromosome number data is available for it and its phylogenetic reliability score was below 0.95, or 2) its combined reliability score (across trees and simulations) was below 0.95.

Although the automated procedure included all taxa with sequence information

(including infra-specific taxa, such as subspecies and varieties), we chose a single representative in the following analyses to focus on diversification at the species level. For species with multiple infra-specific entries present, we randomly selected one representative and pruned out the remainder.

48 The above procedure produced over 1,000 genus data sets, but only 223 data sets that met the following criteria were retained: (1) the phylogeny contains at least 30 taxa; (2) at most 50% of taxa had uncertain ploidy assignment (‘NA’); (3) at least 20% of the taxa have chromosome number data; and 4) at least one taxon was polyploid and one was diploid. These PloiDB data sets are available for download at Dryad (doi.org/10.5061/dryad.gr732).

Additionally, we also analyzed a previously-assembled dataset (Mayrose et al., 2011) encompassing 63 genus-level (hereafter, referred to as M2011), but dropping two genera

(Cuphea and Cerastium) because of errors in the data (Soltis et al., 2014, Mayrose et al., 2015).

The same criteria described above to filter out data sets with low coverage were applied to the

M2011 data sets, thereby retaining 29 M2011 data sets (16 are in common with the PloiDB data sets). We used the same set of MrBayes trees and ChromEvol ploidy estimates used previously

(dx.doi.org/10.5061/dryad.6hf21).

3.3.2 Models of polyploid evolution

Because nearly all plant species descend from a polyploid ancestor if we trace their evolutionary history back far enough in time (Jiao et al., 2011), we cannot examine recent polyploidization events without using a reference point (Mayrose et al., 2015). Therefore, we defined a polyploid lineage with respect to the base of the genus, as we did previously in

Mayrose et al. (2011) (see also Stebbins, 1938). Thus, in this study a species is denoted as polyploid if it was detected by ChromEvol to have undergone a polyploidization event over its evolutionary history since divergence from the root of the genus phylogeny, regardless of whether its genome subsequently diploidized. While exceptions exist (Mandáková et al., 2016), this assumption is also consistent with the notion that polyploidy is largely an irreversible

49 process over relatively short evolutionary time scales (Meyers & Levin, 2006; Scarpino et al.,

2014).

To estimate the mode of ploidy transitions, we employed the ClaSSE model (Goldberg &

Igic, 2012) with the following trait-dependent parameters (D for diploid and P for polyploid): diploid and polyploid speciation rates without a change in state (λD and λP), diploid and polyploid extinction rates (µD and µP), the rate of polyploidization along a branch (“anagenesis”, qDP), and the rate of speciation coupled with a ploidy shift in one of the daughter species

(“cladogenesis”, λDDP). We refer to this full six-parameter model as the “dual” model, which allows for both cladogenetic and anagenetic ploidy shifts.

The “dual” model makes several assumptions about the directionality and symmetry of ploidy level transitions. First, we assumed that diploid-to-polyploid transitions do not reverse within the evolutionary history of a genus. This assumption is consistent with the definition of polyploidy used in this study. Because all ploidy shifts are measured relative to the genus ancestor, we fixed the ancestral state of each genus to “diploid”. Finally, we assumed that cladogenesis causes a trait shift in only one daughter species (BiSSE-ness and ClaSSE allow the possibility that both daughter species may differ from the parent, which one might observe with niche or range traits).

Estimates of speciation and extinction rates will be biased if one does not account for missing taxa. Thus, in all analyses detailed below, we used the “skeletal tree” method of

FitzJohn (2009) to adjust the likelihoods for missing data, which assumes that the taxa on the tree are randomly sampled from all taxa of the same state in the clade. The skeletal tree method requires an estimate of the size of a genus, which we obtained from v1.1 database

(TPL; www.theplantlist.org/) by counting the number of accepted species, without regard to the

50 confidence level and excluding entries with infra-specific ranks. (In cases where the TPL count was less than the observed number of species represented on the phylogeny, a sampling fraction of 100% was assumed.) For the un-sampled species, we assumed the same fraction of polyploid versus diploid species as among the observed taxa.

3.3.3 Bayesian analysis

First, we took a Bayesian approach to measure the relative proportion of cladogenetic versus anagenetic ploidy shifts, defined as φ = λDDP /(λDDP + qDP). Values of φ close to 1 imply that polyploidy occurs cladogenetically more often than anagenetically, and values close to 0 imply the reverse.

For each PloiDB data set, a single Markov chain Monte Carlo (MCMC) was run using the "dual” model (denoted as MD) on 20 randomly chosen trees for 2,000 generations each, discarding the first 50% as burn-in – thereby resulting in 20,000 generations (trace plots showed that the MCMC chain moved across the parameter space rapidly, with effective sample sizes of

~400 to ~8,600 for 220 out of 223 genera). The same MCMC procedure was conducted for each

M2011 data set, except that 50 randomly selected MrBayes trees were used. For each MCMC, a heuristic starting point was calculated based on a character-independent birth-death model, and exponential priors were placed on the model parameters using the following rates: 1/2r (for λD),

1/2r (for λDDP), 1/r (for λP), 1/2r (for µD), 1/2r (for µP), and 1/2r (for qDP), where r = ln(number of taxa)/tree length. As shown in the Mathematica file (Appendix S1), these prior choices on the parameters lead to a uniform prior distribution for φ.

To assess support for one transition mode over the other, we report the 95% highest posterior density (HPD) interval of φ. In any one genus, strong support for a preponderance of

51 cladogenetic change was inferred if the entire HPD fell above 0.5, and for anagenetic change if it fell below 0.5. We also examined the distribution of HPD across genera to detect departures from a uniform posterior distribution. HPD intervals of φ were constructed by pooling together values of φ calculated from the MCMC samples from all MrBayes trees analyzed.

3.3.4 Maximum likelihood analysis

We also used likelihood ratio tests to identify the best-fitting model. In addition to MD, we analyzed two reduced models, one permitting only cladogenetic shifts (denoted as MC, with qPD = 0) and the other permitting only anagenetic shifts (denoted as MA, with λDDP = 0). By comparing data fits to these three models, we were able to test whether there was significant evidence for the presence of cladogenesis (MA rejected in favour of MD) and/or whether there was significant evidence for anagenesis (MC rejected in favour of MD).

For the PloiDB data sets, ML fitting was performed on each of the 20 MrBayes trees analyzed in the MCMC analysis. Ten starting points were randomly drawn from the MCMC samples (described above), and the parameter set that yielded the maximum likelihood of the data across the ten attempts was kept. This procedure was conducted for each of MA, MC, and

MD. To summarize the results across trees, we calculated, for each tree, twice the difference in the maximum log likelihood values (2ΔlnLik) between the “dual” model (MD) and a reduced model (MA or MC) (i.e., 2*[lnLik of MD – lnLik of MA or MC]), and then took the median over all trees. MA (or MC) was rejected in favour of MD when the median 2ΔlnLik was greater than

2 c p=0.05 = 3.841. For the M2011 data sets, we performed ML fitting to 50 MrBayes trees (those used in the MCMC analysis) instead of 20, using the same procedure as for the PloiDB data sets.

52 3.3.5 Implementation

The MCMC and ML analyses were performed in the R statistical computing environment

(R Core Team, 2015) using some phylogenetic utilities in the package ape 3.4 (Paradis et al.,

2004) and the ClaSSE model and statistical methods in the package diversitree 0.9-8 (FitzJohn,

2012). HPD intervals were computed using the package coda 0.18-1. The R scripts implementing the analysis procedures are available in the Dryad repository.

Results

In this study, we tested, at a broad phylogenetic scale, whether the mode of polyploid transition in plants is mainly cladogenetic (i.e., coinciding with branching events) or anagenetic

(i.e., arising along branches). To this end, we assembled a large database of angiosperm genus data sets (PloiDB) using an automated procedure, and then combined it with manually curated data sets from a previous study (M2011). The sampling fraction of the PloiDB data sets ranges from 5 to 100% (median of 50%), and the percentage of polyploids from 1 to 98% (median of

18%) (Table S1, Appendix B). Using the PloiDB and M2011 data, we took two complementary approaches (MCMC and ML) to examine whether the mode of polyploid transition, cladogenetic or anagenetic, can be identified in each genus data set.

In the MCMC analysis of the PloiDB data sets, we performed an MCMC procedure to determine the 95% HPD interval of the proportion of ploidy shifts that are cladogenetic (φ). The lower bound of the HPD interval of φ was greater than 0.5 in three genera, consistent with the cladogenesis hypothesis. The upper bound of the HPD interval of φ, however, was never less than 0.5, indicating no support for the hypothesis that anagenesis is the main mode of polyploid transition. The posterior distribution for φ revealed a slight, but not significant, shift towards

53 higher values of φ, with the median lying above 0.5 for 129 out of 223 genera (P = 0.0226, exact two-tailed binomial test with N = 223 and p = 0.5; Figure 1A; Table S2, Appendix B).

Similarly, in the MCMC analysis of the M2011 data, the HPD intervals supported cladogenesis in a single genus (Achillea) and anagenesis in none of the data sets examined. The overall posterior distribution was also shifted upwards, with a median above 0.5 in 21 out of 29 genera

(P = 0.0241, exact two-tailed binomial test with N = 29 and p = 0.5; Figure 1B; Table S3,

Appendix B), again suggesting that cladogenesis is the dominant mode of polyploid transition.

Figure 1: Distribution of 95% highest posterior density (HPD) intervals of the relative proportion of cladogenetic transitions (φ) in the PloiDB (A) and M2011 (B) data sets. HPD intervals that lie entirely above 0.5 are highlighted in green. The median of the posterior distribution of φ is marked by a black dot, notably more of which lie above φ = 0.5 (intersection of orange dashed lines), in favour of the cladogenetic shift hypothesis.

In the ML analysis of the PloiDB data sets, we performed likelihood ratio tests to determine whether reduced models (MA and MC, permitting either anagenesis-only or cladogenesis-only transitions, respectively) could be rejected in favour of the “dual” model (MD, where both anagenetic and cladogenetic transitions are possible). Among the 223 PloiDB genus

54 data sets, MA was rejected in favour of MD in 29 genera, supporting the inclusion of cladogenesis in the model (Table S2, Appendix B). Conversely, MC was rejected in favour of MD in only six genera), supporting the inclusion of anagenesis in the model. In one genus (Veronica), both MA and MC were rejected in favour of MD. We discovered a consistent result from the analysis of the

M2011 data sets, with MA rejected in favor of MD in 11 data sets and MC in only one genus

(Physalis) (Table S3, Appendix B). The results for the overlapping genera are generally concordant (i.e., rejecting the same model significantly or, if not significantly, at least weakly rejecting the same model), but different models were rejected in four cases. These discrepancies are due to the following: (1) higher coverage of PloiDB datasets compared to M2011

(Campanula and Solanum), (2) different percentages of taxa with unreliable ploidy estimates

(Physalis; 16% in PloiDB and 35% in M2011), or (3) have different proportions of polyploids

(Mimulus). Taken together, these likelihood analyses indicated that the anagenesis hypothesis is rejected significantly more often (53 genera) than the cladogenesis hypothesis (three genera; excluding the four genera with discrepancies; P = 8×10-13, exact two-tailed binomial test with N

= 56 and p = 0.5).

Discussion

The tempo and mode by which traits evolve over time is one of the most enduring questions in evolutionary biology (Simpson, 1944). In this paper, we investigated the tempo and mode by which the genome evolves by considering the pattern of polyploidization events across the phylogenetic trees for 233 genus-level clades (223 in PloiDB and 29 in M2011, including 19 common clades). To this end, we used a likelihood method (ClaSSE; Goldberg & Igic, 2012) that estimates the extent to which trait changes are concentrated at speciation events or occur at a rate

55 proportional to time. Anagenesis produces trees where the number of polyploidization events is proportional to the amount of time spent as a diploid, whereas cladogenesis produces trees where polyploidization events are proportional to the number of speciation events that diploids have undergone (which may or may not leave a node in the phylogeny of extent species, depending on subsequent extinctions). In addition, cladogenetic change is more likely to be inferred when very closely related sister species differ in ploidy, because the ploidy difference is more probable if speciation itself led to a polyploid daughter (cladogenesis) than if the polyploidization event followed speciation across the very short branch to the present (anagenesis).

In the ML analysis, we found that models with only anagenetic ploidy shifts were rejected in significantly more genera (53 genera) than were models with only cladogenetic shifts

(three genera), providing a strong indication that ploidy shifts are associated with speciation events in many genera. Using a Bayesian approach, we also found that the HPD interval was consistent with a preponderance of cladogenesis (HPD falling entirely above 0.5) in four genera

(across both the PloiDB and M2011 data sets), but never indicated a preponderance of anagenesis (HPD falling entirely below 0.5).

The majority of genera fail to provide a strong enough signal for ClaSSE to distinguish cladogenetic and anagenetic trait changes, which is not surprising given that polyploidization may have occurred only once or a few times in some genera and the signal in any one genus may be very weak. Simulations conducted by Magnuson Ford & Otto (2012) demonstrated that power to detect cladogenesis, when it does occur, increases substantially with clade size, and that these methods are, if anything, conservative (Type I error rates were less than 5% for an a value of 0.05). Power to detect cladogenesis is likely to be substantially reduced in groups with a high extinction rate or a high fraction of species with missing data, as these would obscure the timing

56 of ploidy transitions. For example, extinction or un-sampled species can cause nodes in the complete tree to be lost and appear as branches in the inferred tree, making cladogenetic and anagenetic events harder to distinguish.

Nevertheless, considering the PloiDB datasets, the fact that we rejected the MA model in favour of the MD model that includes cladogenesis in 29 out of 223 likelihood ratio tests (13%) is substantially more often than expected by chance (P = 3×10-13; exact two-tailed binomial test with N = 223 and p = 0.05). These 29 genus data sets are of better quality than the other 194, on average, having more taxa (145 vs. 75) and higher sampling fraction (60% vs. 51%), but similar percentage of taxa with uncertain ploidy estimates (~22%), suggesting more power in higher quality data sets. By contrast, the MC model was rejected in favour of the MD model that includes anagenesis in only 6 out of 223 of likelihood ratio tests (2.7%), which is lower than expected but not significantly so (P = 0.12; exact two-tailed binomial test with N = 223 and p = 0.05).

These results indicate that, at least in some genera, there is a strong signal that polyploidization is associated temporally with speciation events. Correlation does not, however, imply causation. Thus, while our data are consistent with polyploidization as an important mechanism leading to the formation of new species, it must be kept in mind that the direction of causality may be reversed: that speciation may lead to polyploidization. For example, hybrids often produce unreduced gametes at a higher rate (Harlan & deWet, 1975; Ramsey & Schemske,

1998), which implies that newly formed species may hybridize and generate polyploid descendants at a higher rate, leading to a temporal association without polyploization directly causing speciation. For this to lead to a false signal of cladogenesis, however, requires a short time frame within which hybridization remains likely (temporally associated with the speciation event), which might not be the case (Levin, 2012).

57 Another caveat that must be considered is that likelihood models can only detect processes included within the model and may be sensitive to factors not included that may leave similar signals (see, e.g., FitzJohn, 2012; Rabosky & Goldberg, 2015). Although we do not know exactly what signals may mislead inferences about cladogenesis versus anagenesis from

ClaSSE (or BiSSE-ness), one potential issue is if taxonomists elevate intraspecific ploidy variants to species status more readily than they would for variants exhibiting the same amount of reproductive isolation without ploidy differences. This may cause an excess of recently diverged species pairs to differ in ploidy, providing a misleading signal in favour of cladogenesis. Tree-building artefacts may also be an issue, particularly if they cause artificially short branch lengths between diploid and polyploid sister species. Conversely, taxonomists may ignore differences displayed by newly formed polyploid species (particularly, autopolyploids;

Soltis et al., 2007), lumping together recently diverged diploids and polyploids. This delay in recognizing polyploid species may obscure signals of cladogenesis.

Keeping in mind the caveats mentioned above, this study contributes to our understanding of the role of polyploidy in speciation by providing statistical evidence that polyploidization events are synchronized over evolutionary time with the formation of new species in many groups of plants.

58 Hidden introductions of freshwater red algae via the aquarium trade exposed by DNA barcodes

Overview

The global aquarium trade can introduce alien freshwater invaders, potentially impacting local aquatic ecosystems and their biodiversity. The role of the aquarium trade in spreading freshwater red macroalgae that hitchhike on ornamental aquatic plants and animals is unassessed.

We investigated this human-mediated phenomenon via a broad biodiversity survey and genetic analysis of freshwater red algae in the field and aquarium shops in East Asia. Using rbcL-based

DNA barcoding, we surveyed 125 samples from 46 field sites and 88 samples from 53 aquarium shops (213 samples in total) mostly across Taiwan – a key hub in the global aquarium trade – as well as in Hong Kong, Okinawa (Japan), the Philippines, and Thailand. We augmented our rbcL sequences with GenBank rbcL sequences that represent 40 additional countries globally. We found 26 molecular operational taxonomic units (mOTUs) in Taiwan, some of which are cryptic. mOTUs are clusters of highly similar sequences which may be interpreted as hypothesis species.

Phylogeographical analysis revealed three potential introduced mOTUs in Taiwan, which exhibit no local genetic variation in Taiwan and are distributed across continents. Also, we posit that some presumably endangered freshwater red algae may be preserved in aquaria, an unintentional ex situ conservation site for these organisms that are vulnerable to water pollution from anthropogenic disturbances. Collectively, these data suggest that freshwater red algae have been hitchhiking and dispersed via the aquarium trade, an important and overlooked mechanism of introduction of these organisms across the globe.

59 Introduction

The aquarium trade facilitates introduction of alien species and homogenization of biodiversity in aquatic ecosystems worldwide (e.g., Padilla & William, 2004; Rahel, 2007;

Strechker et al., 2011). Many introduced species have hardly discernible or subtle impacts on the biodiversity of local environments, but some might become harmful invasive species (Strayer,

2010; Simberloff, 2014). Efforts to eradicate invasive species incur substantial social and economic costs, yet they are often unsuccessful (reviewed in Simberloff et al., 2013). Therefore, to prevent and control the spread of invasive species, it is important to establish an early warning and rapid response system. To support the implementation of such a system, knowledge about the taxonomic diversity and transportation of potential invasive species is critically needed.

As invasive species, aquatic hitchhikers are damaging to marine and freshwater ecosystems and potentially to public health (e.g., Patoka et al., 2016; Duggan & Pullan, 2017; Duggan et al.,

2018). It has been suggested that the socioeconomic impact of aquatic hitchhikers, such as algae, is often overlooked (reviewed in Kaštovský et al., 2010). For instance, an introduced green alga,

Hydrodictyon reticulatum, has been reported to cause the decline of local fish and aquatic plants through habitat transformation, but it had been unnoticed for long until its blooming (Wells et al.,

1999; Wells & Clayton, 2001).

Unlike ornamental animals and plants in the aquarium trade, less attention has been paid to the diversity and introduction potential of aquatic hitchhikers largely because of their species crypticity (Stoyneva et al, 2006; Kato et al., 2009). Freshwater red macroalgae are aquatic hitchhikers that are occasionally found in aquarium tanks (e.g., Kaufmann, 2010), but these algae are rare in the field near cities due to their vulnerability to polluted water (Sheath & Hambrook,

1990; Sheath & Vis, 2015). These algae produce inconspicuous spores that suspend in water or

60 adhere to aquatic organisms, such as aquatic plants (Figure 2a), crayfish (Figure 2b), and snails

(Figure 2c-e). Spores can be generated by asexual reproduction (when the algae are turf-like sporophytes, or chantransia) or by sexual reproduction (when the algae appear as thread-like, mucilaginous gametophytes). The asexual form may facilitate the population establishment of introduced freshwater red algae in the non-native range where mating partners may be unavailable, but to our knowledge this has not been supported empirically.

Only a few freshwater algal species have been documented as biological invasions. These known invaders are Bangia atropurpurea (a red alga; Lin & Blum, 1977), Chara connivens (a green alga; Luther, 1979), Nitellopsis obtusa (a green alga; Schloesser, 1986), Compsopogon caeruleus (a red alga; Manny et al., 1991), Hydrodictyon reticulatum (a green alga; Hawes et al.,

1991), and Ulva flexuosa (a green alga; Kaštovský et al., 2010). These algae have been introduced by discharge of ballast water (e.g., Lin & Blum, 1977; Manny et al., 1991) or accidental release from laboratory experiments (Hawes et al., 1991). Freshwater red macroalgae can be dispersed through the aquarium trade (Composopogon caeruleus, Stoyneva et al., 2006;

Montagnia macrospora, Kato et al., 2009). Recently, it has been speculated that the cosmopolitan distribution of freshwater red macroalgae may be facilitated by the aquarium trade

(e.g., Carlile & Sherwood, 2013; Johnston et al., 2018). Beyond speculation, it has not been explored whether the aquarium trade can serve as a vector for the global dispersal of freshwater red algae.

61

Figure 2: Examples of epiphytic (a) and epizoic (b-e) freshwater red macroalgae as aquatic hitchhikers (black arrows). (a) Compsopogon coeruleus (THU.369) (black arrow) on an ornamental aquatic plant, Aubulias barteri (white arrow), in the Gou-Lao-Ban Pet Shop

(Taoyuan, Taiwan). (b) The chantransia stage of Sheathia dispersa (THU.575) (black arrow) on a red swamp crayfish, Procambarus clarkii (white arrow), in the Yu-Zhong-Yu Aquarium Shop

(Taichung, Taiwan). (c) The chantransia stage of Montagnia macrospora (THU.537) (black arrow) on an apple snail, Pomacea canaliculata (white arrow), in Nanshi River (Taichung,

Taiwan). (d) The chantransia stage of M. macrospora (THU.401) (black arrow) on a chopstick snail, Stenomelania sp. (white arrow), in the Mataian Wetland Ecological Park (Hualien,

Taiwan). (e) A six-month lab culture showing the growth of the gametophyte of S. dispersa

(black arrow) on the shell of a chopstick snail, Stenomelania sp. (white arrow), which was collected from the field. Photo credit: Shao-Lun Liu.

62 Traditional biodiversity monitoring programs depend on morphology-based approaches, which require time-consuming diagnostics by taxonomy experts (Riedel et al., 2013).

Hitchhiking freshwater red macroalgae are morphologically indistinguishable, thus rendering their species identification challenging. A powerful alternative approach to monitor biodiversity is DNA barcoding, which typically involves sequencing individual genetic loci, such as rbcL.

DNA barcodes have been applied to detect introduced and invasive organisms (Armstrong &

Ball, 2005; Pečnikar & Buzan, 2014). Recently, Vranken and coworkers (2018) investigated the spread of introduced and invasive marine macroalgae (i.e., seaweeds) via the aquarium trade in

Europe using DNA barcodes. No study has examined the introduction of freshwater red macroalgae via the aquarium trade at a geographically broad scale.

DNA barcode sequence data can be utilized to estimate the effect of introduction(s) on the genetic diversity of an alien species (e.g., Bonett et al., 2007; Kinziger et al., 2011). The population of a newly introduced species is predicted to harbor lower genetic diversity than its source population(s) as a consequence of a recent genetic bottleneck (Nei et al., 1975; Barrett &

Husband, 1990), but multiple introductions may increase the genetic diversity of an introduced population (Novak & Mack, 1993; Novak & Mack, 2005) (empirical evidence reviewed in

Dlugosch & Parker, 2008). Following this logic, the genetic information in DNA barcodes, in combination with geographic occurrence data), can enable detection of putative introduced or invasive species.

In this study, we employed DNA barcoding to survey the biodiversity of freshwater red macroalgae from the field and aquarium shops across Taiwan – a key hub in the global aquarium trade (Padilla & William, 2004) – as well as in a few surrounding regions (Hong Kong, Okinawa in Japan, the Philippines, and Thailand). We collected 213 macroalgal specimens from 99 sites

63 (mostly in Taiwan, 196 specimens from 90 sites), determined the sequences of their rbcL (a chloroplast gene), and estimated the number and species identity of molecular operational taxonomic units (mOTUs). These data yielded, for the first time, a detailed picture of the biodiversity of freshwater red macroalgae in Taiwan, in the field and aquaria. To identify potential introduced species in Taiwan, we examined the genetic diversity and global distribution of the mOTUs by analyzing our data jointly with previously published rbcL sequences of specimens collected worldwide. Also, we assessed the relative proportion of freshwater red algae in asexual and sexual forms in both the field and aquaria to test whether the asexual form is more frequently observed in aquaria. Finally, we examined ornamental fish trade data to assess the plausibility of an introduction route of an alga via Taiwan’s aquarium trade.

Materials and Methods

4.3.1 Sample collection

In total, 213 specimens were collected from the field and aquarium shops in Taiwan,

Okinawa (Japan), Hong Kong, the Philippines, and Thailand (Figure 3; Table S1, Appendix C); of the 213 samples, 125 were collected from 46 field sites and 88 from 53 aquarium shops. One sample that was collected from a brackish area was included in the species delimitation analyses, but it was excluded from other analyses (Table S1, Appendix C). After collection, a portion of the material (about 100-200 mg) was preserved in silica gel or 95% ethanol for molecular analyses, whereas the rest of the material was preserved in 10-15% formalin solution for morphological examination.

64

Figure 3: Sites in Taiwan, Okinawa (Japan), Hong Kong, Thailand, and the Philippines sampled in this study.

4.3.2 DNA extraction, PCR, and Sanger sequencing

DNA was extracted from silica gel-dried specimens or 95% ethanol-preserved specimens by using the commercial ZR Plant/Seed DNA kit (Zymo Research, CA, USA) following the manufacturer's protocol. We amplified rbcL under the PCR conditions described in the protocol of Wang et al. (2005) using these gene-specific primers: two forward (F7: 5'-

AACTCTGTAGAACGNACAAG-3' and F160: 5'-CCTCAACCAGGAGTAGATCC-3') and one reverse (R753: 5'-GCTCTTTCATACATATCTTCC-3') (Vis et al., 1998; Lin et al., 2001). The

PCR products were sent to the Mission Biotech Company (Taipei, Taiwan) for Sanger sequencing. The GenBank accessions of the newly generated sequences are listed in Table S1

(Appendix C).

65 4.3.3 Sequence acquisition and curation

Poor taxon sampling may lower the accuracy of phylogenetic tree reconstruction and species delimitation (Esselstyn et al., 2012; Ahrens et al., 2016). Therefore, we combined the newly generated rbcL sequences with additional rbcL sequences from NCBI GenBank (accessed on Oct. 18, 2018). Using the literature as a guide (see the references in Table S1, Appendix C), we downloaded from GenBank 1,046 rbcL sequences of the freshwater and marine species of the following genera: Bangia, Bostrychia, Caloglossa, and Hildenbrandia. We built a preliminary phylogeny to identify potential contaminant sequences, which appear as either unexpected non- monophyletic placements or long branch attraction. The following 12 potential contaminant

GenBank sequences were removed: ten Hildenbrandia sequences (AF107812, AF107818,

AF107822, AF107827 to AF107831, AF534406, AF534408), one Thorea sequence

(GU953248), and one Bangia sequence (AF043379).

The 1,034 cleaned GenBank sequences and the 213 newly produced sequences (deposited in GenBank under the accessions: MH835465 - MH835677) were combined into a data set of

1,247 sequences (1,054 freshwater taxa and 193 non-freshwater taxa). We reduced sequence redundancy by removing identical sequences using the UCLUST function (hereafter as

UCLUST100) implemented in USEARCH v8.1.1756 (Edgar, 2010). From each UCLUST100 cluster, we randomly selected the longest sequence. This resulted in a final dataset of 639 sequences (500 freshwater taxa, 135 non-freshwater taxa, and 4 taxa that can live in both types of habitat) for the phylogenetic analysis and species delimitation. The sequences were mostly contributed by certain geographical regions – including Australia, Brazil, the United States,

Europe, Japan, and Taiwan (primarily from this study) – where there has been extensive

66 taxonomic diversity work (Figure S1, Appendix C). Therefore, the biodiversity reported in this study is biased heavily towards those regions (Figure S2, Appendix C).

We updated the species names by consulting the taxonomic nomenclature in AlgaeBase

(Guiry & Guiry, 2020; accessed Mar. 7, 2020).

4.3.4 Phylogenetic tree reconstruction

The rbcL sequences were aligned using MUSCLE 3.8.31 with the default settings (Edgar,

2004) followed by manual inspection using BioEdit (Hall, 1999). Phylogenetic analyses were conducted using two different methods to produce a maximum likelihood (ML) tree and an ultrametric Bayesian tree. A non-ultrametric ML phylogeny was inferred under the best-fit nucleotide substitution model (GTR+G+I; according to the lowest Bayesian Information

Criterion) with 1,000 bootstrap replicates using MEGA 6.0 (Tamura et al., 2013). An ultrametric

Bayesian phylogeny was estimated using MrBayes 3.2.2 (Ronquist et al., 2012), assuming

GTR+G+I (which was determined to be the best-fit model using MEGA) and an Independent

Gamma Rate relaxed clock (Lepage et al., 2007). Four MCMC chains (one hot and three cold) were run for 200 million generations, sampling every 20,000 generations and discarding the first

80% of the posterior samples such that the average standard deviation of split frequency was less than 0.01. A majority-rule consensus tree was summarized from the MCMC trees with posterior probabilities as support values using Mesquite v2.75 (Maddison & Maddison, 2011). The ML tree was taken as input for the PTP species delimitation method and the Bayesian consensus tree for the GMYC method (see below).

67 4.3.5 Species delimitation

DNA-based algorithmic species delimitation methods are an important tool to differentiate species within morphologically indistinguishable taxonomic groups (reviewed in

Leliaert et al., 2014).Several studies have raised concerns about the accuracy of inferring species boundaries using single-locus data (e.g., Knowles & Carstens, 2007; Dupuis et al., 2012).

To minimize potential false positives, it is advised to infer mOTUs using multiple species delimitation methods and then to take the most conservative result, which has the fewest mOTUs

(Carsten et al., 2013). After determining mOTUs, the non-freshwater taxa were excluded from downstream analyses.

For freshwater red algae, rbcL is the only marker with abundantly available sequence data. We estimated mOTUs based on rbcL sequences using three different methods: automated barcode gap discovery (ABGD), generalized mixed Yule-coalescent (GMYC), and Poisson tree process (PTP). The distance-based method, ABGD, identifies mOTUs by finding the threshold between intraspecific (i.e., within a mOTU) genetic distances and interspecific (i.e., between mOTUs) genetic distances. The threshold was determined based on the distribution of the pairwise genetic distance between any two given sequences (corrected under the Kimura 2- parameter model). We employed the ABGD online tool

(https://bioinfo.mnhn.fr/abi/public/abgd/) using the default settings (Puillandre et al., 2012).

Next, we inferred mOTUs using two coalescent-based methods, GMYC and PTP. GMYC determines the point of transition from interspecific branching (a pure birth process) to intraspecific branching (a neutral coalescent process) on a clock-calibrated ultrametric tree (Pons et al., 2006). For the GMYC analysis, the ultrametric tree was imported into Splits (Pons et al.,

2006) in R, following the procedure described in Pons et al. (2006). PTP determines mOTUs by

68 detecting the threshold between the intraspecific branching rate and the interspecific branching rate (Zhang et al., 2013). For the PTP analysis, the ML tree was imported into the PTP web server (https://species.h-its.org/).

4.3.6 Rarefaction Analysis

To evaluate sampling effort, we estimated the projected maximum number of mOTUs and sample size-based completeness using iNEXT (iNterpolation and EXTrapolation; Hsieh et al., 2016). The maximum number of mOTUs and the number of samples (i.e., the sample size, or more specifically the number of rbcL sequences in this study) needed to detect the maximum number of mOTUs were estimated by a rarefaction analysis. Sample size-based completeness is defined as the number of samples collected in this study divided by the number of samples needed to capture the maximum number of mOTUs.

4.3.7 Identification of introduced species

Low genetic diversity and large geographical distribution are suggested to be informative criteria to identify potential introduced taxa in the red algae (e.g., Necchi Jr. et al., 2013; Díaz-

Tapia et al., 2018; Johnston et al., 2018). We estimated the genetic diversity and geographical range of each mOTU. Two indices of genetic variation, haplotype diversity (Hd) and nucleotide diversity (π), were computed using DnaSP 6 (Rozas et al., 2017). Before calculating these indices, the multiple sequence alignment was trimmed on both ends, where there were overhanging bases (or missing data). Also, we computed the maximum distance between any two field locations for a mOTU as a measure of the geographical range of the mOTU. These data were obtained for the five aquarium mOTUs with sufficient sequence data (at least three

69 aquarium samples and three field samples in Taiwan) to estimate local (i.e., in Taiwan) and global genetic variation. We considered mOTUs with no local genetic variation (Hd = 0 and π =

0) and large geographical range (at least 10,000 km) as likely to be introduced through the aquarium trade in Taiwan.

Additionally, for the introduced mOTUs with enough sampling, their rbcL haplotype networks were inferred using PopArt 1.7 (Leigh & Bryant, 2015). Haplotype networks have been utilized to identify the potential geographical source(s) of introductions (e.g., Kato et al., 2009).

Populations with higher genetic diversity may be the source of introduced species, which often undergo a genetic bottleneck (e.g., Allendorf & Lundquist, 2003; Roman & Darling, 2007).

4.3.8 Trade data

There are no customs records for freshwater red algae, as they are not traded. We sought for trade data on ornamental aquatic plants and freshwater fish instead, because freshwater red algae hitchhike on those traded organisms. We found no customs records on ornamental aquatic plants imported to and exported from Taiwan in the Taiwan Customs database. But we found import and export data for ornamental freshwater fish between 2013 and 2017 (under the HS code “0301110090 Other Ornamental Fish, Freshwater” from the Customs Administration,

Ministry of Finance, Taiwan; https://portal.sw.nat.gov.tw/APGA/GA03).

Results

ABGD, PTP, and GMYC inferred 27, 28, and 28 mOTUs in the 213 samples collected for this study, respectively (Figure S3, Appendix C). We took the most conservative estimate of 27 mOTUs (26 are freshwater taxa and one is brackish) inferred using ABGD (initial partition with

70 prior maximal distance, p = 7.74 x 10-3; distance K80 Kimura, MinSlope = 1.5). The ABGD analysis indicated that globally there are 170 mOTUs of freshwater red macroalgae, of which 26

(~15%) are found in Taiwan. A rarefaction analysis showed that excellent sampling effort was achieved to detect mOTUs (95% and 96% sample-size based completeness for the field and aquarium samples, respectively; Figure 4).

71

Figure 4: Distribution of freshwater red algal mOTUs (a) and proportion of chantransia (versus gametophyte) (b) found in the field and aquarium samples from Taiwan. Only the specimens of the taxa of Batrachospermales and Thoreales were counted for panel (b). The observed number of mOTUs, sampling effort, and the number of maximum mOTUs projected using iNEXT are shown in parentheses in panel (a). The type of tank where the aquarium samples were collected is indicated in panel (b). Taxa exhibiting no genetic variation in either the field or aquarium samples are marked by the hash sign. The asterisks indicate a significant difference as per

Fisher’s exact test (p < 0.05).

72 Of the 26 freshwater mOTUs, 13 (50%) are found in the field only, four (~15%) in aquaria only, and nine (~35%) mOTUs in both the field and aquaria (Figure 4). We have enough samples (at least three from the field and aquaria) for only five of the mOTUs to estimate nucleotide diversity and haplotype diversity (Table 4). Therefore, we identified potential introduced taxa among the five mOTUs based on genetic and geographical data. Three of these five mOTUs (Kumanoa mahlacensis mOTU067, Montangnia macrospora mOTU120, and

Thorea hispida mOTU122) exhibit no local genetic variation (i.e., in Taiwan) in the field and aquarium samples (Table 4; Figure 4a) and are found across large geographical distances (i.e., across continents). We further examined these three mOTUs as taxa that might have been introduced into Taiwan.

For M. macrospora (which has decent global sampling), a haplotype network was reconstructed to identify the potential source population(s) of this introduced alga (Figure 5).

Eight distinct rbcL haplotypes were recovered (designated H1 to H8). All eight haplotypes (H1 to H8) were found in South America (specifically, Bolivia, French Guiana, and Brazil), whereas only two haplotypes (H1 and H2) were found in East Asia. H1 appears to be the most geographically widespread haplotype (at least based on the present sampling), and it is the only haplotype found in Taiwan in the field and aquaria. Interestingly, H2 was not found in Taiwan but was found in Hong Kong, Thailand, and South America. These data suggest that South

America might have been the source of M. macrospora (indicated by high haplotype diversity) and East Asia (with sampling densely concentrated in Taiwan in this study) might have been the sink (indicated by low haplotype diversity).

73 Table 4: Maximum pairwise geographical distance, haplotype diversity (Hd), and nucleotide diversity (π) of five molecular operational taxonomic units (mOTUs) found in the aquarium shops surveyed in Taiwan in this study. These taxa have been reported in East Asia, Europe, Oceania,

North America, and South America. The potential introduced taxa are bold and underlined. Hd and π were estimated for the aquarium samples and the field samples in Taiwan separately, together, and in combination with the global field samples from NCBI GenBank. Abbreviations:

Cc = C. caeruleus; Mm = M. macrospora; Km = K. mahlacensis; Th = T. hispida; Sd = S. dispersa; EA = East Asia; EU = Europe; NA = North America; OC = Oceania; SA = South

America.

Distance Aquarium (Taiwan) Field (Taiwan) Taxa Distributional region(s) (km) n Hd* π n Hd* π

Cc mOTU238 25,102 EA, EU, NA, SA, OC 3 0.000 (1) 0.00000 11 0.182 (2) 0.00171

Mm mOTU120 24,792 EA, SA 7 0.000 (1) 0.00000 18 0.000 (1) 0.00000

Km mOTU067 16,394 EA, EU, NA 6 0.000 (1) 0.00000 5 0.000 (1) 0.00000

Th mOTU122 16,186 EA, EU, NA 14 0.000 (1) 0.00000 7 0.000 (1) 0.00000

Sd mOTU010 10,175 EA, NA 13 0.682 (4) 0.00633 26 0.538 (3) 0.00365

Aquarium + field Aquarium + field (Taiwan) (world, including Taiwan)

Cc mOTU238 14 0.473 (3) 0.00219 72 0.487 (4) 0.00167

Mm mOTU120 25 0.000 (1) 0.00000 63 0.620 (8) 0.01225

Km mOTU067 11 0.000 (1) 0.00000 16 0.125 (2) 0.00293

Th mOTU122 21 0.000 (1) 0.00000 38 0.149 (2) 0.00035

Sd mOTU010 38 0.572 (5) 0.00442 87 0.742 (7) 0.00344

*The number of distinct haplotypes is parenthesized.

74

Figure 5: Haplotype network of Montagnia macrospora, a potential introduced species identified in this study. A network of eight distinct haplotypes (H1 to H8) was inferred by TCS.

The hatch marks on the links between the haplotypes represent mutational steps. The asterisks indicate the aquarium samples. The field and aquarium samples from East Asia contain only H1 and H2 (low haplotype diversity), whereas all eight haplotypes were found in the field samples from South America (high haplotype diversity). Abbreviations: BL, Bolivia; BR, Brazil; FG,

French Guiana; HK, Hong Kong; MY, Malaysia; OK, Okinawa; TH, Thailand; TW, Taiwan.

75 Of the freshwater red algae we found in aquarium shops, only the algae of

Batrachospermales (Kumanoa, Sheathia, and Virescentia in our data) and Thoreales

(Nemalinopsis and Thorea in our data) have morphologically distinct alternating life stages. We observed that the relative proportion of gametophyte and chantransia (which were identified morphologically) is different between the field and aquarium samples (p = 6.34 x 10-12, Fisher’s exact test; Figure 4b), indicating that chantransia (the asexual stage) is the more frequently occurring reproductive form in aquaria. Chantransia do not seem to have substrate preference between aquatic plant tanks and fish tanks (p = 0.86, Fisher’s exact test; Figure 4b). However, we observed a higher proportion of chantransia specimens in aquatic plant tanks than fish tanks

(Figure 4b), suggesting that chantransia might hitchhike more often on ornamental aquatic plants than animals. Also, we observed that in both types of tanks, chantransia seem to grow preferentially on inanimate substrates (plastic materials or rocks) rather than aquatic plants or animals (Figure 4b).

Discussion

The diversity of freshwater red algae in the aquarium trade has been largely unexplored, because the algae are easy to miss as minuscule spores and are often morphologically indistinguishable (i.e., cryptic). Collectively, only 10 species of freshwater red macroalgae have been documented in Taiwan on the basis of molecular or morphological evidence (Wu, 1999;

Wu, 2001; Liu et al., 2004; Chou & Wang, 2006; Vis et al., 2010; Chou et al., 2014; Chou et al.,

2015). Through our broad DNA barcode-based survey of field sites and aquarium shops across

Taiwan, we found 26 mOTUs of freshwater red macroalgae (13 mOTUs in the field only, four mOTUs in aquaria only, and nine mOTUs in the field and aquaria). To the extent that the

76 mOTUs correspond to distinct species, this result indicates that the biodiversity of freshwater red macroalgae in Taiwan is substantially richer than previously reported.

In this study, we found that some freshwater red algae might have been introduced via the aquarium trade, in line with previous observations (e.g., Stoyneva et al., 2006; Kato et al., 2009).

Using the amount of genetic variation and the extent of geographical distribution as criteria, we identified three mOTUs (Kumanoa mahlacensis, Montangnia macrospora, and Thorea hispida) as potential introduced species among the 13 mOTUs found in aquaria in Taiwan. The three mOTUs were also found in the field samples, suggesting successful introduction. M. macrospora has already been reported to be an alien species in a eutrophic artificial dam in Okinawa, Japan

(Kato et al., 2009). T. hispida has been reported to be cosmopolitan with low genetic diversity, possibly partly due to the aquarium trade (Johnston et al., 2018). This epizoic alga can hitchhike on rusty crayfish (Orconectes rusticus), which is a common invasive species in Asia (Fuelling et al., 2012). As far as we know, K. mahlacensis is not known as an introduced species elsewhere in the world. On the basis on our data, we hypothesize that K. mahlacensis has been introduced to

Taiwan via the global aquarium trade.

Routes of introduction can be inferred based on genetic and occurrence data (e.g., Bonett et al., 2007; Muirhead et al., 2008; Le Roux et al., 2011). We illustrate this using M. macrospora as an example. Kato et al. (2009) has established that M. macrospora was introduced by the aquarium trade between South America and Okinawa, Japan. Later, this alga was reportedly found in the field in Taiwan and Malaysia (Chou et al., 2014; Johnston et al., 2014). In our study,

M. macrospora was found in the aquarium shops surveyed in Taiwan, Okinawa (Japan), Hong

Kong, and Thailand. We observed that the nucleotide diversity and haplotype diversity of the rbcL sequences from the South America samples of M. macrospora (n = 25, π = 0.02183, Hd =

77 0.820, 8 haplotypes; from Bolivia, Brazil, and French Guiana) are higher than the nucleotide diversity and haplotype diversity of the sequences from the East Asia samples (n = 38, π =

0.00004, Hd = 0.149, 2 haplotypes; from Taiwan, Thailand, Okinawa in Japan, and Malaysia).

Only two haplotypes (H1 and H2) were found in East Asia (Figure 5). Interestingly, the trade data of ornamental fish in Taiwan indicate that there is high import activity to Taiwan from

South America (Figure S4 and Table S2, Appendix C) and therefore that an introduction route between the two regions is plausible. These data suggest that M. macrospora might have been introduced to East Asia (possibly more than once, given that there are two haplotypes in East

Asia) from South America (but the country of origin is unknown due to limited field sampling) and then it may spread further among the regions in East Asia through the aquarium trade.

Plausible introduction routes for the other two cases (K. mahlacensis and T. hispida), however, are less clear because of insufficient sampling from field populations (Table 4).

In addition to potential introduced taxa, we identified cryptic ones. Our analysis uncovered three cryptic taxa of Thorea (mOTU128, mOTU129, and mOTU130) (Figure 4; Figure S3,

Appendix C). These cryptic taxa may be novel species that have not yet been circumscribed taxonomically. Interestingly, two of them (mOTU128 and mOTU130) were found to be common in the aquarium shops in Taiwan but not in the field anywhere in the world (at least based on the current sampling of rbcL sequences) (Figure 4; Figure S3, Appendix C). This observation raises the intriguing possibility that the aquarium trade can act as a reservoir for some algal species which may not be widespread in the field or may be even endangered (see below). These mOTUs are examples of cryptic taxa that have gone unnoticed under the current morphology- based biodiversity monitoring programs. For example, chantransia (the sporophytic stage) of freshwater red macroalgae are minute hitchhikers and morphologically alike. Such cryptic taxa

78 have the potential to become invasive. Therefore, it is important to incorporate better tools, such as DNA barcodes, into biodiversity monitoring approaches.

Besides introduced taxa and cryptic taxa, the aquarium trade may preserve endangered freshwater red algae. For example, Nemalionopsis shawii (previously named N. tortuosa) and

Thorea gaudichaudii are regarded as endangered in Japan, possibly due to anthropogenic disturbances (Brodie et al., 2009). Consistent with this, we preliminarily did not find these algae in the four aquarium samples that we collected in Okinawa, Japan (Table S1, Appendix C).

However, we did find a few uncommon instances of N. shawii (mOTU136; in 10 field samples and two aquarium samples) and T. gaudichaudii (mOTU131; in four field samples and two aquarium samples) in Taiwan (Figure 4), likely because of a much greater sampling effort (in total, 127 field samples and 76 aquarium samples were collected in Taiwan). If endangered freshwater red algae were accidentally transferred between the field and aquaria (which may be the case in Taiwan), then a wide survey of aquaria in Japan may reveal N. shawii and T. gaudichaudii. Thus, the aquarium trade may act as an ex situ preservation site for N. shawii, T. gaudichaudii, and probably other freshwater algae, which are being threatened by water pollution caused by urbanization or industrialization (Sheath & Hambrook, 1990; Sheath & Vis,

2015).

The criteria adopted in this study to identify potential introduced taxa may be too strict.

Introduction of a species into a non-native habitat can cause a genetic bottleneck that results in a drastic reduction of genetic diversity; however, multiple introductions of a species into the same non-native habitat can lead to appreciable levels of genetic diversity (reviewed in Wilson et al.,

2009). Thus, potential cases of introduction elsewhere (or in Taiwan) may be missed under our stringent criteria. One example is Composopogon caeruleus, which is frequently found in aquaria

79 (e.g., Stoyneva et al., 2006; Levent & Bud, 2010; Carlile & Sherwood, 2013; Necchi Jr. et al.,

2013). This alga has been reported as a tropical invader in Belgium via the aquarium trade

(Stoyneva et al., 2006). However, we ruled out this alga as an introduced species in Taiwan because it did not meet our criterion of no local nucleotide or haplotype diversity (Table 4).

For this study, we selected rbcL as a marker to track potential introductions because the sequence data for this gene are abundantly available and well represented geographically for freshwater red algae compared to other nuclear, mitochondrial, and plastid loci. The rbcL sequences of the specimens sampled worldwide enabled us to identify tentative cases of freshwater red algae introduced to Taiwan from distant regions and vice versa. In future investigations, it may be fruitful to explore the utility of alternative marker genes, such as rpoC1

(Zhan et al., 2020), and population-level markers, such as microsatellite loci (e.g., Kinziger et al.,

2011), to unveil candidate algal introductions alongside rbcL. Population-level markers would help to better evaluate differences in genetic diversity between an introduced population and its potential source population(s) (e.g., Muirhead et al., 2008), thereby refining our criteria of potential introduced algae. Presently, a disadvantage of using these marker loci is their geographically restricted sampling, which limits our capacity to detect long-distance introductions (e.g., between Taiwan and South America). Enough sequence data for those marker loci (with geographically broad sampling) need to be obtained before employing the marker loci to detect potential instances of introduced freshwater red macroalgae.

Here, we analyzed rbcL sequences for evidence that freshwater red macroalgae are dispersed via the global aquarium trade. Besides the aquarium trade, waterfowl may serve as another mechanism of short- or long-distance dispersal of freshwater red algae. Waterfowl are thought to be active dispersers of aquatic alien organisms, including algae (reviewed in Reynolds

80 et al., 2015). Short-distance dispersal of freshwater red algae (such as organisms of

Batrachospermales) by waterfowl is plausible. We found one case of potential introduction via short-distance dispersal – N. shawii, which is found only in East Asia based on the present sampling. However, long-distance, transoceanic dispersal of the freshwater algae by waterfowl is presumably unlikely, because these algae obviously cannot tolerate salt water or long-term desiccation. In this study, examples of freshwater algae dispersed over the are M. macrospora and T. hispidia. Our genetic diversity estimates suggest that M. macrospora has been introduced from South America to East Asia, and indicated that T. hispidia is distributed in

East Asia, Europe, and North America. Hence, for these two algae, the aquarium trade is a more probable mechanism of dispersal than waterfowl. Moreover, another possible mechanism to introduce alien freshwater algae is ballast water discharge, which is known to wreak environmental havoc by spreading invasive species (e.g., zebra mussel). Manny et al. (1991) speculated that C. caeruleus, a freshwater red alga, might be transported via ballast water discharge, but so far it remains to be a speculation. Although ballast water discharge is an unlikely mechanism to introduce freshwater algae into Taiwan (an island surrounded by seawater), it may transport freshwater algae between distant locations connected by freshwater bodies, such as lakes and rivers, in other parts of the world.

DNA barcoding of individual specimens, as performed in this study, can be labor-intensive and time-consuming, and it provides a restricted view of the biodiversity in an ecosystem. A powerful alternative is environmental DNA metabarcoding enabled by high-throughput sequencing. Recently, this approach has been applied for large-scale biodiversity monitoring of various ecosystems (reviewed in Thomsen & Willerslev, 2015). We envisage that environmental

DNA metabarcoding can accelerate the identification of introduced, cryptic, and endangered taxa

81 of freshwater red algae in the field and aquaria, thereby expanding our knowledge of the biodiversity of these algae in the wild and artificial environments.

Hitchhiking freshwater red algae have gone overlooked over the past decades. We surveyed the biodiversity of freshwater red macroalgae in the field and aquaria across Taiwan and in nearby regions. We identified potential introductions of freshwater red macroalgae into

Taiwan through the global aquarium trade. We anticipate that our data will serve as a taxonomic resource for future large-scale monitoring programs that utilize DNA barcoding and environmental DNA metabarcoding, especially around large urban centers where aquarium trade activity is higher (Strechker et al., 2011). Overlooked hitchhikers, such as freshwater red macroalgae, are not regulated by CITES (Convention on International Trade in Endangered

Species of Wild Fauna and Flora), hampering the prevention of their spread via the aquarium trade. It is important to educate aquarists and the public about the proper disposal of aquarium waste (e.g., putting it in solid waste for compost, microwaving/freezing it prior to the waste disposal, or treating it with chemicals), as suggested by Padilla & Williams (2004), Patoka et al.

(2016), and Vranken et al. (2018). Furthermore, detailed studies about the potential ecological and social concerns of algal introductions (e.g., the homogenization of global taxonomic diversity, loss of local biodiversity, and potential transfer of pests or pathogens) are needed to better determine whether or not hitchhikers should be actively controlled or eradicated in the aquarium trade.

82 Reappraising plastid markers of the red algae for phylogenetic community ecology in the genomic era

Overview

Selection of appropriate genetic markers to quantify phylogenetic diversity is crucial for community ecology studies. Yet, systematic evaluation of marker genes for this purpose is scarcely done. Recently, the combined effort of phycologists has produced a rich plastid genome resource with taxonomic representation spanning all of the major lineages of the red algae

(Rhodophyta). In this proof-of-concept study, we leveraged this resource by developing and applying a phylogenomic strategy to seek candidate plastid markers suitable for phylogenetic community analysis. We ranked the core genes of 107 published plastid genomes based on various sequence-derived properties and the difference between individual gene-based phylogenies and the overall genome-based phylogeny. The resulting ranking revealed that the most widely used marker, rbcL, is not necessarily the optimal marker, while other promising markers might have been overlooked. We designed and tested PCR primers for several candidate marker genes, and successfully amplified one of them, rpoC1, in a taxonomically broad set of red algal specimens. We suggest that our general marker identification methodology and the rpoC1 primers will be useful to the phycological community for investigating the biodiversity and community ecology of the red algae.

Introduction

Integration of phylogenetic information into community ecology has enjoyed an upsurge of interest in the past decade (e.g., Webb et al., 2002; Cavender-Bares et al., 2009; Weber et al.,

2017). With this marriage of phylogenetics and ecology, we can better explore the processes

83 shaping biodiversity and driving community assembly in an evolutionary context. The recent introduction of environmental DNA (eDNA) metabarcoding (i.e., identification of all species in an environmental sample via DNA sequencing) facilitates monitoring of community biodiversity of various organisms in virtually unlimited types of ecological niches. eDNA metabarcoding has been made widely accessible by high-throughput next-generation sequencing (HTS), by which millions of pieces of eDNA are sequenced in a massively parallel and cost-effective fashion.

Commonly, eDNA metabarcoding employs a single genetic marker that enables species identification, and the marker can be enriched via PCR amplification (reviewed in Deiner et al.,

2017) or target hybridization (i.e., using molecular “baits”; e.g., Wilcox et al., 2018). HTS-based eDNA metabarcoding has been applied in community ecology (reviewed in Porter & Hajibabaei,

2018), for example, to investigate species turnover in a community (e.g., Pérez-Valera et al.,

2015; Hugerth & Andersson, 2017) and to inform environmental management and conservation efforts (e.g., Kress et al., 2009; Brooks et al., 2015).

Most eDNA metabarcoding studies employ well-established genetic markers for pragmatic and historical reasons. In practice, a suitable genetic marker is amenable to primer design so as to maximize its PCR amplification efficacy across a variety of species within a group of interest. Considerations include (1) the length of the genetic region to be amplified

(typically, it is easier to achieve good amplification for regions less than 1,000 base pairs long) and (2) an appropriate level of nucleotide conservation across the group (i.e., the marker gene should be conserved enough for efficient PCR amplification, and yet it should evolve fast enough for species differentiation; reviewed in Deiner et al., 2017). For animals, plants, and bacteria, there are established DNA barcode genes for biodiversity surveys and community ecology (e.g., cox1, rbcL, and 16S rRNA). These marker genes are also the cornerstone of

84 molecular systematics and phylogenetics (e.g., Freshwater et al., 1994; Smith et al., 2006;

Lahaye et al., 2008). Thus, for those popular markers, large and high-quality reference databases exist (e.g., Barcode of Life Data System; Ratnasingham & Hebert, 2007).

In phylogenetic community ecology, two important quantities to estimate are relatedness among species within a community (i.e., phylogenetic alpha diversity) and relatedness among species between communities (i.e., phylogenetic beta diversity). The measurement of these alpha and beta diversity indices can inform us whether or not a given community has greater phylogenetic diversity or more distinct phylogenetic components than other communities (e.g.,

Kembel et al., 2010; Daru et al., 2017). Poor phylogenetic signal, however, may lead to erroneous inferences about phylogenetic relatedness among species within a community or among communities. For instance, considering alpha diversity, phylogenetic misplacement of taxa based on a marker with poor phylogenetic signals might misleadingly inflate the phylogenetic diversity of a community (e.g., increasing phylogenetic evenness; see Scenario 1 in

Figure 6) or deflate it (e.g., increasing phylogenetic clustering; see Scenario 2 in Figure 6).

Thus, careful selection of an appropriate marker may be crucial to phylogenetic community analysis.

85 Phylogenetic Phylogenetic Evenness Clustering

Scenario 1 Good Good Phylogenetic Signal Poor Phylogenetic Signal

Phylogenetic Phylogenetic Clustering Evenness

Scenario 2

Figure 6: Schematic illustrating how phylogenetic misplacement of a taxon (grey dot) may inflate the phylogenetic diversity of an ecological community (Scenario 1) or deflate it (Scenario

2). Dots at the terminal tips of the inferred phylogeny indicate taxa that are present within a community. Arrows indicate the correct phylogenetic positions of lineages.

Traditional organellar marker genes, such as plastid genes, have improved our understanding of biodiversity and community ecology (e.g., Heise et al., 2015; Porter et al.,

2016). For many underexplored groups of eukaryotes (such as algae), it is unclear whether or not widely used markers (e.g., rbcL) are the “optimal” choice for phylogenetic community analysis.

In the red algae, the commonly used plastid markers – psaA (photosystem I P700 chlorophyll a apoprotein A1), psaB (photosystem I P700 chlorophyll a apoprotein A2), psbA (photosystem II protein D1 2), and rbcL (ribulose bisphosphate carboxylase large chain) – individually approximate the red algal Tree of Life poorly (e.g., Verbruggen et al., 2010). To resolve deep relationships across the red algal phylogeny, multi-locus and whole plastid genome approaches have been taken (e.g., Nelson et al., 2015; Boo et al., 2016; Lam et al., 2016; Díaz-Tapia et al.,

86 2017). In phylogenetic community studies involving eDNA metabarcoding, a single well- selected locus can still be useful if it can approximate the red algal phylogeny, especially at shallow nodes (i.e., the species- or population-level). As mentioned previously, the reasons to choose one marker over the others have been pragmatic (e.g., the ease of PCR amplification and the availability of a rich sequence database) and grounded on its limited evaluation in focal taxonomic groups (e.g., psbA in the reef-building coralline algae; Broom et al., 2008). The phylogenetic utility of alternative plastid genes – such as the rpo (DNA-dependent RNA polymerase) genes (rpoA, rpoB, rpoC1, and rpoC2) – has been explored in several studies of cyanobacteria and land plants (e.g., Palenik & Swift, 1996; CBOL Plant Working Group, 2009;

Gomolińska et al., 2017). Although it remains to be seen whether or not the rpo genes are better phylogenetic markers than other plastid genes, the rpo genes have often been selected to be potential complementary markers for the phylogenetic analyses in cyanobacteria and land plants due to their rapid rate of molecular evolution and their PCR amplification efficiency across different major lineages. In the red algae, other plastid genes have seldom been evaluated for biodiversity surveys and phylogenetic community analysis. To the best of our knowledge, only the phycoerythrin gene has been recently proposed by Yang & Boo (2006) for the biodiversity survey of the order Ceramiales. We believe that there may be promising, overlooked plastid genes which are beneficial for investigating the biodiversity and community ecology of the red algae.

Recently, many complete plastid genomes that taxonomically span all the major lineages of the Rhodophyta have been published. Phylogenetic analyses of these genomes have yielded robust species trees of the red algae (e.g., Janouškovec et al., 2013; Costa et al., 2016; Díaz-

Tapia et al., 2017). These plastid genomes form a good foundational resource for analyses

87 requiring an adequate phylogenetic framework. Our group was the first to publish an HTS-based eDNA metabarcoding study of the red algae (Hsieh et al., 2018); related works performed DNA barcoding in coralline algae, but they did not sequence environmental DNA (Bittner et al., 2010;

Carro et al., 2014). In our previous work which surveyed the biodiversity of cyanidia – a group of unicellular thermoacidophilic red algae (Hsieh et al., 2015; Hsieh et al., 2018) – we chose rbcL because of its PCR amplification efficiency, its single-copy nature, and the existence of a well populated sequence database (with hundreds of entries deposited in NCBI GenBank). While rbcL is a powerful tool for eDNA metabarcoding, it is unknown whether or not superior markers may exist for phylogenetic community analysis (to measure phylogenetic alpha and beta diversities). To fill this gap, foundational work is needed that (1) identifies and evaluates candidate markers, (2) designs and tests new PCR primers, and (3) constructs a well-annotated database for the most promising candidate markers. The growing genomic resource collectively produced by the phycological community presents an unprecedented opportunity to take the first step towards building that foundation – that is, finding superior phylogenetic markers and creating resources to support their usage for biodiversity surveys and community ecology.

In this study, we provided a proof-of-concept work to leverage 107 reported red algal plastid genomes to scan for candidate plastid markers that fit our criteria. Using the idea of phylogenetic topological similarity, we devised a simple ranking strategy that involves the comparison of individual plastid gene trees to a single target phylogeny – here, the plastid genome species tree inferred using all core plastid genes. Because plastid genomes are inherited uniparentally in the red algae, the evolutionary history captured by the plastid genome phylogeny should not be affected by recombination, which causes genes to differ in their true ancestry.

More specifically, we applied normalized Robinson-Foulds distance, a notion of tree

88 dissimilarity that measures the proportion of bipartitions unique to one of the two given phylogenetic trees (Robinson & Foulds, 1981); in our study, the greater the distance (i.e., closer to 1), the more disagreement there is in pairwise tree comparisons, and the more poorly a gene tree approximates the target plastid genome tree. This phylogenomics approach allowed us to assess the commonly used markers (e.g., psaA, psaB, psbA, and rbcL) in red algal studies

(reviewed in Brodie & Lewis, 2007; Saunders & Moore, 2013; Leliaert et al., 2014), as well as less studied markers, to identify better candidates for biodiversity surveys and phylogenetic community ecology.

Materials and Methods

5.3.1 Sequence data collection and processing

We collected 107 publicly available plastid genomes from red algal taxa deposited in

NCBI GenBank (Table S1, Appendix D; collected up till Dec. 2017). The sampled taxa represent most of the major orders and families of the Rhodophyta. Using the gene annotations of the NCBI GenBank entries, we extracted all of the protein-coding sequences and assembled them into 120 single-copy core gene families represented by at least 96 taxa (i.e., ~90% of the

107 taxa). In a few taxa, we removed genes that had multiple fragmented coding frames (i.e., poor coding sequencing annotations), because they might be genome assembly artifacts and/or incorrect annotations. Also, we excluded one gene (ccs1), because it is duplicated across many taxa and paralogs are not ideal markers (for example, see the genus Polysiphonia). Next, we translated coding sequences into amino acid (AA) sequences using TransDecoder 3.0.0 (Haas et al., 2013), retaining the longest open reading frame with a minimum AA length of 50. We then aligned the AA sequences using MUSCLE 3.8.31 (Edgar, 2004). Additionally, we obtained the

89 corresponding alignments of the nucleotide (NT) sequences by back translating AAs to their original codons. This processing resulted in AA and NT alignments of 120 gene families, each of which includes up to 107 taxa. This procedure was implemented in Python using the sequence processing functionalities in BioPython 1.70 (Cock et al., 2009). The analysis scripts as well as the data and results files were deposited and archived in the GitHub repository: https://github.com/szhan/rhododb.

5.3.2 Partitioning analysis

Using PartitionFinder 2 2.1.1 (Lanfear et al., 2016) in conjunction with RAxML 8.2.11

(Stamatakis, 2014), we determined AA and NT data partition groupings (which possess similar substitution models and model parameters) under the r-clustering algorithm (Lanfear et al.,

2014). We identified the best-fitting AA and NT models for each gene family under the corrected

Akaike Information Criterion (Burnham & Anderson, 2002). Under the partition schemes and the associated substitution models, we inferred AA and NT plastid genome species trees and individual gene trees.

First, we inferred two plastid genome species trees (i.e., AA and NT trees), beginning with an AA tree. The AA alignments were partitioned by gene and then grouped using

PartitionFinder2. All the AA models implemented in RAxML, including their +G variants, were considered. PartitionFinder2 found 77 AA partition groupings. Under this partition grouping scheme, RAxML was run using the best-fitting AA models. Next, we obtained a NT plastid genome tree using a similar approach. The NT alignments were partitioned according to the gene-by-codon scheme (“G x C”), which treats the first, second, and third codon sites of each NT alignment as separate partitions to be grouped. Thus, the NT substitution models GTR and

90 GTR+G were fitted. This resulted in 282 NT partition groups, and GTR+G was the best model for all the partition groups. RAxML was run on the full NT alignment under the best partition grouping scheme.

Second, with the plastid genome phylogenies in hand, we reconstructed the trees of the individual genes. We estimated two trees for each gene family, one based on its AA alignment and the other based on its NT alignment. The best-fitting AA and NT models identified during inference of the plastid genome trees were also used to derive the gene trees.

All of the RAxML analyses were performed with 100 rounds of rapid bootstrapping.

Also, in all of the phylogenies, we treated as the outgroup of the remaining taxa, as have other workers (e.g., Yoon et al., 2006).

5.3.3 Phylogenetic tree comparisons

To rank the individual plastid genes, we computed the normalized Robinson-Foulds distance (nRF) between each of the plastid gene trees and a target plastid genome tree. Before calculating the distance between a gene tree and a target tree, taxa absent in the gene tree but present in the target tree were pruned from the target tree, and the trees were unrooted. We performed two sets of nRF distance calculations to compare: (1) the AA gene trees and the AA plastid genome tree and; (2) the NT gene trees and the NT plastid genome tree. For tree processing and nRF distance calculations, we used the R packages ape 5.1 (Paradise et al., 2004) and phangorn 2.4.0 (Schliep, 2011). Visual juxtaposition of phylogenetic trees was performed with the aid of the R package phytools 0.6-44 (Revell, 2012).

91 5.3.4 Estimation of degrees of sequence variation and rates of molecular evolution

For each plastid gene family, we computed its pairwise p-distance (percentage nucleotide mismatches, which is a simple measure of sequence divergence) using a custom Python script.

We also estimated its pairwise rate of non-synonymous substitution (dN) and its pairwise rate of synonymous substitution (dS) using CodeML (PAML 4.9h; Yang, 2007), taking the median across all the sequence pairs. Lastly, we calculated the proportion of parsimony informative sites using AMAS (Borowiec, 2016). The statistical analyses (regression analysis and correlation tests) were conducted using R (R Core Team, 2018).

5.3.5 PCR experiments and Sanger sequencing

To examine the efficacy of the designed primers on a wide taxonomic spectrum of the

Rhodophyta, we selected eleven species that span five different classes: two in Cyanidiophyceae, one in Porphyrideophyceae, one in , one in , and six in

Florideophyceae (Table S3, Appendix D). The six samples in Florideophyceae cover four subclasses: one in Hildenbrandiphycidae, one in Nemaliophycidae, one in Corallinophycidae, and three in Rhodymeniophycidae (Table S3, Appendix D). Total genomic DNA (gDNA) from eleven samples was extracted using the commercial ZR Plant/Seed DNA kit (Zymo Research,

CA, USA), following the manufacturer’s instructions. We amplified rpoC1 (DNA-directed RNA polymerase subunit beta') from the gDNA using the manually designed gene-specific primers described below (Table S4, Appendix D). For the design of the rpoC1 primers, the degenerate primers were manually designed based on a 50% consensus rule for the most conserved area

(e.g., low p-distance) using both the software BioEdit (Hall, 1999) and the sliding window sequence variation analyses. Polymerase chain reaction (PCR) was conducted using the

92 commercial Titanium® Taq DNA Polymerase kit (Takara Bio USA, Inc., USA), following the manufacturer’s instructions. The PCR settings for the initial amplification tests were 96°C for 4 min, and 40 cycles of 94°C for 40 s, 47°C for 40 s, 72°C for 1 min, and 72°C for 10 min. To reduce non-specific amplification, a Touchdown PCR protocol was carried out as follows: 96°C for 4 min, and 4 cycles of 94°C for 40 s, 52°C for 40 s, 72°C for 1 min, and 2 cycles of 94°C for

40 s, 50°C for 40 s, 72°C for 1 min, and 34 cycles of 94°C for 40 s, 47°C for 40 s, 72°C for 1 min, and 72°C for 10 min. The resulting PCR product was compared against a commercial DNA standard (DM2300 ExcelBand™ 100 bp+3K DNA Ladder, SMOBiO Technology, Inc., Taiwan) on a 1.5% agarose gel. DNA sequencing was conducted using an ABI3730 DNA Sequencer

(Applied Biosystems, Foster, CA) at Mission Biotechnology Company (Taipei, Taiwan).

Results and Discussion

We developed a bioinformatics strategy to select phylogenetic markers informed by an analysis of 107 published plastid genomes, using these to assemble the AA and NT alignments and the gene trees of 120 single-copy core plastid gene families. Only 120 protein-coding genes were retained based on our filtering criteria (i.e., genes were excluded if they were poorly or inconsistently annotated, duplicated, had AA length less than 50, or occurred in less than ~90% of the taxa). We also inferred two trees that represent our best plastid genome-based estimates of the Rhodophyta phylogeny, one using the AA alignment concatenated from all the plastid genes and the other using the corresponding NT alignment. Overall, the AA plastid genome phylogeny

(Figure 7) supports the major inter-class relationships observed in published multi-locus and plastid genome analyses (Yang et al., 2016; Cho et al., 2018); the corresponding NT phylogeny is largely consistent with the AA phylogeny (nRF = 0.0673; Figure S1, Appendix D); for

93 example, seven well-supported monophyletic classes in three subphyla were recovered: one in

Cyanidiophytina (Cyanidiophyceae), four in Proteorhodophytina (Compsopogonophyceae,

Porphyridiophyceae, , and ), and two in Eurhodophytina

(Bangiophyceae and Florideophyceae).

Next, we assessed how well each of the plastid genes topologically approximates the plastid genome trees. We ranked the plastid genes by the nRF distance between their trees (i.e., each plastid gene tree) and a target plastid genome tree. In both sets of the nRF rankings of the

AA and NT gene trees (Table S2, Appendix D), we found that psaA and psaB approximate the plastid genome trees better than rbcL and psbA (i.e., having lower nRF distances to the target trees). A visual comparison of the AA plastid genome tree and the AA rbcL gene tree confirms that the rbcL gene tree poorly approximates the plastid genome tree (Figure 7). Our findings further support that each of those commonly used plastid markers (i.e., psaA, psaB, psbA, and rbcL) alone is not the optimal marker to approximate the red algal phylogeny, consistent with previous observations (e.g., Verbruggen et al., 2010; Nelson et al., 2015; Boo et al., 2016; Lam et al., 2016). Our results also demonstrate that those four popular markers provide limited phylogenetic resolution at the shallow (here, species) levels. This is a known issue with rbcL – the most widely employed marker in the red algae (Yang et al., 2008; Freshwater et al., 2010). In a recent multi-locus phylogenetic study of the Gelidiales (Boo et al., 2016), psaA, psbA, and rbcL were shown to have peak phylogenetic signals at the deeper levels of the Gelidiales tree rather than at the shallower levels.

94 Species Tree rbcL Tree

Bootstrap: 71-95 Bootstrap: 50-70 Bootstrap: <50

Florideophyceae

Bangiophyceae

Compsopogonophyceae Stylonematophyceae Rhodellophyceae Porphyrideophyceae Cyanidiophyceae

Figure 7: Phylogenies based on the AA alignment concatenated from the core plastid genes (left) and rbcL (right). The trees were inferred using RAxML with 100 rapid bootstraps and under the best-fitting AA models identified by PartitionFinder2. The nodes supported with bootstrap values below 0.95 are color-coded. Gray shading indicates the conflicting nodes between the trees.

Various quantities have been proposed as key criteria for marker gene selection (e.g.,

Yang & Boo, 2006; Lei et al., 2012; Janouškovec et al., 2013). They include p-distance, proportion of parsimony informative sites (Pi), and the rates of non-synonymous substitution

(dN) and synonymous substitution (dS). Genes having higher p-distance, Pi, dN, and/or dS tend to be more suitable for phylogenetic analysis because they harbor more sequence variation,

95 especially when the target clade is an evolutionarily young lineage. Based on the nRF distance rankings alone, it was not apparent how to determine a cutoff to select candidate markers. For instance, in the ranking of the AA trees, about 11 genes have similar nRF distances of ~0.2

(Figure 8); also, in this ranking, gltB appears to perform better than the other plastid genes.

Hence, we examined the p-distance, Pi, dN, and dS of the plastid genes (Table S2, Appendix D) jointly with the nRF distances to find a clearer cut-off. P-distance is negatively correlated with the nRF distance between the AA gene trees and AA plastid genome tree (p = 2.16 x 10-7,

Spearman’s test; Figure 8); likewise, Pi and dN are negatively correlated with nRF distance (p =

1.30 x 10-6; not shown). Indeed, p-distance is positively correlated with dN and Pi (p < 2.2 x 10-

16 for both). However, dS is not correlated with nRF distance (p = 0.10; not shown), probably due to substitution saturation.

When examining the correlations, we noticed that some genes have trees more similar to the target plastid genome trees (i.e., lower nRF distance) than genes with similar levels of sequence divergence (p-distance) (Figure 8) or similar AA alignment length (Figure S2,

Appendix D). To pinpoint such genes, we performed a linear regression analysis and determined a 95% prediction interval (PI) around the line of best fit (Figure 8). The genes that lie within the

PI perform comparably to genes of similar p-distance. Using the PI as a guide, we found genes that fall below the lower bound of the 95% PI (i.e., having a better nRF distance ranking compared to genes of similar p-distances or AA alignment length); congruent results were found using NT-based p-distances (not shown). In the analysis of the AA data, three genes stood out: rpoC1, rpoB, and gltB (Figure 8), indicating that these outlying genes yield more “accurate” phylogenetic signal (i.e., closer to the target plastid genome tree) than expected based on the amount of sequence information. This approach revealed the same genes even when using dN or

96 Pi instead of p-distance. In an addition bootstrapping analysis, we took into account uncertainty in tree topology due to sampling errors (i.e., the statistical support of bipartitions). We took 100 bootstrap replicates of a target gene tree and 100 replicates of the plastid genome tree (obtained from the RAxML analysis of the AA MSAs), and randomly drew each with replacement 100 times and then calculated the median nRF distance across the 100 draws. This analysis revealed that the three markers genes still fall outside the 95% PI (Figure S2, Appendix D), supporting the candidacy of the genes. A visual juxtaposition of the AA plastid genome tree and the AA rpoC1 gene tree confirms that the rpoC1 gene tree yields a better approximation of the plastid genome tree (Figure 9) than traditional marker genes, such as rbcL (Figure 7).

Widely employed genetic markers, such as rbcL and psbA, are amenable to PCR amplification efficiency and Sanger sequencing. Such markers contain regions conserved enough for PCR primer binding (low sequence divergence), as well as a stretch of nucleotides of appropriate length for Sanger sequencing (i.e., 500 to 1,000 bp). Using these criteria, we performed an initial assessment of the potential of the three newly proposed markers for adoption. Among the three markers, rpoC1 and rpoB have relatively low p-distances and short sequence length, whereas gltB is rather long (~4,800 bp) and therefore not ideal as a marker gene

(Figure 8; Figure S2 and Table S2, Appendix D). Hence, we decided to focus on rpoC1 and rpoB for PCR primer design and testing. We took a sliding window approach (30 bp) to measure the p-distance along the NT alignments of rpoC1 and rpoB, finding several regions that seemed suitable for PCR (Figure S3, Appendix D).

97 1.0 psbE● psaC●

atpH●

apcA● petB● psaE●

0.8 psbD● cpcA● petF● psbA● rbcL● ● ycf65● ● apcB● cpeBrps12● rps18rps14rpl27● ● rps9● rpl16psaD● ● cpcB● rps8● acpPrpl20● ● cpeA● psbC ● ● rps11 ● rpl29● psbB ● rpl14● ycf54●rpl18 tufA rbcS● trxA● rps7●● rps16● psbW rpl5● ● rpl19● psaL● rpl12ftrB● rpl13● ● rps2●rpl11● ● ● 0.6 atpB rpl23 odpB● rps17● ● rpl24 ● ilvH rpl6●● ycf3 ● ● ● rpl22rpl3 atpI ● rps13 atpG● Foulds distance (AA trees) Foulds odpA apcF rps5rpl2● rpl4● − atpE● rps20● ● ● ● rps6 cbbX psaFthiG● rpl21petJ● ● accB● ● infC● ● ycf4●trpG● ● cemA●psbV● ● tsf argB● psaA● ● cpcG●apcDrps3 rpl9● clpCatpA rps4● trpA● ● psaB● petA rpl1● atpF● 0.4 ● rpoA● rps1● ftsH● chlI preA● accD● dnaK● ilvB● ● carA● atpD● accA●fabHycf45● groel● ycf39● Normalized Robinson secY● ● rpoC1● secA● infB syfB● rne● dnaB● rpoB● apcE● rpoC2● 0.2

gltB●

0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

p−distance

Figure 8: Negative correlation between the normalized Robinson-Foulds (nRF) distance to a target tree and p-distance across the plastid genes. The nRF distance was calculated based on

AA-based gene trees and a AA-based plastid genome tree. The dashed lines delineate the 95% prediction interval. Genes that fall below the lower bound of the interval (i.e., low distance and therefore more similar to the target tree) perform better than other plastid genes having similar p- distance. Located inside the interval are the popular plastid markers: rbcL, psbA, psaA, and psaB

(blue). Below the lower bound of the interval are three genes, which were the focus of PCR primer design and testing: rpoC1, rpoB, and gltB (orange).

98 Species Tree rpoC1 Tree

Bootstrap: 71-95 Bootstrap: 50-70 Bootstrap: <50

Florideophyceae

Bangiophyceae

Compsopogonophyceae Stylonematophyceae Porphyrideophyceae Cyanidiophyceae

Figure 9: Phylogenies based on the AA alignment concatenated from the core plastid genes (left) and rpoC1 (right). The trees were inferred using RAxML with 100 rapid bootstraps and under the best-fitting AA models identified by PartitionFinder2. The nodes supported with bootstrap values below 0.95 are color-coded. Gray shading indicates the conflicting nodes between the trees. Corynoplastis japonica was not included due to a missing coding sequence annotation.

Based on the p-distance profiles, we designed and optimized PCR primers for those two genes and then tested them on 11 red algal specimens ( partita, Galdieria maxima,

Porphyridium cruentum, Compsopogon caeruleus, Bangia fuscopurpurea, Hildenbrandia sp.,

Kumanoa sp., Sporolithon sp., Peyssonelia sp., Caloglossa ogasawaraensis, and Champia sp.;

99 Table S1, Appendix D), which were selected to represent some of the major lineages of the

Rhodophyta. We designed and tested 11 primers for rpoC1 (five for the 5’ end and six for the 3’ end; Figure 10a; Table S2, Appendix D). We successfully amplified rpoC1 across all the specimens of Florideophyceae, as well as Bangiophyceae (Figure 10b); the amplification success rates were poor in the specimens of the extant descendants of early branching lineages

(Cyanidiophyceae, Porphyrideophyceae, and Compsopogonophyceae) (Figure 10b). Based on these PCR results, we suggest two primer pairs, F1-R3 and F4-R4, for amplifying rpoC1, as they have a high amplification success rate and their overlapping PCR products span most of rpoC1

(validated by Sanger sequence data, which were deposited in NCBI GenBank; Table S1,

Appendix D). We also tried testing F1-R4 and F1-R5 a few times, but had a low success rate with F1-R4 (25%; only in Compsopogon caeruleus and Hildenbrandia sp.; data not shown) and no amplification for the rest of the specimens. Moreover, we could not achieve the same level and consistency of success with rpoB even after several attempts at primer design and testing, probably because this gene is more divergent (Figure S3, Appendix D), longer (3,386 bp), and lower in GC content (32.64%) than rpoC1.

100 a DNA-directed RNA polymerase subunit beta' (rpoC1, ~1900 bp)

rpoC1-5P (ca. 450-1100 bp) F1 R3 (91%) F1 R1 (91%) F2 R1 (82%) F3 R2 (91%) F3 R1 (100%) rpoC1-3P (ca. 350-800 bp) F4 R5 (82%) F4 R4 (91%) F5 R5 (82%) F5 R4 (82%) F6 R5 (82%) : HVR, High Variable Region F6 R4 (91%) F1 F2 F3 F4 F5 F6

rpoC1

HVR R1 R2 R3 R4 R5 HVR b

F1-R3 (10 out of 11) F4-R5 (9 out of 11) M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch

1000 bp 1000 bp * 1000 bp 1000 bp 500 bp 500 bp 500 bp 500 bp

F1-R1 (10 out of 11) F4-R4 (10 out of 11) M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch

1000 bp 1000 bp 1000 bp 1000 bp * 500 bp 500 bp 500 bp 500 bp

F2-R1 (9 out of 11) F5-R5 (9 out of 11) M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch

1000 bp 1000 bp 1000 bp 1000 bp * 500 bp 500 bp 500 bp 500 bp

F3-R2 (10 out of 11) F5-R4 (9 out of 11) M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch

1000 bp 1000 bp 1000 bp 1000 bp

500 bp 500 bp 500 bp 500 bp

F3-R1 (11 out of 11) F6-R5 (9 out of 11) M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch M Gp Pc Bf M Gm Cc Hi Ku Sp Pe Co Ch

1000 bp 1000 bp 1000 bp 1000 bp

500 bp * 500 bp 500 bp 500 bp

101 Figure 10: PCR primers designed for rpoC1 (a) and their amplification efficacy (b) across major taxonomic groups (Cyanidiophyceae: Gm and Gp; Porphyrideophyceae: Pc;

Compsopogonophyceae: Cc; Bangiophyceae: Bf; Hildenbrandiophycidae: Hi;

Nemalionophycidae: Ku; Corallinophycidae: Sp; and Rhodymeniophycidae: Pe, Co, and Ch).

The gene amplification rate was good (82%) or excellent (91% or 100%) for all the primers

(shown in parentheses). Two highly variable regions in rpoC1 exhibit a high level of sequence divergence according to the gene’s p-distance profile. PCR was considered successful if a band of the expected size (indicated by an arrowhead) was observed, even if the band was faint.

Unexpected band sizes are non-specific (or off-targeted) PCR products. A large amplicon in some specimens of Gp was observed (marked by asterisks) that is caused by an insertion in the highly variable region confirmed by Sanger sequencing. Abbreviations: Bf, Bangia fuscopurpurea; Cc, Compsopogon caeruleus; Ch, Champia sp.; Co, Caloglossa ogasawarensis;

Gm, Galdieria maxima; Gp, Galdieria partita; Hi, Hildenbrandia sp.; Ku, Kumanoa sp.; M, 100 bp DNA marker; Pc, Porphyridium cruentum; Pe, Peyssonelia sp.; Sp, Sporolithon sp.

Many attractive phylogenetic markers may not be suitable for PCR primer design for various reasons, hampering their uptake by the research community. Furthermore, it is known that in amplicon-based eDNA metabarcoding studies, estimates of relative abundance are skewed, and so our estimates of community species diversity may be poorer than they could be

(e.g., Wilcox et al., 2018). However, there exist alternative technologies that could enable researchers to sequence such markers without needing to go through the laborious process of

PCR primer development. For example, one can utilize the plastid markers proposed using our in silico methodology in an approach that leverages both HTS and probe-based target hybridization

102 (e.g., Weitemier et al., 2014; Shokralla et al., 2016). Probes (or baits) can be designed to bind to the plastid markers (“targets”), and the bait-target complexes would be pulled down or enriched

(for example, using magnetic beads that bind to biotinylated baits) while non-target nucleic acids are washed away. This method effectively enhances the ratio of target to non-target nucleic acids, and the resulting target-enriched pool of nucleic acids can then be subjected to HTS (e.g.,

Mariac et al., 2018). This would exploit the scalability of HTS to facilitate eDNA metabarcoding studies of the red algae that have thus far been infeasible (e.g., due to PCR amplification failure).

Moreover, if the target genes are too long for short read HTS technologies by Illumina Inc., nanopore sequencing technologies, such as the MinION by Oxford Nanopore Technologies Ltd., provide a promising alternative approach. The handheld, affordable, and field-deployable

MinION boasts long sequencing read lengths of thousands to millions of base pairs long (e.g.,

Krehenwinkel et al., 2019). This powerful feature enables the sequencing of entire genes without the need to correct for assembly errors (i.e., chimeric sequences) (Saunders & Moore, 2013). The

MinION has been criticized for its high base-calling error rate, but it is anticipated that it will be improved in upcoming technological updates. Evaluating the utility of a target hybridization- based HTS eDNA metabarcoding approach, coupled with nanopore sequencing and with phylogenomic approaches such as ours, could be a productive avenue for future research.

Taxon sampling is an important consideration when choosing an appropriate phylogenetic marker. Here, we examined all the plastid genomes available to us at the beginning of the study (Dec., 2017). Nearly half of the taxa (53 of 107; 51%) were sampled from the most species-rich family Rhodomelaceae (Ceramiales), which encompasses roughly 15% of the recognized species diversity of the Rhodophyta (AlgaeBase; Guiry & Guiry, 2019). We intended to search for phylogenetic markers that would allow us to recover shallow relationships (e.g.,

103 species- or population-level) for phylogenetic community analysis, because we were not attempting to investigate the deep relationships of the red algal Tree of Life. Hence, our sampling is biased towards Rhodomelaceae, and therefore the marker rankings and the proposed rpoC1 marker may be more pertinent to this family. We anticipate to identify and test candidate markers that are more specific for focal clades (orders, e.g., Corallinales, Gigartinales, and

Rhodymeniales; or families within Ceramiales, e.g., Ceramiaceae and Delesseriaceae) as their plastid genomes become available. Moreover, we hope to maintain these marker rankings alongside with curated sequences as a resource for the phycological community, beginning with rpoC1. Presently, we are conducting broader testing of the rpoC1 primers on more specimens across more diverse red algal lineages.

Conclusions

Much remains to be discovered about the processes shaping the biodiversity and community assembly of the red algae. HTS-based eDNA metabarcoding utilizing phylogenetic community analysis based on carefully selected markers will help to elucidate those processes.

There is a scarcity of tools and resources (robust phylogenetic markers, well tested PCR primers, optimized wet-lab protocols, and high-quality reference sequence databases) for the eDNA metabarcoding of the red algae. By leveraging the genomic resource contributed cumulatively by the phycological community, we have taken the first step towards the long-term goal of building additional tools and resources. Finally, expansion of similar efforts to mine mitochondrial and nuclear genomes and periodic re-evaluation of plastid genomes, as more and more data become available, may help to augment the molecular toolbox to investigate the phylogenetic community ecology of the red algae.

104 Conclusion

Evolutionary consequences of polyploidy revisited

In this dissertation, I present four studies utilizing phylogenetic analyses to further our understanding about the evolutionary consequences of polyploidy in eukaryotes and about the biodiversity and community ecology of red algae.

The importance of polyploidy to the lineage success of eukaryotes is a topic of enduring debate. The scholarly discussion continues on whether polyploids are evolutionary ‘dead-ends’ or represent stories of evolutionary success and what role polyploidy plays in the speciation and diversification of eukaryotes. I have contributed to this discussion via the studies presented in

Chapters 2 and 3, as well as other studies that have been published but are not presented in this thesis dissertation (Mayrose et al., 2011; Mayrose et al., 2015; Mandáková et al., 2017). In these studies, I applied likelihood-based methods to infer trait-dependent diversification – specifically,

BiSSE and BiSSE-ness/CLaSSE. These methods offer a statistical framework to test whether a trait such as ploidy is correlated with changes in speciation, extinction, and net diversification rates. Criticisms and caution about these methods have been raised (Davis et al., 2013; Maddison

& FitzJohn, 2015; Rabosky & Goldberg, 2015). Below, I discuss the findings of Chapters 2 and

3 and how they might be impacted by the criticisms and should be interpreted with caution in light of the criticisms.

6.1.1 Polyploidization and evolutionary success in fish

In Chapter 2, I examined whether polyploid lineages in fish are more successful than their diploid relatives (Zhan et al., 2014). We applied BiSSE to test whether polyploid and diploid lineages diversify at different rates in four fish clades with an appreciable amount of

105 polyploidization: sturgeons (Acipenseridae), botiid loaches (Botiidae), salmonids

(Salmoniformes), and cyprinid fishes (Cyprininae). By combining evidence across these four clades, we found no overall association between net diversification and polyploidy in fish (but in

Cyprininae, we detected a positive, albeit marginally significant, correlation between polyploidy and diversification). It has been proposed that polyploidy is an important component of eukaryote evolution (e.g., Soltis et al., 2009; Fawcett & Van de Peer, 2010; Van de Peer et al.,

2017), but in this study we found no overall support that polyploidy leads to lineage success in fish. Since the publication of this study, new data and advances in phylogenetic methods have considerably furthered our understanding about the effect of polyploidization on diversification in plants (e.g., Landis et al., 2018; Zenil-Ferguson et al., 2019). But in fish (and animals in general), this discussion has lagged behind. Below, I explain some reasons for this and some potentially fruitful analysis to forward the discussion in fish (and perhaps other animals with polyploid lineages).

Due to the rarity of polyploidy in fish (and animals in general), it is difficult to robustly test the relationship between evolutionary success and polyploidy in this organismal group. Our study was limited by scarce data: phylogenetic and ploidy data were available for four fish clades, and only two clades (the sturgeons and cyprinid fishes) underwent multiple diploid-to- polyploid transitions. Despite that a lack of correlation between polyploidy and net diversification was detected in the analysis of the botiid loaches and salmonids, the analysis still suffers from the criticisms raised by Maddison & FitzJohn (2015) and Rabosky & Goldberg

(2015) – that a neutrally evolving character could be associated with differences in speciation, extinction, or net diversification rates along an empirical phylogeny, particularly when there is only a single character transition and it could be coincidentally correlated with a shift in

106 evolutionary rates. The analysis of the sturgeons did not reveal any association between net diversification and polyploidy in the group, but the clade is small (24 sequenced taxa), so that

BiSSE would not have had sufficient statistical power even if there were indeed an association.

Lastly, we did find a marginally significant, positive correlation between net diversification and polyploidy in the cyprinid fishes, in which we inferred nine diploid-to-polyploid transitions using

ChromEvol. One recent study (Li & Guo, 2020) citing Chapter 2 also reported a positive association between the rate of net diversification and polyploidy in the cyprinid fishes (using updated sequence and ploidy level data and BAMM). As highlighted by Rabosky & Goldberg

(2015), this instance of association could still be a false positive due to a neutrally evolving trait or some other process affecting diversification but not included in the model. Analysis of this clade needs to be revisited by accounting for a hidden state (using the Hidden State Speciation and Extinction model; Beaulieu & O’Meara, 2016) as the next step to determine whether the signal is a statistical artefact or not. Recently, Zenil-Ferguson et al. (2019) re-analyzed the ploidy data of Solanaceae using various diversification models. When considering only ploidy, Zenil-

Ferguson et al. (2019) detected a difference in diversification rates between polyploid and diploid lineages. However, when modelling breeding system (self-compatible versus self- incompatible lineages) as well as accounting for an additional hidden character, the authors found that ploidy no longer explains the difference in diversification rate. Neither the study of

Chapter 2 nor Li & Guo (2020) consider a hidden state, which may better explain the putative association between polyploidy and net diversification – a hypothesis to be tested in a future study using updated data and methods. Additionally, Davis et al. (2013), published around the time when Chapter 2 was submitted for publication, conducted simulation-based analysis to caution that BiSSE results need to be interpreted carefully when the sample size is small (fewer

107 than 300 tip taxa, but especially when fewer than 100 tip taxa) and/or when the tip ratio bias is high (one state present in over 90% of the tip taxa). At low sample sizes, the power of BiSSE to detect differences in speciation and extinction rates is poor (Davis et al., 2013). Therefore, this may explain the lack of a correlation between polyploidy and net diversification in the sturgeon

(n = 24), botiid loaches (n = 35), and salmonid (n = 69) data sets. When sample size is sufficient

(at least 300 tip taxa), tip ratio bias needs to be considered. In the case of the cyprinid fishes (n =

329), 71% of the sampled taxa are polyploids as inferred using ChromEvol. Thus, the marginally significant correlation between polyploidy and net diversification we observed in this clade may not be prone to the cautions of Davis et al. (2013), but the correlation may still be confounded by the criticism of Maddison & FitzJohn (2015) and Rabosky & Goldberg (2015).

Rabosky & Goldberg (2015) brought up another concern – that among-lineage rate heterogeneity could lead to a false correlation between a trait and evolutionary rates. The authors proposed to first detect shifts in net diversification rates using BAMM (Rabosky, 2014) or

MEDUSA (Alfaro et al., 2009) (which are likelihood-based methods to infer heterogeneity in the rates of lineage diversification or phenotypic evolution across a given phylogeny), and then apply that information to allow different regions of a phylogeny to evolve under different rates in a BiSSE-type analysis (they mentioned the example presented by FitzJohn (2010) on a continuous rather than a discrete trait, but in principle it should be applicable to any BiSSE-type analysis). While FitzJohn (2010) and Rabosky & Goldberg (2015) proposed a two-step procedure to model trait-dependent lineage diversification, among-lineage rate heterogeneity and trait evolution can now be jointly estimated using a recently proposed framework for phylogenetic modelling called RevBayes (Höhna et al., 2016).

108 Ultimately, in light of the criticisms of the BiSSE methodology (and other trait correlative tests), phylogenetic comparative methods may not be appropriate to examine whether there is a evolutionarily consequential relationship between whole genome duplication and lineage diversification in groups, like fish, with few transitions in few clades, preventing robust analysis across clades.

6.1.2 Cladogenetic polyploidization in plants

Unlike in fish and other animals, occurrences of polyploidy are plentiful in plants. In

Chapter 3, I tested whether there is a preponderance of phylogenetic evidence for cladogenetic polyploidization (i.e., speciation coinciding with polyploidy) versus anagenetic polyploidization

(i.e., polyploidy occurring along branches). We applied the ClaSSE model, which permits diploid-to-polyploid transition along branches (i.e., anagenesis) or at internal nodes (i.e., cladogenesis), to a large collection of phylogenetic data sets for many plant genera (Zhan et al.,

2016). ClaSSE is a derivative of BiSSE and should suffer from the same statistical issues. We took a meta-analysis approach by leveraging phylogenetical signals of polyploidy across many clades (thereby, gathering many independent pieces of evidence). We found evidence for cladogenetic polyploidization in 129 out of 223 genera examined (~58%), supporting the view that polyploidy plays a role in the formation of reproductive isolation and speciation in plants

(Coyne & Orr, 2004). There may be more plant genera that have experienced cladogenetic polyploidization, but they might have been missed because the power of CLaSSE to detect cladogenetic ploidy shifts is affected by poor taxon sampling (or small clade size; Davis et al.,

(2013) analyzed the performance of BiSSE, but their cautions should be applicable to related methods such as CLaSSE) and high extinction rates (Magnuson-Ford & Otto, 2012). Freyman &

109 Höhna (2018), too, found some evidence for cladogenetic polyploidization in plants. The authors developed a phylogenetic method that jointly models chromosome number evolution and lineage diversification (in Chapter 3, we took a two-stage process instead, first estimating ploidy level and then applying BiSSE) and re-analyzed data for five plant genera (using previously published data sets instead of automatically generated ones like in Chapter 3), finding model support for cladogenetic polyploidization in three of the five clades (60%). As pointed out previously

(Maddison & FitzJohn, 2015; Rabosky & Goldberg, 2015), the meta-analysis approach we took in Chapter 3 (and also in Mayrose et al., 2010) is a potential way to mitigate the problem of a lack of replication (e.g., in the case of the botiid loaches which experienced a single diploid-to- polyploid transition). Nevertheless, future studies would profitably revisit this question using the

HiSSE method instead (as done in Zenil-Ferguson et al. (2019), but across many clades), to account for the effects of a hidden character when estimating diploid-to-polyploid transition rates, as well as differences in the rates of speciation, extinction, and net diversification in diploid versus polyploid lineages.

6.1.3 Do processes of chromosomal evolution other than polyploidization contribute to lineage success in plants?

In addition to polyploidization, other major processes of chromosomal evolution may play a role in speciation and diversification of lineages. Chromosomal rearrangements (fusion, fission, and translocations; called dysploidy) happen alongside polyploidization to shape eukaryote genomes (Schubert & Lysak, 2011). Dysploidy leads to single chromosome gains and losses. Together, polyploidization and dysploidy generate the widespread variation in chromosome numbers observed in plants, but debate about their importance to species

110 diversification continues (Stebbins, 1971; Soltis & Soltis, 2000; Soltis et al., 2009). Recently, a phylogenetic study examined the rates and patterns of chromosome gain and loss events in 15 major clades of flowering plants. The authors found that chromosome gains and losses contribute to the high variation of chromosome numbers in plants much more than previously appreciated

(Escudero et al., 2014). However, the authors did not closely examine the impact of chromosome gain and loss, or their impact relative to polyploidy, on species diversification (and also speciation and extinction) in plants. Presently, there are no other broad phylogenetic studies on the effect of chromosome gain or loss on the pattern of diversification in plants. At the time of this writing, I am completing a study in collaboration with Prof. Sarah P. Otto (UBC) and Prof.

Michael S. Barker (University of Arizona) that explores the broad variation in the rates of chromosome number change across major angiosperm families and orders. In this study, we test whether the rates of chromosomal evolution are correlated with the rates of net diversification in the angiosperms, using a multi-clade meta-analysis approach. We confirm that the rate of polyploidy and dysploidy vary widely across angiosperm lineages and empirically support that both those components of chromosomal evolution contribute to speciation and lineage diversification in the angiosperms. I anticipate that future studies of chromosomal evolution will focus more on dysploidy in macroevolutionary studies, an important component of chromosomal evolution that has been understudied compared to polyploidy.

Community and invasion ecology of the red algae

Chapters 2 and 3 demonstrate how phylogenetic analysis has enriched our discussion about the evolutionary consequences of polyploidy in fish and plants. Phylogenetic analysis can

111 also be gainfully applied to explore the ecological processes forming the biodiversity of the red algae, as demonstrated in Chapters 4 and 5.

6.2.1 Human-mediated introductions of freshwater red algae

Anthropogenic activities have been harming or homogenizing biodiversity, for example, by trafficking invasive species around the globe. In Chapter 4, we examined the role of the global aquarium trade in dispersing freshwater red algae. We propose that the aquarium trade is an overlooked mechanism of introduction of hitchhiking red algae and performed genetic analysis to identify several freshwater red algae taxa that have been introduced via this human- mediated route. We focused on the freshwater red algae of Taiwan (a hub in the global aquarium trade) and taxa that might have been introduced to Taiwan from other regions around the world.

Here, we relied upon rbcL, which is a species-level marker, but we recognize that the use of a population-level marker (such as microsatellite loci) may improve our ability to identify introduced taxa. Also, there is a sampling bias as some regions appear to be more densely sampled (e.g., Brazil and the United States of America). Hence, we may be missing some source populations of taxa introduced to Taiwan. Similarly, we are unable to assess whether Taiwan or

East Asia in general serves as the source of freshwater red algae introduced to other regions of the world. We hope that this study will encourage other researchers to pay more attention to freshwater red algae when conducting surveys of the field or aquaria. It is possible that some introduced freshwater red algae become invasive in the future, and therefore we should not underestimate or dismiss their introduction potential. In a follow-up study in collaboration with

Prof. Shao-Lun Liu (Tunghai University, Taiwan), I am assessing the invasion risk of the introduced freshwater red algae revealed in this study. We will perform ecological niche

112 modelling to predict the habitat range expansion and contraction of the introduced algal taxa under several hypothetical global warming scenarios, particularly in Taiwan where occurrence data for those taxa are relatively rich.

6.2.2 New molecular tools to study the community ecology of the red algae

Studies of biodiversity and community ecology may benefit from better genetic markers that can simultaneously capture the evolutionary history of a focal clade of organisms and contain sufficient genetic information to estimate within- and between-community biological diversity. In Chapter 5, we developed a phylogenomics strategy that leverages existing genomic resources to propose candidate genetic markers for phylogenetic community analysis of the red algae. In that proof-of-concept work, we propose rpoC1 as an alternative and potentially superior marker to rbcL to study the phylogenetic community ecology of the red algae. However, adoption of novel genetic marker is hampered by the unavailability of a good reference database.

To facilitate the potential adoption of rpoC1, I am initiating a follow-up study in collaboration with Prof. Shao-Lun Liu to build a database of rpoC1 sequences for a large collection of red algae species, to evaluate whether rpoC1 yields more information about the biodiversity of the red algae in comparison to rbcL and psbA (another commonly used marker in studies of the red algae), and to determine how many more mOTUs may be revealed when using either only rpoC1 or the three markers combined. Also, in this work, we did not examine thoroughly the effect of sampling bias (taxa of Rhodomelaceae comprised the majority of our data set) or the choice of topological distance on the ranking of candidate genetic markers. With those considerations in mind, we plan to refine our strategy and to update and expand our ranking of candidate genetic markers when more genomes (plastid, mitochondrial, and nuclear) of the red algae become

113 available. We will continue to identify and experimentally test candidate genetic markers that are potentially useful for monitoring the biodiversity and even tracking of introductions of the red algae. Although thus far we have examined only the red algae, our strategy may be employed to nominate candidate phylogenetic markers in other major groups of algae, such as the and brown algae. The molecular tools and resources that we develop via these studies will contribute to our ultimate objective to understand the ecological processes and environmental factors that influence the biodiversity and community assembly of myriad algae in diverse ecological habitats around the globe.

Final remarks

Phylogenetic analysis empowers evolutionary and ecological studies by enabling biologists to examine processes that shape biodiversity across time and lineages. In this thesis dissertation, I present four studies that illustrate the types of evolutionary and ecological insights about different organismal groups that can be gleaned from phylogenetic analysis. Such insights could not have been obtained without putting the questions in a phylogenetic context.

Advancement of phylogenetic methodologies, by addressing some of the shortcomings discussed in this work and in others, may further our discussion about some contentious topics, such as the impact of polyploidy on lineage success in plants and animals, and may expand our knowledge about the biodiversity and ecology of some underexplored organismal groups, such as the red algae. Beyond evolutionary biology and ecology, phylogenetic analysis has more far-reaching impact, yielding valuable insights into other scientific disciplines such as epidemiology (e.g.,

Gardy et al., 2011) and linguistics (Bouchard-Côté et al., 2013).

114 Bibliography

Ahrens, D., Fujisawa, T., Krammer, H. J., Eberle, J., Fabrizi, S., & Vogler, A. P. 2016. Rarity

and incomplete sampling in DNA-based species delimitation. Syst. Biol. 65: 478–494.

Akaike, H. 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control.

19: 716–723.

Albertin, W., & Marullo, P. 2012. Polyploidy in fungi: evolution after whole-genome

duplication. Proc. R. Soc. B. 279: 2497–2509.

Alfaro, M. E., Santini, F., Brock, C., Alamillo, H., Dornburg, A., Rabosky, D. L., Carnevale, G.,

& Harmon, L. J. 2009. Nine exceptional radiations plus high turnover explain species

diversity in jawed vertebrates. Proc. Natl. Acad. Sci. U.S.A. 106: 13410–13414.

Allendorf, F. W., & Lundquist, L. L. 2003. Introduction: population biology, evolution, and

control of invasive species. Conserv. Biol. 17: 24–30.

Allendorf, F. W., & Thorgaard, G. H. 1984. Tetraploidy and the evolution of salmonid fishes. In:

Evolutionary Genetics of Fishes. (ed. Turner, B. J.), pp. 1–53. Plenum Press, New York,

USA.

Alves, M. J., Coelho, M. M., & Collares-Pereira, M. J. 2001. Evolution in action through

hybridisation and polyploidy in an Iberian freshwater fish: a genetic review. Genetica. 111:

375–385.

Armstrong, K. F., & Ball, S. L. 2005. DNA barcodes for biosecurity: invasive species

identification. Phil. Trans. R. Soc. B. 360: 1813–1823.

Barrett, S. C. H., & Husband, B. C. 1990. The genetics of plant migration and colonization. In

Plant Population Genetics, Breeding, and Genetic Resources (eds. Brown, A. H. D.;

115

Clegg, M. T.; Kahler, A. L.; Weir, B. S.), pp. 254–277. Sinauer Associates, Sunderland,

Massachusetts, USA.

Barringer, B. C. 2007. Polyploidy and self-fertilization in flowering plants. Am. J. Bot. 94: 1527–

1533.

Beaulieu, J. M., & Donoghue, M. J. 2013. Fruit evolution and diversification in campanulid

angiosperms. Evolution. 67: 3132–3144.

Beaulieu, J. M., & O’Meara, B. C. 2016. Detecting hidden diversification shifts in models of

trait-dependent speciation and extinction. Syst. Biol. 65: 583–601.

Berrebi, P., Kottelat, M., Skelton, P., & Ráb, P. 1996. Systematics of Barbus: state of the art and

heuristic comments. Folia Zool. 45: 5–12.

Birstein, V. J. 1993. Sturgeons and paddlefishes: threatened fishes in need of conservation.

Conserv. Biol. 7: 773–787.

Birstein, V. J., Hanner, R., & DeSalle, R. 1997. Phylogeny of the Acipenseriformes: cytogenetic

and molecular approaches. Environ. Biol. Fish. 48: 127–155.

Birstein, V. J., Doukakis, P., DeSalle, R., & McEachran, J. D. 2002. Molecular phylogeny of

Acipenseridae: nonmonophyly of Scaphirhynchinae. Copeia. 2002: 287–301.

Bittner, L., Halary, S., Payri, C., Cruaud, C., de Reviers, B., Lopez P., & Bapteste, E. 2010.

Some considerations for analyzing biodiversity using integrative metagenomics and gene

networks. Biol. Direct. 5: 47.

Bonett, R. M., Kozak, K. H., Vieites, D. R., Bare, A., Wooten, J. A., & Trauth, S. E. 2007. The

importance of comparative phylogeography in diagnosing introduced species: a lesson

from the seal salamander, Desmognathus monticola. BMC Ecol. 7: 7.

116

Boo, G. H., Le Gall, L., Miller, K. A., Freshwater, D. W., Wernberg, T., Terada, R., Yoon, K. J.,

& Boo, S. M. 2016. A novel phylogeny of the Gelidiales (Rhodophyta) based on five genes

including the nuclear CesA, with descriptions of Orthogonacladia gen. nov. and

Orthogonacladiaceae fam. nov. Mol. Phylogenet. Evol. 101: 359–372.

Borowiec, M. L. 2016. AMAS: a fast tool for alignment manipulation and computing of

summary statistics. Peer J. 4: e1660.

Bouchard-Côté, A., Hall, D., Griffiths, T. L., & Klein, D. 2013. Automated reconstruction of

ancient languages using probabilistic models of sound change. Proc. Natl. Acad. Sci.

U.S.A. 110: 4224–4229.

Bowers, J. E., Chapman, B. A, Rong, J., & Paterson, A. H. 2003. Unravelling angiosperm genome

evolution by phylogenetic analysis of chromosomal duplication events. Nature. 422: 433–

438.

Brodie, J., & Lewis, J. 2007. Unravelling the Algae: the Past, Present, and Future of Algal

Systematics. Systematics Association Special Volume Series, vol. 75. CRC Press, Boca

Raton, USA.

Brooks, T. M., Cuttelod, A., Faith, D. P., Garcia-Moreno, J., Langhammer, P., & Pérez-Espona,

S. 2015. Why and how might genetic and phylogenetic diversity be reflected in the

identification of key biodiversity areas? Philos. Trans. R. Soc. Lond. B Biol. Sci. 370:

20140019.

Broom, J. E. S., Hart, D. R., Farr, T. J., Nelson, W. A., Neill, K. F., Harvey, A. H., &

Woelkerling, W. J. 2008. Utility of psbA and nSSU for phylogenetic reconstruction in the

Corallinales based on New Zealand taxa. Mol. Phylogenet. Evol. 46: 958–973.

117

Burnham, K. P., & Anderson, D. R. 2002. Model Selection and Multimodel Inference: a

Practical Information-Theoretic Approach. Springer, New York, USA.

Carlile, A. L., & Sherwood, A. R. 2013. Phylogenetic affinities and distribution of the Hawaiian

freshwater red algae (Rhodophyta). Phycologia. 52: 309–319.

Carro, B., Lopez, L., Peña, V., Bárbara, I., & Barreiro, R. 2014. DNA barcoding allows the

accurate assessment of European maerl diversity: a Proof-of-Concept study. Phytotaxa.

190: 176–189.

Carstens, B. C., Pelletier, T. A., Reid, N. M., & Satler, J. D. 2013. How to fail at species

delimitation. Mol. Ecol. 22: 4369–4383.

Cavender-Bares, J., Kozak, K. H., Fine, P. V. A., & Kembel, S. W. 2009. The merging of

community ecology and phylogenetic biology. Ecol. Lett. 12: 693–715.

CBOL Plant Working Group. 2009. A DNA barcode for land plants. Proc. Natl. Acad. Sci.

U.S.A. 106: 12794–12797.

Chapman, B. A., Bowers, J. E., Feltus, F. A., & Paterson, A. H. 2006. Buffering of crucial

functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome

duplication. Proc. Natl. Acad. Sci. U.S.A. 103: 2730–2735.

Chapman, M. A., & Abbott, R. J. 2010. Introgression of fitness genes across a ploidy barrier.

New Phytol. 186: 63–71.

Cho, C. H., Choi, J. W., Lam, D. W., Kim, K. M., & Yoon, H. S. 2018. Plastid genome analysis

of three Nemaliophycidae red algal species suggests environmental adaptation for iron

limited habitats. PLoS One. 13: e0196995.

Chou, J. Y., & Wang, W. L. 2006. Batrachospermum arcuatum Kylin (Batrachospermales,

Rhodophyta), a freshwater red alga newly recorded in Taiwan. Taiwania. 51: 58–63. 118

Chou, J. Y., Wen, Y. D., & Wang, W. L. 2014. Morphological and molecular data confirm new

records of three freshwater red algae, Batrachospermum macrosporum, Nemalionopsis

tortuosa and Caloglossa leprieurii in Taiwan. Nova Hedwigia. 98: 233–246.

Chou, J. Y., Liu, S. L., Wen, Y. D., & Wang, W. L. 2015. Phylogenetic analysis of Bangiadulcis

atropurpurea (A. Roth) W. A. Nelson and Bangia fuscopurpurea (Dillwyn) Lyngbye

(Bangiales, Rhodophyta) in Taiwan. Arch. Biol. Sci. 67: 445–454.

Cock, P. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I.,

Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. L. 2009. BioPython: freely

available Python tools for computational molecular biology and bioinformatics.

Bioinformatics. 25: 1422–1423.

Collares-Pereira, M.J. 1994. The karyology of barbins and the possible plesiomorphic condition

of polyploidy in Cyprinidae. Bull. Fr. Pêche Piscic. 334: 191–199.

Comai, L. 2005. The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6: 836–

846.

Costa, J. F., Lin, S. M., Macaya, E. C., Fernández-García, C., & Verbruggen, H. 2016.

Chloroplast genomes as a tool to resolve red algal phylogenies: a case study in the

Nemaliales. BMC Evol. Biol. 16: 205.

Coyne, J. A., & Orr, H. A. 2004. Speciation. Sinauer Associates, Inc., Sunderland, MA, USA.

Crespi, B. J., & Fulton, N. J. 2004. Molecular systematics of Salmonidae: combined nuclear data

yields a robust phylogeny. Mol. Phylogenet. Evol. 31: 658–679.

Cui, L., Wall, P. K., Leebens-Mack, J., Lindsay, B. G., Soltis, D. E., Doyle, J. J., et al. 2006.

Widespread genome duplications throughout the history of flowering plants. Genome Res.

16: 738–749. 119

Danzmann, R. G., Davidson, E. A., Ferguson, M. M., Gharbi, K., Koop, B. F., Høyheim, B., et al.

2008. Distribution of ancestral proto-Actinopterygian chromosome arms within the genomes

of 4R-derivative salmonid fishes (Rainbow trout and Atlantic salmon). BMC Genom. 9: 557.

Darriba, D., Taboada, G. L., Doallo, R., & Posada, D. 2012. jModelTest 2: more models, new

heuristics and parallel computing. Nat. Methods 9: 772.

Daru, B. H., Elliott, T. L., Park, D. S., & Davies, T. J. 2017. Understanding the processes

underpinning patterns of phylogenetic regionalization. Trends Ecol. Evol. 32: 845–860.

Davis, M. P., Midford, P. E., & Maddison, W. 2013. Exploring power and parameter estimation

of the BiSSE method for analyzing species diversification. BMC Evol. Biol. 13: 38.

Dehal, P., & Boore, J. L. 2005. Two rounds of whole genome duplication in the ancestral

vertebrate. PLoS Biol. 3: e314.

Deiner, K., Bik, H. M., Mächler, E., Seymour, M., Lacoursière-Roussel, A., Altermatt, F., Creer,

S., Bista, I., Lodge, D. M., de Vere, N., Pfrender, M. E., & Bernatchez, L. 2017.

Environmental DNA metabarcoding: transforming how we survey animal and plant

communities. Mol. Ecol. 26: 5872–5895.

Díaz-Tapia, P., Maggs, C. A., West, J. A., & Verbruggen, H. 2017. Analysis of chloroplast

genomes and a supermatrix inform reclassification of the Rhodomelaceae (Rhodophyta). J.

Phycol. 53: 920–937.

Díaz-Tapia, P., Maggs, C. A., Macaya, E. C., & Verbruggen, H. 2018. Widely distributed red

algae often represent hidden introductions, complexes of cryptic species or species with

strong phylogeographic structure. J. Phycol. 54: 829–839.

Dlugosch, K. M., & Parker, I. M. 2008. Founding events in species invasions: genetic variation,

adaptive evolution, and the role of multiple introductions. Mol. Ecol. 17: 431–449. 120

Duggan, I. C., & Pullan, S. G. 2017. Do freshwater aquaculture facilities provide an invasion risk

for zooplankton hitchhikers? Biol. Invasions. 19: 307–314.

Duggan, I. C., Champion, P. D., & MacIsaac, H. J. 2018. Invertebrates associated with aquatic

plants bought from aquarium stores in Canada and New Zealand. Biol. Invasions. 20:

3167–3178.

Dupuis, J. R., Roe, A. D., & Sperling, F. A. 2012. Multi-locus species delimitation in closely

related animals and fungi: one marker is not enough. Mol. Ecol. 21: 4422–4436.

Durand, J. D., Tsigenopoulos, C. S., Unlü, E., & Berrebi, P. 2002. Phylogeny and biogeography

of the family Cyprinidae in the Middle East inferred from cytochrome b DNA- evolutionary

significance of this region. Mol. Phylogenet. Evol. 22: 91–100.

Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res. 32: 1792–1797.

Edgar, R. C. 2010. Search and clustering orders of magnitude faster than BLAST.

Bioinformatics. 26: 2460–2461.

Eschmeyer, W. N., & Fong, J. D. 2012. Species of Fishes by family/subfamily. World Wide Web

electronic publication. https://www.calacademy.org/scientists/projects/eschmeyers-catalog-

of-fishes. Version 03/2012.

Escudero, M., Martín-Bravo, S., Mayrose, I., Fernández-Mazuecos, M., Fiz-Palacios, O., Hipp,

A. L., Pimentel, M., Jiménez-Mejías, P., Valcárcel, V., Vargas, P., & Luceño, M. 2014.

Karyotypic changes through dysploidy persist longer over evolutionary time than polyploid

changes. PLoS One. 9: e85266.

121

Esselstyn, J. A., Evans, B. J., Sedlock, J. L., Anwarali Khan, F. A., & Heaney, L. R. 2012.

Single-locus species delimitation: a test of the mixed Yule-coalescent model, with an

empirical application to Philippine round-leaf bats. Proc. R. Soc. B. 279: 3678–3686.

Faith, D. P. 1992. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 61: 1–10.

Fawcett, J., & Van de Peer, Y. 2010. Angiosperm polyploids and their road to evolutionary

success. Trends Evol. Biol. 2: 16–21.

Felsenstein, J. 2003. Inferring Phylogenies. Sinauer Associates, Sunderland, MA, USA.

Ferris, S. D. 1984. Tetraploidy and the evolution of the catostomid fishes. In: Evolutionary

Genetics of Fishes. (ed. Turner, B. J.), pp. 55–93. Plenum Press, New York, USA.

Fisher, R. A. 1935. The sheltering of lethals. Am. Nat. 69: 446–455.

FitzJohn, R. G., Maddison, W. P., & Otto, S. P. 2009. Estimating trait-dependent speciation and

extinction rates from incompletely resolved phylogenies. Syst. Biol. 58: 595–611.

FitzJohn, R. 2010. Quantitative traits and diversification. Syst. Biol. 59: 619–633.

FitzJohn, R. G. 2012. Diversitree: comparative phylogenetic analyses of diversification in R.

Methods Ecol. Evol. 3: 1084–1092.

Freshwater, D. W., Fredericq, S., Butler, B. S., Hommersand, M. H., & Chase, M. W. 1994. A

gene phylogeny of the red algae (Rhodophyta) based on plastid rbcL. Proc. Natl. Acad.

Sci. U.S.A. 91: 7281–7285.

Freshwater, D. W., Tudor, K., O’Shaughnessy, K., & Wysor, B. 2010. DNA barcoding in the red

algal order Gelidiales: comparison of COI with rbcL and verification of the "barcoding

gap". Cryptogamie, Algol. 31: 435–449.

Freyman, W. A., & Höhna, S. 2018. Cladogenetic and anagenetic models of chromosome

number evolution: a Bayesian model averaging approach. Syst. Biol. 67: 195–215. 122

Froese, R., & Pauly, D. 2012. FishBase. World Wide Web electronic publication.

www..org. Version 02/2012.

Fuelling, L. J., Adams, J. A., Badik, K. J., Bixby, R. J., Caprette, C. L., Caprette, H. E.,

Chiasson, W. B., Davies, C. L., Decolibus, D. T., Glascock, K. I., Hall, M. M., Perry, W.

L., Schultz, E. R., Taylor, D. A., Vis, M. L., & Verb, R. G. 2012. An unusual occurrence of

Thorea hispida (Thore) Desvaux chantransia on rusty crayfish in West Central Ohio. Nova

Hedwigia. 94: 355–366.

Gaffaroğlu, M., & Yüksel, E. 2004. Karyotype analysis of macrostomus Heckel, 1843

(Pisces: Cyprinidae). J Kirsehir Education Faculty. 5: 235–239.

Gardy, J. L., Johnston, J. C., Ho Sui, S. J., Cook, V. J., Shah, L., Brodkin, E., Rempel, S., Moore,

R., Zhao, Y., Holt, R., Varhol, R., Birol, I., Lem, M., Sharma, M. K., Elwood, K., Jones, S.

J., Brinkman, F. S., Brunham, R. C., & Tang, P. 2011. Whole-genome sequencing and

social-network analysis of a tuberculosis outbreak. N. Engl. J. Med. 364: 730–739.

Glick, L., & Mayrose, I. 2014. ChromEvol: Assessing the pattern of chromosome number

evolution and the inference of polyploidy along a phylogeny. Mol. Biol. Evol. 31: 1914–

1922.

Goldberg, E. E., & Igic, B. 2012. Tempo and mode in plant breeding system evolution.

Evolution. 66: 3701–3709.

Goldberg, E. E., Kohn, J. R., Lande, R. Robertson, K. A., Smith, S. A., & Igic, B. 2010. Species

selection maintains self-incompatibility. Science. 330: 459–460.

Gomolińska, A. M., Szczecińska, M., Sawick, J., Krawczyk, K., & Szkudlarz, P. 2017.

Phylogenetic analysis of selected representatives of the genus Erica based on the genes

encoding the DNA-dependent RNA polymerase I. Biodiv. Res. Conserv. 46: 1–18. 123

Gorshkova, G., Gorshkov, S., & Golani, D. 2002. Karyotypes of Barbus canis and Capoeta

damascina (Pisces, Cyprinidae) from the Middle East. Italian J. Zool. 69: 191–194.

Grant, V. 1963. The Origins of Adaptation. Columbia University Press, New York, New York,

USA.

Gregory, T. R. 2012. Animal Genome Size Database (release 2.0). World Wide Web electronic

publication. http://www.genomesize.com.

Gregory, T. R. 2016. Animal Genome Size Database. World Wide Web electronic publication.

http://www.genomesize.com.

Gue´gan, J. F., Rab, P., Machordom, A., & Doadrio, I. 1995. New evidence of hexaploidy in ‘large’

African Barbus with some considerations on the origin of hexaploidy. J. Fish. Biol. 47: 192–

198.

Guindon, S., & Gascuel, O. 2003. A simple, fast and accurate method to estimate large

phylogenies by maximum-likelihood. Syst. Biol. 52: 696–704.

Guiry, M. D., & Guiry, G. M. 2019. AlgaeBase. World Wide Web electronic publication.

http://www.algaebase.org. Accessed on July 4, 2019.

Guiry, M. D., & Guiry, G. M. 2020. AlgaeBase. World Wide Web electronic publication.

https://www.algaebase.org. Accessed on March 7, 2020.

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., et al. 2013.

De novo transcript sequence reconstruction from RNA-Seq: reference generation and

analysis with Trinity. Nat. Protoc. 8: 1494–1512.

Haldane, J. B. S. 1933. The Causes of Evolution. Longwood Green, London, United .

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis

program for Windows 95/98/NT. Nuc. Acids Symp. Ser. 41: 95–98. 124

Harlan, J. R., & deWet, J. M. J. 1975. On Ö. Winge and a prayer: The origins of polyploidy. Bot.

Rev. 41: 361–390.

Harmon, L. J. 2018. Phylogenetic Comparative Methods: Learning from Trees. CreateSpace

Independent Publishing Platform.

Hawes, I., Howard-Williams,C., Wells, R. D. S., & Clayton, J. S. 1991. Invasion of water net,

Hydrodictyon reticulatum: the surprising success of an aquatic plant new to our flora. New

Zealand J. Marine Freshwater Res. 25: 227–229.

He, D., & Chen, Y. 2006. Biogeography and molecular phylogeny of the genus Schizothorax

(Teleostei: Cyprinidae) in inferred from cytochrome b sequences. J. Biogeog. 33:

1448–1460.

Heise, W., Babik, W., Kubisz, D., & Kajtoch, L. 2015. A three-marker DNA barcoding approach

for ecological studies of xerothermic plants and herbivorous insects from central Europe.

Bot. J. Linn. Soc. 177: 576–592.

Hoegg, S., Brinkmann, H., Taylor, J. S., & Meyer, A. 2004. Phylogenetic timing of the fish-

specific genome duplication correlates with the diversification of teleost fish. J. Mol. Evol.

59: 190–203.

Höhna, S., Landis, M. J., Heath, T. A., Boussau, B., Lartillot, N., Moore, B. R., Huelsenbeck, J.

P., & Ronquist, F. 2016. RevBayes: Bayesian phylogenetic inference using graphical

models and an interactive model-specification language. Syst. Biol. 65: 726–736.

Hsieh, C. J., Zhan, S. H., Lin, Y. C., Tang, S. L., & Liu, S. L. 2015. Analysis of rbcL sequences

reveals the global biodiversity, community structure, and biogeographical pattern of

thermoacidophilic red algae (Cyanidiales). J. Phycol. 51: 682–694.

125

Hsieh, C. J., Zhan, S. H., Liao, C. P., Tang, S. L., Wang, L. C., Watanabe, T., Geraldino, P. J., &

Liu, S. L. 2018. The effects of contemporary selection and dispersal limitation on the

community assembly of acidophilic microalgae. J. Phycol. 54: 720–733.

Hsieh, T. C., Ma, K. H., & Chao, A. 2016. iNEXT: an R package for rarefaction and

extrapolation of species diversity (Hill numbers). Method Ecol. Evol. 7: 1451–1456.

Huelsenbeck, J. P., Nielsen, R., & Bollback, J. P. 2003. Stochastic mapping of morphological

characters. Syst. Biol. 52: 131–158.

Huelsenbeck, J. P., & Ronquist, F. 2001. MRBAYES: Bayesian inference of phylogenetic trees.

Bioinformatics. 17: 754–755.

Hugall, A. F., & Stuart-Fox, D. 2012. Accelerated speciation in colour-polymorphic birds.

Nature. 485: 631–634.

Hugerth, L., & Andersson, A. F. 2017. Analysing microbial community composition through

amplicon sequencing: from sampling to hypothesis testing. Front. Microbiol. 8: 1561.

Ishiguro, N. B., Miya, M., & Nishida, M. 2003. Basal euteleostean relationships: a mitogenomic

perspective on the phylogenetic reality of the "Protacanthopterygii". Mol. Phylogenet. Evol.

27: 476–488.

Jaillon, O., Aury, J. M., Brunet, F., Petit, J. L., Stange-Thomann, N., Mauceli, E., et al. 2004.

Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate

proto-karyotype. Nature. 431: 946–957.

Janouškovec, J., Liu, S. L., Martone, P., Collén, J., & Keeling, P. J. 2013. Evolution of red algal

plastid genomes: ancient architectures, introns, horizontal gene transfer and taxonomic

utility of plastid markers. PLoS One. 8: e59001.

126

Jiao, Y., Wickett, N. J., Ayyampalayam, S., Chanderbali, A. S., Landherr, L., Ralph, P. E., et al.

2011. Ancestral polyploidy in seed plants and angiosperms. Nature. 473: 97–100.

Johnson, K. R., Wright, J. E. J., & May, B. 1987. Linkage relationships reflecting ancestral

tetraploidy in salmonid fish. Genetics. 116: 579–591.

Johnston, E. T., Lim, P. E., Buhari, N., Keil, E. J., Djawad, M. I., & Vis, M. L. 2014. Diversity

of freshwater red algae (Rhodophyta) in Malaysia and Indonesia from morphological and

molecular data. Phycologia. 53: 329–341.

Johnston, E. T., Dixon, K. R., West, J. A., Buhari, N., & Vis, M. L. 2018. Three gene phylogeny

of the Thoreales (Rhodophyta) reveals high species diversity. J. Phycol. 54: 159–170.

Kaštovský, J., Hauer, T., Mareš, J., Krautová, M., Beša, T., Komárek, J., Desirtivá, B., Heteša,

J., Hindáková, A., Houk, V., Janeček, E., Kopp, R., Marvan, P., Pumann, P., Skácelová,

O., & Zapomělová, E. 2010. A review of the alien and expansive species of freshwater

cyanobacteria and algae in the Czech Republic. Biol. Invasions. 12: 3599–3625.

Kato, A., Morita, N., Hiratsuka, T., & Suda, S. 2009. Recent introduction of a freshwater red

alga Chantransia macrospora (Batrachospermales, Rhodophyta) to Okinawa, Japan. Aquat.

Invasions. 4: 567–574.

Katoh, K., & Standley, D. M. 2013. MAFFT multiple sequence alignment software version 7:

improvements in performance and usability. Mol. Biol. Evol. 30: 772–780.

Kaufmann, B. 2010. Algen-Fibel Aquarium: Kein Problem mit Süßwasseralgen, pp. 94. Dähne

Verlag Press, Germany.

Keller, M. J., & Gerhardt, H. C. 2001. Polyploidy alters advertisement call structure in gray

treefrogs. Proc. Biol. Sci. B. 268: 341–345.

127

Kellis, M., Birren, B. W., & Lander, E. S. 2004. Proof and evolutionary analysis of ancient genome

duplication in the yeast Saccharomyces cerevisiae. Nature. 428: 617–624.

Kembel, S. W., Cowan, P. D., Helmus, M. R., Cornwell, W. K., Morlon, H., Ackerly, D. D.,

Blomberg, S. P., & Webb, C. O. 2010. Picante: R tools for integrating phylogenies and

ecology. Bioinformatics. 11: 1463–1464.

Kılıç-Demirok, N., & Ünlü, E. 2001. Karyotypes of Cyprinid fish Capoeta trutta and Capoeta

capoeta umbla (Cyprinidae) from the Tigris River. Tr. J. Zool. 25: 389–395.

Kinziger, A. P., Nakamoto, R. J., Anderson, E. C., & Harvey, B. C. 2011. Small founding

number and low genetic diversity in an introduced species exhibiting limited invasion

success (speckled dace, Rhinichthys osculus). Ecol. Evol. 1: 73–84.

Knowles, L. L., & Carstens, B. C. 2007. Delimiting species without monophyletic gene trees.

Syst. Biol. 56: 887–895.

Kotlik, P., Tsigenopoulos, C.S., Ráb, P., & Berrebi, P. 2002. Two new Barbus species from the

Danube River basin, with redescription of B. petenyi (Teleostei: Cyprinidae). Folia Zool.

51: 227–240.

Kottelat, M. 2004. Botia kubotai, a new species of loach (Teleostei: from the Ataran

River basin (Myanmar), with comments on botiine nomenclature and diagnoses of two new

genera. Zootaxa. 401: 1–18.

Krehenwinkel, H., Pomerantz, A., Henderson, J. B., Kennedy, S. R., Lim, J. Y., Swamy, V.,

Shoobridge, J. D., Graham, N., Patel, N. H., Gillespie, R. G., & Prost, S. 2019. Nanopore

sequencing of long ribosomal DNA amplicons enables portable and simple biodiversity

assessments with high phylogenetic resolution across broad taxonomic scale. Gigascience.

8: giz006. 128

Kress, W. J., Erickson, D. L., Jones, F. A., Swenson, N. G., Perez, R., Sanjur, O., &

Bermingham, E. 2009. Plant DNA barcodes and a community phylogeny of a tropical

forest dynamics plot in Panama. Proc. Natl. Acad. Sci. U.S.A. 106: 18621–18626.

Krieger, J., Hett, A. K., Fuerst, P. A., Artyukhin, E., & Ludwig, A. 2008. The molecular phylogeny

of the order Acipenseriformes revisited. J. Appl. Ichthyol. 24: 36–45.

Lahaye, R., van der Bank, M., Bogarin, D., Warner, J., Pupulin, F., Gigot, G., Maurin, O.,

Duthoit, S., Barraclough, T. G., & Savolainen, V. 2008. DNA barcoding the floras of

biodiversity hotspots. Proc. Natl. Acad. Sci. U.S.A. 105: 2923–2928.

Lam, D. W., Verbruggen, H., Saunders, G. W., & Vis, M. L. 2016. Multigene phylogeny of the

red algal subclass Nemaliophycidae. Mol. Phylogenet. Evol. 94: 730–736.

Landis, J. B., Soltis, D. E., Li, Z., Marx, H. E., Barker, M. S., Tank, D. C., & Soltis, P. S. 2018.

Impact of whole-genome duplication events on diversification rates in angiosperms. New

Phytol. 105: 348–363.

Lanfear, R., Calcott, B., Kainer, D., Mayer, C., & Stamatakis, A. 2014. Selecting optimal

partitioning schemes for phylogenomic datasets. BMC Evol. Biol. 14: 82.

Lanfear, R., Frandsen, P. B., Wright, A. M., Senfeld, T., & Calcott, B. 2016. PartitionFinder 2:

new methods for selecting partitioned models of evolution for molecular and

morphological phylogenetic analyses. Mol. Biol. Evol. 34: 772–773.

Le Comber, S. C., & Smith, C. 2004. Polyploidy in fishes: patterns and processes. Biol. J. Linn.

Soc. 82: 431–442.

Le Roux, J. J., Brown, G. K., Byrne, M., Ndlovu, J., Richardson, D. M., Thompson, G. D., &

Wilson, J. R. U. 2011. Phylogeographic consequences of different introduction histories of

129

invasive Australian Acacia species and Paraserianthes lophantha (Fabaceae) in South

Africa. Divers. Distrib. 17: 861–871.

Leggatt, R. A., & Iwama, G. K. 2003. Occurrence of polyploidy in the fishes. Rev. Fish Biol.

Fish. 13: 237–46.

Lei, R., Rowley, T. W., Zhu, L., Bailey, C. A., Engberg, S. E., Wood, M. L., Christman, M. C.,

Perry, G. H., Louis, E. E. Jr., & Lu, G. 2012. PhyloMarker—a tool for mining phylogenetic

markers through genome comparison: application of the mouse lemur (genus Microcebus)

phylogeny. Evol. Bioinform. 8: 423–435.

Leigh, J. W., & Bryant, D. 2015. PopART: Full-feature software for haplotype network

construction. Methods Ecol. Evol. 6: 1110–1116.

Leliaert, F., Verbruggen, H., Vanormelingen, P., Steen, F., López-Bautista, J. M., Zuccarello, G.

C., & De Clerck, O. 2014. DNA-based species delimitation in algae. Eur. J. Phycol. 49:

179–196.

Lepage, T., Bryant, D., Philippe, H., & Lartillot, N. 2007. A general comparison of relaxed

molecular clock models. Mol. Biol. Evol. 24: 2669–2680.

Levente, V., & Bud, I. 2010. Effects of hydrogen peroxide on Compsopogon caeruleus

(Rhodophycophyta) and two superior plants. AACL Bioflux. 3: 367–372.

Levin, B. A., Freyhof, J., Lajbner, Z., Perea, S., Abdoli, A., Gaffaroğlu, M., et al. 2012.

Phylogenetic relationships of the algae scraping cyprinid genus Capoeta (Teleostei:

Cyprinidae). Mol. Phylogenet. Evol. 62: 542–549.

Levin, D. A. 1975. Minority cytotype exclusion in local plant populations. Taxon 24: 35–43.

Levin, D. A. 1983. Polyploidy and novelty in flowering plants. Am. Nat. 122: 1–25.

130

Levin, D. A. 2012. The long wait for hybrid sterility in flowering plants. New Phytol. 196: 666–

670.

Li, C., Lu, G., & Ortí, G. 2008. Optimal data partitioning and a test case for ray-finned fishes

() based on ten nuclear loci. Syst. Biol. 57: 519–539.

Li, L., Stoeckert Jr., C. J., & Roos, D. S. 2003. OrthoMCL: identification of ortholog groups for

eukaryotic genomes. Genome Res. 13: 2178–2189.

Li, X., & Guo, B. 2020. Substantially adaptive potential in polyploid cyprinid fishes: evidence

from biogeographic, phylogenetic and genomic studies. Proc. R. Soc. B. 287: 20193008.

Lin, C. K., & Blum, J. L. 1977. Recent invasion of a red alga (Bangia atropurpurea) in Lake

Michigan. J. Fish. Res. Board Canada. 34: 2413–2416.

Lin, S. M., Fredericq, S., & Hommersand, M. H. 2001. Systematics of the Delesseriaceae

(Ceramiales, Rhodophyta) based on LSU rDNA and rbcL sequences, including the

Phycodryoideae, subfam. nov. J. Phycol. 37: 881–899.

Liu, S. L., Wang, L. C., & Wang, W. L. 2004. Inorganic carbon utilization of the freshwater red

alga Compsopogon coeruleus (Balbis) Montagne (Compsopogonaceae, Rhodophyta)

evaluated by in situ measurement of chlorophyll fluorescence. Taiwania. 49: 207–217.

López, J. A., Chen, W. J., & Ortí, G. 2004. Esociform phylogeny. Copeia. 2004: 449–464.

Ludwig, A. 2008. Identification of Acipenseriformes species in trade. J. Appl. Ichthyol. 24: 2–19.

Ludwig, A., Belfiore, N.M., & Pitra, C. 2001. Genome duplication events and functional reduction

of ploidy levels in sturgeons (Acipenser, Huso, and Scaphirhynchus). Genetics. 158: 1203–

1215.

Luther, H. 1979. Chara conniens in the Baltic sea area. Annales Botanici Fennici. 16: 141–150.

131

Lysak, M. A. 2014. Live and let die: centromere loss during evolution of plant chromosomes.

New Phytol. 203: 1082–1089.

Mable, B. K., Alexandrou, M. A., & Taylor, M. I. 2011. Genome duplication in amphibians and

fish: an extended synthesis. J. Zool. 284: 151–182.

Machordom, A., & Doadrio, I. 2001a. Evolutionary history and speciation modes in the cyprinid

genus Barbus. Proc. R. Soc. Lond. 268: 1297–1306.

Machordom, A., & Doadrio, I. 2001b. Evidence of a Cenozoic Betic-Kabilian connection based

on freshwater fish phylogeography (Luciobarbus, Cyprinidae). Mol. Phylogenet. Evol. 18:

252–263.

Maddison, W. P., Midford, P. E., & Otto, S. P. 2007. Estimating a binary character's effect on

speciation and extinction. Syst. Biol. 56: 701–710.

Maddison, W. P., & Maddison, D. R. 2011. Mesquite: a modular system for evolutionary

analysis. Version 2.75. http://mesquiteproject.org.

Maddison, W. P., & FitzJohn, R. G. 2015. The unsolved challenge to phylogenetic correlation

tests for categorical characters. Syst. Biol. 64: 127–136.

Magnuson-Ford, K. S., & Otto, S. P. 2012. Linking the investigations of character evolution and

species diversification. Am. Nat. 180: 225–245.

Mandáková, T., Gloss, A. D., Whiteman, N. K., & Lysak, M. A. 2016. How diploidization

turned a tetraploid into a pseudotriploid. Am. J. Bot. 103: 1187–1196.

Mandáková, T., Pouch, M., Harmanová, K., Zhan, S. H., Mayrose, I., & Lysak, M. A. 2017.

Multispeed genome diploidization and diversification after an ancient allopolyploidization.

Mol. Ecol. 26: 6445–6462.

132

Mank, J. E., & Avise, J. C. 2006. Phylogenetic conservation of chromosome numbers in

Actinopterygiian fishes. Genetica. 127: 321–327.

Manny, B. A., Esdall, T., & Wujek, D. 1991. Compsopogon cf. coeruleus, a benthic red alga

(Rhodophyta) new to the Laurentian Great Lakes. Can. J. Bot. 69: 1237–1240.

Mariac, C., Vigouroux, Y., Duponchelle, F., García-Dávila, C., Nunez, J., Desmarais, E., &

Renno, J. F. 2018. Metabarcoding by capture using a single COI probe (MCSP) to identify

and quantify fish species in ichthyoplankton swarms. PLoS One. 13: e0202976

Marková, S., Šanda, R., Crivelli, A., Shumka, S., Wilson, I.F., Vukić, J., et al. 2010. Nuclear and

mitochondrial DNA sequence data reveal the evolutionary history of Barbus (Cyprinidae) in

the ancient lake systems of the Balkans. Mol. Phylogenet. Evol. 55: 488–500.

Masterson, J. 1994. Stomatal size in fossil plants: evidence for polyploidy in majority of

angiosperms. Science. 264: 421–423.

Mayrose, I., Barker, M. S., & Otto, S. P. 2010. Probabilistic models of chromosome number

evolution and the inference of polyploidy. Syst. Biol. 59: 132–144.

Mayrose, I., Zhan, S. H., Rothfels, C. J., Magnuson-Ford, K. S., Barker, M. S., Rieseberg, L. H.,

& Otto, S. P. 2011. Recently-formed polyploid plants diversify at lower rates. Science.

333: 1257.

Mayrose, I., Zhan, S. H., Rothfels, C. J., Arrigo, N., Barker, M. S., Rieseberg, L. H., & Otto, S.

P. 2015. Methods for studying polyploid diversification and the dead end hypothesis: a

reply to Soltis et al. New Phytol. 206: 27–35.

Meyers, L. A., & Levin, D. A. 2006. On the abundance of polyploids in flowering plants.

Evolution. 60: 1198–1206.

133

Muirhead, J. R., Gray, D. K., Kelly, D. W., Ellis, S. M., Heath, D. D., & Macisaac, H. J. 2008.

Identifying the source of species invasions: sampling intensity vs. genetic diversity. Mol.

Ecol. 17: 1020–1035.

Necchi Jr., O., Garcia Fo, A. S., Salomaki, E. D., West, J. A., Aboal, M., & Vis, M. L. 2013.

Global sampling reveals low genetic diversity within Compsopogon (Composopogonales,

Rhodophyta). Eur. J. Phycol. 48: 152–162.

Nelson, W. A., Sutherland, J. E., Farr, T. J., Hart, D. R., Neill, K. F., Kim, H. J., & Yoon, H. S.

2015. Multi-gene phylogenetic analyses of New Zealand coralline algae: Corallinapetra

Novaezelandiae gen. et sp. nov. and recognition of the Hapalidiales ord. nov. J. Phycol. 51:

454–468.

Novak, S. J., & Mack, R. N. 1993. Genetic variation in Bromus tectorum (Poaceae): comparison

between native and introduced populations. Heredity. 71: 167–176.

Novak, S. J., & Mack, R. N. 2005. Genetic bottlenecks in alien plant species: influences of

mating systems and introduction dynamics. In: Species Invasions: Insights into Ecology,

Evolution, and Biogeography (eds. Sax, D. F., Stachowicz, J. J., Gaines, S. D.), pp. 201-

228. Sinauer Associates, Sunderland, Massachusetts, USA.

Ohno, S. 1970. Evolution by gene duplication. Springer, New York, USA.

One Thousand Plant Transcriptomes Initiative. 2019. One thousand plant transcriptomes and the

phylogenomics of green plants. Nature. 574: 679–685.

Orr, H. A., 1990. ‘Why polyploidy is rarer in animals than in plants’ revisited. Am. Nat. 6: 759–

770.

Otto, S. P. 2007. The evolutionary consequences of polyploidy. Cell. 131: 452–462.

134

Otto, S. P., & Goldstein, D. B. 1992. Recombination and the evolution of diploidy. Genetics. 131:

745–751.

Otto, S. P., & Whitton, J. 2000. Polyploidy: incidence and evolution. Ann. Rev. Genet. 34: 401–

437.

Padilla, D. K., & Williams, S. L. 2004. Beyond ballast water: aquarium and ornamental trades as

sources of invasive species in aquatic ecosystems. Front. Ecol. Environ. 2: 131–138.

Palenik, B., & Swift, H. 1996. Cyanobacterial evolution and Prochlorophyte diversity as seen in

DNA-dependent RNA polymerase gene sequences. J. Phycol. 32: 638–646.

Paradis, E., Claude, J., & Strimmer, K. 2004. APE: Analyses of Phylogenetics and Evolution in

R language. Bioinformatics. 20: 289–290.

Patoka, J., Bláha, M., Kalous, L., Vrabec, V., Buřič, M., & Kouba, A. 2016. Potential pest

transfer mediated by international ornamental plant trade. Sci. Rep. 6: 25896.

Pečnikar, Z. F., & Buzan, E. V. 2014. 20 years since the introduction of DNA barcoding: from

theory to application. J. Applied Genet. 55: 43–52.

Peng, Z., Ludwig, A., Wang, D., Diogo, R., Wei, Q., & He, S. 2007. Age and biogeography of

major clades in sturgeons and paddlefishes (Pisces: Acipenseriformes). Mol. Phylogenet.

Evol. 42: 854–862.

Penn, O., Privman, E., Landan, G., Graur, D., & Pupko, T. 2010. An alignment confidence score

capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27: 1759–1767.

Pérez-Valera, E., Goberna, M., & Verdú, M. 2015. Phylogenetic structure of soil bacterial

communities predicts ecosystem functioning. FEMS Microbiol. Ecol. 91: fiv031.

Phillips, R. B., & Ráb, P. 2001. Chromosome evolution in the Salmonidae (Pisces): an update.

Biol. Rev. 76: 1–15. 135

Pons, J., Barraclough, T. G., Gomez-Zurita, J., Cardoso, A., Duran, D. P., Hazell, S., Kamoun,

S., Sumlin, W. D., & Vogler, A. P. 2006. Sequence-based species delimitation for the

DNA taxonomy of undescribed insects. Syst. Biol. 55(4): 595–609.

Porter, T. M., Shokralla, S., Baird, D., Golding, G. B., & Hajibabaei, M. 2016. Ribosomal DNA

and plastid markers used to sample fungal and plant communities from wetland soils

reveals complementary biotas. PLoS One. 11: e0142759.

Porter, T. M., & Hajibabaei, M. 2018. Scaling up: A guide to high-throughput genomic

approaches for biodiversity analysis. Mol. Ecol. 27: 313–338.

Posada, D. 2008. jModelTest: phylogenetic model averaging. Mol. Biol. Evol. 25: 1253–1256.

Puillandre, N., Lambert, A., Brouillet, S., & Achaz, G. 2012. ABGD, Automatic Barcode Gap

Discovery for primary species delimitation. Mol. Ecol. 21: 1864–1877.

R Core Team. 2015. R: a language and environment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria. http://www.R-project.org/.

R Core Team. 2018. R: a language and environment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria. http://www.R-project.org/.

Rabosky, D. L., Santini, F., Eastman, J., Smith, S.A., Sidlauskas, B., Chang, J., et al. 2013. Rates

of speciation and morphological evolution are correlated across the largest vertebrate

radiation. Nat. Commun. 4: 1958.

Rabosky, D. L. 2014. Automatic detection of key innovations, rate shifts, and diversity-

dependence on phylogenetic trees. PLoS One. 9: e89543.

Rabosky, D. L., & Goldberg, E. E. 2015. Model inadequacy and mistaken inferences of trait-

dependent speciation. Syst. Biol. 64: 340–355.

136

Ratnasingham, S., & Hebert, P. D. N. 2007. BOLD : The Barcode of Life Data System

(www.barcodinglife.org). Mol. Ecol. Notes 7: 355–364.

Rahel, F. J. 2007. Biogeographic barriers, connectivity and homogenization of freshwater faunas:

it's a small world after all. Freshwater Biol. 52: 696–710.

Ramsey, J., & Schemske, D. W. 1998. Pathways, mechanisms, and rates of polyploid formation

in flowering plants. Ann. Rev. Ecol. Syst. 29: 467–501.

Ramsey, J., & Schemske, D. W. 2002. Neopolyploidy in flowering plants. Ann. Rev. Ecol. Syst.

33: 589–639.

Rausch, J. H., & Morgan, M. T. 2005. The effect of self-fertilization, inbreeding depression, and

population size on autopolyploid establishment. Evolution. 59: 1867–1875.

Reynolds, C., Miranda, N. A., & Cumming, G. S. 2015. The role of waterbirds in the dispersal of

aquatic alien and invasive species. Divers. Distrib. 21: 744–754.

Revell, L. J. 2012. phytools: an R package for phylogenetic comparative biology (and other

things). Methods Ecol. Evol. 3: 217–223.

Rice, A., Glick, L., Abadi, S., Einhorn, M., Kopelman, N. M., Salman-Minkov, A., Mayzel, J., et

al. 2015. The Chromosome Counts Database (CCDB)–a community resource of plant

chromosome numbers. New Phytol. 206: 19–26.

Riedel, A., Sagata, K., Suhardjono, Y. R., Tänzler, R., & Balke, M. 2013. Integrative taxonomy

on the fast track - towards more sustainability in biodiversity research. Front. Zool. 10: 15.

Robertson, K., Goldberg, E. E, & Igic, B. 2010. Comparative evidence for the correlated evolution

of polyploidy and self-compatibility in Solanaceae. Evolution. 94: 1527–1533.

Robinson, D. F., & Foulds, L. R. 1981. Comparison of phylogenetic trees. Math. Biosci. 53:

131–147. 137

Roman, J., & Darling, J. A. 2007. Paradox lost: genetic diversity and the success of aquatic

invasions. Trends Ecol. Evol. 22: 454–464.

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B.,

Liu, L., Suchard, M. A., & Huelsenbeck, J. P. 2012. MrBayes 3.2: efficient Bayesian

phylogenetic inference and model choice across a large model space. Syst. Biol. 61: 539–

542.

Rozas, J., Ferrer-Mata, A., Sánchez-DelBarrio, J. C., Guirao-Rico, S., Librado, P., Ramos-

Onsins, S. E., & Sánchez-Gracia, A. 2017. DnaSP v6: DNA sequence polymorphism

analysis of large datasets. Mol. Biol. Evol. 34: 3299–3302.

Sabath, N., Goldberg, E. E., Glick, L., Einhorn, M., Ashman, T. L., Ming, R., Otto, S. P., et al.

2016. Dioecy does not consistently accelerate or slow lineage diversification across

multiple genera of angiosperms. New Phytol. 209: 1290–1300.

Santini, F. S., Harmon, L. J., Carnevale, G., & Alfaro, M. E. 2009. Did genome duplication drive

the origin of teleosts? A comparative study of diversification in ray-finned fishes. BMC Evol.

Biol. 9: 194.

Saunders, G. W., & Moore, T. E. 2013. Refinements for the amplification and sequencing of red

algal DNA barcode and RedToL phylogenetic markers: a summary of current primers,

profiles and strategies. Algae. 28: 31–43.

Scarpino, S. V., Levin, D. A., & Meyers, L. A. 2014. Polyploid formation shapes flowering plant

diversity. Am. Nat. 184: 456–465.

Schliep, K. P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics. 27: 592–593.

Schloesser, D. W., Hudson, P. L., & Nichols, S. J. 1986. Distribution and habitat of Nitellopsis

obtusa (Characeae) in the Laurentian Great Lakes. Hydrobiologia. 133: 91–96. 138

Schubert, I., & Lysak, M. A. 2011. Interpretation of karyotype evolution should consider

chromosome structural constraints. Trends Genet. 27: 207–216.

Semon, M., & Wolfe, K. H. 2007. Consequences of genome duplication. Curr. Opin. Genet. Dev.

17: 505–512.

Sheath, R. G., & Hambrook, J. A. 1990. Freshwater ecology. In Biology of the Red Algae (eds.

Cole, K. M., & Sheath, R. G.), pp. 423–453. Cambridge University Press, New York,

USA.

Sheath, R. G., & Vis, M. L. 2015. Red Algae. Freshwater Algae of North America (eds. Wehr, J.

D., Sheath, R. G., & Kociolek, J. P.), pp. 237–264. Elsevier Inc. Press, San Diego, CA,

USA.

Shokralla, S., Gibson, J., King, I., Baird, D., Janzen, D., Hallwachs, W., & Hajibabaei, M. 2016.

Environmental DNA barcode sequence capture: Targeted, PCR-free sequence capture for

biodiversity analysis from bulk environmental samples. bioRxiv. 87437.

Smith, M. A., Woodley, N. E., Janzen, D. H., Hallwachs, W., & Henert, P. D. N. 2006. DNA

barcodes reveal cryptic host-specificity within the presumed polyphagous members of a

genus parasitoid flies (Diptera: Tachinidae). Proc. Natl. Acad. Sci. U.S.A. 103: 3657–3662.

Stamatakis, A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of

large phylogenies. Bioinformatics. 30: 1312–1313.

Simberloff, D., Martin, J. L., Genovesi, P., Maris, V., Wardle, D. A., Aronson, J., Courchamp,

F., Galil, B., García-Berthou, E., Pascal, M., Pyšek, P., Sousa, R., Tabacchi, E., & Vilà, M.

2013. Impacts of biological invasions: what's what and the way forward. Trends Ecol.

Evol. 28: 58–66.

139

Simberloff, D. 2014. Biological invasions: What's worth fighting and what can be won? Ecol.

Eng. 65: 112–121.

Simpson, G. G. 1944. Tempo and Mode in Evolution. Columbia University Press, New York,

New York, USA.

Šlecthtová, V., Bohlen, J., Freyhof, J., & Ráb, P. 2006. Molecular phylogeny of the Southeast

Asian freshwater fish family Botiidae (Teleostei: Cobitoidea) and the origin of polyploidy

in their evolution. Mol. Phylogenet. Evol. 39: 529–541.

Slotte, T., Huang, H., Lascoux, M., & Ceplitis, A. 2008. Polyploid speciation did not confer

instant reproductive isolation in Capsella (Brassicaceae). Mol. Biol. Evol. 25: 1472–1481.

Smith, S. A., & Dunn, C. W. 2008. Phyutility: a phyloinformatics tool for trees, alignments, and

molecular data. Bioinformatics. 24: 715–716.

Soltis, D. E., & Soltis, P. S. 1999. Polyploidy: Recurrent formation and genome evolution.

Trends Ecol. Evol. 14: 348–352.

Soltis, P. S., & Soltis, D. E. 2000. The role of genetic and genomic attributes in the success of

polyploids. Proc. Natl. Acad. Sci. U.S.A. 97: 7050–7057.

Soltis, D. E., Soltis, P. S., Schemske, D. W., Hancock, J. F., Thompson, J. N., Husband, B. C., &

Judd, W. S. 2007. Autopolyploidy in angiosperms: have we grossly underestimated the

number of species? Taxon. 56: 13–30.

Soltis, D. E., Albert, V. A., Leebens-Mack, J., Bell, C. D., Paterson, A. H., Zheng, C., et al. 2009.

Polyploidy and angiosperm diversification. Am. J. Bot. 96: 336–348.

Soltis, D. E., Segovia-Salcedo, M. C., Jordon-Thaden, I., Majure, L., Miles, N. M., Mavrodiev,

E. V., Mei, W., et al. 2014. Are polyploids really evolutionary dead-ends (again)? A

critical reappraisal of Mayrose et al. 2011. New Phytol. 202: 1105–1117. 140

Stebbins, G. L. 1938. Cytological characteristics associated with the different growth habits in

the dicotyledons. Am. J. Bot. 25: 189–198.

Stebbins, G. L. 1971. Chromosomal Evolution in Higher Plants. Edward Arnold Ltd., London,

United Kingdom.

Stoyneva, M., Vanhoutte, K., & Vyverman, W. 2006. First record of the tropical invasive alga

Compsopogon coeruleus (Balbis) Montagne (Rhodophyta) in Flanders (Belgium).

Advances in Phycological Studies: Festschrift in Honour of Prof. Dobrina Temniskova-

Topalova (eds. Ognjanova-Rumenova, N., & Manoylov, K.), pp. 203–212. PENSOFT

Publishers and University Publishing House Sofia-Moscow.

Strayer, D. L. 2010. Alien species in fresh waters: ecological effects, interactions with other

stressors, and prospects for the future. Freshwater Biol. 55: 152–174.

Strechker, A. L., Campbell, P. M., & Olden, J. D. 2011. The aquarium trade as an invasion

pathway in the Pacific Northwest. Fish. 36: 74–85.

Tamura, K., Stecher, G., Peterson, D., Filipski. A., & Kumar, S. 2013. MEGA6: Molecular

evolutionary genetic analysis version 6.0. Mol. Biol. Evol. 30: 2725–2729.

Taylor, J. S., Van de Peer, Y., & Meyer, A. 2001. Genome duplication, divergent resolution and

speciation. Trends Genet. 17: 299–301.

Taylor, J. S., I. Braasch, Frickey, T., Meyer, A., & Van de Peer, Y. 2003. Genome duplication, a

trait shared by 22,000 species of ray-finned fish. Genome Res. 13: 382–390.

Thompson, J. D., & Lumaret, R. 1992. The evolutionary dynamics of polyploid plants: origins,

establishment and persistence. Trends Ecol. Evol. 7: 302–307.

Thomsen, P. F., & Willerslev, E. 2015. Environmental DNA – An emerging tool in conservation

for monitoring past and present biodiversity. Biol. Conserv. 183: 4–18. 141

Thorne, J. L., Kishino, H., & Painter, I. S. 1998. Estimating the rate of evolution. Mol. Biol.

Evol. 15: 1647–1657.

Tsigenopoulos, C. S., Ráb, P., Naran, D., & Berrebi, P. 2002. Multiple origins of polyploidy in the

phylogeny of southern African barbs (Cyprinidae) as inferred from mtDNA markers.

Heredity. 88: 466–473.

Tsigenopoulos, C. S., Durand, J. D., Unlü, E., & Berrebi, P. 2003. Rapid radiation of the

Mediterranean Luciobarbus species (Cyprinidae) after the Messinian salinity crisis of the

Mediterranean Sea, inferred from mitochondrial phylogenetic analysis. Biol. J. Linn.

Soc. 80: 207–222.

Tsigenopoulos, C. S., Kasapidis, P., & Berrebi, P. 2010. Phylogenetic relationships of hexaploid

large-sized barbs (genus Labeobarbus, Cyprinidae) based on mtDNA data. Mol. Phylogenet.

Evol. 56: 851–856.

Uyeno, T., & Smith, G. R. 1972. Tetraploid origin of the karyotype of catostomid fishes. Science.

175: 644–646.

Van de Peer, Y., Fawcett, J.A., Proost, S., Sterck, L., & Vandepoele, K. 2009. The flowering world:

a tale of duplications. Trends Plant Sci. 14: 680–688.

Van de Peer, Y., Mizrachi, E., & Marchal, K. 2017. The evolutionary significance of polyploidy.

Nat. Rev. Genet. 18: 411–424.

Vasil’ev, V. P., Vasil’eva, E. D., Shedko, S. V., & Novomodny, G. V. 2009. Ploidy levels in the

Kaluga, Huso dauricus and Sakhalin sturgeon Acipenser mikadoi (Acipenseridae, Pisces).

Doklady Akademii Nauk. 426: 275–278.

142

Verbruggen, H., Maggs, C. A., Saunders, G. W., Le Gall, L., Yoon, H. S., & De Clerck, O. 2010.

Data mining approach identifies research priorities and data requirements for resolving the

red algal tree of life. BMC Evol. Biol. 10: 16.

Vis, M. L., Saunders, G. W., Sheath, R. G., Dunse, K., & Entwisle, T. J. 1998. Phylogeny of the

Batrachospermales (Rhodophyta) inferred from rbcL and 18S ribosomal DNA gene

sequences. J. Phycol. 34: 341–350.

Vis, M. L., Feng, J., Chiasson, W. B., Xie, S. L., Stancheva, R., Entwisle, T. J., Chou, J. Y., &

Wang, W. L. 2010. Investigation of the molecular and morphological variability in

Batrachospermum arcuatum (Batrachospermales, Rhodophyta) from geographically

distant locations. Phycologia. 49: 545–553.

Vranken, S., Bosch, S., Peña, V., Leliaert, F., Mineur, F., & De Clerck, O. 2018. A risk

assessment of aquarium trade introductions of seaweed in European waters. Biol.

Invasions. 20: 1171–1187.

Wang, W. L., Liu, S. L., & Lin, S. M. 2005. Systematics of the calcified genera of the

Galaxauraceae (Nemaliales, Rhodophyta) with an emphasis on Taiwan species. J. Phycol.

41: 685–703.

Webb., C. O. 2000. Exploring the phylogenetic structure of ecological communities: an example

for rain forest trees. Am. Nat. 156: 145–155.

Webb, C. O., Ackerly, D. D., McPeek, M. A., & Donoghue, M. J. 2002. Phylogenies and

community ecology. Annu. Rev. Ecol. Syst. 33: 475–505.

Weber, M. G., Wagner, C. E., Best, R. J., Harmon, L. J., & Matthews, B. 2017. Evolution in a

community context: on integrating ecological interactions and macroevolution. Trends

Ecol. Evol. 32: 291–304. 143

Weitemier, K., Straub, S. C., Cronn, R. C., Fishbein, M., Schmickl, R., McDonnell, A., & Liston,

A. 2014. Hyb-Seq: Combining target enrichment and genome skimming for plant

phylogenomics. Appl. Plant Sci. 2: 1400042.

Wells, R. D. S., Hall, J. A., Clayton, J. S., Champion, P. D., Payne, G. W., & Hofstra, D. E.

1999. The rise and fall of water net (Hydrodictyon reticulatum) in New Zealand. J. Aquat.

Plant Manage. 37: 49–55.

Wells, R. D. S., & Clayton, J. S. 2001. Ecological impacts of water net (Hydrodictyon

reticulatum) in Lake Aniwhenua. New Zealand. New Zealand J. Ecol. 25: 55–63.

Wendel, J. F. 2000. Genome evolution in polyploids. Plant Mol. Biol. 42: 225–249.

Werth, C. R., & Windham, M. D. 1991. A model for divergent, allopatric speciation of polyploid

pteridophytes resulting from silencing of duplicate-gene expression. Am. Nat. 137: 515–

526.

Wilcox, T. M., Zarn, K. E., Piggott, M. P., Young, M. K., McKelvey, K. S., & Schwartz, M. K.

2018. Capture enrichment of aquatic environmental DNA: a first proof of concept. Mol.

Ecol. Resour. 18: 1392–1401.

Wilson, J. R. U., Dormontt, E. E., Prentis, P. J., Lowe, A. J., & Richardson, D. M. 2009.

Something in the way you move: dispersal pathways affect invasion success. Trends Ecol.

Evol. 24: 136–144.

Wood, T. E., Takebayashi, N., Barker, M. S., Mayrose, I., Greenspoon, P. B., & Rieseberg, L. H.

2009. The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci.

U.S.A. 106: 13875–13879.

Wright, S. 1969. Evolution and Genetics of Populations. II. The Theory of Gene Frequencies.

University of Chicago Press, Chicago, IL, USA. 144

Wu, J. T. 1999. Occurrence of four freshwater rhodophytes in Taiwan. Taiwania. 44: 145–153.

Wu, J. T. 2001. Supplements to the freshwater rhodophytes in Taiwan. Taiwania. 46: 359–362.

Xiao, H., Chen, S. Y., Liu, Z. M., Zhang, R. D., Li, W. X., Zan, R. G., & Zhang, Y. P. 2005.

Molecular phylogeny of Sinocyclocheilus (Cypriniformes: Cyprinidae) inferred from

mitochondrial DNA sequences. Mol. Phylogenet. Evol. 36: 67–77.

Yang, E. C., & Boo, S. M. 2006. A red alga-specific phycoerythrin gene for biodiversity surveys

of callithamnioid red algae. Mol. Ecol. Notes. 6: 533–535.

Yang, E. C., Kim, M. S., Geraldino, P. J. L., Sahoo, D., Shin, J. A., & Boo, S. M. 2008.

Mitochondrial cox1 and plastid rbcL genes of Gracilaria vermiculophylla (Gracilariaceae,

Rhodophyta). J. Appl. Phycol. 20: 161–168.

Yang, E. C., Boo, S. M., Bhattacharya, D., Saunders, G. W., Knoll, A. H., Fredericq, S., Graf, L.,

& Yoon, H. S. 2016. Divergence time estimates and the evolution of major lineages in the

Florideophyte red algae. Sci. Rep. 6: 21361.

Yang, L., Mayden, R. L., Sado, T., He, S., Saitoh, K., & Miya, M. 2010. Molecular phylogeny of

the fishes traditionally referred to Cyprinini sensu stricto (Teleostei: Cypriniformes). Zool.

Scripta. 39: 527–550.

Yang, Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24:

1586–1591.

Yasuike, M., Jantzen, S., Cooper, G. A., Leder, E., Davidson, W. S., & Koop, B. F. 2010. Grayling

(Thymallinae) phylogeny within salmonids: complete mitochondrial DNA sequences of

Thymallus arcticus and Thymallus thymallus. J. Fish. Biol. 76: 395–400.

Yoon, H. S., Muller, K. M., Sheath, R. G., Ott, F., & Bhattacharya, D. 2006. Defining the major

lineages of red algae (Rhodophyta). J. Phycol. 42: 482–492. 145

Yu, X., Zhou, T., Li, K., Li, Y. & Zhou, M. 1987. On the karyosystematics of cyprinid fishes and

a summary of fish chromosome studies in China. Genetica. 72: 225–236.

Zanne, A. E., Tank, D. C., Cornwell, W. K., Eastman, J. M., Smith, S. A., FitzJohn, R. G.,

McGlinn, D. J., O'Meara, B. C., et al. 2014. Three keys to the radiation of angiosperms into

freezing environments. Nature. 506: 89–92.

Zenil-Ferguson, R., Burleigh, J. G., Freyman, W. A., Igić, B., Mayrose, I., & Goldberg, E. E.

2019. Interaction among ploidy, breeding system and lineage diversification. New Phytol.

224: 1252–1265.

Zhan, S. H., Glick, L., Tsigenopoulos, C. S., Otto, S. P., & Mayrose, I. 2014. Comparative

analysis reveals that polyploidy does not decelerate diversification in fish. J. Evol. Biol. 27:

391–403.

Zhan, S. H., Shih, C. C., & Liu, S. L. 2020. Reappraising plastid markers of the red algae for

phylogenetic community ecology in the genomic era. Ecol. Evol. 10: 1299–1310.

Zhang, J., Kapli, P., Pavlidis, P., & Stamatakis, A. 2013. A general species delimitation method

with applications to phylogenetic placements. Bioinformatics. 29: 2869–2876.

Zhang, S. M., Yan, Y., Deng, H., Wang, D. Q., Wei, Q. W., & Wu, Q. J. 1999. Genome size,

ploidy characters of several species of sturgeons and paddlefishes with comment on cellular

evolution of Acipenseriformes. Acta Zool. Sin. 45: 200–206.

146

Appendices

Appendix A Comparative analysis suggests that polyploidy does not decelerate diversification in fish

A.1 Data Availability

These supporting materials are available online alongside the published version of Zhan et al.

(2014) (https://onlinelibrary.wiley.com/doi/full/10.1111/jeb.12308).

A.2 Supporting Methods

To estimate the frequency of heteroploid speciation in the fish groups examined, we applied a recently published extension of the BiSSE model, termed BiSSE-node enhanced state shift (BiSSE-ness) (Magnuson-Ford & Otto, 2012). In the original BiSSE model (Maddison et al.,

2007), transitions between two states of interest (here, diploid and polyploid) are assumed to be homogenous with respect to time and to occur at equal rates at any point along the branches of a phylogeny. In particular, transitions between the two states are decoupled from speciation events, and state change cannot co-occur with speciation. This assumption was alleviated in the BiSSE- ness model, allowing us to account for diploid-to-polyploid transitions both at the internal nodes and along the branches of a phylogeny. In addition to the five parameters of the original BiSSE model (λD, λP, µD, µP, and qDP; see Materials and Methods of Chapter 2), we reformulated the

BiSSE-ness model to co-estimate a sixth parameter: the probability that a diploid-to-polyploid transition co-occurs with speciation (termed h). The sixth parameter is interpreted as the frequency of heteroploid speciation and was estimated following the same MCMC procedure (except that 50, instead of 100, trees were used) as described in Materials and Methods (Chapter 2). This analysis 147

was conducted using the BiSSE-ness implementation in the R package diversitree 0.7.2 (FitzJohn,

2012). The BiSSE-ness model parameter estimates for each fish group are presented in Table S4.

148

A.3 Supporting Figures

(A)

(B)

149

(C) The phylogenetic tree of Cyprininae is too large to be included herein. It can be downloaded from the online Supporting Information document cited in Data Availability above.

(D)

150

Figure S1: Fifty percent majority-rule consensus trees summarizing the 500 trees obtained using

MrBayes. Only posterior probability support less than 95% is shown. Branches where polyploidization events were inferred to have occurred are indicated as dashed lines (the question mark in A denotes uncertainty as shown in Peng et al., 2007). The fish groups investigated in the current study are: (A) the sturgeons (Acipenseridae: Acipenseriformes); (B) the botiid loaches

(Botiidae: Cypriniformes); (C) the Cyprininae fishes (Cyprinidae: Cypriniformes); and (D) the salmonids (Salmonidae: Salmoniformes), along with their sister order Esociformes. The consensus trees were built using phyutility 2.1.1 (Smith & Dunn, 2008; http://code.google.com/p/phyutility/) and were visualized using FigTree 1.3.1

(http://tree.bio.ed.ac.uk/software/figtree/).

151

A.4 Supporting Tables

Table S1: Ploidy level estimates for the Acipenseridae dataset. D = diploid; P = polyploid; NA = not available.

Species Ploidy level taken from Peng et al. (2007) Acipenser baerii P Acipenser brevirostrum P Acipenser fulvescens P Acipenser gueldenstaedtii P Acipenser medirostris P Acipenser mikadoi P Acipenser naccarii P Acipenser nudiventris D Acipenser oxyrinchus D Acipenser persicus P Acipenser ruthenus D Acipenser schrenckii P Acipenser sinensis P Acipenser stellatus D Acipenser sturio D Acipenser transmontanus P Acipenser dabryanus P Huso huso D Huso dauricus D* Scaphirhynchus albus D Scaphirhynchus platorynchus D Scaphirhynchus suttkusi NA Pseudoscaphirhynchus hermanni NA Pseudoscaphirhynchus kaufmanni NA * Polyploid according to Vasil’ev et al. (2009) instead.

152

Table S2: GenBank accession numbers of cytochrome b (cytb) sequences used for the phylogenetic reconstruction of Cyprininae, chromosome numbers used for the inference of polyploidy, and ChromEvol estimates of ploidy level. The FishBase citation refers to Froese &

Pauly (2012); parenthesized numbers indicate chromosome numbers taken from Tsigenopoulos et al. (2002), which were used in ChromEvol analysis instead of the FishBase counts. Asterisk denotes chromosome numbers that were inferred by Tsigenopoulos et al. (2002) based on karyotypes of closely related species and were excluded from analysis. Hyphen denotes unknown chromosome number. D = diploid; P = polyploid; NA = unreliably inferred according to ChromEvol analysis and thus treated as missing values in subsequent BiSSE diversification analysis (encoded as ‘NA’ in the R package diversitree). Additionally, if a taxon has a different ploidy level inference in the phylogeny built in the current study and in the time-calibrated mega-phylogeny published by Rabosky et al. (2013), then the ploidy level inferred in the time- calibrated tree is indicated in brackets.

Species GenBank Gametic Inferred Chromosome accession chromosome ploidy number data number number (n) status source hemispinus GQ406312.1 25 D Yu et al. (1987) Acrossocheilus monticola HM536795.1 - NA [D] Acrossocheilus paradoxus HQ443699.1 - NA [D] Aulopyge huegelii AF287415.1 50 P FishBase Balantiocheilos melanopterus HM536796.1 - D Barbodes carnaticus HM010725.1 - P Barbonymus gonionotus AB238966.1 25 D FishBase Barbonymus schwanenfeldii AF180823.1 25 D FishBase Barbus ablabes AF180835.1 25 D Tsigenopoulos et al. (2002) Barbus albanicus AY004723.1 - P Barbus andrewi AF180843.1 50 P FishBase Barbus anoplus AF287417.1 24 D FishBase Barbus balcanicus GQ302798.1 50 P Kotlik et al. (2002) Barbus barbus AF112123.1 50 P FishBase Barbus bigornei AY004752.1 - D Barbus borysthenicus AY331026.1 - P [NA] 153

Barbus bynni bynni AF287420.1 75 P FishBase Barbus cadenati AF180834.1 25* NA [D] Tsigenopoulos et al. (2002) Barbus calidus AF287422.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Barbus callensis AF045974.1 - P Barbus caninus AF287424.1 - P Barbus ciscaucasicus AF095604.1 - P Barbus cyclolepis AF237579.1 50 P Tsigenopoulos et al. (2002) Barbus erubescens AF180845.1 50 P Tsigenopoulos et al. (2002) Barbus ethiopicus AF180828.1 75 P FishBase Barbus euboicus AF090785.1 - P Barbus fasciolatus HM536811.1 24 D FishBase Barbus fritschii AF287429.1 - P Barbus gruveli AF287431.1 - P Barbus grypus AF145945.1 - NA [P] Barbus guineensis AF180833.1 25* D Tsigenopoulos et al. (2002) Barbus gurneyi AF287432.1 25 D FishBase Barbus haasi AF045976.1 - P Barbus harterti AF180855.1 - P Barbus kerstenii AF180840.1 25 D FishBase Barbus lacerta AF145935.1 - P Barbus longiceps AF145942.1 - P Barbus macedonicus AF112129.1 - P Barbus macrops AF180832.1 25 D Tsigenopoulos et al. (2002) Barbus mattozi AF180838.1 25* NA [D] Tsigenopoulos et al. (2002) Barbus meridionalis AF112130.1 50(50) P FishBase; Tsigenopoulos et al. (2002) Barbus motebensis AF287435.1 25 D FishBase Barbus nasus AY004744.1 - P Barbus nyanzae AF180841.1 25* NA [D] Tsigenopoulos et al. (2002) Barbus oxyrhynchus AF180874.1 - P Barbus paludinosus AF287436.1 25 D FishBase Barbus peloponnesius AF287438.1 - P Barbus pergamonensis AF112434.1 - P Barbus petenyi AF287439.1 - P Barbus petitjeani AF287443.1 - P Barbus plebejus AY004750.1 50 P FishBase Barbus pleurogramma AY740721.1 - NA [D] 154

Barbus prespensis GQ302766.1 - P Barbus rebeli GQ302803.1 - P Barbus reinii AF287444.1 - NA [P] Barbus sacratus AF287445.1 - P Barbus serra AF287446.1 50 P Tsigenopoulos et al. (2002) Barbus sperchiensis AF287428.1 - P Barbus strumicae AF112134.1 - P Barbus sublineatus AF180837.1 25* NA [D] Tsigenopoulos et al. (2002) Barbus subquincunciatus AF145937.1 - P Barbus tanapelagius AY740729.1 - NA [D] Barbus tauricus AF095605.1 - P Barbus thessalus AF090781.1 - P Barbus trevelyani AF180847.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Barbus trimaculatus AF180839.1 24(24) D FishBase; Tsigenopoulos et al. (2002) Barbus tyberinus AF397300.1 - P Barbus wurtzi AF287448.1 - P Capoeta aculeate JF798267.1 - P Capoeta angorae HQ167620.1 - P Capoeta antalyensis JF798269.1 - P Capoeta baliki JF798271.1 - P Capoeta banarescui GQ423991.1 - P Capoeta barroisi JF798279.1 - P Capoeta bergamae JF798280.1 - P Capoeta buhsei JF798283.1 - P Capoeta caelestis JF798286.1 - P Capoeta capoeta capoeta AF145951.1 - P Capoeta damascina JF798303.1 74 P Gorshkova et al. (2002) Capoeta ekmekciae GQ424027.1 - P Capoeta kosswigi JF798320.1 - P Capoeta mauricii JF798324.1 - P Capoeta sieboldii JF798329.1 - P GQ424009.1 - P Capoeta trutta AF145949.1 75 P Kılıç-Demirok and Ünlü (2001) Capoeta turani JF798335.1 - P Carasobarbus canis AF288486.1 - P Carasobarbus chantrei AF180852.1 - P Carasobarbus luteus AF145944.1 - P Carassioides acuminatus HM536806.1 50 P FishBase Carassius auratus langsdorfii AB006953.1 50 P FishBase 155

Carassius carassius AY714387.1 50 P FishBase Carassius cuvieri AB045144.1 50 P FishBase Carassius gibelio AB605597.1 50 P FishBase Catlocarpio siamensis HM536812.1 49 P FishBase Chuanchia labiosa HM536799.1 46 NA [P] Collares-Pereira (1994) Cirrhinus microlepis HM536825.1 - NA [D] Cirrhinus molitorella AY463098.1 25 D FishBase burmanicus GU086535.1 - NA [D] Crossocheilus nigriloba EF151090.1 - NA [D] Crossocheilus reticulatus HM536826.1 - D Cyclocheilichthys apogon DQ366154.1 25 D FishBase Cyclocheilichthys armatus HM536827.1 - D Cyclocheilichthys janthochir HM536808.1 - D Cyprinion kais AF180860.1 - D Cyprinion macrostomus AF180826.1 25 D Gaffaroğlu and Yüksel (2004) Cyprinus carpio carpio X61010.1 50 P FishBase Cyprinus multitaeniata HM536798.1 - P Eirmotus octozona HM536819.1 - NA [D] Epalzeorhynchos bicolor HM536818.1 25 D FishBase cryptonemus GU086532.1 - NA Garra dembecha FJ196826.1 - NA [D] Garra imberba GU086568.1 25 D FishBase Garra micropulvinus GU086567.1 - NA [D] Garra mirofrontis GU086565.1 - D [NA] Garra orientalis FJ196829.1 25 D Yu et al. (1987) Garra regressus FJ196828.1 - NA [D] Garra rufa AF180857.1 22 D FishBase Garra tana FJ196827.1 - NA [D] Garra tengchongensis GU086566.1 - NA [D] Garra variabilis AF180825.1 25* NA [D] Tsigenopoulos et al. (2002) eckloni JQ082351.1 - NA [P] Gymnocypris namensis DQ309353.1 - NA [P] Gymnocypris potanini AY463499.1 - NA [P] Gymnocypris przewalskii AB239595.1 - NA [P] bimaculata AY697378.1 - NA [D] Hampala macrolepidota HM536790.1 - NA [D] Hampala sabana AY697410.1 - NA lineatus HM536803.1 - NA [D] malcolmi HM536816.1 - D Hypsibarbus vernayi HM536794.1 - D Hypsibarbus wetmorei DQ366155.1 25 D FishBase Kosswigobarbus kosswigi AF180853.1 - NA [P] bata AP011198.1 25 D FishBase Labeo batesii AB238967.1 - NA [D] 156

Labeo boggut HQ645087.1 - NA [D] Labeo dussumieri DQ520919.1 - D Labeo fimbriatus GQ853089.1 - NA [D] Labeo forskalii FJ196832.1 - D Labeo gonius HQ645093.1 27 D FishBase Labeo rajasthanicus DQ520921.1 - D Labeo rohita AY463099.1 25 D FishBase Labeo sorex AY791415.1 - NA [D] Labeo stolizkae GU086536.1 - NA [D] Labeo yunnanensis GQ406324.1 - D Labeobarbus acutirostris GQ853202.1 - P Labeobarbus aeneus AF180876.1 74 P FishBase Labeobarbus brevicephalus GQ853204.1 - P Labeobarbus capensis AF180831.1 75 P FishBase Labeobarbus crassibarbis GQ853208.1 - P Labeobarbus dainellii GQ853210.1 - P Labeobarbus gorgorensis GQ853212.1 - P Labeobarbus gorguari GQ853214.1 - P Labeobarbus habereri AF180869.1 - P Labeobarbus intermedius AF287433.1 75 P FishBase Labeobarbus johnstonii AF180867.1 - P Labeobarbus longissimus GQ853216.1 - P Labeobarbus macrophtalmus GQ853218.1 - P Labeobarbus marequensis AF287434.1 68(75) P FishBase; Tsigenopoulos et al. (2002) Labeobarbus megastoma GQ853222.1 - P Labeobarbus nedgia GQ853224.1 - P Labeobarbus platydorsus GQ853228.1 - P Labeobarbus polylepis AF180877.1 74 P FishBase Labeobarbus surkis GQ853229.1 - P Labeobarbus truttiformis GQ853232.1 - P Labeobarbus tsanensis GQ853234.1 - P lineatus HM536789.1 - NA Luciobarbus bocagei AY004727.1 50 P FishBase Luciobarbus brachycephalus AY004729.1 50 P FishBase Luciobarbus capito AF045975.1 - P Luciobarbus comizo AY004735.1 50 P FishBase Luciobarbus esocinus AF145934.1 - P Luciobarbus graecus AF145941.1 - P Luciobarbus graellsii AF045973.1 - P Luciobarbus guiraonis AF334090.1 - P Luciobarbus microcephalus AF045971.1 50 P FishBase Luciobarbus mursa AF145943.1 - P Luciobarbus pectoralis AF145933.1 - P Luciobarbus sclateri AF045970.1 50 P FishBase Luciobarbus xanthopterus AF145939.1 - P 157

Mystacoleucus marginatus HM536814.1 - NA [D] heterostomus AY463516.1 - P Neolissochilus hexagonolepis DQ366150.1 - P Neolissochilus soroides EF588165.1 - NA [P] Neolissochilus stracheyi HM536823.1 - P Onychostoma alticorpus HM142577.1 - NA [D] Onychostoma gerlachi (syn. GQ406314.1 25 D Yu et al. (1987) Varicorhinus gerlachi) Onychostoma lepturum HM142578.1 - NA [D] Onychostoma simum (syn. Varicorhinus HM536801.1 25 D Yu et al. (1987) simus) Oreichthys cosuatis HM536822.1 - NA salsburyi FJ196830.1 - D Osteochilus spilurus DQ366162.1 - NA [D] Oxygymnocypris stewartii JQ082354.1 - NA [P] Platypharodon extremus JQ082363.1 45 NA [P] FishBase Poropuntius opisthoptera HM536793.1 - D Probarbus jullieni HM536810.1 49 P FishBase Procypris rabaudi EU082030.1 50 P FishBase Pseudobarbus afer AF287449.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Pseudobarbus asper AF287451.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Pseudobarbus burchelli AF180848.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Pseudobarbus burgi AF180849.1 48(50) P FishBase; Tsigenopoulos et al. (2002) Pseudobarbus phlegethon AF287452.1 48 P FishBase Pseudobarbus quathlambae AY791833.1 48 P FishBase Pseudobarbus tenuis AF287453.1 48 P FishBase bulu DQ366163.1 - D Puntioplites falcifer HM536805.1 - D Puntioplites proctozystron HM536813.1 25 D FishBase Puntioplites waandersi HM536829.1 - D arulius EU241450.1 25 D FishBase Puntius bandula EU604673.1 - D Puntius bimaculatus EU241451.1 - NA [D] Puntius bramoides DQ366164.1 - D Puntius brevis HM536815.1 - NA [D] Puntius chalakkudiensis GQ247557.1 - NA [D] Puntius chola AY708243.1 25 D FishBase Puntius conchonius AY004751.1 25 D FishBase Puntius cumingii EU604676.1 25 D FishBase 158

Puntius denisonii EU241453.1 - NA [D] Puntius dorsalis AY708261.1 - D Puntius everetti EU241454.1 25 D FishBase Puntius fasciatus AY708262.1 25 D FishBase Puntius filamentosus EU241455.1 25 D FishBase Puntius gelius EU241456.1 - NA [D] Puntius jerdoni HM010722.1 - P Puntius lineatus EU241457.1 - NA [D] Puntius mahecola AY708238.1 - D Puntius martenstyni AY708266.1 - D Puntius melanomaculatus EU604678.1 - D [NA] Puntius nigrofasciatus HM536821.1 25 D FishBase Puntius oligolepis HM536820.1 25 D FishBase Puntius phutunio EU241459.1 - D Puntius pleurotaenia AY708267.1 - NA [D] Puntius sarana HM010726.1 25 D FishBase Puntius sealei DQ366165.1 - NA [D] Puntius semifasciolatus AY856113.1 26 D FishBase Puntius singhala AY708256.1 - D [NA] Puntius snyderi AY856103.1 - D Puntius sophore EU241461.1 24 D FishBase Puntius srilankensis AY708271.1 - D Puntius tetrazona EU287909.1 25 D FishBase Puntius ticto AB238969.1 25 D FishBase Puntius titteya AF287455.1 25 D FishBase Puntius vittatus AY708278.1 - NA Scaphognathops bandanensis HM536828.1 - D Scaphognathops stejnegeri HM536807.1 - D Schizothorax argentatus AY954269.1 - P Schizothorax biddulphi FJ931464.1 - P Schizothorax chongi DQ126118.1 - P Schizothorax curvifrons AF532086.1 - P Schizothorax davidi AY954257.1 49 P FishBase Schizothorax dolichonema DQ126116.1 - P Schizothorax dulongensis AY954284.1 - P Schizothorax esocinus JN600515.1 - P Schizothorax eurystomus AY954275.1 - P Schizothorax gongshanensis DQ126124.1 - P Schizothorax griseus AY954253.1 - P Schizothorax kozlovi AY954256.1 - P Schizothorax labiatus AY954281.1 - P Schizothorax lantsangensis DQ126126.1 - P Schizothorax lissolabiatus DQ126127.1 74 P FishBase Schizothorax macropogon (syn. Racoma AY463517.1 45 P Collares-Pereira macropogon) (1994) Schizothorax malacanthus AY954277.1 - P Schizothorax meridionalis AY954285.1 - P 159

Schizothorax molesworthi DQ126129.1 - P Schizothorax myzostomus GQ406331.1 - P Schizothorax nukiangensis DQ126125.1 - P Schizothorax oconnori AY463519.1 46 P FishBase Schizothorax paoshanensis AY954286.1 - P Schizothorax plagiostomus AF532074.1 - P Schizothorax prenanti AY954259.1 74 P FishBase Schizothorax progastus JN600507.1 49 P FishBase Schizothorax pseudoaksaiensis AF180827.1 49* P Tsigenopoulos et al. (2002) Schizothorax richardsonii AF532078.1 - P Schizothorax rotundimaxillaris AY954283.1 - P Schizothorax waltoni (syn. Racoma AY463518.1 46 P Collares-Pereira waltoni) (1994) Schizothorax wangchiachii AY954254.1 - P Schizothorax yunnanensis yunnanensis AY954252.1 74 P FishBase Schizothorax zarudnyi JN790241.1 - P Sikukia stejnegeri HM536800.1 - D Sinocyclocheilus altishoulderus FJ984568.1 - NA [D] Sinocyclocheilus anatirostris AY854708.1 - NA [D] Sinocyclocheilus angustiporus AY854702.1 - NA [P] Sinocyclocheilus anophthalmus AY854715.1 - P Sinocyclocheilus bicornutus AY854730.1 - NA [D] Sinocyclocheilus cyphotergous AY854711.1 - NA [D] Sinocyclocheilus furcodorsalis AY854709.1 - NA [D] Sinocyclocheilus grahami GQ148557.1 48 P FishBase Sinocyclocheilus guishanensis AY854722.1 - NA [P] Sinocyclocheilus huaningensis AY854718.1 - NA [P] Sinocyclocheilus hyalinus AY854721.1 - NA [D] Sinocyclocheilus jii AY854727.1 - NA [D] Sinocyclocheilus jiuxuensis AY854736.1 - NA [D] Sinocyclocheilus lateristriatus AY854703.1 - NA [P] Sinocyclocheilus lingyunensis AY854691.1 - NA [D] Sinocyclocheilus longibarbatus AY854714.1 - NA [D] Sinocyclocheilus macrocephalus AY854683.1 - NA [P] Sinocyclocheilus macrolepis AY854729.1 - NA Sinocyclocheilus macrophthalmus HM536792.1 - NA [D] Sinocyclocheilus maculatus EU366193.1 48 P Yu et al. (1987) Sinocyclocheilus maitianheensis AY854710.1 - P Sinocyclocheilus malacopterus AY854697.1 - NA [P] Sinocyclocheilus microphthalmus AY854690.1 - NA [D] Sinocyclocheilus multipunctatus AY854712.1 - NA [D] Sinocyclocheilus oxycephalus AY854685.1 - NA [P] Sinocyclocheilus purpureus EU366189.1 - NA [P] Sinocyclocheilus qiubeiensis EU366188.1 - P Sinocyclocheilus qujingensis AY854719.1 - NA Sinocyclocheilus rhinocerous AY854720.1 - NA [D] 160

Sinocyclocheilus tingi AY854701.1 - P Sinocyclocheilus xunlensis HM536791.1 - NA [D] Sinocyclocheilus yangzongensis AY854725.1 - NA [P] Sinocyclocheilus yimenensis EU366191.1 - P Spinibarbus denticulatus GU086544.1 50 P FishBase Spinibarbus hollandi AY195628.1 50 P FishBase Spinibarbus sinensis (syn. Barbus HM536797.1 50 P Yu et al. (1987) sinensis) Tor douronensis FJ211162.1 50 P FishBase Tor khudree DQ520923.1 50 P FishBase Tor macrolepis EF588203.1 - P Tor malabaricus HM585022.1 - P Tor putitora HM636801.1 50 P FishBase Tor qiaojiensis FJ211163.1 - P Tor sinensis HM536802.1 50 P FishBase Tor tambra DQ366170.1 50 P FishBase Tor tambroides HM536824.1 - P Varicorhinus beso AF180862.1 75 P FishBase Varicorhinus mariae AF180863.1 - P Varicorhinus maroccanus AF287457.1 - P Varicorhinus nelspruitensis AF180866.1 - NA [P] Varicorhinus steindachneri AF180865.1 - P

Outgroup Gobio gobio Y10452.1 Tinca tinca Y10451.1

Species exist only in the Rabosky et al. (2013) phylogeny Aaptosyax grypus - P Acrossocheilus beijiangensis - NA Acrossocheilus yunnanensis 25 D FishBase Akrokolioplax bicornis - D Anchicyclocheilus halfibindus - D Aspiorhynchus laticeps - P ariza 24 D FishBase Bangana behri - D Bangana lemassoni - D Bangana lippus - D Bangana pierrei - D Bangana tonkinensis - D Barboides britzi - D Barbonymus collingwoodii - D Barbus bynni occidentalis 74 P Guegan et al. (1995) Barbus callipterus - D Barbus humilis - D

161

Barbus leonensis - D Capoeta pestai - P Carasobarbus apoensis - D Carassius auratus auratus 50 P FishBase Catla catla 25 D FishBase Cirrhinus cirrhosus 25 D FishBase Crossocheilus latius 24 D FishBase Cyclocheilichthys repasson - D maculatus 50 P FishBase Discocheilus wui - D Discogobio bismargaritus - D Discogobio brachyphysallidos - D Discogobio laticeps - D Discogobio macrophysallidos - D Discogobio tetrabarbatus - D Discogobio yunnanensis - D Discolabeo wuluoheensis - D Epalzeorhynchos frenatus - D Epalzeorhynchos kalopterus - D Folifer brevifilis 25 D FishBase Garra annandalei - D Garra barreimiae barreimiae - D Garra cambodgiensis 26 D FishBase Garra ceylonensis - D Garra congoensis - D Garra fasciacauda - D Garra flavatra - D Garra fuliginosa - D Garra gotyla gotyla 25 D FishBase Garra_gravelyi - D Garra kempi - D Garra makiensis 25 D FishBase Garra spilota - D Gymnocypris chilianensis - P Gymnocypris waddellii - P dybowskii 49 P FishBase Gymnodiptychus integrigymnatus - P Gymnodiptychus pachycheilus - P Henicorhynchus lobatus - D Henicorhynchus siamensis - D Herzensteinia microcephalus - P Hongshuia microstomatus - D Hongshuia paoli - D Labeo boga - D Labeo calbasu 25 D FishBase Labeo chrysophekadion 25 D FishBase Labeo coubie - D 162

Labeo dyocheilus - D Labeo lineatus - D Labeo senegalensis - D Labeo weeksii - D Linichthys laticeps - D Lobocheilos bo - D Lobocheilos melanotaenia - D Luciobarbus mystaceus - P Luciobarbus steindachneri 50 P FishBase Luciosoma spilopleura - D erythrospila - D Mystacoleucus chilopterus - D Mystacoleucus lepturus - D Onychostoma barbatulum - D Onychostoma ovale - D Osteochilus lini - D Osteochilus melanopleurus - D Osteochilus microcephalus - D Osteochilus vittatus 25 D FishBase Osteochilus waandersii 25 D ishBase Paracrossochilus vittatus - D Parasinilabeo assimilis - D Percocypris pingi 49 P FishBase Percocypris tchangi - P Phreatichthys andruzzii - D Pseudocrossocheilus bamaensis - D Pseudocrossocheilus liuchengensis - D Pseudocrossocheilus longibullus - D Pseudocrossocheilus nigrovittatus - D Pseudocrossocheilus papillolabrus - D Pseudocrossocheilus tridentis - D Pseudogyrinocheilus prochilus - D Psilorhynchus homaloptera - D Psilorhynchus sucatio 25 D FishBase Ptychidio jordani - D chungtienensis - P Ptychobarbus conirostris - P Ptychobarbus dipogon 223 P FishBase Ptychobarbus kaznakovi - P Puntius binotatus 25 D FishBase Puntius johorensis 26 D FishBase Puntius parvus - D Puntius reval - D Qianlabeo striatus - D Rectoris posehensis - D Rhodeus sinensis - P Sawbwa resplendens - D 163

Schizopyge curvifrons - P anteroventris - P Schizopygopsis kessleri - P Schizopygopsis kialingensis - P Schizopygopsis malacanthus - P Schizopygopsis pylzovi - P Schizopygopsis thermalis - P Schizopygopsis younghusbandi - P Schizothorax intermedius - P Schizothorax macrophthalmus - P Schizothorax pseudoaksaiensis - P pseudoaksaiensis Schizothorax yunnanensis paoshanensis - P notabilis - D Semilabeo obscurus - D Sinocrossocheilus labiatus - D Sinocrossocheilus megalophthalmus - D Sinocyclocheilus macroscalus - D Sinocyclocheilus tianeensis - D Sinocyclocheilus yishanensis - D Tor progeneius - P Tor tor 50 P FishBase

164

Table S3: GenBank accession numbers of cytb sequences used for the phylogenetic reconstruction of Salmoniformes and Esociformes.

Species GenBank accession number Salmoniformes Brachymystax lenok AF125213.1 Coregonus albula DQ173427.1 Coregonus artedi DQ451310.1 Coregonus baerii DQ185407.1 Coregonus baicalensis AJ251589.1 Coregonus baunti EU003522.1 Coregonus chadary FJ589214.1 Coregonus clupeaformis DQ451313.1 Coregonus lavaretus AB034824.1 Coregonus migratorius AJ251588.1 Coregonus muksun DQ185411.1 Coregonus nasus DQ185404.1 Coregonus oxyrinchus DQ185405.1 Coregonus peled DQ185402.1 Coregonus pidschian DQ185406.1 Coregonus pollan AJ251591.1 Coregonus pravdinellus DQ185413.1 Coregonus sardinella EU003526.1 Coregonus tugun EU003529.1 Coregonus ussuriensis EU003525.1 Hucho bleekeri FJ597623.1 Hucho hucho AF172397.1 Hucho perryi D58396.1 Hucho taimen HQ897271.1 Oncorhynchus clarkii AY886762.1 Oncorhynchus gorbuscha EF455489.1 Oncorhynchus keta EF105341.1 Oncorhynchus kisutch EF126369.1 Oncorhynchus masou EF105342.1 Oncorhynchus mykiss DQ288268.1 Oncorhynchus nerka EF055889.1 Oncorhynchus tshawytscha AF392054.1 Prosopium coulterii AY382560.1 Prosopium cylindraceum DQ185399.1 Prosopium williamsoni AY008701.1 Salmo carpio X77519.1 Salmo fibreni X77520.1 Salmo obtusirostris AF488534.1 Salmo ohridanus AF053590.1 Salmo platycephalus AY260506.1 Salmo salar EU492280.1 165

Salmo trutta trutta EU492108.1 Salmo marmoratus X77522.1 Salvelinus albus DQ298789.1 Salvelinus alpinus AF154851.1 Salvelinus boganidae AY286024.1 Salvelinus drjagini AY286025.1 Salvelinus elgyticus AY286039.1 Salvelinus fontinalis AF154850.1 Salvelinus kronocius DQ298797.1 Salvelinus leucomaenis D58397.1 Salvelinus levanidovi AY286046.1 Salvelinus malma DQ298795.1 Salvelinus namaycush DQ451375.1 Salvelinus neiva AY286033.1 Salvelinus schmidti DQ298788.1 Salvelinus taranetzi AY286035.1 Stenodus leucichthys DQ185400.1 Thymallus arcticus FJ872559.1 Thymallus thymallus FJ853655.1

Esociformes Dallia pectoralis AP004102.1 Novumbra hubbsi AY497457.1 Umbra krameri AF060437.1 Umbra limi AY497458.1 Umbra pygmae AF060435.1 Esox americanus americanus AY497432.1 Esox lucius FJ425091.1 Esox niger AY497437.1 Esox reichertii AY497442.1

Outgroup Barbus bocagei AF045969.1 Gobiobotia abbreviata AF051861.1

166

Table S4: MCMC-based estimates of evolutionary rates under the BiSSE-ness model. Using BiSSE-ness, rate estimates were computed as

the median of the posterior distributions recovered from the MCMC sampling procedure as implemented in the R package diversitree. l D

and l P represent speciation rates of diploid (D) and polyploid (P) lineages, respectively; µ D and µ P represent speciation rates of diploid and polyploid lineages, respectively; h is the probability that transition from diploidy to polyploidy occurs at internal nodes (i.e., frequency of heteroploid speciation). PP is the percentage of MCMC samples where the rates of diploid lineages are greater than the rates of polyploid

lineages. For example, PP(lD > lP ) £ 0.025 is interpreted as evidence that speciation rates of diploids are higher than those of polyploids.

Group qDP l D l P µ D µ P h PP(λD > λP) PP(µD > µP) PP(rD > rP) Acipenseridae 67.1 98.2 19.2 57.6 6.0 0.21 0.22 0.21 0.55 Botiidae 25.9 18.9 9.0 3.8 1.6 0.07 0.78 0.71 0.55 Cyprininae 20.3 156.7 1.2 129.0 1.1 0.05 0.00 0.00 0.05 Salmoniformes/Esociformes 15.1 163.0 6.4 150.4 2.2 0.16 0.00 0.00 0.28

167

Table S5: ML parameter estimates of the best supported BiSSE model. The rates were estimated by taking the median of the maximum likelihood estimates from 100 fitted trees. Similar estimates were obtained when the mean was used instead.

Group Best supported model l D l P µ D µ P qDP Acipenseridae M0 69.48 26.47 12.08 = l D = µ D Botiidae M0 17.40 2.49e-09 1.90 = l D = µ D Cyprininae Mse 20.30 131.61 3.40e-08 103.35 1.79 Salmoniformes/Esociformes Mse 13.69 312.66 7.20 304.06 4.55

168

Table S6: Best-fitting BiSSE model according to AIC while allowing for polyploid-to-diploid reversals (i.e., without the constraint qPD = 0). The voting system results are presented alongside the distribution of the difference in AIC (ΔAIC) relative to the best-fitting model. The best-fitting model is the same, whether qPD is constrained to 0 or not, except for a non-significant shift from M0 to Ms in

Acipenseridae.

Salmoniformes/ Group Acipenseridae Botiidae Cyprininae Esociformes % of trees supporting Mse 0 0 100 100 % of trees supporting Me 0 0 0 0 % of trees supporting Ms 90 13 0 0 % of trees supporting M0 10 87 0 0 Best-fitting model Ms M0 Mse Mse 5/50/95%tile of ΔAIC of Mse 1.82/1.95/2.00 2.06/3.03/3.72 0/0/0 0/0/0 5/50/95%tile of ΔAIC of Me 0.16/0.78/1.24 1.96/2.00/2.00 62.95/72.78/87.05 13.80/15.84/19.46 5/50/95%tile of ΔAIC of Ms 0/0/0 -0.45/0.79/1.75 29.58/37.51/57.23 12.99/15.19/18.47 5/50/95%tile of ΔAIC of M0 -0.74/1.73/2.57 0/0/0 89.87/98.48/108.36 15.82/17.94/20.82

169

Table S7: ML parameter estimates of the best supported BiSSE model while allowing for polyploid-to-diploid reversals. The rates were estimated by taking the median of the maximum likelihood estimates from 100 fitted trees. Similar estimates were obtained when the mean was used instead.

Group Best supported Model l D l P µ D µ P qDP qPD Acipenseridae Ms 39.16 93.12 29.49 3.93 x 10-10 31.47 = µ D Botiidae M0 17.40 1.17 x 10-6 2.28 x 10-5 0.52 = l D = µ D Cyprininae Mse 19.35 136.54 3.05 x 10-8 113.72 0.53 0.52 Salmoniformes/Esociformes Mse 13.69 312.65 7.20 303.95 4.55 1.17 x 10-11

170

Table S8: MCMC-based estimates of evolutionary rates using the BiSSE model while allowing for polyploid-to-diploid reversals. The rate estimates were derived using the same procedure as described in Chapter 2.

Group qDP qPD PP(λD > λP) PP(µD > µP) PP(rD > rP) l D l P µ D µ P Acipenseridae 57.75 104.49 47.18 36.13 11.57 25.56 0.15 0.57 0.13 Botiidae 27.90 18.92 14.28 3.10 2.26 0.52 0.83 0.84 0.40 Cyprininae 20.91 124.44 1.73 99.21 0.77 0.54 0.00 0.00 0.08 Salmoniformes/Esociformes 15.30 154.79 10.65 140.73 2.48 0.092 0.00 0.00 0.10

171

Table S9: MCMC BiSSE analysis (with and without the constraint qPD = 0) conducted on time-calibrated trees extracted from the mega-phylogeny published by Rabosky et al. (2013).

With constraint qPD = 0 Without constraint qPD = 0 Group PP(λD > λP) PP(µD > µP) PP(rD > rP) PP(λD > λP) PP(µD > µP) PP(rD > rP) Acipenseridae 0.256 0.221 0.596 0.116 0.617 0.095 Botiidae 0.788 0.607 0.652 0.739 0.837 0.269 Cyprininae 0.000 0.015 0.033 0.000 0.014 0.029 Salmoniformes/Esociformes 0.000 0.000 0.010 0.000 0.003 0.002

172

Table S10: MCMC-based estimates of evolutionary rates using the BiSSE model for Cyprininae (ploidy levels estimated using a threshold of 0.90 in ChromEvol power analysis). The results using either the threshold of 0.90 or 0.48 (optimal according to the

ChromEvol power analysis) agree, with and without the constraint qPD = 0.

With constraint qPD = 0? l D l P µ D µ P qDP qPD PP(λD > λP) PP(µD > µP) PP(rD > rP) No 20.885 124.580 1.747 99.295 0.768 0.539 0.000 0.000 0.085 Yes 21.795 121.483 1.963 91.668 1.549 constrained 0.000 0.000 0.028

173

Appendix B Phylogenetic evidence for cladogenetic polyploidization in land plants

B.1 Data Availability

These supporting materials are available online alongside the published version of Zhan et al.

(2016) (https://doi.org/10.3732/ajb.1600108) and on Dryad

(https://datadryad.org/stash/dataset/doi:10.5061/dryad.gr73).

174

B.2 Supporting Tables

Table S1: Information about the PloiDB genus data sets used in this study. For each genus, the following pieces of information are provided: (1) the species count (estimated as the number of accepted species in The Plant List v1.1 database), (2) the number of observed taxa, (3) the percentage of taxa with uncertain ploidy ('NA'), and (4) the percentage of polyploid taxa

(calculated disregarding the 'NA' taxa). In the ClaSSE analysis, the sampling fraction (i.e., number of taxa divided by the TPL 1.1 count) is capped at 1.

Genus TPL1.1 count Taxa % uncertain ploidy % polyploids Acer 164 93 0.06 0.06 Achillea 151 59 0.10 0.17 Aeonium 89 34 0.00 0.09 Aeschynanthus 194 48 0.33 0.09 Agrostis 228 41 0.12 0.83 Albizia 137 36 0.19 0.03 Allium 918 392 0.04 0.16 Alnus 44 30 0.20 0.54 Alyssum 207 104 0.32 0.31 Androsace 170 75 0.25 0.55 Angelica 116 61 0.05 0.09 Anthemis 178 117 0.35 0.08 Anthurium 926 101 0.29 0.06 Arabis 110 86 0.27 0.38 Arachis 81 48 0.00 0.06 Arctostaphylos 75 59 0.12 0.13 Arisaema 180 110 0.14 0.15 Aristolochia 485 123 0.33 0.29 Artemisia 481 242 0.13 0.33 Asarum 121 88 0.14 0.96 Aster 234 72 0.47 0.24 Atriplex 258 93 0.16 0.22 Axonopus 89 34 0.47 0.56 Bambusa 137 51 0.10 0.09 Berberis 580 148 0.03 0.02 Betula 119 40 0.30 0.46 175

Bidens 249 49 0.16 0.68 Boechera 110 77 0.10 0.06 Brachiaria 125 45 0.29 0.78 Bromus 169 69 0.22 0.59 Bupleurum 208 77 0.13 0.16 Caesalpinia 162 101 0.09 0.01 Calceolaria 277 101 0.34 0.66 Calligonum 158 32 0.44 0.44 Calochortus 74 64 0.30 0.07 Camellia 250 99 0.24 0.23 Campanula 440 305 0.17 0.91 Caragana 90 54 0.07 0.08 Caralluma 78 47 0.15 0.08 Cardamine 233 115 0.37 0.65 Carthamus 48 30 0.07 0.29 Castilleja 204 63 0.25 0.28 Centaurea 734 292 0.08 0.24 Ceropegia 217 59 0.31 0.07 Chaerophyllum 46 59 0.07 0.58 Chaetanthera 49 42 0.48 0.05 Chamaecrista 330 45 0.47 0.08 Chenopodium 150 36 0.06 0.24 Chlorophytum 196 50 0.40 0.47 Clematis 373 80 0.13 0.04 Coffea 125 99 0.03 0.01 Columnea 195 84 0.12 0.01 Combretum 289 49 0.39 0.10 Convolvulus 72 134 0.43 0.04 Corchorus 77 42 0.26 0.13 Coreopsis 100 60 0.12 0.09 Cornus 51 42 0.00 0.02 Corydalis 586 30 0.27 0.14 Cotoneaster 278 61 0.38 0.82 Coursetia 40 33 0.06 0.03 Crambe 39 30 0.37 0.37 Crepis 252 88 0.06 0.05 Crinum 106 51 0.04 0.02 Crotalaria 699 249 0.45 0.09 Cuphea 280 76 0.22 0.64 176

Cyanus 44 39 0.23 0.13 Cynanchum 283 119 0.34 0.05 Cytisus 73 34 0.15 0.97 Dahlia 42 31 0.00 0.10 Delphinium 457 158 0.06 0.06 Dendranthema 36 30 0.17 0.64 Descurainia 46 32 0.34 0.24 Dianthus 338 113 0.34 0.31 Dioscorea 613 132 0.36 0.82 Drosera 187 77 0.32 0.90 Dudleya 46 30 0.07 0.18 Echinocereus 64 56 0.43 0.19 Echium 67 46 0.00 0.09 Elymus 234 116 0.07 0.94 Epilobium 222 59 0.08 0.96 Ericameria 39 33 0.06 0.06 Erigeron 476 93 0.22 0.22 Eriogonum 255 101 0.19 0.04 Erodium 128 64 0.14 0.24 Eryngium 257 131 0.34 0.29 Eupatorium 126 35 0.06 0.30 Fraxinus 63 47 0.19 0.03 Fuchsia 113 36 0.11 0.31 Gagea 209 91 0.29 0.97 Gaultheria 141 88 0.22 0.52 Genista 125 82 0.15 0.84 Geranium 415 112 0.28 0.41 Gilia 40 30 0.50 0.27 Globba 98 55 0.24 0.88 Gossypium 54 39 0.00 0.05 Grindelia 65 39 0.03 0.11 Gymnocalycium 77 56 0.27 0.15 Haworthia 166 57 0.14 0.08 Hedysarum 201 34 0.47 0.06 Helianthus 71 48 0.02 0.19 Helichrysum 506 95 0.25 0.56 Helictotrichon 90 46 0.30 0.41 Heracleum 52 80 0.20 0.08 Hibiscus 241 70 0.30 0.53 177

Hieracium 2241 110 0.17 0.34 Hordeum 43 34 0.03 0.39 458 260 0.35 0.12 Hypochaeris 85 40 0.20 0.03 Impatiens 487 230 0.42 0.05 Indigofera 674 242 0.26 0.07 Inula 110 44 0.32 0.13 Ipomoea 449 97 0.35 0.08 Isodon 107 64 0.03 0.03 Jacobaea 43 34 0.18 0.11 Jasminum 198 30 0.23 0.09 Kalanchoe 133 37 0.22 0.07 Klasea 29 30 0.43 0.24 Lachenalia 115 35 0.06 0.27 Lactuca 147 40 0.05 0.16 Lathyrus 159 111 0.01 0.02 Lepidium 234 120 0.34 0.78 Leptinella 33 31 0.19 0.36 Lespedeza 58 40 0.03 0.05 Leucanthemum 42 30 0.00 0.57 Leymus 55 33 0.27 0.25 Lilium 111 91 0.00 0.02 Linanthus 45 44 0.20 0.06 Linaria 98 123 0.02 0.02 Linum 141 50 0.06 0.57 Liparis 428 42 0.43 0.04 Lippia 191 32 0.50 0.13 Lobelia 414 73 0.30 0.55 Lonicera 103 89 0.22 0.14 Lotononis 199 66 0.33 0.30 Lotus 141 96 0.13 0.17 Lupinus 626 128 0.19 0.79 Lycium 88 63 0.05 0.12 Magnolia 242 128 0.16 0.14 Malus 62 51 0.10 0.15 Mammillaria 185 152 0.18 0.05 Meconopsis 45 51 0.37 0.78 Medicago 103 76 0.18 0.21 Melaleuca 265 65 0.09 0.02 178

Melampodium 45 41 0.15 0.40 Micromeria 77 31 0.19 0.96 Mimulus 155 89 0.28 0.30 Muhlenbergia 162 132 0.37 0.37 Narcissus 116 39 0.00 0.33 Nicotiana 55 54 0.04 0.31 Nymphaea 44 45 0.44 0.84 Oenothera 150 76 0.33 0.08 Onobrychis 148 91 0.38 0.21 Ophiopogon 67 36 0.03 0.11 Orbea 30 34 0.38 0.24 Orobanche 119 69 0.00 0.71 Oxytropis 586 68 0.37 0.28 Packera 66 44 0.20 0.26 Panicum 442 165 0.44 0.63 Papaver 55 45 0.18 0.54 Parnassia 72 34 0.24 0.31 Paspalum 329 136 0.35 0.61 Passiflora 513 272 0.40 0.69 Pectis 93 60 0.27 0.16 Pedicularis 484 284 0.05 0.01 Pennisetum 83 37 0.27 0.70 Penstemon 301 174 0.48 0.12 Peucedanum 114 53 0.28 0.05 Phacelia 186 104 0.15 0.06 Phlox 85 30 0.20 0.17 Physalis 124 45 0.16 0.08 Pilosella 217 44 0.30 0.58 Pimpinella 106 67 0.07 0.03 Pinguicula 73 75 0.48 0.59 Platanthera 152 44 0.20 0.09 Polystachya 245 84 0.14 0.01 Potamogeton 160 53 0.17 0.48 Pultenaea 104 75 0.07 0.03 Reseda 31 47 0.43 0.81 Rhododendron 641 409 0.18 0.05 Rosa 366 134 0.38 0.41 Rubus 1494 250 0.39 0.64 Rumex 152 50 0.12 0.98 179

Rytidosperma 70 56 0.21 0.34 Sansevieria 69 32 0.47 0.94 Saussurea 433 154 0.30 0.06 Scabiosa 62 34 0.03 0.03 Scorzonera 199 41 0.12 0.08 Scutellaria 468 30 0.47 0.06 Senna 272 99 0.08 0.02 Sesbania 56 44 0.41 0.15 Sida 156 34 0.47 0.28 Silene 487 267 0.13 0.20 Sinosenecio 45 31 0.06 0.97 Sisyrinchium 201 99 0.49 0.74 Solanum 1199 516 0.16 0.09 Solidago 117 40 0.05 0.16 Sonchus 131 58 0.12 0.10 Spiraea 138 41 0.20 0.27 Sporobolus 184 35 0.40 0.95 Stachys 375 136 0.47 0.93 Stipa 329 171 0.37 0.89 Streptocarpus 134 91 0.05 0.01 Stylosanthes 46 36 0.17 0.17 Swertia 105 64 0.16 0.02 Symphyotrichum 108 57 0.18 0.74 Tabernaemontana 118 69 0.39 0.02 Tamarix 57 36 0.03 0.03 Tanacetum 167 65 0.38 0.10 Taraxacum 2332 118 0.34 0.77 Tragopogon 141 100 0.22 0.12 Trifolium 244 213 0.06 0.08 Trigonella 95 32 0.25 0.08 Trillium 50 43 0.05 0.10 Turnera 126 38 0.29 0.52 Urtica 53 45 0.40 0.37 Vanda 60 35 0.49 0.17 Vernonia 671 36 0.33 0.92 Veronica 198 261 0.10 0.71 Viburnum 166 124 0.13 0.43 Vicia 232 139 0.09 0.04 Vigna 118 73 0.00 0.01 180

Viola 478 189 0.37 0.93 Zieria 1 32 0.31 0.32

181

Table S2: MCMC and ML results for the PloiDB data sets. In the MCMC analysis, the 95% highest posterior density (HPD) interval of the proportion of cladogenetic ploidy shifts (φ) was calculated (the lower and upper bounds of the HPD intervals are shown), along with the median of the posterior distribution of φ. In the ML analysis, twice the difference in the log likelihood

values (2ΔlnLik) between the “dual” model (MD) and a reduced model (i.e., 2*[lnLik of MD –

lnLik of MA or MC]) was calculated for each tree, and the median over all trees was taken. MA (or

2 MC) was rejected in favor of MD if the median 2ΔlnLik was greater than chi p=0.05 = 3.841.

Lower Upper Median 2ΔlnLik Median 2ΔlnLik Genus Median bound bound of against MA against MC of HPD HPD Acer 0.3622 0.0000 0.9038 0.0000 2.6968 Achillea 0.6671 0.1222 1.0000 0.5023 0.0000 Aeonium 0.5178 0.0517 0.9950 0.0000 0.4019 Aeschynanthus 0.5035 0.0254 0.9704 0.6917 0.0000 Agrostis 0.8543 0.4196 1.0000 4.1888 0.0002 Albizia 0.5102 0.0503 0.9976 0.0000 0.3515 Allium 0.8831 0.5760 1.0000 27.1975 0.0000 Alnus 0.5293 0.0544 0.9970 0.8383 0.0000 Alyssum 0.4045 0.0001 0.9179 0.0243 1.4204 Androsace 0.7093 0.1548 1.0000 2.7554 0.0000 Angelica 0.4636 0.0026 0.9413 0.0000 1.2329 Anthemis 0.6448 0.1303 1.0000 3.2480 0.0000 Anthurium 0.4031 0.0003 0.9232 0.0000 0.4396 Arabis 0.7265 0.2876 1.0000 8.8721 0.0000 Arachis 0.4442 0.0002 0.9356 0.8574 0.0000 Arctostaphylos 0.7298 0.2071 1.0000 2.5395 0.0237 Arisaema 0.4206 0.0014 0.9246 2.6015 0.0018 Aristolochia 0.5242 0.0514 0.9952 0.3900 0.0000 Artemisia 0.9284 0.7177 1.0000 29.4206 0.0000 Asarum 0.5251 0.0604 0.9996 0.5514 0.0000 Aster 0.5221 0.0540 0.9957 0.0000 0.9452 Atriplex 0.3638 0.0002 0.9021 0.0000 3.4690 Axonopus 0.6544 0.0994 1.0000 1.0843 0.0000 Bambusa 0.5394 0.0608 0.9999 0.5314 0.0088 Berberis 0.5933 0.0834 1.0000 0.6912 0.0000 182

Betula 0.7451 0.1872 1.0000 2.6173 0.0000 Bidens 0.6646 0.1396 1.0000 2.4631 0.0000 Boechera 0.4236 0.0001 0.9201 0.0000 0.1318 Brachiaria 0.7271 0.1577 1.0000 2.9703 0.0000 Bromus 0.6746 0.1830 1.0000 3.7681 0.0768 Bupleurum 0.5778 0.0719 0.9998 1.1844 0.0000 Caesalpinia 0.5864 0.0775 0.9997 0.7914 0.0000 Calceolaria 0.4926 0.0120 0.9475 0.6186 0.2193 Calligonum 0.5615 0.0780 0.9999 0.3287 0.2292 Calochortus 0.4799 0.0001 0.9457 0.0000 0.3308 Camellia 0.6717 0.1433 1.0000 5.5734 0.0000 Campanula 0.4013 0.0001 0.8752 1.8114 4.4985 Caragana 0.5651 0.0774 1.0000 0.9468 0.0000 Caralluma 0.6084 0.0849 1.0000 0.1998 0.0000 Cardamine 0.7234 0.1736 1.0000 4.6809 0.0034 Carthamus 0.5784 0.0764 0.9992 1.5318 0.0000 Castilleja 0.5911 0.0872 0.9999 1.5992 0.0000 Centaurea 0.8077 0.3067 1.0000 12.9157 0.0000 Ceropegia 0.3674 0.0000 0.9059 0.0000 1.3906 Chaerophyllum 0.3605 0.0003 0.9070 0.0000 0.6849 Chaetanthera 0.3319 0.0000 0.8895 0.0000 1.0382 Chamaecrista 0.3898 0.0000 0.9244 0.0000 0.3374 Chenopodium 0.3129 0.0000 0.8914 0.0000 0.7516 Chlorophytum 0.7117 0.1498 1.0000 1.6117 0.0000 Clematis 0.4648 0.0006 0.9439 0.0000 0.0921 Coffea 0.5498 0.0584 0.9971 0.0000 0.0822 Columnea 0.4936 0.0167 0.9598 0.0000 0.2130 Combretum 0.4338 0.0000 0.9339 0.0704 0.0008 Convolvulus 0.3196 0.0000 0.8850 0.3188 0.2691 Corchorus 0.3803 0.0000 0.9172 0.1291 0.1436 Coreopsis 0.3569 0.0000 0.9077 0.0000 1.7919 Cornus 0.4208 0.0004 0.9230 0.0000 0.6785 Corydalis 0.4212 0.0000 0.9287 0.0000 0.0743 Cotoneaster 0.7469 0.1802 1.0000 3.3309 0.0000 Coursetia 0.4470 0.0000 0.9316 0.0000 0.3197 Crambe 0.4593 0.0006 0.9315 0.0000 1.8831 Crepis 0.5174 0.0518 0.9975 0.1623 0.0173 Crinum 0.4556 0.0005 0.9410 0.0000 0.4530 Crotalaria 0.3932 0.0001 0.9179 0.0000 1.8895

183

Cuphea 0.6273 0.0833 0.9999 2.2173 0.0000 Cyanus 0.5177 0.0605 0.9959 0.1449 0.4780 Cynanchum 0.5205 0.0524 0.9963 0.0070 0.0175 Cytisus 0.5827 0.0906 0.9998 0.0150 1.6482 Dahlia 0.5513 0.0755 0.9998 1.1679 0.0000 Delphinium 0.5877 0.0814 0.9998 1.7629 0.0000 Dendranthema 0.6688 0.0947 0.9997 4.4624 0.0000 Descurainia 0.3645 0.0000 0.9014 0.0000 1.5088 Dianthus 0.6795 0.1719 0.9999 1.5660 0.4230 Dioscorea 0.5268 0.0587 0.9980 0.1231 0.0036 Drosera 0.5770 0.0638 0.9997 0.0044 0.7893 Dudleya 0.5664 0.0893 0.9999 3.4475 0.0000 Echinocereus 0.6290 0.1047 1.0000 3.2308 0.0000 Echium 0.4958 0.0166 0.9629 1.4599 0.0000 Elymus 0.8033 0.1739 1.0000 5.7397 0.0000 Epilobium 0.5158 0.0276 0.9731 0.0000 0.0216 Ericameria 0.5255 0.0561 0.9996 0.4601 0.0000 Erigeron 0.6355 0.1070 0.9996 3.9505 0.0000 Eriogonum 0.5099 0.0559 0.9989 0.3593 0.0000 Erodium 0.4317 0.0001 0.9301 0.9928 0.0000 Eryngium 0.7584 0.2462 1.0000 5.1605 0.0000 Eupatorium 0.4087 0.0001 0.9187 0.0000 1.6636 Fraxinus 0.6169 0.0730 0.9997 0.6535 0.0000 Fuchsia 0.4379 0.0001 0.9368 0.0000 1.6692 Gagea 0.4882 0.0004 0.9462 0.0000 2.2921 Gaultheria 0.3104 0.0000 0.8824 0.0000 1.7991 Genista 0.6979 0.1547 1.0000 5.0749 0.0000 Geranium 0.6654 0.1167 1.0000 1.8846 0.0000 Gilia 0.5311 0.0669 0.9997 0.4334 0.0253 Globba 0.3599 0.0000 0.9064 0.0000 2.3054 Gossypium 0.5998 0.0785 1.0000 3.4340 0.0216 Grindelia 0.6409 0.1050 1.0000 2.4714 0.0000 Gymnocalycium 0.4719 0.0003 0.9442 0.1863 0.0377 Haworthia 0.4175 0.0000 0.9333 0.4444 0.0000 Hedysarum 0.4157 0.0000 0.9304 0.0000 0.0590 Helianthus 0.7442 0.1852 1.0000 6.3266 0.0000 Helichrysum 0.5899 0.0908 0.9999 2.0114 0.2415 Helictotrichon 0.5113 0.0437 0.9801 0.8641 0.0648 Heracleum 0.3394 0.0001 0.8888 0.0000 1.0285

184

Hibiscus 0.6878 0.1426 1.0000 1.7290 0.0000 Hieracium 0.3651 0.0000 0.9189 0.1231 0.0000 Hordeum 0.3104 0.0001 0.8643 0.0000 2.2312 Hypericum 0.6756 0.1352 1.0000 4.2511 0.0000 Hypochaeris 0.4516 0.0000 0.9338 0.0000 0.5521 Impatiens 0.6285 0.1056 0.9999 2.2077 0.0000 Indigofera 0.5312 0.0609 1.0000 0.9453 0.0012 Inula 0.5186 0.0529 0.9999 0.0000 0.1505 Ipomoea 0.5864 0.0791 0.9999 0.1667 0.0900 Isodon 0.3748 0.0000 0.9087 0.0000 1.2102 Jacobaea 0.5311 0.0600 0.9995 0.4464 0.0000 Jasminum 0.5282 0.0547 0.9998 0.0000 0.4719 Kalanchoe 0.5443 0.0566 0.9996 0.0000 0.3333 Klasea 0.5876 0.0782 1.0000 3.2215 0.0000 Lachenalia 0.4704 0.0029 0.9419 1.0652 0.0000 Lactuca 0.5660 0.0697 0.9997 0.8118 0.0000 Lathyrus 0.6187 0.0877 1.0000 1.1164 0.0000 Lepidium 0.7384 0.1853 1.0000 4.2828 0.0060 Leptinella 0.6948 0.1624 1.0000 3.9481 0.0000 Lespedeza 0.5170 0.0553 0.9998 0.6309 0.0000 Leucanthemum 0.8134 0.3041 1.0000 7.5755 0.0000 Leymus 0.3894 0.0001 0.9098 0.0000 2.4740 Lilium 0.4432 0.0000 0.9355 0.0000 1.1125 Linanthus 0.7363 0.1657 1.0000 4.0538 0.0000 Linaria 0.4775 0.0024 0.9160 1.1696 0.0000 Linum 0.5258 0.0622 1.0000 0.6115 0.0000 Liparis 0.4469 0.0000 0.9423 0.0000 0.2590 Lippia 0.5410 0.0599 0.9999 0.4198 0.0000 Lobelia 0.5083 0.0063 0.9496 0.0000 0.3427 Lonicera 0.4217 0.0000 0.9241 1.3100 0.0000 Lotononis 0.3016 0.0000 0.9034 0.1817 0.0403 Lotus 0.6814 0.2147 1.0000 1.6777 0.4630 Lupinus 0.4144 0.0001 0.9239 0.0000 0.6952 Lycium 0.5318 0.0706 0.9995 1.1156 0.0000 Magnolia 0.3661 0.0001 0.9048 0.0000 1.3526 Malus 0.3766 0.0000 0.9064 0.0000 1.4735 Mammillaria 0.3463 0.0001 0.8834 0.9887 0.3011 Meconopsis 0.7117 0.1762 1.0000 5.4548 0.0000 Medicago 0.4976 0.0234 0.9601 2.1403 0.2742

185

Melaleuca 0.5161 0.0534 0.9971 0.0000 0.2546 Melampodium 0.6891 0.1474 1.0000 4.7811 0.0000 Micromeria 0.4306 0.0005 0.9348 0.0000 0.9021 Mimulus 0.3588 0.0001 0.8833 0.0445 3.8815 Muhlenbergia 0.5896 0.1673 0.9999 3.6483 0.3271 Narcissus 0.5198 0.0650 0.9947 1.6384 0.0000 Nicotiana 0.5570 0.0749 0.9999 2.0049 0.0020 Nymphaea 0.6169 0.0927 1.0000 0.8452 0.0593 Oenothera 0.3497 0.0000 0.9025 0.0000 0.5865 Onobrychis 0.5329 0.0780 0.9954 2.1794 0.0086 Ophiopogon 0.5848 0.0764 0.9998 2.6456 0.0000 Orbea 0.3627 0.0000 0.8967 0.3490 0.0342 Orobanche 0.4298 0.0000 0.9362 0.0000 0.1062 Oxytropis 0.4490 0.0049 0.9400 0.0000 0.1313 Packera 0.7064 0.2027 1.0000 5.9914 0.0000 Panicum 0.3877 0.0000 0.8576 0.0000 4.5284 Papaver 0.6733 0.2077 0.9999 2.7090 0.6655 Parnassia 0.4901 0.0001 0.9439 0.3047 0.0000 Paspalum 0.8453 0.3964 1.0000 9.9624 0.0000 Passiflora 0.1729 0.0000 0.5938 0.0000 12.0916 Pectis 0.4908 0.0173 0.9613 0.0000 1.4728 Pedicularis 0.4880 0.0015 0.9444 0.0000 0.0000 Pennisetum 0.5862 0.0782 0.9997 1.7336 0.0000 Penstemon 0.6856 0.1739 1.0000 6.9344 0.0000 Peucedanum 0.4821 0.0023 0.9484 0.3851 0.0000 Phacelia 0.5683 0.0762 0.9998 1.2046 0.0000 Phlox 0.5121 0.0427 0.9873 0.0000 0.4865 Physalis 0.6001 0.1061 0.9999 2.4744 0.0000 Pilosella 0.6069 0.1341 0.9999 1.2115 0.0003 Pimpinella 0.4108 0.0000 0.9248 0.4105 0.0000 Pinguicula 0.7809 0.3830 1.0000 12.8294 0.0000 Platanthera 0.5508 0.0646 1.0000 0.0000 0.4746 Polystachya 0.5768 0.0671 1.0000 1.0099 0.0000 Potamogeton 0.4363 0.0003 0.9347 0.0000 0.4376 Pultenaea 0.5202 0.0565 0.9998 0.0045 0.0000 Reseda 0.4351 0.0000 0.9319 0.0000 0.3602 Rhododendron 0.5887 0.0844 0.9997 2.3616 0.0051 Rosa 0.7751 0.2740 1.0000 7.5190 0.0000 Rubus 0.8755 0.4878 1.0000 12.7890 0.0000

186

Rumex 0.6408 0.0803 1.0000 0.0000 0.6337 Rytidosperma 0.4709 0.0004 0.9283 0.8787 1.1912 Sansevieria 0.6301 0.0847 1.0000 0.9092 0.0000 Saussurea 0.4700 0.0059 0.9458 1.9696 0.0000 Scabiosa 0.3885 0.0001 0.9179 0.0000 0.2939 Scorzonera 0.4265 0.0000 0.9277 0.0000 0.3985 Scutellaria 0.4381 0.0014 0.9345 0.0000 0.2213 Senna 0.4332 0.0044 0.9358 0.0000 0.6117 Sesbania 0.4380 0.0004 0.9354 0.7596 0.0000 Sida 0.5816 0.0808 1.0000 0.5162 0.0016 Silene 0.7640 0.3058 1.0000 12.9307 0.0000 Sinosenecio 0.6461 0.1073 1.0000 3.0679 0.0000 Sisyrinchium 0.5900 0.0813 0.9998 1.1915 0.0000 Solanum 0.4941 0.0678 0.9681 3.9658 2.6511 Solidago 0.5377 0.0649 1.0000 0.0000 0.3802 Sonchus 0.4170 0.0000 0.9197 0.0000 0.7633 Spiraea 0.5007 0.0516 0.9932 0.0000 0.9837 Sporobolus 0.5940 0.0710 0.9998 0.0000 1.3498 Stachys 0.4937 0.0114 0.9581 0.0167 0.8413 Stipa 0.2474 0.0001 0.7835 0.0000 5.7253 Streptocarpus 0.4346 0.0001 0.9267 0.0000 0.5177 Stylosanthes 0.4368 0.0001 0.9302 0.0000 1.7016 Swertia 0.4172 0.0000 0.9285 0.0000 1.4386 Symphyotrichum 0.4843 0.0069 0.9461 0.0773 0.6009 Tabernaemontana 0.5085 0.0497 0.9997 0.1785 0.0000 Tamarix 0.4226 0.0003 0.9312 0.0000 0.5377 Tanacetum 0.4314 0.0000 0.9280 0.0000 0.0000 Taraxacum 0.8494 0.5365 1.0000 1.5586 0.0000 Tragopogon 0.2942 0.0000 0.8664 0.0000 2.8059 Trifolium 0.4602 0.0729 0.9735 2.1434 0.8862 Trigonella 0.5097 0.0531 0.9994 0.0000 0.2935 Trillium 0.3269 0.0001 0.8840 0.0000 1.2453 Turnera 0.7113 0.1631 1.0000 1.1782 0.3422 Urtica 0.5609 0.0812 0.9978 2.8125 0.0276 Vanda 0.4474 0.0002 0.9338 0.0000 1.1386 Vernonia 0.5519 0.0610 0.9997 0.0000 0.6408 Veronica 0.5038 0.1042 0.9812 6.8878 5.1988 Viburnum 0.6204 0.1266 1.0000 3.4229 0.5640 Vicia 0.4828 0.0004 0.9390 0.0000 0.4537

187

Vigna 0.6020 0.0748 1.0000 0.4568 0.0000 Viola 0.5227 0.0545 0.9998 0.4208 2.7879 Zieria 0.3810 0.0001 0.9015 0.0040 0.9406

188

Table S3: MCMC and ML results for the M2011 data sets.

Lower Upper Median 2ΔlnLik Median 2ΔlnLik Genus Median bound of bound of against MA against MC HPD HPD Achillea 0.8811 0.5425 1.0000 19.3640 0.0000 Actinidia 0.7865 0.3845 1.0000 30.5774 0.2424 Aeonium 0.4463 0.0001 0.9391 0.0000 1.5874 Antirrhinum 0.5429 0.0675 1.0000 0.5918 0.7991 Arisaema 0.7072 0.2319 1.0000 19.7102 0.0188 Aristolochia 0.4526 0.0000 0.9372 0.0000 0.3279 Asplenium_Ceterach 0.7185 0.1611 1.0000 6.9503 0.0000 Campanula 0.7869 0.3225 1.0000 8.4899 0.0000 Dodecatheon 0.6067 0.1017 1.0000 1.8496 0.1491 Dryopteris 0.8209 0.4773 1.0000 10.0720 0.0000 Dryopteris_China 0.7526 0.2916 1.0000 4.8793 0.0000 Erodium 0.5115 0.1246 0.9997 18.7804 0.0027 Fuchsia 0.4158 0.0001 0.9262 0.0000 2.6071 Gagea 0.5235 0.0668 0.9997 2.5394 0.0000 Hymenophyllum 0.5219 0.0590 1.0000 0.0072 0.7096 0.4087 0.0001 0.9189 2.3718 0.0000 Lathyrus 0.5034 0.0229 0.9707 0.0000 0.1556 Lemna 0.5390 0.0594 0.9996 0.0000 1.0713 Mimulus 0.6297 0.1158 1.0000 3.4213 0.0000 Myriopteris 0.7872 0.4393 1.0000 24.2515 0.0000 Orobanche 0.5033 0.0711 0.9869 1.5395 0.7484 Pelargonium 0.7649 0.3099 1.0000 17.8535 0.0000 Penstemon 0.6754 0.1801 1.0000 6.1310 0.0003 Phacelia 0.6417 0.0947 1.0000 2.5630 0.0000 Physalis 0.2824 0.0000 0.8403 0.0000 3.8516 Solanum 0.3658 0.0000 0.9109 0.0000 1.7132 Tarasa 0.5353 0.0682 0.9999 2.1100 0.0000 Vaccinium 0.4057 0.0000 0.9163 0.6380 0.1310 Viburnum 0.4528 0.0001 0.9383 0.0000 0.8716

189

Appendix C Hidden introductions of freshwater red algae via the aquarium trade exposed by DNA barcodes

C.1 Data Availability

At the time that this document was finalized, Tables S1 and S2 were being deposited on Dryad

Repository (DOI: doi:10.5061/dryad.3n5tb2rf8). Both of the tables are available on bioRxiv

(https://www.biorxiv.org/content/10.1101/2020.06.30.180042v1.supplementary-material).

C.2 Supporting Figures

Figure S1: Global sampling of the GenBank records of the freshwater red macroalgae examined in this study: all taxa (a), Batrachospermales (e.g., Batrachospermum, Kumanoa, Lemanea, 190

Sheathia, and Virescentia) (b), Thoreales (Nemalinopsis and Thorea) (c), Bangia atropurpurea

(d), Hildenbrandia spp. (e), and Compsopogon caeruleus (f). The contours indicate the latitudinal and longitudinal density of the GenBank records (represented as red dots). It should be noted the contours do not indicate the geographical distribution of the freshwater red algae.

Figure S2: Geographical distribution of freshwater red macroalgae in different countries (a), along the latitude (b), and along the longitude (c) based on GenBank records. The number of

191

molecular operational taxonomic units (mOTUs) was inferred using automated barcode gap discovery (ABGD).

192

Figure S3: Phylogeny and diversity of the freshwater red macroalgal mOTUs in the field and aquarium samples in this study. The mOTUs were inferred using three different species delimitation methods: ABGD, PTP, and GMYC. The full set of rbcL sequences were clustered using USEARCH (designated as “UCLUST”) before the determination of mOTUs

(designated as “Genus species mOTU”). The identification codes of the samples are colored by the geographical region where the samples were collected. The legend on the right indicates whether the samples in the UCLUST clusters were collected from the field and/or aquarium shops (black dot denotes “yes”). Abbreviations: ABGD, automated barcode gap discovery; Aqua., aquarium; GMYC, general mixed Yule-Coalescent model; MCCT, maximum clade credibility tree; mOTU, molecular operational taxonomic unit; PTP, Poisson tree process;

UCLUST100, OTU clustering based on 100% sequence similarity threshold using USEARCH.

193

Figure S4. Map of countries with which Taiwan traded freshwater ornamental fish between 2013 and 2017. The grey-scaled bar on the bottom indicates the mean annual weight of fishes imported to Taiwan (a) and fishes exported from Taiwan (b).

194

C.3 Supporting Tables

Table S1: GenBank accessions, molecular operational taxonomic units, and collection information for all the sequences analyzed in this study. Available via Dryad Repository and bioRxiv.

Table S2: Weight of fish traded (kg) and the countries from which Taiwan imported fish and to which Taiwan exported fish. Available via Dryad Repository and bioRxiv.

195

Appendix D Reappraising plastid markers of the red algae for phylogenetic community ecology in the genomic era

D.1 Data Availability

These supporting materials are available online alongside the published version of Zhan et al.

(2020) (https://onlinelibrary.wiley.com/doi/abs/10.1002/ece3.5984).

196

D.2 Supporting Figures

Figure S1: A maximum-likelihood phylogeny based on the NT alignment concatenated from the core plastid genes. Bootstrap proportion values based on 100 replicates of maximum likelihood analyses were shown beside the node. Scale bar indicates substitutions per site.

197

(a) (b)

Figure S2: Negative correlation between the median bootstrap normalized Robinson-Foulds

(nRF) distance and p-distance (a) and between the nRF distance and the amino acid alignment length (b). The median nRF distance in (a) was estimated from 100 pairwise comparisons of the bootstrap replicates of an individual gene tree and the target plastid genome tree; the nRF distance in (b) was calculated based on a single tree. The dashed lines delineate the 95% prediction interval. The popular plastid markers (psaA, psaB, psbA, and rbcL) are shown in blue, while the newly proposed three markers (gltB, rpoB, and rpoC1) are shown in orange.

198

(a) (b)

Figure S3: P-distance profile of rpoC1 (a) and rpoB (b). P-distance was measured over the NT alignment by sliding a window of 30 bases. For a given window, we calculated the median pairwise p-distance. At p-distance of ~0.2, the regions should be conserved enough for PCR primer design.

199

D.3 Supporting Tables

Table S1: Information about the included taxa and the NCBI GenBank accessions of their plastid genomes.

Class Subclass Order Family Species Accession

Bangiophyceae - Bangiales Bangiaceae Bangia atropurpurea NC_030221.1

Bangiophyceae - Bangiales Bangiaceae Bangia fuscopurpurea KP714733.1

Bangiophyceae - Bangiales Bangiaceae Porphyra pulchra NC_029861.1

Bangiophyceae - Bangiales Bangiaceae Porphyra purpurea NC_000925.1

Bangiophyceae - Bangiales Bangiaceae Porphyra umbilicalis NC_035573.1

Bangiophyceae - Bangiales Bangiaceae Pyropia haitanensis NC_021189.1

Bangiophyceae - Bangiales Bangiaceae Pyropia perforata NC_024050.1

Bangiophyceae - Bangiales Bangiaceae Pyropia yezoensis KC517072.1

Compsopogonophyceae - Compsopogonales Compsopogonaceae Compsopogon caeruleus KY083067.1

Compsopogonophyceae - Erythropeltales Erythrotrichiaceae Erythrotrichia carnea KX284721.2

Compsopogonophyceae - Rhodochaetales Rhodochaetaceae Rhodochaete parvula KX284728.2

Cyanidiophyceae - Cyanidiales Cyanidioschyzon merolae NC_004799.1

Cyanidiophyceae - Cyanidiales Cyanidiaceae Cyanidium caldarium NC_001840.1

Cyanidiophyceae - Cyanidiales Cyanidiaceae Cyanidium sp. KJ569775.1

Cyanidiophyceae - Cyanidiales Galdieriaceae NC_024665.1

Florideophyceae Ahnfeltiophycidae Ahnfeltiales Ahnfeltiaceae Ahnfeltia plicata KX284715.1

Florideophyceae Carollinophycidae Corallinales Corallinaceae Calliarthron tuberculosum NC_021075.1

Florideophyceae Carollinophycidae Sporolithales Sporolithaceae Sporolithon durum NC_029857.1

Florideophyceae Hildenbrandiophycidae Hildenbrandiales Hildenbrandiaceae Apophlaea sinclairii KX284716.1

Florideophyceae Hildenbrandiophycidae Hildenbrandiales Hildenbrandiaceae Hildenbrandia rivularis KX284723.1

Florideophyceae Hildenbrandiophycidae Hildenbrandiales Hildenbrandiaceae Hildenbrandia rubra KX284724.1

Florideophyceae Nemaliophycidae Batrachospermales Batrachospermaceae Kumanoa americana NC_031178.1

Florideophyceae Nemaliophycidae Batrachospermales Batrachospermaceae Sheathia arcuata KY033529.1

Florideophyceae Nemaliophycidae Nemaliales Galaxauraceae Galaxaura rugosa NC_031657.1

Florideophyceae Nemaliophycidae Nemaliales Liagoraceae Helminthocladia australis NC_031658.1

Florideophyceae Nemaliophycidae Nemaliales Liagoraceae Liagora brachyclada NC_031667.1

Florideophyceae Nemaliophycidae Nemaliales Liagoropsidaceae Liagoropsis maxima NC_031662.1

Florideophyceae Nemaliophycidae Nemaliales Nemaliaceae Nemalion sp. LT622871.1

200

Florideophyceae Nemaliophycidae Nemaliales Scinaiaceae Scinaia undulata NC_031664.1

Florideophyceae Nemaliophycidae Nemaliales Yamadaellaceae Yamadaella caenomyce NC_031666.1

Florideophyceae Nemaliophycidae Palmariales Palmariaceae Palmaria palmata KX284726.1

Florideophyceae Nemaliophycidae Thoreales Thoreaceae Thorea hispida KY083065.1

Florideophyceae Rhodymeniophycidae Acrosymphytales Schimmelmanniaceae Schimmelmannia schousboei KX284711.1

Florideophyceae Rhodymeniophycidae Bonnemaisoniales Bonnemaisoniaceae Asparagopsis taxiformis KX284717.1

Florideophyceae Rhodymeniophycidae Ceramiales Ceramiaceae Ceramium cimbricum KR025491.1

Florideophyceae Rhodymeniophycidae Ceramiales Ceramiaceae Ceramium japonicum KX284719.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Acrosorium ciliolatum MF101411.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Bostrychia moritziana MF101419.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Bostrychia simpliciuscula MF101421.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Bostrychia tenella MF101417.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Bryothamnion seaforthii MF101430.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Caloglossa beccarii MF101422.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Caloglossa intermedia MF101418.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Caloglossa monosticha MF101416.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Chondria sp. MF101429.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Chondria sp. MF101431.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Chondria sp. MF101451.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Cliftonaea pectinata MF101450.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Dasya naccarioides MF101436.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Dasyclonium flaccidum MF101443.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Dictyomenia sonderi MF101455.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Digenea simplex MF101465.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Dipterocladia arabiensis MF101408.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Dipterosiphonia australica MF101444.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Gredgaria maugeana MF101446.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Herposiphonia versicolor MF101434.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Kuetzingia canaliculata MF101449.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Laurencia snackeyi LN833431.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Laurencieae sp. MF101412.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Laurenciella marilzae MF101410.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Lophocladia kuetzingii MF101448.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Melanothamnus harveyi MF101437.1

201

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Ophidocladus simpliciusculus MF101440.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Osmundaria fimbriata MF101415.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Palisada sp. MF101453.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Periphykon beckeri MF101413.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Platysiphonia delicata MF101409.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia brodiei MF101425.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia elongata MF101427.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia infestans MF101432.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia schneideri MF101454.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia scopulorum MF101438.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia sertularioides MF101423.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia sertularioides MF101435.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia sp. MF101414.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia sp. MF101456.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Polysiphonia stricta MF101428.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Rhodomela confervoides MF101424.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Sonderella linearis MF101445.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Spyridia filamentosa MF101441.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Symphyocladia dendroidea MF101420.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Taenioma perpusillum MF101452.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Thaumatella adunca MF101447.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Thuretia quercifolia MF101442.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Tolypiocladia glomerulata MF101467.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Vertebrata australis MF101439.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Vertebrata isogona MF101433.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Vertebrata lanosa NC_026523.1

Florideophyceae Rhodymeniophycidae Ceramiales Rhodomelaceae Vertebrata thuyoides MF101426.1

Florideophyceae Rhodymeniophycidae Gelidiales Gelidiaceae Gelidium elegans NC_029858.1

Florideophyceae Rhodymeniophycidae Gelidiales Gelidiaceae Gelidium vagum NC_029859.1

Florideophyceae Rhodymeniophycidae Gigartinales Gigartinaceae Chondrus crispus NC_020795.1

Florideophyceae Rhodymeniophycidae Gracilariales Gracilariaceae Gracilaria chilensis NC_029860.1

Florideophyceae Rhodymeniophycidae Gracilariales Gracilariaceae Gracilaria chorda KX284722.1

Florideophyceae Rhodymeniophycidae Gracilariales Gracilariaceae Gracilaria salicornia NC_023785.1

Florideophyceae Rhodymeniophycidae Gracilariales Gracilariaceae Gracilaria tenuistipitata NC_006137.1

202

Florideophyceae Rhodymeniophycidae Halymeniales Halymeniaceae Grateloupia taiwanensis NC_021618.1

Florideophyceae Rhodymeniophycidae Nemastomatales Schizymeniaceae Schizymenia dubyi KX284712.1

Florideophyceae Rhodymeniophycidae Nemastomatales Schizymeniaceae Sebdenia flabellata KX284713.1

Florideophyceae Rhodymeniophycidae Peyssonneliales Peyssonneliaceae Riquetophycus sp. KX284710.1

Florideophyceae Rhodymeniophycidae Plocamiales Plocamiaceae Plocamium cartilagineum KX284727.1

Florideophyceae Rhodymeniophycidae Rhodymeniales Champiaceae Coeloseira compressa NC_030338.1

Florideophyceae Rhodymeniophycidae Rhodymeniales Rhodymeniaceae Rhodymenia pseudopalmata KX284709.1

Porphyridiophyceae - Porphyridiales Porphyridiaceae Porphyridium purpureum NC_023133.1

Porphyridiophyceae - Porphyridiales Porphyridiaceae Porphyridium sordidum KX284720.1

Rhodellophyceae - Rhodellales Rhodellaceae Corynoplastis japonica KY709210.1

Stylonematophyceae - Stylonematales Stylonemataceae Bangiopsis subsimplex KX284718.1

203

Table S2: Sequence-derived characteristics of the plastid genes.

Number Alignment Median Median Median nRF distance nRF distance Gene Pi of Taxa Length pairwise dS pairwise dN p-distance (AA trees) (NT trees) accA 100 1062 1.2535 0.2056 0.2993 0.3093 0.3814 0.673 accB 99 636 1.3549 0.1536 0.3932 0.4896 0.4583 0.725 accD 100 975 1.3599 0.1949 0.2985 0.3299 0.3299 0.623 acpP 106 354 1.3218 0.1996 0.3038 0.6990 0.6408 0.596 apcA 107 483 1.1829 0.0411 0.1781 0.8462 0.6827 0.516 apcB 107 486 1.0540 0.0442 0.1718 0.7500 0.6731 0.471 apcD 106 522 1.2787 0.2136 0.2836 0.4563 0.3398 0.676 apcE 106 2841 1.3076 0.2367 0.3258 0.2039 0.1553 0.75 apcF 105 546 1.3468 0.2655 0.3254 0.5490 0.4020 0.727 argB 100 1002 1.0301 0.4098 0.3826 0.4639 0.3299 0.716 atpA 106 1602 1.2104 0.0817 0.2081 0.4466 0.3883 0.522 atpB 106 1479 1.1962 0.0629 0.2023 0.5922 0.3786 0.497 atpD 106 588 1.3027 0.4162 0.4076 0.3204 0.3107 0.845 atpE 107 423 1.1545 0.2022 0.2786 0.5192 0.4904 0.697 atpF 106 708 1.4322 0.2500 0.3447 0.3981 0.3592 0.658 atpG 107 522 1.4162 0.2295 0.3312 0.5577 0.4808 0.766 atpH 106 258 1.0348 0.0164 0.1667 0.9126 0.8447 0.461 atpI 107 810 1.3805 0.1017 0.2392 0.5577 0.4519 0.568 carA 106 1335 1.2154 0.3511 0.3911 0.3204 0.3398 0.776 cbbX 98 996 1.3983 0.0873 0.2181 0.4947 0.4000 0.534 cemA 107 849 1.2907 0.2048 0.2746 0.4712 0.3654 0.67 chlI 107 1125 1.3472 0.1636 0.2651 0.3558 0.3558 0.625 clpC 106 2805 1.3742 0.0507 0.1933 0.4466 0.3786 0.463 cpcA 106 486 1.2595 0.0687 0.1975 0.7767 0.5534 0.562 cpcB 106 516 1.2403 0.0941 0.2151 0.6990 0.5146 0.568 cpcG 107 813 1.2786 0.1957 0.2698 0.4519 0.4327 0.619 cpeA 102 531 1.0971 0.1040 0.2134 0.6869 0.4949 0.525 cpeB 102 561 1.0038 0.0963 0.2072 0.7475 0.5657 0.54 dnaB 106 2235 1.4243 0.4645 0.4677 0.2233 0.2233 0.808 dnaK 106 2007 1.2266 0.1343 0.2464 0.3204 0.3107 0.593

204

fabH 99 1059 1.1591 0.2586 0.3131 0.3125 0.2188 0.709 ftrB 104 432 1.2916 0.1771 0.3012 0.6139 0.5842 0.583 ftsH 107 1935 1.3809 0.1207 0.2413 0.3462 0.3269 0.62 gltB 105 4800 1.2083 0.2352 0.3139 0.1471 0.1569 0.7 groel 106 1620 1.1928 0.1952 0.2784 0.2816 0.2621 0.757 ilvB 106 1845 1.2221 0.2079 0.2940 0.3204 0.2621 0.662 ilvH 107 555 1.3874 0.2026 0.3008 0.5769 0.4327 0.688 infB 106 2613 1.1365 0.4744 0.4437 0.2427 0.2233 0.857 infC 106 780 1.5179 0.1737 0.2981 0.4757 0.3495 0.559 odpA 102 1125 1.2238 0.1620 0.2729 0.5455 0.4848 0.61 odpB 99 1044 1.2658 0.1314 0.2381 0.5833 0.5104 0.572 petA 107 1035 1.4242 0.2396 0.3250 0.4231 0.3269 0.717 petB 103 657 1.3833 0.0370 0.1829 0.8200 0.6900 0.469 petF 106 351 1.0462 0.2899 0.3299 0.7767 0.6602 0.698 petJ 107 411 1.1170 0.3319 0.3731 0.4904 0.5096 0.747 preA 106 1071 1.3320 0.2877 0.3376 0.3495 0.3301 0.702 psaA 106 2349 1.5200 0.0617 0.1964 0.4563 0.2621 0.478 psaB 107 2211 1.5008 0.0747 0.2021 0.4135 0.3077 0.506 psaC 96 336 1.0289 0.0390 0.1605 0.9677 0.8387 0.336 psaD 107 441 1.4316 0.1477 0.2482 0.7115 0.6058 0.639 psaE 107 315 1.1190 0.2267 0.2842 0.8269 0.7115 0.495 psaF 104 591 1.2391 0.1584 0.3098 0.4950 0.4059 0.707 psaL 107 492 1.2774 0.1008 0.2418 0.6058 0.5192 0.628 psbA 106 1095 0.7416 0.0222 0.1296 0.7573 0.4369 0.416 psbB 104 1563 1.3297 0.0290 0.1735 0.6634 0.5446 0.447 psbC 107 1464 1.4290 0.0354 0.1851 0.6827 0.4808 0.443 psbD 107 1059 1.2723 0.0202 0.1624 0.7885 0.6250 0.405 psbE 106 354 0.9631 0.0166 0.1429 0.9709 0.7573 0.311 psbV 107 546 1.3076 0.1732 0.2963 0.4615 0.4904 0.69 psbW 97 390 1.4346 0.1627 0.2596 0.6277 0.5106 0.59 rbcL 107 1485 0.9848 0.0557 0.1721 0.7596 0.4712 0.487 rbcS 106 477 1.4055 0.1934 0.2464 0.6505 0.6311 0.591 rne 103 1725 1.3165 0.2996 0.3714 0.2200 0.1900 0.777 rpl1 107 729 1.1550 0.2661 0.3188 0.3942 0.3558 0.753

205

rpl11 107 456 1.1930 0.1759 0.2695 0.5962 0.5769 0.614 rpl12 107 402 1.1654 0.1694 0.3003 0.6154 0.6058 0.734 rpl13 106 510 1.3568 0.3175 0.3756 0.6117 0.4466 0.782 rpl14 106 420 1.4332 0.0965 0.2240 0.6602 0.5825 0.519 rpl16 107 456 1.2215 0.1072 0.2272 0.7212 0.6635 0.603 rpl18 104 600 1.4277 0.3966 0.3883 0.6634 0.6535 0.513 rpl19 104 423 1.3357 0.2855 0.3497 0.6238 0.5545 0.74 rpl2 107 930 1.4303 0.1804 0.2727 0.5385 0.4327 0.639 rpl20 106 390 1.3059 0.2256 0.3129 0.7087 0.5340 0.728 rpl21 105 408 1.1942 0.3687 0.3654 0.4902 0.4804 0.657 rpl22 105 396 1.3288 0.3570 0.3486 0.5686 0.5196 0.778 rpl23 106 363 1.3538 0.3243 0.3733 0.5922 0.5437 0.771 rpl24 103 390 1.3458 0.3952 0.3958 0.5900 0.6200 0.821 rpl27 106 321 1.2798 0.2207 0.3095 0.7476 0.7184 0.614 rpl29 96 237 1.1212 0.4775 0.4379 0.6667 0.6989 0.895 rpl3 107 792 1.3776 0.2815 0.3512 0.5673 0.5385 0.654 rpl4 107 786 1.2675 0.3623 0.3997 0.5385 0.4712 0.752 rpl5 106 570 1.5432 0.2114 0.2870 0.6214 0.5728 0.707 rpl6 106 570 1.3400 0.3024 0.3483 0.5728 0.5049 0.781 rpl9 102 498 1.1587 0.4910 0.4401 0.4545 0.5051 0.863 rpoA 107 1008 1.6104 0.1847 0.2613 0.3654 0.3462 0.694 rpoB 106 3558 1.3122 0.1230 0.2529 0.2039 0.1553 0.673 rpoc1 104 2058 1.3105 0.1414 0.2468 0.2376 0.2079 0.571 rpoc2 105 4332 1.1906 0.3034 0.3523 0.1961 0.1765 0.684 rps1 105 963 1.3992 0.3659 0.4071 0.3627 0.3039 0.793 rps11 107 393 1.2619 0.1013 0.2377 0.6731 0.5865 0.659 rps12 105 429 1.2257 0.0625 0.2124 0.7549 0.5784 0.485 rps13 107 414 1.4171 0.2384 0.2989 0.5577 0.4423 0.705 rps14 101 351 1.4483 0.2700 0.3100 0.7449 0.6429 0.678 rps16 106 297 1.2939 0.3170 0.3465 0.6311 0.6117 0.704 rps17 97 279 1.3109 0.2626 0.3122 0.5851 0.5426 0.72 rps18 100 270 1.5122 0.1551 0.3092 0.7526 0.6701 0.652 rps2 107 741 1.2811 0.1515 0.2562 0.5962 0.5192 0.7 rps20 101 366 1.1160 0.4337 0.4129 0.5204 0.4592 0.751

206

rps3 106 744 1.2586 0.2273 0.3071 0.4563 0.4272 0.703 rps4 107 624 1.4482 0.1744 0.2803 0.4423 0.4519 0.744 rps5 106 567 1.2655 0.1383 0.2722 0.5340 0.5340 0.667 rps6 104 375 1.3202 0.2595 0.3301 0.5050 0.4554 0.803 rps7 107 471 1.2802 0.1480 0.2543 0.6346 0.4808 0.718 rps8 106 408 1.5636 0.1466 0.2768 0.6990 0.6117 0.748 rps9 107 447 1.2046 0.1315 0.2770 0.7212 0.5385 0.7 secA 107 2847 1.4288 0.2370 0.3387 0.2404 0.2404 0.777 secY 107 1317 1.4360 0.2377 0.3366 0.2500 0.2596 0.789 syfB 101 2595 1.2869 0.5999 0.4891 0.2347 0.2041 0.819 thiG 106 996 1.3408 0.2484 0.3262 0.4854 0.4175 0.644 trpA 106 1068 1.1659 0.3668 0.3872 0.4369 0.3495 0.684 trpG 102 624 1.4748 0.3196 0.3511 0.4747 0.4141 0.723 trxA 99 468 1.2096 0.1935 0.2956 0.6458 0.5938 0.667 tsf 107 765 1.2550 0.3707 0.3197 0.4712 0.4135 0.694 tufA 107 1278 1.3021 0.0772 0.2046 0.6538 0.5000 0.487 ycf3 107 531 1.5500 0.0846 0.2242 0.5673 0.4038 0.618 ycf39 107 1017 1.2685 0.3472 0.3534 0.2692 0.3077 0.794 ycf4 105 627 1.3671 0.2799 0.3407 0.4706 0.5000 0.73 ycf45 96 1803 1.2756 0.0956 0.3327 0.3011 0.2473 0.748 ycf54 96 393 1.2794 0.3124 0.3733 0.6559 0.5591 0.718 ycf65 98 465 1.3496 0.1521 0.2492 0.7579 0.6211 0.443

207

Table S3: Taxa used in the PCR experiments, their collection information, and NCBI GenBank accessions.

Accession Taxon Class/Subclass Collection Information rpoC1 rbcL psbA Galdieria partita Sentsova Cyanidiophyceae/- Strain THAL043, MN538 MN540 MN540 DaYouKen, 998 180 181 Yangmingshan National Park, Taiwan Galdieria maxima Sentsova Cyanidiophyceae/- Strain THAL066, MN538 MN431 MN431 GenZiPing, 999 657 657 Yangmingshan National Park, Taiwan Porphyridium cruentum Porphyridiaceae/- Strain UTEX161, wet MN539 MN539 MN539 (Gray) Nägeli shaded tuff, Basel, 000 012 009 Switzerland Compsopogon caeruleus Compsopogonophyce Voucher THU.368, MN539 MH835 - (Balbis ex Agardh) ae/- "GouLaoBan" Pet Shop, 001 676 Montagne Section 1, Jieshou Road, Bade District, Taoyuan, Taiwan Bangia fuscopurpurea Bangiophyceae/- Voucher YASP116, MN539 MN539 MN539 (Dillwyn) Lyngbye Yongxin, Taoyuan Algal 002 013 010 Reef, Taoyuan, Taiwan Hildenbrandia sp. Florideophyceae/ Voucher BYSP051, MN539 MN539 - Hildenbrandiophycid Baiyu, Taoyuan Algal 003 014 ae Reef, Taoyuan, Taiwan Kumanoa sp. Florideophyceae/ Voucher THU.571, A MN539 MH835 - Nemaliophycidae spring at Kenting 004 528 National Park, Hengchun, Taiwan Sporolithon sp. Florideophyceae/ Voucher DTG1SP025, MN539 - MN539 Corallinophycidae Datan, Taoyuan Algal 005 011 Reef, Taoyuan, Taiwan Peyssonnelia sp. Florideophyceae/ Voucher YASP034, MN539 MN539 - Rhodymeniophycidae Yongxin, Taoyuan Algal 006 015 Reef, Taoyuan, Taiwan Caloglossa ogasawaraensis Florideophyceae/ Voucher THU.423, MN539 MH835 - Okamura Rhodymeniophycidae ChuanZiTou Bridge, 007 647 Dahu Road, Yuanshan Township, Yilan, Taiwan Champia sp. Florideophyceae/ Datan, Taoyuan Algal MN539 MN539 - Rhodymeniophycidae Reef, Taoyuan, Taiwan 008 016

208

Table S4: rpoC1 primers.

Designation Sequence Direction F1 5’-GAAAGAAYWTTRCCWAATGG-3’ Forward F2 5’-GAYTGGGARTGTCAYTGTGG-3’ Forward F3 5’-ACTCATGTTTGGTAYYTAAAAGG-3’ Forward F4 5’-GGATGRTWTTTTYWGTWATWCC-3’ Forward F5 5’-GATGGWGGWMGWTTTGCWACWGC-3’ Forward F6 5’-GATATTATWGAAGGWAARCARGG-3’ Forward R1 5’-GCWGTWGCAAAWCKWCCWCCATC-3’ Reverse R2 5’-CCATTATCCATTARWGMATCWACWGC-3’ Reverse R3 5’-CCYTGYTTWCCTTCWATAATATC-3’ Reverse R4 5’-GCCATTTGATCWCCATCRAAATC-3’ Reverse R5 5’-CCATRTCTTGRCTWGGCAT-3’ Reverse

209