Organelle Phylogeny Incongruence in Begonia L. ()

Daniel Fuller August 2014 Thesis submitted in partial fulfilment for the MSc in the Biodiversity and of

1

ABSTRACT

The increased availability of DNA sequences has revolutionised the field of molecular phylogenetics. The increase in available data increases the opportunity for statistically significant incongruence to affect the dataset in multigenic phylogenies. Incongruence can be caused by hybridisation, incomplete lineage sorting, gene duplication and extinction, horizontal gene transfer or systematic errors in tree building methodology. A previous pilot phylogenetic study of all three genomes in Begonia L. showed incongruence between all genomes.

The aim of this study is to (i) construct a mitochondrial DNA phylogeny representative of the range of Begonia phylogenetic diversity, (ii) compare the mitochondrial DNA phylogeny with a chloroplast DNA phylogeny, (iii) identify well-supported incongruencies between the two, and (iv) identify potential causes for the incongruence.

Using Bayesian Inference methods, phylogenies of Begonia mitochondrial DNA, chloroplast DNA, and combined datasets were created. Comparisons of the phylogenies revealed incongruences spread across the entire distribution of Begonia. Eight cases of incongruence are hypothesised to be the result of organelle inheritance uncoupling during hybridisation, a rare event in angiosperms. Most notable is the presence of two chloroplast clades but only one mitochondrial clade in the Neotropics, representing incongruence between 650 species organelle genomes. Other incongruencies are hypothesised to be the result of incomplete lineage sorting or tree building error. 2

ACKNOWLEDGEMENTS

First and foremost I would like to thank my supervisors. Mark Hughes, without your guidance, positive attitude, and unconditional support I never would have made it through the last three months. Of course, those drinks didn’t hurt either. One day I will pay you back. To my co- supervisor, Peter Moonlight, saying thanks doesn’t seem like enough. How you managed to find time to assist me while doing fieldwork in Peru will never cease to amaze me. This thesis wouldn’t be half of what it is without support from both of you.

Producing a phylogeny requires a certain amount of lab work. Without the assistance of Michelle Hollingsworth, Laura Forrest, and Ruth Hollands I never would have managed. The living collection at RBGE and Glasgow Botanic Gardens were an invaluable source of fresh material. While I may not have met you personally, I’d like to extend a Thank You to all those who have a hand in maintaining the collections. On the same note, I’d like to thank Lakmini Kumarage for being so generous with her Begonia extractions, without which the phylogeny would be considerably smaller.

Luke Mitchell, you were a never-ending source of banter and Vanilla Ice. I’m not really sure if that’s a good thing, but thanks anyway. Honestly, without your support and willingness to listen I’m not sure how I would have made it through. PS., I double-checked and it is definitely “Beachmont Avenue”.

They say to leave the best for last. Well, to my parents I extend the greatest of thanks. Never have you failed to encourage me to follow my dreams. That is truly the greatest gift of all.

3

TABLE OF CONTENTS

Abstract ...... 1

Acknowledgements ...... 2

List of Figures ...... 5

List of Tables ...... 6

1 Introduction to Incongruence ...... 7 1.1 A brief introduction to the problem ...... 7 1.2 Defining incongruence ...... 7 1.2.1 Species trees and gene trees ...... 7 1.2.2 Gene tree heterogeneity ...... 8 1.3 Causes of incongruence ...... 8 1.3.1 Hybridisation ...... 9 1.3.2 Incomplete lineage sorting ...... 9 1.3.3 Gene duplication and extinction ...... 10 1.3.4 Horizontal gene transfer ...... 10 1.3.5 Tree building errors ...... 11 1.4 Testing for incongruence ...... 11 1.4.1 Incongruence length difference ...... 11 1.4.2 Parsimony-based tests ...... 13 1.4.3 Likelihood-based tests...... 14 1.4.4 Topological tests ...... 14 1.5 Working with incongruent datasets ...... 14

2 Introduction to Begonia L...... 17 2.1 Classification ...... 17 2.2 Morphology ...... 17 2.3 Pollination, reproduction, and dispersal ...... 18 2.4 Organelle genome inheritance ...... 19 2.5 Hybridisation ...... 20 2.6 Habitat ...... 20 2.7 Biogeography and phylogenetics ...... 20 4

3 Materials and Methods ...... 23 3.1 Taxon and gene region sampling ...... 23 3.2 Laboratory methods ...... 23 3.2.1 DNA extraction ...... 23 3.2.2 PCR amplification ...... 23 3.2.3 Visualisation and sequencing ...... 26 3.3 Sequence alignment ...... 27 3.4 Phylogenetic analysis ...... 27 3.4.1 Tree building – organelle genomes ...... 27 3.4.2 Assessing incongruence and informative characters ...... 27 3.4.3 Tree building – methods of combining organelle genome data ...... 28 3.4.3 Maximum likelihood and maximum parsimony support ...... 29

4 Results ...... 30 4.1 Statistical tests ...... 30 4.2 Single genome phylogenies ...... 30 4.2.1 mtDNA phylogenetic tree ...... 30 4.2.2 cpDNA phylogenetic tree ...... 30 4.2.3 Incongruences between genome phylogenies ...... 37 4.3 Combined analyses ...... 38 4.3.1 Congruent combined phylogeny ...... 38 4.3.2 Reduced mtDNA + cpDNA phylogeny ...... 38 4.3.3 Reduce cpDNA + mtDNA phylogeny ...... 38 4.3.4 Duplicated phylogeny ...... 39

5 Discussion...... 40 5.1 African and malagasy incongruence ...... 40 5.2 Asian incongruence ...... 41 5.3 Neotropical incongruence ...... 45 5.4 Further work ...... 46 5.5 Conclusions ...... 47

References ...... 50

Appendix 1 ...... 60

5

LIST OF FIGURES

2.1: Begonia sutherlandii Hook. f.

4.1: mtDNA Bayesian majority rule consensus phylogram

4.2: cpDNA Bayesian majority rule consensus phylogram

4.3: Congruent combined Bayesian majority rule consensus phylogram

4.4: Reduced mtDNA + cpDNA Bayesian majority rule consensus phylogram

4.5: Reduced cpDNA + mtDNA Bayesian majority rule consensus phylogram

4.6: Duplicated Bayesian majority rule consensus phylogram

5.1: Phylogram from Chung et al. (2014)

5.2: Phylogram from Thomas (2010)

6

LIST OF TABLES

1.1: Summary of incongruence tests

2.1: Recent Begonia phylogenies

3.1: Accession information for taxa sequenced in this study

3.2: DNA fragments amplified and primers used

3.3: Summary of datasets and resulting phylogenies

7

1 INTRODUCTION TO INCONGRUENCE

1.1 A BRIEF INTRODUCTION TO THE PROBLEM The sought-after Tree of Life is a dichotomous branching diagram upon which all of the events of biological evolution can be mapped. Phylogenetics seeks to infer and map these relationships based on morphological and molecular data. The increased availability of DNA sequences and, more recently, entire genomes has revolutionised this field of study. Depending on inheritance, the three genomes in a have the potential to represent different evolutionary patterns (Pagel, 1999; Savolainen & Chase, 2003). The result is a set of incongruent species and gene trees from which researchers are trying to infer the Tree of Life.

1.2 DEFINING INCONGRUENCE

1.2.1 SPECIES TREES AND GENE TREES A species tree is a graphic display of speciation and represents the branching pattern a group of individuals take as they diverged from a common ancestor (Maddison, 1997). Phylogenies can be used to determine conservation priorities, uncover historical biogeography, map phylogenetic diversity, study the evolution of morphological characters, and to inform taxonomic placement of species, orders or families within the tree of life (Brower et al., 1996; Doyle, 1992; Nichols, 2001; Than & Nakhleh, 2009). Species trees are inferred from gene trees, which are a representation of the relationships between genes. There is a gene tree for every locus in the genome of each species in the species tree (Maddison, 1997).

Species trees were often previously inferred from a single gene tree. It was assumed that this tree was congruent with the species tree for that group of species. This method was widely used for many years because of the high costs associated with sequencing DNA. However, as the associated costs have dropped and technology improved so has the availability of multigene datasets and even whole genomes (Degnan & Rosenberg, 2006, 2009; Than & Nakhleh, 2009). When a species tree contains short branches, even just one, they are more susceptible to incongruences with the associated gene trees. This is especially true when the short branches are deep in the species tree. Large trees in which the short branches are only present near the terminal taxa tend to only show minor incongruences because they represent relatively recent 8 divergence events (Degnan & Rosenberg, 2006; Kubatko & Degnan, 2007). However, short branches become more of a problem as the number of taxa being analysed increases and monophyletic groups grow. In this sense, incongruences are more problematic in groups that are extremely species rich and those that experience multiple periods of rapid divergence over a relatively short time period (Degnan & Rosenberg, 2006). The reconstruction of a “true” species tree requires that all potential causes of incongruence be considered during the reconciliation process (Than and Nakhleh, 2009).

1.2.2 GENE TREE HETEROGENEITY A gene tree and a species tree won’t necessarily have the same topology. Genes, much like species, have a phylogenetic history. However, despite the fact that a gene resides within a species, these histories don’t always overlap. Analyses completed with multiple genes or multiple copies of a single gene have revealed incongruencies spread across gene trees representing the same group of species (Degnan & Rosenberg, 2006, 2009; Maddison, 1997; Than & Nakhleh, 2009). Gene trees can represent different histories for many reasons, including hybridisation, incomplete lineage sorting, and gene duplication and extinction (Doyle, 1992; Page & Charleston, 1997, 1998). In order for a gene tree to match a species tree the genes extracted from a monophyletic group of species must share the same history. However, as the number of genes sequenced per species increases the probability of gene trees with the same topology decreases (Knowles, 2009; Maddison, 1997). These incongruent datasets must be treated with care as inferences from them can be misleading and inaccurate (Doyle, 1992; Page & Charleston, 1998). Each gene has the potential to represent a different history and can therefore have a different gene tree.

1.3 CAUSES OF INCONGRUENCE While this section focuses on the molecular aspect of incongruence, it is worth noting that morphological characters can also be present in ancestral polymorphisms and cross species boundaries through introgressive hybridization. In both cases, the respective species trees and morphological trees would be incongruent in much the same way as the gene trees are (Doyle, 1992).

9

1.3.1 HYBRIDISATION Hybridization can be compared to horizontal gene transfer because the genes in question have the ability to physically move from one species to another, except this happens over the course of many generations and involves breeding. Hybridisation induced incongruence is generally caused through introgression, the recurrent backcrossing of a hybrid with one parent species (Brower et al., 1996; Degnan & Rosenberg, 2009; Doyle & Gaut, 2000). The recurrent backcrossing between the successive generations of offspring and one of the original parents will decrease the original inheritance from the other parent until the hybrid lineage is almost identical to only one parent species. However, due to the effective haploid nature and uniparental inheritance of mitochondrial and plastid genomes it is possible that the hybrid lineage may have the nuclear genome of one parent but share an organelle genome with the other. In this scenario, a mitochondrial phylogeny would have a different topology than a nuclear or morphological phylogeny (Doyle, 1992). All tree topologies are correct in terms of organelle phylogeny, however the interpretation of the data in terms of a species tree needs to be justified. This form of introgressive hybridization acts as an evolutionary filter. It allows certain genes or genomes to spread throughout the species in question while preventing the spread of others. Theoretically, beneficial genes would be allowed to pass through while deleterious genes are filtered out (Martinsen et al., 2001).

1.3.2 INCOMPLETE LINEAGE SORTING When an ancestral polymorphism is retained through one or more speciation events it can cause tree incongruence. This incongruence, known as incomplete lineage sorting (ILS), is a result of the common ancestor to two or more species having two alleles at a single locus. The two alleles can then be passed through multiple speciation events before eventually becoming homozygous in a species. Once this has happened, the topology of two gene trees or a gene tree and a species tree may no longer be congruent if sister species receive different copies of the allele (Brower et al., 1996; Doyle & Gaut, 2000; Maddison, 1997). With diploid nuclear genes this polymorphism can be present both within the individual and at the population level. However, because organelles are haploid they show polymorphism at the population level only. ILS is the process in a time forward view that tracks the polymorphism from common ancestor to single allele daughter species. It may be easier to consider this in a time reverse process known as deep 10 coalescence. In this view, two sister species contain incongruent copies of an allele. Each allele is traced back through time until a common ancestor with both copies is found. Going further back the point at which the two individual alleles diverged is known as the coalescent point. Small populations with shorter generation times are less prone to incongruence through ILS due to strong genetic drift (Degnan & Rosenberg, 2009; Maddison, 1997).

1.3.3 GENE DUPLICATION AND EXTINCTION Gene duplication events result in two or more copies of a gene in the genome. Unlike the process of lineage sorting, these gene copies are evolving separately as they are not the same locus. For this reason, some copies may become extinct if they lose functionality, or they may obtain a new function. Depending on which copy is sampled, or goes extinct in the individual lineages, the gene tree may be incongruent with the species tree (Degnan & Rosenberg, 2009; Doyle, 1992; Maddison, 1997; Page & Charleston, 1998).

1.3.4 HORIZONTAL GENE TRANSFER Species trees account for a unidirectional pattern of decent known as vertical gene transfer. In this situation genes move from parent to offspring, yet this is not the only way for DNA to be passed from one individual to another. Horizontal gene transfer (HGT) is the movement of DNA from one genome to another, either between individuals or between genomes within one cell (Brower et al., 1996; Gao et al., 2014). Plastids and mitochondria within eukaryotic cells have experienced considerable transfer of genetic material between host cells and endosymbionts (Talianova & Janousek, 2011). Amborella trichopoda Baill. has recently been discovered to have large amounts of foreign DNA from several plant donors in its mitochondrial genome, though not in the chloroplast genome (Bergthorsson et al., 2004; Goremykin et al., 2009; Talianova & Janousek, 2011). Mitochondria are, according to recent studies, more active in HGT than other genomes (Archibald & Richards, 2010; Talianova & Janousek, 2011). HGT can result in the transfer of entire genomes, but is more common with smaller regions of DNA. The transfer of DNA can be accomplished through direct plant-to-plant contact such as epiphytic or parasitic associations or vector mediated transfers such as viruses, bacteria and fungi (Bock, 2010; Brower et al., 1996; Gao et al., 2014; Maddison, 1997; Talianova & Janousek, 2011). Maddison (1997) theorized that more distantly related species would display less HGT, however recent research by Bergthorsson et al. (2003) and Bergthorsson et al. (2004) would suggest 11 otherwise. HGT, depending on the prevalence, can have various effects on the incongruence between trees. Trees that include genes affected by HGT may not reflect true species phylogenies; in situations where HGT is common the trees would be more accurate indicators of ecological traits than species divergence (Galtier & Daubin, 2008).

1.3.5 TREE BUILDING ERRORS Not all instances of incongruence have biological origins. Recent studies have revealed statistically significant incongruence caused by the phylogenetic reconstruction methods chosen (Brower et al., 1996; Som, 2014). Rokas et al. (2003) and Jeffroy et al. (2006) found significant incongruencies amongst single gene trees when different tree building methods, such as maximum likelihood, maximum parsimony and Bayesian inference, were used to analyse the same dataset. Similarly, evolutionary models can also be the source of incongruence among large datasets. In this case, when an inappropriate model is used it can cause the non- phylogenetic signal in the data to overwhelm the phylogenetic signal resulting in miscalculated branch lengths. Models are generalized and simplistic in comparison to actual sequence evolution which is a much more complex process. More complex models of sequence evolution may prevent these incongruencies, however they also result in increased computation time and the estimation of more parameters. Estimating more parameters may open the analysis up to further errors and more chances for incongruence (Philippe et al., 2011; Som, 2014). It is important that these potential causes of incongruence be accounted for when inferring a species tree from gene tree.

1.4 TESTING FOR INCONGRUENCE

1.4.1 INCONGRUENCE LENGTH DIFFERENCE The incongruence length difference test (ILD) (Farris et al., 1995) is one of the most frequently used methods for detecting phylogenetic incongruence. ILD statistics are calculated based on the assumption that all data has undergone the same evolutionary history and tests this through a randomization procedure designed to tell the researcher if partitioned data should be analysed independently or if it can be concatenated. This test was designed to be an overall measurement of the incongruence present between two datasets based on parsimony analyses (Planet, 2006). 12

When more than two datasets or partitions are present the ILD test will have to be modified. This can be accomplished by either testing individual partitions against a combination of all remaining partitions or sequentially testing all partitions against each other. A major drawback to both options is the time involved as datasets grow and the number of partitions increases (Planet & Sarkar, 2005; Planet, 2006). With this in mind, Planet and Sarkar (2005) introduced matrices of ILD comparisons (mILD). mILD is not a test for incongruence on its own, rather it is an application that will perform all ILD pairwise comparisons on large datasets. The mILD will then provide the option to either systematically remove taxa until the dataset is no longer incongruent or to test for and combine only the congruent partitions.

Originally, the ILD was accepted as providing accurate assessments of incongruence, however, recent research suggests that in certain situations the results should be completely rejected (Barker & Lutzoni, 2002; Cunningham, 1997a, b; Darlu & Lecointre, 2002; Yoder et al., 2001). Most notably, the ILD performed poorly when the partitions had significant differences in size and number of informative sites, level of noise, or evolutionary rates (Planet, 2006; Thornton & DeSalle, 2000). Despite these disagreements, the ILD has been well tested and its shortcomings are well known, which allows a cautious researcher to pick out the potentially inaccurate results and proceed appropriately. It has been suggested that the ILD be used as a starting point to identify potential incongruence upon which further statistical tests can be conducted (Hipp et al., 2004; Planet, 2006).

Planet (2006) notes two particular failures of the ILD. The first failure occurs when one, highly supported occurrence of incongruence inaccurately increases the support for other, insignificant incongruences. Second, unsupported incongruences from different sources placed randomly on the tree can accumulate during analysis. The result is a false rating of incongruence despite the lack of support for all occurrences. The ILD will also report an incongruent dataset if only one or a few taxa display significant incongruence. Since the ILD test only indicates that incongruence is either present or absent and does not identify the problematic taxa it prohibits further analysis (Thornton & DeSalle, 2000).

1.4.1.1 Local incongruence length difference Thornton and DeSalle (2000) developed the local incongruence length difference test (LILD) to help identify the incongruent taxa in a dataset. The LILD accomplishes this through a 13 calculation of the number of steps necessary for a node present in a combined phylogeny to appear in a phylogeny of one of the partitioned datasets, providing a statistical comparison of the conflict of one dataset with a clade in another dataset using the most parsimonious trees (Thornton & DeSalle, 2000). Even though the LILD was designed to overcome some of the shortcomings of the ILD it has issues of its own. In particular, the LILD may show statistically significant incongruence through random chance as the number of tested nodes increases. To correct for this, one can apply a statistical correction to the P-value derived from multiple comparisons (Planet, 2006).

1.4.1.2 Incongruence length difference using the BIONJ algorithm The ILD and LILD were both designed to run on parsimony principles. Planet (2006) notes that as long as the same criteria assumed for a parsimony analysis is maintained the ILD should be adaptable to other types of analyses. Zelwer and Daubin (2004) applied the principles of ILD to the biological neighbour-joining algorithm (BIONJ) (Gascuel, 1997) in an attempt to increase accuracy. The main difference between the ILD and ILD_BIONJ is that trees are constructed and analysed using distance based methods instead of parsimony. This change to the ILD resulted in similar rates of false inferences of incongruence but had lower rates of false inferences of congruence (Planet, 2006; Zelwer & Daubin, 2004).

1.4.2 PARSIMONY-BASED TESTS The ILD and its variations are not the only parsimony-based tests for incongruence, but it does remain the most used and tested measurement of incongruence. Other parsimony-based tests can be used on a partitioned dataset to separately analyse each partition’s support for the same tree topologies (Cunningham, 1997b; Planet, 2006). However, this is not a direct measure of whether or not to combine the datasets but a measure of support provided by a dataset for a specific topology. Since trees are supposed to be specified a priori the interpretation of the results can be biased towards those topologies even if a more congruent topology exists. Similarly, if the trees are not specified a priori the tests may be overly prone to false rejections of congruence (Planet, 2006).

14

1.4.3 LIKELIHOOD-BASED TESTS Likelihood-based tests, similar to parsimony-based tests, measure the fit of the dataset to differing tree topologies. While some of the likelihood tests are designed for partitioned data other tests must be modified in the same way as parsimony-based tests through separate measurements of each partitions support for the same tree topology. The fit of the dataset is measured either through parametric or non-parametric bootstrapping. Some tests, such as the Kishino-Hasegawa (KH) test, can also be implemented using sitewise procedures (Planet, 2006). Most of these tests are time consuming and therefore rarely used. However, some timesaving procedures have been developed. When properly used and interpreted, the tests are effective (Goldman et al., 2000; Planet, 2006).

1.4.4 TOPOLOGICAL TESTS Measurements of topological congruence can be divided into two types. Both types only consider branching patterns and ignore the underlying data. It has been suggested that these tests are most useful as a first step to test for incongruence before exploring complex evolutionary scenarios (Planet, 2006; de Vienne et al., 2007). The first type, consensus-based measurements, summarizes incongruence by comparing the number of nodes in a strictly bifurcating tree to those in the consensus tree obtained from analyses. A consensus tree, depending on the type, will only display nodes present in a proportion of all analysed trees (Planet, 2006). The second type bases congruence on a tree distance measurement such as the symmetric difference. The symmetric difference is defined by breaking up the tree into different groups and computing the number of groupings in one tree that are not in the other. This type of test can be done on partitioned data by comparing the symmetric difference, or another measure of similarity, of trees produced by analysing the two partitions separately (Planet, 2006; Rodrigo et al., 1993).

1.5 WORKING WITH INCONGRUENT DATASETS More methods than causes of incongruence have been developed to deal with incongruent datasets. Programs such as *BEAST, BEST, and CONCATERPILLAR have been developed to work with incongruences and still find the optimal tree (Heled & Drummond, 2010; Leigh et al., 2008; Liu, 2008). mILD, as mentioned above, is an application that works directly with the dataset to perform multiple ILD tests and create a dataset that shows no significant incongruence through either showing which compartments are congruent, or showing which taxa are congruent 15 across all compartments (Planet & Sarkar, 2005). Some methods call for increased taxon or gene sampling while others call for an increase in both. The idea behind these methods is to drown the incongruence by increasing the phylogenetic signal (Hedtke et al., 2006; Jeffroy et al., 2006; Philippe et al., 2011). Another option, in relation to sampling methods, is to collect multiple samples from each taxon. This method seeks to illuminate the cause of incongruence rather than overcome it (Degnan & Rosenberg, 2006). Others have proposed methods of data combination. These range from simple concatenation (Gadagkar et al., 2005; Gontcharov et al., 2004; Kubatko & Degnan, 2007) to specific supermatrix methods involving removing and/or duplicating data (Leigh et al. 2011; de Villiers et al. 2013; Pirie et al. 2009; Pirie et al. 2008; Lecointre & Deleporte 2005). Similar to the sampling methods, these strive to either reveal the causes of or overcome incongruence. Another process advocates the benefits of running separate trees for separate partitions and comparing them to sort out the incongruences (Gadagkar et al., 2005; Tsutsui et al., 2009). Despite the variety of methods and programs for dealing with incongruent datasets the most appropriate way will depend on your data, the type and strength of incongruence present 16 Table 1.1: Summary of incongruence tests by type. Character congruence Parsimony based Likelihood based

Name Reference Name Reference Kishino & Hasegawa Incongruence Length Difference (ILD) Farris et al. (1995) Kishino-Hasegawa (KH) (1989); Hasegawa &

Kishino (1989) Resampling Estimated Log-Likelihood Local ILD (LILD) Thornton & DeSalle (2000) Kishino & Hasegawa (1989) (RELL) Shimodaira & Hasegawa ILD_BIONJ Zelwer & Daubin (2004) Shimodaira–Hasegawa (SH) (1989); Shimodaira (1998)

Swofford–Olsen–Waddell–Hillis Matrices of ILD Comparisons (mILD) Planet & Sarkar (2005) Goldman et al. (2000) (SOWH)

Topology-Dependent Cladistic Faith (1991) Likelihood Ratio Test (LRT) Huelsenbeck & Bull (1996) Permutation Tail Probability (T-PTP)

Templeton Test Templeton (1983) Approximately Unbiased (AU) Shimodaira (2002)

Winning Sites Test Prager & Wilson (1988)

Topological congruence Consensus based Tree distances

Name Reference Name Reference Consensus Fork Index (CFI) Colless (1980) Quartets Distance (QD) Estabrook et al. (1985)

Nearest Neighbor Interchange Distance Topological ILD (TILD) Wheeler (1999) Waterman (1978) (NNID) Rohlf Consensus Index Rohlf (1982) Path-Length-Distance metric (PLD) Penny et al. (1993)

Rodrigo Tests Rodrigo et al. (1993) 17

2 INTRODUCTION TO BEGONIA L.

2.1 CLASSIFICATION The Berchtold & J. Presl currently has around 2600 species in at least 7 families. The core Cucurbitales, comprised of the Begoniaceae C. Agardh, Cucurbitaceae Jussieu, Datiscaceae Dumortier, and Tetramelaceae Airy Shaw, have long been recognized as a closely related group. Recent molecular studies have added Anisophylleaceae Ridley, Coriariaceae A. DC., Corynocarpaceae Engler, and Apodanthaceae Takhtajan to the Cucurbitales. However, support for Apodanthaceae is low and its inclusion is questionable. Species numbers are unevenly distributed in the order with the majority being found in Begoniaceae (about 1750) and Cucurbitaceae (about 980) while the remaining families each contain 2 to 40 species (Dewitte et al., 2011; Kubitzki, 2011; Schaefer & Renner, 2011; Steven, 2001; Zhang et al., 2006).

Begoniaceae is currently divided into two genera, the monotypic Hillebrandia Oliv. and the species-rich Begonia L. Begonia is almost pantropical, being absent only from Australia and some Pacific islands with a single species extending into temperate Asia while Hillebrandia is endemic to the Hawaiian Archipelago (Clement et al., 2004; Dewitte et al., 2011; Forrest et al., 2005; de Wilde, 2011). Begonia can be further divided into 66 sections, with sections being limited to a single continent with a few exceptions, notably Tetraphila A. DC (Dewitte et al., 2011; Doorenbos et al., 1998; Forrest & Hollingsworth, 2003).

2.2 MORPHOLOGY Begonia is a genus of perennial or occasionally annual herbs to small shrubs growing in terrestrial to epiphytic habitats. The majority are succulent to herbaceous evergreens but some tuberous species present dormant periods. Plants generally have two-ranked, asymmetric leaves that are often patterned and optimized for low, scattered light with palmate or pinnate venation and well-developed teeth. Plants are monoecious, rarely dioecious, with unisexual, actinomorphic flowers. Carpellate flowers have a bifid style with twisted stigmas. Staminate and carpellate flowers can present different numbers of tepals but the stamens and stigmas are bright yellow. Inflorescences are axillary and cymose and can be dichasial or monochasial. The fruit is winged or horned, can be dry, sometimes fleshy, and either loculicidally dehiscent or 18 indehiscent. Many, minute seeds of fairly uniform size and structure are produced across the genus. All species produce seeds with a ring of elongated collar cells located below the micropylar-hilar layer (Dewitte et al., 2011; Steven, 2001; de Wilde, 2011).

Figure 2.1: Begonia sutherlandii Hook. f. in cultivation at Royal Botanic Garden, Edinburgh.

2.3 POLLINATION, REPRODUCTION, AND DISPERSAL Most Begonia species have features that suggest insect pollination, probably by bees. It has been suggested that pollination by deceit is used to “trick” insects into visiting reward less, female flowers that mimic the male flowers in appearance. In some species with red, tubular flowers pollination by hummingbird and nectivorous sunbirds has been documented (de Wilde, 2011; Wyatt & Sazima, 2011). A few species show pollination by gravity where male flowers are 19 present above female flowers and both are fertile at the same time. Wind has also been considered as a potential pollinator, especially in some epiphytic specimens whose male flowers continue to disperse pollen after anthesis (de Wilde, 2011).

Begonia species are known to be self-compatible (Wyatt & Sazima, 2011). Studies have shown high inbreeding rates (Agren & Schemske, 1993) as well as high outcrossing rates (Hughes et al., 2003) in Neotropical and Socotran species respectively. To avoid self-fertilization most species invoke temporal separation of anthesis for flowers on the same inflorescence. However, among inflorescences on the same plant there is still considerable overlap. Most self-compatible crosses have resulted in reduced fitness of both seed and offspring (Agren & Schemske, 1993). Begonias also reproduce asexually through rhizomes or vegetative clones (Twyford et al., 2014).

The differences in fruit morphology and seed structure are a reflection of different dispersal adaptations. Different seeds and fruit types show adaptations for passive dispersal through gravity, wind, ant mediated, rain wash, and rain ballast mediums. It has also been theorized that some Begonia have adaptations for animal dispersal either through endozoochory or epizoochory based on the seeds and fruit wings respectively. One case of animal dispersal has been documented in which ants were observed transporting seeds into the substrate next to a potted Begonia in a greenhouse (Matolweni et al., 2000; de Wilde, 2002, 2011). Despite the wide range of dispersal methods employed, the seeds are often poorly dispersed. This could be due to the moist, sheltered conditions on the forest floor where Begonias prefer to grow. Poor dispersal can be seen in the clumps of seedlings growing below adult plants as well as the lack of gene flow between populations (Dewitte et al., 2011; Hughes & Hollingsworth, 2008; Matolweni et al., 2000; Thomas et al., 2012; Twyford et al., 2013).

2.4 ORGANELLE GENOME INHERITANCE Few studies have been completed on the inheritance of organelles in Begonia. While most angiosperms primarily display maternal inheritance of organelles, there are documented cases of paternal and biparental inheritance as well. Cucumis sativus L., (Cucurbitaceae; in the Cucurbitales along with Begoniaceae), displays different types of organelle inheritance: paternal mitochondrial and maternal chloroplast (Calderon et al., 2012; Havey et al., 1998). Only two studies have been done on Begonia organelle inheritance. One study showed maternal 20 inheritance of the chloroplast genome and the second showed maternal inheritance for both organelle genomes (Drinkwater, 2014; Peng & Chiang, 2000).

2.5 HYBRIDISATION Hybridisation is well documented in Begonia with over 10 000 combinations recorded. Both interspecific and intersectional hybrids are regularly produced in cultivation. Natural hybrids are less common but not unheard of (Hughes & Hollingsworth, 2008; de Wilde, 2011). At least three natural hybrids have been found in Taiwan. Experimental and natural hybrids are found to have varying levels of reproductive fitness (Dewitte et al., 2011; Peng & Ku, 2009). Experimental crosses have shown that there are only weak reproductive barriers between many Begonia species. The promiscuity between species along with the relatively low gene flow between populations are potential explanations for the high level of diversity in the genus (Dewitte et al., 2009, 2011; Hughes & Hollingsworth, 2008).

2.6 HABITAT Begonia species generally occur as very localized endemics in moist forest habitats where they have adapted to low light levels. Some, such as the African seasonally adapted species, have wider ranges which has been attributed to a change in habitat preference from moist to more arid areas and seasonally dry shrublands (Brennan et al., 2012; Goodall-copestake et al., 2010). While some species grow on level ground, many prefer to grow on slopes or along creek beds in loose soil, on decaying wood or moss covered boulders. Most species prefer to grow in the vicinity of water, often occurring in crevices and cracks wetted by the spray of waterfalls. A few species are epiphytic (Hughes et al., 2003; Mclellan, 2000; de Wilde, 2011).

2.7 BIOGEOGRAPHY AND PHYLOGENETICS Molecular dating has shown that the earliest diverging Begonia clades are African. Further studies have suggested that independent dispersal events out of Africa have resulted in highly diverse Neotropical and Asian clades. It is hypothesised that adaptation to seasonal drought assisted succesful colonisation via long distance dispersal across the Atlantic (Dewitte et al., 2011; Goodall-copestake et al., 2010). Dispersal to Asia may have occurred via the Arabian Peninsula. From there species dispersed to India and into continental Asia before multiple dispersals into Malesia (Dewitte et al., 2011; Thomas, 2010). The disparity between the large 21

Neotropical and Asian clades versus the relatively small African grade is possibly due to extinction in Africa and rapid post-dispersal diversification (Forrest et al., 2005; Thomas, 2010). There also appears to be a major Malagasy radiation likely caused by a single dispersal event from Africa into Madagascar (Plana et al., 2004).

The high levels of diversity and separate radiations out of Africa have created many opportunities for parallel evolution in Begonia making it a perfect group for phylogenetic, biogeographic, and evolutionary studies (Table 2.1) (Brennan et al., 2012; Dewitte et al., 2011). Thomas (2010) and Thomas et al. (2011) found well supported incongruence in the positions of several species in ITS and cpDNA gene trees which he attributes to potential hybridization, but notes that more information is needed to further elucidate the cause. Similarly, Goodall- copestake et al. (2010) found incongruence between mtDNA and cpDNA gene trees. These results show a single mtDNA clade in the Neotropics while the cpDNA has two Neotropic clades.

The aim of this study is to (i) construct a mtDNA phylogeny representative of the range of Begonia phylogenetic diversity, (ii) compare the mtDNA phylogeny with a sample-matched cpDNA phylogeny, (iii) identify well-supported incongruences between the two, and (iv) identify potential causes for the incongruence. 22

Table 2.1. Summary of recently published phylogenetic studies of Begonia. Genome(s) Used Begonia Sampling Publication Title

Forrest & Hollingsworth (2003) Nuclear 36 Worldwide A recircumscription of Begonia based on nuclear ribosomal sequences

Phylogenetic Relationships of the Afro-Malagasy Members of the Large Genus Plana (2003) Chloroplast 81 Worldwide Begonia Nuclear Plana et al. (2004) 74 Worldwide Pleistocene and pre-Pleistocene Begonia Speciation in Africa Chloroplast Nuclear Phylogenetic Position and Biogeography of Hillebrandia Sandwicensis Clement et al. (2004) 28 Worldwide Chloroplast (Begoniaceae): A Rare Hawaiian Relict A Phylogeny of Begonia Using Nuclear Ribosomal Sequence Data and Forrest et al. (2005) Nuclear 63 Worldwide Morphological Characters Phylogenetic Relationships of Asian Begonia, with and Emphasis on the Tebbitt et al. (2006) Nuclear 52 Asian and African Evolution of Rain-Ballist and Animal Dispersal Mechanisms in Sections Platycentrum, Sphenanthera and Leprosae Nuclear Phylogeny of the Cucurbitales Based on DNA Sequences of Nine Loci from Zhang et al. (2006) Chloroplast 2 African and Neotropical Three Genomes: Implication for Morphological and Sexual System Evolution Mitochondrial The Origin of a Mega-Diverse Genus: Dating Begonia (Begoniaceae) Using Goodall-copestake et al. (2009) Nuclear 21 Worldwide Alternative Datasets, Calibrations and Relaxed Clock Methods Chloroplast The Early Evolution of the Mega-Diverse Genus Begonia (Begoniaceae) Goodall-copestake et al. (2010) 30 Worldwide Mitochondrial Inferred from Organelle DNA Phylogenies A Non-Coding Plastid DNA Phylogeny of Asian Begonia (Begoniaceae): Thomas et al. (2011) Chloroplast 84 Worldwide Evidence for Morphological Homoplasy and Sectional Polyphyly

Rajbhandary et al. (2011) Nuclear >100 Asian and African Asian Begonia: Out of Africa via the Himalayas?

West to East Dispersal and Subsequent Rapid Diversification of the Mega- Thomas et al. (2012) Chloroplast >100 Worldwide Diverse Genus Begonia (Begoniaceae) in the Malesian Archipelago Recircumscription of Begonia sect. Baryandra (Begoniaceae): Evidence from Rubite et al. (2013) Nuclear 32 Asian Molecular Data Nuclear Phylogenetic Analyses of Begonia Sect. Coelocentrum and Allied Limestone Chung et al. (2014) 94 Asian Chloroplast Species of China Shed Light on the Evolution of Sino-Vietnamese Karst Flora 23

3 MATERIALS AND METHODS

3.1 TAXON AND GENE REGION SAMPLING A total of 69 taxa were analysed of which 38 taxa were newly sequenced for this study. Taxa were added to the mitochondrial data matrix to give good coverage of clades present in chloroplast phylogenies of Begonia (Moonlight, 2013; Thomas et al., 2011), to represent the entire range of Begonia distribution, and to increase the sampling already present in the framework mitochondrial phylogeny produced by Goodall-copestake et al. (2010). The mitochondrial genome dataset consisted of the non-coding regions matR, nad1b/c, nad7 intron, and rps14-cob (Goodall-copestake et al., 2010). A matching dataset of chloroplast genome regions ndhA intron, ndhF-rpl32, and rpl32-trnL was prepared using sequences from Thomas et al. (2011), Moonlight (2013) and de novo for this study. Table 3.1 lists accession information for sequences generated for this study. A complete list of taxa in this study is in Appendix 1.

3.2 LABORATORY METHODS

3.2.1 DNA EXTRACTION Total genomic DNA was extracted from fresh or silica dried material using Qiagen DNeasy Plant Mini Kits (Qiagen, Germantown, MD, USA) following the manufacturers protocol. All newly extracted samples were placed in the Edinburgh DNA Bank, EDNA numbers are listed in Table 3.1.

3.2.2 PCR AMPLIFICATION 3.2.2.1 Mitochondrial Mitochondrial regions were amplified in 20μL reactions using primers from Goodall-copestake et al. (2010). Each reaction contained 12.375μL ddH2O, 2μL 10× NH4 reaction buffer, 2μL dNTPs (2mM), 1μL MgCl2 (50mM), 0.75μL forward and reverse primers (10μM), 0.125μL Biotaq (Bioline, London, UK) DNA polymerase, and 1μL genomic DNA. PCR conditions consisted of template denaturation at 95° C for 3 minutes followed by 35 cycles of denaturation at 94°C for 30 seconds, primer annealing at 56 (nad7 intron and rps14-cob) or 60°C (matR and nad1b/c) for 1 minute and primer extension at 72°C for 3 minutes with a final extension at 72°C 24

Table 3.1: Accession information for taxa sequenced in this study. Taxon Accession number Source Material EDNA Number B. aff. heydei 20131992 Royal Botanic Garden, Edinburgh EDNA14-0036182 B. albo-coccinea - Academica Sinica EDNA14-0036065 B. annobonensis 20140265 Royal Botanic Garden, Edinburgh EDNA14-0035690 B. antongilensis 20132234 Royal Botanic Garden, Edinburgh EDNA14-0035458 B. baccata 20140261 Royal Botanic Garden, Edinburgh EDNA14-0035689 B. bissei 20021711 Royal Botanic Garden, Edinburgh EDNA14-0036183 B. bracteata 20101652 Royal Botanic Garden, Edinburgh EDNA14-0036128 B. carolineifolia 20042077A Royal Botanic Garden, Edinburgh EDNA14-0036076 B. conchifolia 20042082A Royal Botanic Garden, Edinburgh EDNA14-0036077 B. cordifolia - Lakmini Kumarage EDNA13-0032556 B. coursii 20132227 Royal Botanic Garden, Edinburgh EDNA14-0035694 B. crispula - Academica Sinica EDNA13-0033442 B. dietrichiana 021 075 71 Glasgow Botanic Garden EDNA14-0036144 B. fagifolia - Royal Botanic Garden, Edinburgh EDNA13-0033096 B. fuscisetosa 20120800 Royal Botanic Garden, Edinburgh EDNA14-0035456 B. grandis 19521036 Royal Botanic Garden, Edinburgh EDNA08-03023 B. heracleifolia 20100403A Royal Botanic Garden, Edinburgh EDNA14-0035695 B. hoehneana 20131494A Royal Botanic Garden, Edinburgh EDNA14-0035461 B. johnstonii 20131209 Royal Botanic Garden, Edinburgh EDNA14-0035691 B. laruei 20080581 Royal Botanic Garden, Edinburgh EDNA14-0036124 B. loranthoides 20101698 Royal Botanic Garden, Edinburgh EDNA13-0033383 B. luxurians 19685494 Royal Botanic Garden, Edinburgh EDNA14-0036147 B. lyallii 20132223 Royal Botanic Garden, Edinburgh EDNA14-0035459 B. macduffieana 002 004 86 Glasgow Botanic Garden EDNA13-0033083 B. meyeri-johannis 20131229 Royal Botanic Garden, Edinburgh EDNA14-0035692 B. microsperma 014 078 98 Glasgow Botanic Garden EDNA14-0036145 B. odorata 20082086A Royal Botanic Garden, Edinburgh EDNA14-0036078 B. palawanensis - Academica Sinica EDNA14-0036066 B. pringlei 018 002 83 Glasgow Botanic Garden EDNA13-0033082 B. prismatocarpa 002 121 78 Glasgow Botanic Garden EDNA14-0036146 B. pseudodryadis 20130406 Royal Botanic Garden, Edinburgh EDNA14-0035696 B. schmidtiana 20080890 Royal Botanic Garden, Edinburgh EDNA13-0033095 B. sp. 20132230 Royal Botanic Garden, Edinburgh EDNA14-0035455 B. stictopoda 20070789 Royal Botanic Garden, Edinburgh EDNA14-0036123 B. sutherlandii - Academica Sinica EDNA14-0036067 B. thelmae 20131424 Royal Botanic Garden, Edinburgh EDNA14-0035457 B. thwaitesii - Lakmini Kumarage EDNA13-0032557 Hillebrandia sandwicensis - Edutg Monn EDNA13-0034233 25 Table 3.2: DNA fragments amplified and primers used. Primer Genome Region Name Primer Sequence (5' to 3') Direction Use Reference PCR and matR5F GTT TTC ACA CCA TCG ACC GAC ATC G Forward Anderberg et al. (2002) Sequencing PCR and matR matR3R GGC GGC ACC TGT AGT AGG ACA GAG GA Reverse Anderberg et al. (2002) Sequencing Goodall-copestake et al. matR_Beg1 TAT GGA TAC GGT GCT TTA CC Reverse Sequencing (2010) PCR and nad1BCF GCA TTA CGA TCT GCA GCT CA Forward Demesure et al. (1995) Sequencing nad1 PCR and nad1BCR GGA GCT CGA TTA GTT TCT GC Reverse Demesure et al. (1995) Sequencing Dumolin-Lapegue et al. nad7/1 ACC TCA ACA TCC TGC TGC TC Forward PCR (1997) Mitochondrial Dumolin-Lapegue et al. nad7/3r TGT TCT TGG GCC ATC ATA GA Reverse PCR (1997) nad7 Goodall-copestake et al. nad7/2_Beg1 CAC CTG TGA TCT CCT TCT AC Reverse Sequencing (2010) Dumolin-Lapegue et al. nad7/2 GCT TTA CCT TAT TCT GAT CG Forward Sequencing (1997) PCR and rps-cobF GTG TGG AGG ATA TAG GTT GT Forward Demesure et al. (1995) Sequencing rps14- PCR and Goodall-copestake et al. rps-cobBegR TCC TTT CAA GTA TGC TCC C Reverse cob Sequencing (2010) rps- Goodall-copestake et al. TCC TTT CTA AAG CTG CGC Reverse Sequencing cob_Beg2 (2010) PCR and ndhAx1 GCY CAA TCW ATT AGT TAT GAA ATA CC Forward Shaw et al. (2007) Sequencing ndhA PCR and ndhaX2 GGT TGA CGC CAM ARA TTC CA Reverse Shaw et al. (2007) Sequencing PCR and ndhF GAA AGG TAT KAT CCA YGM ATA TT Forward Shaw et al. (2007) Sequencing PCR and rpl32-R CCA ATA TCC CTT YYT TTT CCA A Reverse Shaw et al. (2007) ndhF- Sequencing Chloroplast rpl32 PCR and Beg1-F TGG ATG TGA AAG ACA TAT TTT GCT Forward Thomas et al. (2011) Sequencing PCR and Beg2-R TTT GAA AAG GGT CAG TTA ATA ACA A Reverse Thomas et al. (2011) Sequencing PCR and rpl32-F CTG CTT CCT AAG AGC AGC GT Forward Shaw et al. (2007) rpl32- Sequencing trnL PCR and trnL CAG TTC CAA AAA AAC GTA CTT C Reverse Shaw et al. (2007) Sequencing 26 for 5 minutes (Goodall-copestake et al., 2010). All primers are listed in Table 3.2. If a region failed to amplify the reaction was altered to include 4μL of TBTpar (Samarakoon et al., 2013). Regions that failed to amplify again were carried out on the “slow and cold” method with template denaturation at 80°C for 5 minutes followed by 30 cycles of denaturation at 95°C for 1 minute, primer annealing at 50°C for 1 minute, a ramp of 0.3°C/s to 65°C, primer extension at 65°C for 4 minutes and a final extension at 65°C for 5 minutes (Shaw et al., 2007).

3.2.2.2 Chloroplast Chloroplast regions were amplified in 25μL reactions. For the rpl32-trnL region each reaction contained 15μL ddH2O, 5μL 10x phusion reaction buffer, 2.5μL dNTPs (2mM), 0.2μL phusion polymerase (Thermo Fisher Scientific, Waltham, MA, USA), 0.75μL forward and reverse primers (10μM), and 1μL genomic DNA. The ndhA intron and ndhF-rpl32 reactions contained

10.6μL ddH2O, 5μL TBTpar, 2.5μL 10x NH4 reaction buffer, 2.5μL dNTPs (2mM), 1.25μL

MgCl2 (50mM), 1μL forward and reverse primers (10μM), 0.15μL Biotaq (Bioline) and 1μL genomic DNA. PCR conditions consisted of template denaturation at 96°C for 5 minutes followed by 35 cycles of denaturation at 98°C for 30 seconds, primer annealing at 50°C for 30 seconds, primer extension at 72°C for 30 seconds with a final extension at 72°C for 5 minutes (Moonlight, 2013; Thomas et al., 2011). All primers are listed in Table 3.2. For regions that failed to amplify a second reaction was carried out with the same reagents on the “slow and cold” method (Shaw et al., 2007).

3.2.3 VISUALISATION AND SEQUENCING Electrophoresis on a 1% agarose gel stained with SYBER Safe (Invitrogen/Life Technologies) was used to assess quality and yield following all amplification reactions. PCR products were purified using ExoSAP-IT (Affymetrix, High Wycombe, UK).

Sequencing reactions were carried out in 10μL reactions containing 6.68μL ddH2O, 2μL 5x BigDye buffer, 0.5μL BigDye (Applied Biosystems/Life Technologies), 0.32μL primer (10μM), and 1μL of purified PCR product. Sequencing PCR conditions consisted of 25 cycles of template denaturation at 95°C for 30 seconds, primer annealing at 50°C for 20 seconds, and primer extension at 60°C for 4 minutes. All primers used for sequencing are listed in Table 3.2. Samples were sent to GenePool at the University of Edinburgh for analysis. 27

3.3 SEQUENCE ALIGNMENT Newly generated sequences were trimmed and assembled in GeneiousPro v7.1.6 (Biomatters). Edited sequences were aligned in BioEdit v7.2.5 (Hall, 1999) using ClustalW algorithms (Sievers et al., 2011) using default settings. This dataset was then combined with previously-produced matrices and manually aligned in BioEdit. FastGap v1.2 (Borchsenius, 2009) was used to automatically code gaps in the mitochondrial dataset. The resulting gap matrix was filtered to exclude gaps of dubious homology, such as gaps of variable length and areas with single nucleotide repeats.

3.4 PHYLOGENETIC ANALYSIS

3.4.1 TREE BUILDING – ORGANELLE GENOMES Phylogenetic analyses were carried out on the Cipres Bioportal (Miller et al., 2010). Bayesian Inference methodologies were carried out as implemented in MrBayes v3.2.2 (Huelsenbeck & Ronquist, 2001; Ronquist et al., 2012). The mitochondrial dataset (mtDNA) was divided into two partitions, one consisting of four mtDNA regions and one of binary indel data; the chloroplast dataset (cpDNA) was analysed as a single partition. The best fit model of molecular evolution for the mtDNA and cpDNA datasets was assessed using jModelTest v2.1.5 (Darriba et al., 2012; Guindon & Gascuel, 2003) and the Akaike Information Criterion. GTR+G+I was identified as the most probable model for both the mitochondrial and chloroplast data. Both datasets were analysed with two runs of four MCMC analyses each containing one cold and three heated Markov chains for 106 generations with trees sampled every 5 000 generations, with the analysis carried out as five separate replicates. Tracer v1.6 (Rambaut & Drummond, 2013) was used to ensure that the separate runs were converging. All resulting trees from all five analyses were combined into one majority rule consensus tree with a burnin set at 2 500 000 generations. The trees were then visualised using Figtree v1.4.2 (Rambaut, 2014).

3.4.2 ASSESSING INCONGRUENCE AND INFORMATIVE CHARACTERS ILD analysis was completed on the complete dataset using the partition homogeneity function as implemented in PAUP* v4.0b10 (Swofford, 2003). Individual gene trees were assessed for within-genome incongruence prior to combination by analysing each region separately and visually comparing tree topologies. The two organelle trees resulting from concatenated datasets 28 for all regions were assessed visually for incongruence through comparing clades with posterior probability support of 0.95 and above. Manual inspection for incongruence was carried out as the scripts for LILD and mILD tests were unavailable, and it was beyond the scope of the project to develop these. Taxa were manually assessed and removed from the dataset until no significant incongruence was found using a standard ILD test. After visual comparisons terminal taxa were marked for further analysis using various data combination approaches as defined below. As a final test for congruence ILD tests were completed on all combined datasets. Prior to combination, parsimony informative characters were assessed for each data set using PAUP*.

3.4.3 TREE BUILDING – METHODS OF COMBINING ORGANELLE GENOME DATA The method chosen to combine the incongruent datasets was adopted from Pirie et al. (2008, 2009) and de Villiers et al. (2013). Initially, all conflicting taxa were removed from the mtDNA and cpDNA datasets before combining (congruent combined). Then, analyses were also run on combined datasets in which the sequence data for the conflicting taxa was removed from one dataset and coded as missing characters before combining in an attempt to estimate the best supported whole-genus phylogeny. First, the data for the conflicting taxa was removed from the mtDNA dataset (reduced mtDNA+cpDNA). Second, the conflicting data was removed from the cpDNA dataset (reduced cpDNA+mtDNA). Finally, the 18 conflicting taxa were duplicated in a combined alignment; one terminal represented the cpDNA with the mtDNA coded as missing and the other represented the mtDNA with the cpDNA coded as missing (duplicated). The duplicated method was adopted to show a full representation of all taxa and the separate evolutionary histories for the incongruent genomes in a single tree. For combined datasets where mtDNA was coded as missing B. thomeana C. DC., B. aspleniifola Hook. f. ex A. DC., B. prismatocarpa Hook., and B. piurensis Hook. f. were removed because no cpDNA region data was available. Combined datasets were partitioned into mtDNA and cpDNA regions with settings as noted above. Bayesian Inference analyses were completed for the duplicated dataset with parameters as noted above. Other combined datasets were run as above except only one run of 106 generations was performed with trees sampled every 1 000 generations. Trees were visualized using Figtree. Data combinations and resulting phylogenies are summarised in Table 3.3.

29

3.4.3 MAXIMUM LIKELIHOOD AND MAXIMUM PARSIMONY SUPPORT Maximum likelihood analyses were run on the duplicated dataset in GARLI 2.01 (Zwickl, 2006) as implemented on the Cipres Bioportal. All defaults were accepted for the partitioned dataset with the GTR+G+I evolutionary model. Three search replicated of 50 bootstraps were completed. A 50% majority rule consensus tree was used to map support values for nodes onto the Bayesian Inference tree. Maximum parsimony analyses were run in PAUP* for bootstrap support of the duplicated dataset with 10 000 bootstrap replicates. Nodes with above 50% support were mapped onto the Bayesian Inference tree.

Table 3.3: Summary of Treatments of Datasets and Resulting Phylogenies. Phylogeny Name Dataset Treatment of Incongruent Taxa Figure # mtDNA mtDNA None present 4.1 cpDNA cpDNA None present 4.2

Congruent Combined mtDNA, cpDNA Excluded 4.3

mtDNA coded as missing Reduced mtDNA + cpDNA (mtDNA), cpDNA 4.4 Entire cpDNA

cpDNA coded as missing Reduced cpDNA + mtDNA (cpDNA), mtDNA 4.5 Entire mtDNA Duplicated with Duplicated mtDNA, cpDNA (I)cpDNA coded as missing 4.6 (II)mtDNA coded as missing

30

4 RESULTS

4.1 STATISTICAL TESTS The ILD test comparing the two complete organelle datasets resulted in a P value of 0.01 showing statistically significant incongruence. After the incongruent taxa (congruent combined dataset) were removed the P value was 0.07 indicating no significant incongruence. The reduced mtDNA + cpDNA and reduced cpDNA + mtDNA datasets had P values of 0.54 and 0.63 respectively indicating no significant incongruence. The duplicated dataset gave a P value of 0.90 indicating no significant incongruence. The results for the parsimony informative characters were 711 for the cpDNA dataset and 463 for the mtDNA dataset.

4.2 SINGLE GENOME PHYLOGENIES

4.2.1 MTDNA PHYLOGENETIC TREE The African species are presented as paraphyletic with strongly supported single Asian and Neotropical clades nested within in the majority rule consensus tree resulting from the Bayesian analysis of the mtDNA alignment (Fig. 4.1). The Socotran tuberous species B. socotrana Hook. f. is sister to the Asian clade with weak support while African semi-annual species B. annobonensis A. DC., B. johnstonii Oliv. ex Hook. f., and the tuberous B. sutherlandii Hook. f. and B. dregei Otto & Dietr. form a sister grade to the Neotropical clade.

4.2.2 CPDNA PHYLOGENETIC TREE The African species are paraphyletic with a nested single Asian clade and two Neotropical clades (Fig. 4.2). The Socotran tuberous B. socotrana is nested within the Asian clade, which is unresolved at the base with no clear immediate sister group. The Neotropical species are divided into two strongly supported clades, each with an African sister group. Neotropical Clade 1 (NC1) contains B. macduffieana L.B. Smith & Schubert, B. petasitifolia Brade, B. dietrichiana Irmscher, B. luxurians, Sheidw., B. thelmae L.B. Smith & Wasshausen, B. fagifolia Fisch. ex Otto & Dietr., B. hoehneana Irmscher, and B. ulmifolia Willd. with the African semi-annual B. annobonensis and B. johnstonii as a sister clade. Neotropical clade 2 (NC2) contains B. boliviensis A. DC., B. aff. heydei C.DC. ex Donn.Sm., B. conchifolia A. Dietr., B. heracleifolia Cham. & Schlecht., B. nelumbiifolia Cham. & Schlecht., B. carolineifiolia Regel, B. pringlei S. 31

Figure 4.1: mtDNA Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities. The asterisk (*) represents taxa treated as incongruent between the mtDNA and cpDNA datasets. The hashtag (#) represents Begonia taxa missing from the cpDNA dataset. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 1 (NC1) are green, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. The scale bar represents the expected substitutions per site. 32

Figure 4.2: cpDNA Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities. The asterisk (*) represents taxa treated as incongruent between the mtDNA and cpDNA datasets. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 1 (NC1) are green, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. The scale bar represents the expected substitutions per site. 33

Figure 4.3: Congruent combined Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. Neotropical Clade 1 (NC1) and other incongruent taxa have been removed from this analysis. The scale bar represents the expected substitutions per site. 34

Figure 4.4: Reduced mtDNA + cpDNA Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 1 (NC1) are green, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. The scale bar represents the expected substitutions per site. 35

Figure 4.5: Reduced cpDNA + mtDNA Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 1 (NC1) are green, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. The scale bar represents the expected substitutions per site. 36

Figure 4.6: Duplicated Bayesian majority rule consensus phylogram. Numbers at the nodes represent Bayesian Posterior Probabilities, Maximum Likelihood (ML), and Maximum Parsimony (MP) support respectively. ML and MP support is only shown for nodes above 50%. African Begonia are red, Asian Begonia are purple, Begonia of cpDNA Neotropical Clade 1 (NC1) are green, Begonia of cpDNA Neotropical Clade 2 (NC2) are blue. The scale bar represents the expected substitutions per site. Red dots represent mtDNA sequences, green dots represent cpDNA sequences. 37

Wats., B. crispula Brade, B. radicans Vell., B. herbacea Vell., B. schmidtiana Regel, B. foliosa Humb., Bonpl., & Kunth, B. bissei J. Sierra Calzado, and B. odorata Willd. with the African tuberous B. dregei and B. sutherlandii as a sister clade.

4.2.3 INCONGRUENCES BETWEEN GENOME PHYLOGENIES All incongruences listed had conflicting positions in the genome phylogenies supported by a posterior probability of 0.95 or above. cpDNA NC1 along with the semi-annual sister clade were treated as incongruent because cpDNA NC1 was nested within cpDNA NC2 in the mtDNA tree. B. macduffieana was nested in a clade with B. radicans, B. herbacea, and B. aff. heydei of the cpDNA NC2 in the mtDNA phylogeny but was part of NC1 in the cpDNA phylogeny. Both placements of B. macduffieana were highly supported. In the Asian clade B. bracteata Jack was in a clade with B. stictopoda (Miq.) A.DC., B. kingiana Irmscher, B. chloroneura P. Wilkie and Sands, and B. palawanensis Merr. in the mtDNA tree but moved to a clade with B. symsanguinea L.L. Forrest & Hollingsw., B. fuscisetosa Sands, and B. laruei M. Hughes in the cpDNA phylogeny. B. amphioxus M.J.S. Sands moved to an unresolved location in the cpDNA phylogeny from being highly nested in a clade with B. laruei, B. symsanguinea, and B. fuscisetosa in the mtDNA phylogeny. B. morsei Irmscher became unresolved in the cpDNA phylogeny while in the mtDNA phylogeny it was strongly supported in a clade with B. palmata D. Don and B. robusta Blume. B. albo-coccinea Hook., B. floccifera Bedd., and B. malabarica Lamk non A. DC. were nested in a clade with the other Indian endemics B. dipetala R. Grah., B. cordifolia (Wight) Thwaites, and B. thwaitesii Hook. in the mtDNA phylogeny but were sister to B. grandis Dryander, B. palmata, and B. robusta in the cpDNA phylogeny. In both phylogenies B. poculifera Hook. f., B. loranthoides Hook. f., B. baccata Hook f., and B. polygonoides Hook. f. formed a clade. B. poculifera and B. loranthoides were sister species in the mtDNA phylogeny while B. poculifera was sister to B. baccata in the cpDNA phylogeny. B. goudotii A. DC. moved from an early divergent member of the Malagasy clade with B. antongilensis Humbert ex Bosser & Keraudren-Aymonin, B. bogneri Ziesenh, B. coursii Humbert ex Keraudren, B. lyallii A. DC., and B. sp. the mtDNA phylogeny to a highly nested member of the same clade in the cpDNA phylogeny. B. microsperma Warb. was nested in a clade with B. thomeana, B. aspleniifolia, and B. prismatocarpa in the mtDNA phylogeny but is unresolved in the early branching African cpDNA phylogeny presumably due to the missing B. thomeana, B. 38 aspleniifolia, and B. prismatocarpa cpDNA sequences. The incongruent taxa in the mtDNA and cpDNA phylogenies are noted with an asterisk in Figures 4.1 and 4.2 respectively. B. macduffieana mtDNA was previously sequenced for Goodall-copestake et al. (2010) and cpDNA for Moonlight (2013). Due to the incongruences found and because two separate accessions were used, the mtDNA regions were resequenced with genomic DNA from Moonlight (2013) to ensure that both genomes came from the same species.

4.3 COMBINED ANALYSES

4.3.1 CONGRUENT COMBINED PHYLOGENY The congruent combined phylogeny (Fig. 4.3) continued to show African species as paraphyletic with semi-annual B. socotrana nested in an Asian clade and semi-annual B. dregeii and B. sutherlandii sister to a single, Neotropical clade once the incongruent taxa were removed. Some species that were unsupported or unresolved in one genome phylogeny but were in strongly supported positions in the other genome phylogeny remained in the strongly supported location.

4.3.2 REDUCED MTDNA + CPDNA PHYLOGENY This phylogeny maintained the cpDNA topology as noted above with a few, minor changes (Fig. 4.4). The clade containing B. baccata, B. poculifera, B. loranthoides, and B. polygonoides moved from an unresolved position among the other African species to being resolved as sister to the Asian and Neotropical clades with weak support. Within NC2 B. crispula moved to an early branching position with strong support from a weakly supported clade containing B. radicans, B. herbacea, B. schmidtiana, B. foliosa, B. bissei, and B. odorata in the cpDNA phylogeny. Within the Asian clade, B. chloroneura, B. palawanensis, B. kingiana, and B. stictopoda were unresolved in the cpDNA phylogeny but presented here with strong support as a clade. B. amphioxus went from being unresolved in the cpDNA phylogeny to being weakly supported as sister to a clade with B. bracteata, B. symsanguinea, B. fuscisetosa, B. laruei, B. chloroneura, B. palawanensis, B. kingiana, and B. stictopoda here.

4.3.3 REDUCE CPDNA + MTDNA PHYLOGENY The topology of the reduced cpDNA + mtDNA phylogeny (Fig. 4.5) was very similar to the mtDNA phylogeny presented above. Support was lost for the divergence of the early African 39 clades from the Asian and Neotropical clades. Within the Asian clade B. grandis went from an unsupported sister grade with B. socotrana to a strongly supported clade with B. morsei, B. palmata, and B. robusta This newly formed clade became a highly supported sister to the rest of the Asian species with B. socotrana nested within. B. crispula moved from an unresolved early divergent location in the Neotropical clade to an unresolved nested location next to a clade containing B. radicans, B. herbacea and B. macduffieana. B. aff. heydei moved from an early branching position in the B. radicans, B. herbacea and B. macduffieana clade with weak support to an unsupported position as a sister species to a clade containing B. conchifolia, B. heracleifolia, B. nelumbiifolia, B. carolineifolia, and B. pringlei.

4.3.4 DUPLICATED PHYLOGENY This phylogeny maintains the African, Asian and Neotropical clades as in the separate genome phylogenies (Fig. 4.5). Within the African grade, B. poculifera is present in two positions in a clade with B. baccata, B. loranthoides, and B. polygonoides representing the cpDNA and mtDNA. B. goudotii is shown in an early diverging mtDNA position and a highly nested cpDNA position in the Malagasy clade. NC1 is present as both sister to (cpDNA) and nested within (mtDNA) the remainder of the Neotropical species. The B. annobonensis and B. johnstonii cpDNA clade remains sister to the cpDNA NC1 clade while the mtDNA maintains its position in a Neotropical sister grade with B. sutherlandii and B. dregei. B. macduffieana mtDNA retains its highly nested location with B. radicans and B. herbacea as a clade sister to B. schmidtiana, B. foliosa, B. bissei and B. odorata with strong support. B. aff. heydei has retained its cpDNA position as sister to a clade containing B. conchifolia, B. heracleifolia, B. nelumbifolia, B. carolineifolia, and B. pringlei. Within the Asian clade B. malabarica, B. albo- coccinea and B. floccifera mtDNA remains in a clade with the other Indian endemics while the cpDNA is unresolved on a branch with B. morsei mtDNA, B. grandis, B. palmata, and B. robusta. B. morsei and B. amphioxus cpDNA are resolved but with no support in the same location as the cpDNA only phylogeny. B. amphioxus mtDNA and B. bracteata cpDNA are both present in the same clade with B. fuscisetosa, B. laruei and B. symsanguinea. B. bracteata mtDNA maintains its position in the clade with B. kingiana, B. chloroneura, and B. palawanensis as in the mtDNA only phylogeny. 40

5 DISCUSSION

Incongruence in datasets can be caused by sampling error, systematic error, or genuine differences in evolutionary history. In order to avoid sampling error, samples with suspect history (e.g. B. macduffieana) were re-sampled and re-sequenced. Systematic errors were minimised by using the same methods of analysis across all data combinations tested. Having taken these steps to avoid certain types of error it is likely that the incongruences presented above are due to different evolutionary histories of the mitochondrial and chloroplast genomes. HGT would be expected mostly in woody plants where there is the potential for grafting, or in parasitic plants, neither of which applies in Begonia. Therefore, the most likely causes for the incongruence presented in 4.2.3 are either hybridisation or ILS. Due to the difference in parsimony informative characters the combined dataset trees may be biased towards the cpDNA topology. The distribution data referenced below has been inferred from GBIF (2013), Doorenbos et al. (1998), and Hughes & Pullan (2007).

5.1 AFRICAN AND MALAGASY INCONGRUENCE B. goudotii is an member of section Quadrilobaria A. DC. and is endemic to Madagascar. In the duplicated tree B. goudotii appears as both an early branching (mtDNA) and highly nested (cpDNA) member the Malagasy clade. There is strong support for the early branching mtDNA position while the cpDNA position is relatively unsupported leaving it effectively unresolved along with B. sp., and a clade comprising B. coursii and B. lyallii. B. goudotii is a tuberous, acaulescent species while B. lyallii, B. coursii, and the B. sp. are all rhizomatous with erect stems. These differences in morphology suggest that B. goudotii and the cpDNA sister species B. sp. speciated from separate lineages, negating the presence of ILS. Similarly, the morphology provides extra support for B. goudotii as an early branching sister species to the remaining Malagasy species. Therefore, it is likely that a hybridisation event resulted in the capture of the B. sp. chloroplast genome lineage by B. goudotii early in the evolution of the caulescent clade.

The phylogenetic placement of B. poculifera in the mtDNA single genome phylogeny suggests a sister relationship with B. loranthoides while the duplicated phylogeny groups B. poculifera mtDNA and cpDNA with B. baccata, a São Tomé endemic. The cpDNA relationship with B. baccata is also present in the cpDNA single genome tree. B. poculifera has wide range in the 41 coastal areas in the countries of Gabon, Cameroon, and Equatorial Guinea. The range spreads out to the islands of Riaba and Príncipe, but does not reach as far as São Tomé. B. loranthoides is distributed through Gabon and out into the islands of the Gulf of Guinea as well. Determining the cause of incongruence in this situation is complicated. Due to the varied phylogenetic position of B. poculifera mtDNA and the arbitrary choice when deciding which species (B. poculifera or B. loranthoides) represented the incongruent species for that node makes tree building error a distinct possibility. However, hybridisation between B. poculifera and B. loranthoides could have resulted in B. poculifera capturing the mtDNA genome from B. loranthoides. Doorenbos et al. (1998) suggests that Squamibegonia Warb. and Baccabegonia Reitsma, to which B. poculifera and B. baccata belong to respectively, are closely related. Being an island endemic, B. baccata may have dispersed to São Tomé through a single event and therefore lost all organelle polymorphism present in the closely related section Squamibegonia. Without further sampling and analysis it is not feasible to determine which cause of incongruence is most probable.

5.2 ASIAN INCONGRUENCE B. morsei, a member of section Coelocentrum Irmscher, shares a mitochondrial lineage with B. palmata and B. robusta. The chloroplast genome appears unresolved in a clade with B. pseudodryadis, also a member of Coelocentrum, B. amphioxus and 3 other small clades of Malesian Begonia. B. morsei is endemic to Guangxi Province in China, overlapping B. grandis and B. palmata which have wider distributions in China and Asia respectively; B. pseudodryadis is endemic to Yunnan Province in China. Section Platycentrum (Klotzsch) A. DC., to which B. palmata belongs, has a wide Asian distribution ranging from India through the Himalayas, Indo- China, China and into Taiwan and Malesia. Based on sectional delimitations and morphology the cpDNA appears to represent the species tree most convincingly, suggesting a mitochondrial capture via hybridisation from B. palmata, or an ally.

The mtDNA from B. amphioxus is highly nested with B. laruei and B. fuscisetosa, all members of section Petermannia (Klotzsch) A. DC., while the cpDNA appears closest to B. morsei and B. pseudodryadis, both members of Coelocentrum. Geographically, B. amphioxus and B. fuscisetosa are endemic to Borneo. However, section Petermannia is widespread throughout 42

Figure 5.1: Phylogram from Chung et al. (2014). “Best-scoring maximum likelihood phylogram. Clade support values (LB: likelihood bootstrap/PB: parsimony bootstrap/PP: posterior probability) larger than 50% are indicated at each node. Dashed branches indicate LB, PB, and PP all smaller than 50%/0.5 while thick branches denote those present in the strict consensus tree of MP analysis and PP ≥0.95. Species in bold denote limestone species distributed in the Sino-Vietnamese limestone karsts (Gu et al., 2007). Arrows point to clades and sections discussed in the text. Taxon names are followed by sectional placement and distribution (in parentheses). The rhombus (◇) and star (☆) signs denote calibration points for molecular dating. Sectional classification and clade names in Thomas et al. (2011a) are marked to the right to allow easier cross-study comparison. Sources of sectional placement for each species are cited in Additional file 1. Sectional abbreviations: ALI: Alicida, AUG: Augustia, BAR: Baryandra, COL: Coelocentrum, DIP: Diploclinium, HAA: Haagea, LEP: Leprosae, PAR: Parvibegonia, PET: Petermannia, PLA: Platycentrum, REI: Reichenheimia, RID: Ridleyella, SPH: Sphenanthera, SYM: Symbegonia, UA: unassigned.” 43

Figure 5.2: Phylogram from Thomas (2010). “Bayesian majority rule consensus tree (cpDNA data: ndhA intron, ndhF-rpl32, rpl32-trnL; 3 data partitions; 115 taxa). Bayesian posterior probability (PP) support values > 0.5 are indicated next to the nodes, and PPs of corresponding clades of an analysis additionally including 282 indel codes are mapped on the tree: PP (analysis without indel codes)/PP (analysis with additional indel code partition). Broken lines indicate branches which lead to nodes with a PP < 0.9. The scale bar indicates substitutions per site. Sectional placement of taxa is indicated by the following abbreviations: AUG: Augustia, BRA: Bracteibegonia, COE: Coelocentrum, DIP: Diploclinium, HAA: Haagea, MEZ: Mezierea, PAR: Parvibegonia, PEL: Peltaugustia, PET: Petermannia, PLA: Platycentrum, REI: Reichenheimia, RID: Ridleyella, SPH: Sphenanthera, SQA: Squamibegonia, SYM: Symbegonia.” 44

Malesia with a few species found in China. In a phylogeny based on nuclear genome data (Fig. 5.1) B. amphioxus is highly nested in a clade of Sunda Shelf Petermannia species that is sister to another widespread Petermannia clade. B. fuscisetosa is highly nested in the latter clade (Chung et al., 2014). Similarly, Thomas (2010) found B. laruei and B. amphioxus be present in two different section Petermannia clades with another clade between them (Fig. 5.2). Since section Petermannia is so widespread and diverse, B. amphioxus could represent a diverging clade of section Petermannia which has hybridised with a clade containing B. fuscisetosa and captured the mitochondrial genome.

Begonia section Bracteibegonia A. DC. has been proposed as closely related to section Petermannia (Doorenbos et al., 1998). Symbegonia (Warb.) L. L. Forrest & P. M. Hollingsworth, while retained as its own section, is nested within Petermannia (Forrest & Hollingsworth 2003; Dewitte et al. 2008). These sectional relationships are important when considering the incongruence between the B. bracteata mtDNA and cpDNA lineages. B. bracteata cpDNA is the early branching member of a clade containing both Petermannia and Symbegonia species, and this position makes taxonomic and morphological sense; the majority of other species in this group being caulescent forest floor herbs. The mtDNA lineage is sister to that of B. kingiana, a rhizomatous lithophyte with succulent leaves that is endemic to Peninsular Malasia. It would seem that this represents a case of mitochondrial capture by the B. bracteata lineage at some point during its evolution on the Sunda shelf.

B. floccifera, B. albo-coccinea, and B. malabarica from Indian and Sri-Lanka present as monophyletic for both genomes. In the mtDNA genome they are sister to endemic Indian and Sri Lankan species while the cpDNA is sister to a clade containing the widespread species B. grandis and B. palmata as well as the B. morsei mtDNA and the highly nested B. robusta from Java. Thomas (2010) proposes dispersal from African/Socotran lineages through India and into China. With this pattern of distribution, the most logical explanation for the incongruence with would be the hybridisation and chloroplast genome capture of a common ancestor of the three Indian species with the common ancestor to the Chinese species.

Within the B. floccifera, B. albo-coccinea, and B. malabarica clade further incongruence is present. In the cpDNA lineage there is a (B. floccifera (B. albo-coccinea (B. malabarica))) relationship while the mtDNA has a (B. malabarica (B. albo-coccinea (B. floccifera))) 45 relationship. The change in position between B. floccifera and B. malabarica could be the result of ILS if organelle polymorphism was present in the diverging species. However, hybridisation is also a distinct possibility as the species geographic ranges overlap. Determining which genome represents the true phylogeny and what caused the incongruence is not achievable with the current sampling.

5.3 NEOTROPICAL INCONGRUENCE The cpDNA lineages in the Neotropics can be broken into two clades, NC1 and NC2 as noted above. With the exception of B. macduffieana (discussed below) these clades are retained in the mtDNA tree but are combined into one inclusive clade. The presence of two cpDNA lineages in the Neotropics suggests two different dispersal events from Africa, NC1 with an ancestor common with B. johnstonii and B. annobonensis (CA1) and NC2 with an ancestor common with B. sutherlandii and B. dregei (CA2). Upon arrival in the Neotropics, and prior to extensive speciation, the common ancestors to the current species would have hybridised. The hybridisation event resulted in CA2 capturing the mitochondria from CA1, leaving the Neotropics with two cpDNA clades (one from each common ancestor) and one mtDNA clade (descended from CA1) prior to geographic radiation and speciation.

Within the highly nested species of NC2 there is some further topological incongruence. Similar to the incongruence in the Indian and Sri Lankan clade discussed above, B. thelmae and B. fagifolia essentially switch places in the two separate lineages presented in the duplicated phylogeny. However, there is further incongruence between the duplicated phylogeny and the cpDNA single genome phylogeny. In the mtDNA duplicated and single genome phylogeny this highly nested clade is presented as (B. fagifolia (B. hoehneana (B. thelmae (B. ulmifolia)))) with high support at all nodes. The cpDNA single genome phylogeny presents a (B. thelmae (B. ulmifolia (B. fagifolia (B. hoehneana)))) relationship with high support at all nodes except the B. fagifolia and B. hoehneana sister species node. The cpDNA duplicated phylogeny has a (B. thelmae (B. hoehneana (B. fagifolia (B. ulmifolia)))) relationship with high support for all nodes except the B. fagifolia and B. ulmifolia node. The varied relationships in this clade could be the result of ILS. 46

B. macduffieana is present in NC1 for cpDNA. While the other members of NC1 remain monophyletic in the single mtDNA clade, B. macduffieana moves to a highly nested position with B. radicans and B. herbacea, members of cpDNA NC2. Doorenbos et al. (1998) suggests that section Gaerdtia (Klotzsch) A. DC. is closely related to section Pritzelia (Klotzsch) A. DC. He notes that crosses between the two sections often produce fertile F1 hybrids, suggesting a relatively recent evolutionary divergence between the two sections. In the cpDNA phylogeny B. macduffieana is an early branching sister to the remainder of the clade, which includes a few Pritzelia species, a single Scheidweileria (Klotzsch) A. DC., Wageneria (Klotzsch) A. DC. and Donaldia (Klotzsch) A. DC., as well as two species unplaced to section. Doorenbos et al. (1998) also notes that Pritzelia has been considered as closely related to these sections as well. Based on this information it is logical that B. macduffieana organelles would share an evolutionary history with NC1, making the mtDNA position suspect. B. macduffieana is only known from collections in Pará, Brazil and B. radicans and B. herbacea are limited to the south eastern edge of Brazil. However, other species within the Gaerdtia section have distributions that overlap with B. radicans, B. herbacea, and other closely related species making the capture of a mtDNA genome via hybridisation a plausible hypothesis.

5.4 FURTHER WORK Distinguishing between ILS and hybridisation will require more extensive taxon and genome sampling. Similarly, although incongruent taxa were chosen for exclusion as objectively as possible, there may be alternative choices as to which taxa are causing the incongruence in a given clade in some cases. Doing so may have resulted in some tree building error in the combined analyses, such as the case with B. poculifera. More objective analyses in which all taxa are duplicated one at a time and tested for monophyly would help clarify which taxa are truly incongruent. However, due to time constraints doing so was not feasible during this study.

The timing and exact nature of the hybridisation event creating two cpDNA and one mtDNA lineages in the Neotropics is likely to remain open to debate. However, getting a better approximation of when and where this happened would necessitate extensive sampling of both the Neotropical and, at a minimum, the semi-annual African species. This increased sampling would have to include both mitochondrial and chloroplast lineages. Further sampling would also help clarify the relationships between section Gaerdtia, to which B. macduffieana belongs, and 47 other Neotropical sections as well as the placement of B. crispula outside of the other section Pritzelia species.

Including nuclear regions would also be prudent. In work done by Goodall-copestake (2005), the nuclear regions sampled supported a single Neotropical clade. However, this work only sampled one nuclear region. Another option would be to use next generation sequencing and sample many independent loci for a coalescent approach at finding the most plausible species tree. Of course, this is not enough to clarify all cases of incongruence. Studies that include multiple samples of the same region from the same taxa can be used in statistical studies to help determine the difference between ILS and hybridisation.

Finally, in the absence of further nuclear genomic data, including more morphological data could assist with clarifying the relationships between species and sections, especially for cases of ILS. These characters, once assessed, could be converted into a matrix for further cladistics and to help resolve nodes within the phylogenetic trees. Mapping these morphological character states onto a well-supported tree will reveal evolutionary relationships that are otherwise hard to pick out. However, with the available specimens and time constraints completing a thorough morphological study was out of the scope of this project.

5.5 CONCLUSIONS The source of incongruence between phylogenies can vary from sampling error to systematic error to different evolutionary histories. Even when the first two sources have been minimized, the causes cannot always be identified with certainty. Eight of the incongruences noted above are theorised to be the result of genome capture through hybridisation. The two within clade cases of incongruence could be the result of ILS. However, the within clade incongruence in NC1 could also be due to tree building error. Similarly, B. poculifera’s varied placement could represent a second case of tree building error. While most cases of incongruence presented were theorised to be the result of hybridisation, further taxon and genome data would clarify the extent of this and how much of the nuclear genome has been exchanged. Each case of incongruence was considered separately and, where applicable, on a clade-by-clade basis. This approach appears to be the most appropriate as one theory does not fit all cases. 48

With the available data, it would appear is if both mitochondria and chloroplasts are easily exchanged between species. Previous research has found the maternal inheritance of both chloroplasts and mitochondria in Begonia. However, within the closely related Cucumis L. genus there is evidence of paternal mitochondrial inheritance. The amount of incongruence due to genome capture in this small sampling of such a large genus would require that the strictly maternal inheritance of organelles be uncoupled on a regular basis. The uncoupling of the inheritance mechanism allows organelles from different parents to be combined in offspring, resulting in incongruent genome phylogenies.

The uncoupling of organelle inheritance, while not unheard of, is a rare event in angiosperms. In studies of the widespread Quercus L., a genus known to have maternally transmitted chloroplasts and mitochondria, no significant cases of paternal leakage of an organelle were found (Dumolin- Lapègue et al., 1998). Belahbib et al. (2001) reports the co-transmission of mitochondria and chloroplast genomes between two distantly related oak species, but did not find any uncoupling of the genomes. This further supports other reports of strictly maternal inheritance of organelle genomes in oaks. Instances of congruent organelle genome histories have been reported for both Silene vulgaris and Beta vulgaris ssp. maritima L. (Desplanque et al., 2000; Olson & McCauley, 2000). In this respect, it appears as if Begonia may differ from other angiosperms in its ability to uncouple organelle inheritance, although there are no other family-level angiosperm mitochondrial phylogenies to compare. In higher taxonomic groups two studies have found organelle incongruence. Among the Rosidae the different phylogenetic placement of the COM clade was hypothesised to be the result of ancient hybridisation and genome capture (Sun et al., n.d.). In a plastid and mitochondrial study of Ericales Bercht. & J. Presl the only incongruence found was the placement of Impatiens L. A two gene mitochondrial phylogeny placed Impatiens within the Marcgraviaceae Bercht. & J. Presl instead of the Balsaminaceae L, where it currently sits taxonomically and the plastid phylogeny placed it (Anderberg et al., 2002).

Begonia species are known to have weak reproductive barriers, resulting in regular hybridisation. Even though only a few examples of natural hybridisation are known, it is possible that hybrid speciation is a common occurrence and one source of the extreme diversity in this genus. Begonias are also known to be localised and often micro-endemic species in relatively small populations. Due to this population structure and the poor dispersal of seeds, gene flow between 49 populations is limited. The time to reach species monophyly is increased in these isolated populations due to the retention of ancestral polymorphisms held in different populations.

A selective sweep on the mitochondria of hybridising species may account for the single Neotropical mtDNA clade. The selective sweep of an organelle across a large number of species may be an indicator of the acceptance of an organelle better suited for the new habitat, thereby promoting adaptation and speciation. The Neotropical species perhaps represent a likely candidate for a selective sweep due to limited genetic variability after the initial single founder event. If a better genome is available and captured it is likely to spread throughout the group to promote fitness (Muir & Filatov, 2007; Palmé et al., 2003; Percy et al., 2014).

The geography, morphology, and molecular data support the ideas presented here, but those ideas are based on the information available at this time. Further phylogenomic analysis would be welcome to shed more light on the importance of hybridisation and organelle exchange in angiosperm evolution.

50

REFERENCES

Agren, J. & Schemske, D. W. (1993). Outcrossing rate and inbreeding depression in two annual monoecious herbs, Begonia hirsuta and B. semiovata. Evolution (N Y) 47, 125–135.

Anderberg, A. A., Rydin, C. & Källersjö, M. (2002). Phylogenetic relationships in the order Ericales s.l.: analyses of molecular data from five genes from the plastid and mitochondrial genomes. Am J Bot 89, 677–87.

Archibald, J. M. & Richards, T. A. (2010). Gene transfer: anything goes in plant mitochondria. BMC Biol 8, doi:10.1186/1741–7007–8–147.

Barker, F. K. & Lutzoni, F. M. (2002). The utility of the incongruence length difference test. Syst Biol 51, 625–37.

Belahbib, N., Pemonge, M. H., Ouassou, A., Sbay, H., Kremer, A. & Petit, R. J. (2001). Frequent cytoplasmic exchanges between oak species that are not closely related: Quercus suber and Q. ilex in Morocco. Mol Ecol 10, 2003–12.

Bergthorsson, U., Adams, K. L., Thomason, B. & Palmer, J. D. (2003). Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature 424, 197–201.

Bergthorsson, U., Richardson, A. O., Young, G. J., Goertzen, L. R. & Palmer, J. D. (2004). Massive horizontal transfer of mitochondrial genes from diverse land plant donors to the basal angiosperm Amborella. Proc Natl Acad Sci 101, 17747–52.

Bock, R. (2010). The give-and-take of DNA: horizontal gene transfer in plants. Trends Plant Sci 15, 11–22.

Borchsenius, F. (2009). FastGap 1.2. Department of Biosciences, Aarhus University, Denmark.

Brennan, A. C., Bridgett, S., Shaukat Ali, M., Harrison, N., Matthews, A., Pellicer, J., Twyford, A. D. & Kidner, C. A. (2012). Genomic resources for evolutionary studies in the large, diverse, tropical genus Begonia. Trop Plant Biol 5, 261–276.

Brower, A. V. Z., Desallel, R. & Vogler, A. (1996). Gene trees, species trees, and systematics: a cladistic perspective. Annu Rev Ecol Syst 27, 423–450.

Calderon, C. I., Yandell, B. S. & Havey, M. J. (2012). Genetic mapping of paternal sorting of mitochondria in cucumber. Theor Appl Genet 125, 11–18.

Chung, K.-F., Leong, W.-C., Rubite, R., Repin, R., Kiew, R., Liu, Y. & Peng, C.-I. (2014). Phylogenetic analyses of Begonia sect. Coelocentrum and allied limestone species of China shed light on the evolution of Sino-Vietnamese karst flora. Bot Stud 55, doi:10.1186/1999– 3110–55–1. 51

Clement, W. L., Tebbitt, M. C., Forrest, L. L., Blair, J. E., Brouillet, L., Eriksson, T. & Swensen, S. M. (2004). Phylogenetic position and biogeography of Hillebrandia Sandwicensis (Begoniaceae): a rare Hawaiian relict. Am J Bot 91, 905–917.

Colless, D. H. (1980). Congruence between morphometric and allozyme data for menidia species: a reappraisal. Syst Zool 29, 288–299.

Cunningham, C. W. (1997a). Is congruence between data partitions a reliable predictor of phylogenetic accuracy? Empirically testing an iterative procedure for choosing among phylogenetic methods. Syst Biol 46, 464–478.

Cunningham, C. W. (1997b). Can three incongruence tests predict when data should be combined? Mol Biol Evol 14, 733–740.

Darlu, P. & Lecointre, G. (2002). When does the incongruence length difference test fail? Mol Biol Evol 19, 432–7.

Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. (2012). jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9, 772. Nature Publishing Group.

Degnan, J. H. & Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genet 2, e68 DOI: 10.1371/journal.pgen.0020068.

Degnan, J. H. & Rosenberg, N. A. (2009). Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24, 332–40.

Demesure, B., Sodzi, N. & Petit, R. J. (1995). A set of universal primers for amplification of polymorphic non-coding regions of mitochondrial and chloroplast DNA in plants. Mol Ecol 4, 129–31.

Desplanque, B., Viard, F., Bernard, J., Forcioli, D., Saumitou-Laprade, P., Cuguen, J. & Van Dijk, H. (2000). The linkage disequilibrium between chloroplast DNA and mitochondrial DNA haplotypes in Beta vulgaris ssp. maritima (L.): the usefulness of both genomes for population genetic studies. Mol Ecol 9, 141–54.

Dewitte, A., Twyford, A. D., Thomas, D. C., Kidner, C. A. & Huylenbroeck, J. Van. (2011). The origin of diversity in Begonia: genome dynamism, population processes and phylogenetic patterns. In Dyn Process Biodivers – Case Stud Evol Spat Distrib, pp. 27–52. Edited by O. Grillo. InTech.

Dewitte, A., Leus, L., Eeckhaut, T., Vanstechelman, I., Huylenbroeck, J. Van & Bockstaele, E. Van. (2009). Genome size variation in Begonia. Genome 52, 829–838.

Doorenbos, J., Sosef, M. S. M. & de Wilde, J. J. F. E. (1998). The sections of Begonia. Leiden: Wageningen Agricultural University. 52

Doyle, J. J. & Gaut, B. S. (2000). Evolution of genes and taxa: a primer. Plant Mol Biol 42, 1– 23.

Doyle, J. J. (1992). Gene trees and species trees: molecular systematics as one-character taxonomy. Syst Bot 17, 144–163.

Drinkwater, E. (2014). Is mitochondrial and chloroplast inheritance in Begonia maternal, paternal or bi-parental? BSc Thesis, University of Edinburgh.

Dumolin-Lapegue, S., Pemonge, M. H. & Petit, R. J. (1997). An enlarged set of consensus primers for the study of organelle DNA in plants. Mol Ecol 6, 393–7.

Dumolin-Lapègue, S., Pemonge, M. H. & Petit, R. J. (1998). Association between chloroplast and mitochondrial lineages in oaks. Mol Biol Evol 15, 1321–31.

Estabrook, G. F., McMorris, F. R. & Meacham, C. A. (1985). Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Zool 34, 193–200.

Faith, D. P. (1991). Cladistic permutation tests for monophyly and nonmonophyly. Syst Zool 40, 366–375.

Farris, J. S., Kallersjo, M., Kluge, A. G. & Bult, C. (1995). Testing significance of incongruence. Cladistics 10, 315–319.

Forrest, L. L. & Hollingsworth, P. M. (2003). A recircumscription of Begonia based on nuclear ribosomal sequences. Plant Syst Evol 241, 193–211.

Forrest, L. L., Hughes, M. & Hollingsworth, P. M. (2005). A phylogeny of begonia using nuclear ribosomal sequence data and morphological characters. Syst bot 30, 671–682.

Gadagkar, S. R., Rosenberg, M. S. & Kumar, S. (2005). Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J Exp Zool 304, 64– 74.

Galtier, N. & Daubin, V. (2008). Dealing with incongruence in phylogenomic analyses. Philos Trans R Soc Lond B Biol Sci 363, 4023–9.

Gao, C., Ren, X., Mason, A. S., Liu, H., Xiao, M., Li, J. & Fu, D. (2014). Horizontal gene transfer in plants. Funct Integr Genomics 14, 23–9.

Gascuel, O. (1997). BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14, 685–95.

GBIF. (2013). The Global Biodiversity Information Facility: GBIF Backbone Taxonomy. 53

Goldman, N., Anderson, J. P. & Rodrigo, A. G. (2000). Likelihood-based tests of topologies in phylogenetics. Syst Biol 49, 652–70.

Gontcharov, A. A., Marin, B. & Melkonian, M. (2004). Are combined analyses better than single gene phylogenies? A case study using SSU rDNA and rbcL sequence comparisons in the Zygnematophyceae (Streptophyta). Mol Biol Evol 21, 612–24.

Goodall-copestake, W. P. (2005). Framework Phylogenies of the Begoniaceae. PhD Thesis, University of Glasgow.

Goodall-copestake, W. P., Harris, D. J. & Hollingsworth, P. M. (2009). The origin of a mega- diverse genus : dating Begonia (Begoniaceae) using alternative datasets, calibrations and relaxed clock methods. Bot J Linaean Soc 159, 363–380.

Goodall-copestake, W. P., Pérez-espona, S., Harris, D. J. & Hollingsworth, P. M. (2010). The early evolution of the mega‐ diverse genus Begonia (Begoniaceae) inferred from organelle DNA phylogenies. Bot J Linaean Soc 101, 243–250.

Goremykin, V. V, Salamini, F., Velasco, R. & Viola, R. (2009). Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol 26, 99–110.

Guindon, S. & Gascuel, O. (2003). A simple, fast, and accurate algorithm to astimate aarge ahylogenies by maximum likelihood. Syst Biol 52, 696–704.

Hall, T. A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Oxford Univ Press 95–98.

Hasegawa, M. & Kishino, H. (1989). Hetergeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Japanese J Genet 64, 243–258.

Havey, M. J., McCreight, J. D., Rhodes, B. & Taurick, G. (1998). Differential transmission of the Cucumis organellar genomes. Theor Appl Genet 97, 122–128.

Hedtke, S. M., Townsend, T. M. & Hillis, D. M. (2006). Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55, 522–9.

Heled, J. & Drummond, A. J. (2010). Bayesian inference of species trees from multilocus data. Mol Biol Evol 27, 570–80.

Hipp, A., Hall, J. & Sytsma, K. (2004). Congruence versus phylogenetic accuracy: revisiting the incongruence length difference test. Syst Biol 53, 81–89.

Huelsenbeck, J. P. & Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–5. 54

Huelsenbeck, J. P. & Bull, J. J. (1996). A likelihood ratio test to detect conflicting phylogenetic signal. Syst Biol 45, 92–98.

Hughes, M. & Hollingsworth, P. M. (2008). Population genetic divergence corresponds with species-level biodiversity patterns in the large genus Begonia. Mol Ecol 17, 2643–51.

Hughes, M., Hollingsworth, P. M. & Miller, A. G. (2003). Population genetic structure in the endemic Begonia of the Socotra archipelago. Biol Conserv 113, 277–284.

Hughes, M. & Pullan, M. (2007). Southeast Asian Begonia Database.

Jeffroy, O., Brinkmann, H., Delsuc, F. & Philippe, H. (2006). Phylogenomics: the beginning of incongruence? Trends Genet 22, 225–31.

Kishino, H. & Hasegawa, M. (1989). Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29, 170–9.

Knowles, L. L. (2009). Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. Syst Biol 58, 463–7.

Kubatko, L. S. & Degnan, J. H. (2007). Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56, 17–24.

Kubitzki, K. (2011). Introduction to Cucurbitales. In Fam Genera Vasc Plants, pp. 4–6. Edited by K. Kubitzki. Springer Berlin Heidelberg.

Lecointre, G. & Deleporte, P. (2005). Total evidence requires exclusion of phylogenetically misleading data. Zool Scr 34, 101–117.

Leigh, J. W., Susko, E., Baumgartner, M. & Roger, A. J. (2008). Testing congruence in phylogenomic analysis. Syst Biol 57, 104–15.

Leigh, J. W., Schliep, K., Lopez, P. & Bapteste, E. (2011). Let them fall where they may: congruence analysis in massive phylogenetically messy data sets. Mol Biol Evol 28, 2773– 85.

Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24, 2542–3.

Maddison, W. P. (1997). Gene trees in species trees. Syst Biol 46, 523–536.

Martinsen, G. D., Whitham, T. G., Turek, R. J. & Keim, P. (2001). Hybrid populations selectively filter gene introgression between species. Evolution (N Y) 55, 1325–35. 55

Matolweni, L. O., Balkwill, K. & McLellan, T. (2000). Genetic diversity and gene flow in the morphologically variable, rare endemics Begonia dregei and Begonia homonyma (Begoniaceae). Am J Bot Bot 87, 431–439.

Mclellan, T. (2000). Geographic variation and plasticity of leaf shape and size in Begonia dregei and B. homonyma (Begoniaceae). Bot J Linaean Soc 132, 79–95.

Miller, M. A., Pfeiffer, W. & Schwartz, T. (2010). Creating the CIPRES science gateway for inference of large phylogenetic trees. Proc Gatew Comput Envrionments Work 1–8. Ieee.

Moonlight, P. (2013). The biogeography of neotropical Begonia L.: correlation between mountain evolution and range evolution in an Andean-centered group. MSc Thesis, University of Edinburgh.

Muir, G. & Filatov, D. (2007). A selective sweep in the chloroplast DNA of dioecious silene (section Elisanthe). Genetics 177, 1239–47.

Nichols, R. (2001). Gene trees and species trees are not the same. Trends Ecol Evol 16, 358–364.

Olson, M. S. & McCauley, D. E. (2000). Linkage disequilibrium and phylogenetic congruence between chloroplast and mitochondrial haplotypes in Silene vulgaris. Proc Biol Sci 267, 1801–8.

Page, R. D. & Charleston, M. A. (1997). From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol 7, 231–40.

Page, R. D. & Charleston, M. A. (1998). Trees within trees: phylogeny and historical associations. Trends Ecol Evol 13, 356–9.

Pagel, M. (1999). Inferring the historical patterns of biological evolution. Nature 401, 877–84.

Palmé, A. E., Semerikov, V. & Lascoux, M. (2003). Absence of geographical structure of chloroplast DNA variation in sallow, Salix caprea L. Heredity (Edinb) 91, 465–74.

Peng, C.-I. & Chiang, T.-Y. (2000). Molecular confirmation of unidirectional hybridization in Begonia x taipeiensis Peng (Begoniaceae) from Tiawan. Ann Missouri Bot Gard 87, 273– 285.

Peng, C. & Ku, S. (2009). Begonia × chungii ( Begoniaceae ), a new natural hybrid in Taiwan. Bot Stud 50, 241–250.

Penny, D., Watson, E. E. & Steel, M. A. (1993). Trees from languages and genes are very similar. Syst Biol 42, 382.

Percy, D. M., Argus, G. W., Cronk, Q. C., Fazekas, A. J., Kesanakurti, P. R., Burgess, K. S., Husband, B. C., Newmaster, S. G., Barrett, S. C. H. & Graham, S. W. (2014). 56

Understanding the spectacular failure of DNA barcoding in willows (Salix): Does this result from a trans-specific selective sweep? Mol Ecol doi: 10.1111/mec.12837.

Philippe, H., Brinkmann, H., Lavrov, D. V, Littlewood, D. T. J., Manuel, M., Wörheide, G. & Baurain, D. (2011). Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9, e1000602 doi:10.1371/ journal.pbio.1000602.

Pirie, M. D., Humphreys, A. M., Galley, C., Barker, N. P., Verboom, G. A., Orlovich, D., Draffin, S. J., Lloyd, K., Baeza, C. M. & other authors. (2008). A novel supermatrix approach improves resolution of phylogenetic relationships in a comprehensive sample of danthonioid grasses. Mol Phylogenet Evol 48, 1106–19.

Pirie, M. D., Humphreys, A. M., Barker, N. P. & Linder, H. P. (2009). Reticulation, data combination, and inferring evolutionary history: an example from Danthonioideae (Poaceae). Syst Biol 58, 612–28.

Plana, V. (2003). Phylogenetic relationships of the Afro-Malagasy ,embers of the large genus Begonia inferred from trnL intron sequences. Syst Bot 28, 693–704.

Plana, V., Gascoigne, A., Forrest, L. L., Harris, D. & Pennington, R. T. (2004). Pleistocene and pre-Pleistocene Begonia speciation in Africa. Mol Phylogenet Evol 31, 449–61.

Planet, P. J. (2006). Tree disagreement: measuring and testing incongruence in phylogenies. J Biomed Inform 39, 86–102.

Planet, P. J. & Sarkar, I. N. (2005). mILD: a tool for constructing and analyzing matrices of pairwise phylogenetic character incongruence tests. Bioinformatics 21, 4423–4.

Prager, E. M. & Wilson, a C. (1988). Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. J Mol Evol 27, 326–35.

Rajbhandary, S., Hughes, M., Phutthai, T., Thomas, D. C. & Shrestha, K. K. (2011). Asian Begonia: out of Africa via the Himalayas? Gard Bull Singapore 63, 277–286.

Rambaut, A. & Drummond, A. J. (2013). Tracer v1.6.

Rambaut, A. (2014). FigTree.

Rodrigo, A. G., Kelly-Borges, M., Bergquist, P. R. & Bergquist, P. L. (1993). A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree. New Zeal J Bot 31, 257–268.

Rohlf, F. J. (1982). Consensus indices for comparing classifications. Math Biosci 59, 131–144.

Rokas, A., Williams, B. L., King, N. & Carroll, S. B. (2003). Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804. 57

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M. a & Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61, 539–42.

Rubite, R. R., Hughes, M., Alejandro, G. J. & Peng, C.-I. (2013). Recircumscription of Begonia sect. Baryandra (Begoniaceae): evidence from molecular data. Bot Stud 54, 38.

Samarakoon, T., Wang, S. Y. & Alford, M. H. (2013). Enhancing PCR amplification of DNA from recalcitrant plant specimens using a Trehalose-based additive. Appl Plant Sci 1, doi:10.3732/apps.1200236.

Savolainen, V. & Chase, M. W. (2003). A decade of progress in plant molecular phylogenetics. Trends Genet 19, 717–24.

Schaefer, H. & Renner, S. S. (2011). Phylogenetic relationships in the order Cucurbitales and a new classification of the gourd family ( Cucurbitaceae ). Taxon 60, 122–138.

Shaw, J., Lickey, E. B., Schilling, E. E. & Small, R. L. (2007). Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am J Bot 94, 275–288.

Shimodaira, H. (1998). An application of multiple comparison techniques to model selection. Ann Inst Stat Math 50, 1–13.

Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Syst Biol 51, 492–508.

Shimodaira, H. & Hasegawa, M. (1989). Comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16, 1114–1116.

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M. & other authors. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7, 539.

Som, A. (2014). Causes, consequences and solutions of phylogenetic incongruence. Brief Bioinform doi:10.1093/bib/bbu015.

Steven, P. . (2001). Angiosperm Phylogeny Website. Version 12.

Sun, M., Soltis, D. E., Soltis, P. S., Zhu, X., Burleigh, J. G. & Chen, Z. (n.d.). Deep phylogenetic incongruence in the angiosperm Rosidae clade http://mobot–biodiversity– jc.weebly.com/spring–201.

Swofford, D. L. (2003). Paup*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sunderland, Massachusettes: Sinauer Associates. 58

Talianova, M. & Janousek, B. (2011). What can we learn from tobacco and other Solanaceae about horizontal DNA transfer? Am J Bot 98, 1231–42.

Tebbitt, M. C., Forrest, L. L., Santoriello, A., Clement, W. L. & Swensen, S. M. (2006). Phylogenetic relationships of Asian Begonia, with an emphasis on the evolution of rain- ballist and animal dispersal mechanisms in sections Platycentrum, Sphenanthera and Leprosae. Syst Bot 31, 327–336.

Templeton, A. R. (1983). Phylogenetic inference from restiction endonuclease cleave site maps with particular reference to the evolution of humans and the apes. Evolution (N Y) 37, 221– 244.

Than, C. & Nakhleh, L. (2009). Species tree inference by minimizing deep coalescences. PLoS Comput Biol 5, e1000501 doi:10.1371/journal.pcbi.1000501.

Thomas, D. C., Hughes, M., Phutthai, T., Rajbhandary, S., Rubite, R., Ardi, W. H. & Richardson, J. E. (2011). A non-coding plastid DNA phylogeny of Asian Begonia (Begoniaceae): evidence for morphological homoplasy and sectional polyphyly. Mol Phylogenet Evol 60, 428–44. Elsevier Inc.

Thomas, D. C., Hughes, M., Phutthai, T., Ardi, W. H., Rajbhandary, S., Rubite, R., Twyford, a. D. & Richardson, J. E. (2012). West to east dispersal and subsequent rapid diversification of the mega-diverse genus Begonia (Begoniaceae) in the Malesian archipelago. J Biogeogr 39, 98–113.

Thomas, D. C. (2010). Phylogenetics and historical biogeography of Southeast Asian Begonia L. ( Begoniaceae ). PhD Thesis, University of Glasgow.

Thornton, J. W. & DeSalle, R. (2000). A new method to localize and test the significance of incongruence: detecting domain shuffling in the nuclear receptor superfamily. Syst Biol 49, 183–201.

Tsutsui, K., Suwa, A., Sawada, K., Kato, T., Ohsawa, T. a & Watano, Y. (2009). Incongruence among mitochondrial, chloroplast and nuclear gene trees in Pinus subgenus Strobus (Pinaceae). J Plant Res 122, 509–21.

Twyford, A., Kidner, C. & Ennos, R. (2014). Genetic differentiation and species cohesion in two widespread Central American Begonia species. Heredity (Edinb) 112, 382–90. Nature Publishing Group.

Twyford, A. D., Kidner, C. A., Harrison, N. & Ennos, R. A. (2013). Population history and seed dispersal in widespread Central American Begonia species (Begoniaceae) inferred from plastome-derived microsatellite markers. Bot J Linn Soc 171, 260–276. Blackwell Publishing Ltd. 59

De Vienne, D. M., Giraud, T. & Martin, O. C. (2007). A congruence index for testing topological similarity between trees. Bioinformatics 23, 3119–24.

De Villiers, M. J., Pirie, M. D., Hughes, M., Möller, M., Edwards, T. J. & Bellstedt, D. U. (2013). An approach to identify putative hybrids in the “coalescent stochasticity zone”, as exemplified in the African plant genus Streptocarpus (Gesneriaceae). New Phytol 198, 284– 300.

Waterman, M. S. (1978). On the similarity of dendrograms. J Theor Biol 73, 789–800.

Wheeler, W. (1999). Measuring topological congruence by extending character techniques. Cladistics 15, 131–135.

De Wilde, J. J. F. E. (2002). Begonia section Tetraphilia A. DC.: a taxonomic revision. In Stud Begoniaceae VII, pp. 5–258. Edited by J. J. F. E. de Wilde. Leiden: Backhuys Publishers.

De Wilde, J. J. F. E. (2011). Begoniaceae. In Fam Genera Vasc Plants, pp. 56–71. Edited by K. Kubitzki. Springer Berlin Heidelberg.

Wyatt, G. & Sazima, M. (2011). Pollination and reproductive biology of thirteen species of begonia in the Serra do Mar State Park, Sao Paulo, Brazil. J Pollinat Ecol 6, 95–107.

Yoder, A. D., Irwin, J. A. & Payseur, B. A. (2001). Failure of the ILD to determine data combinability for slow loris phylogeny. Syst Biol 50, 408–24.

Zelwer, M. & Daubin, V. (2004). Detecting phylogenetic incongruence using BIONJ : an improvement of the ILD test. Mol Phylogenet Evol 33, 687–693.

Zhang, L.-B., Simmons, M. P., Kocyan, A. & Renner, S. S. (2006). Phylogeny of the Cucurbitales based on DNA sequences of nine loci from three genomes: implications for morphological and sexual system evolution. Mol Phylogenet Evol 39, 305–22.

Zwickl, D. J. (2006). Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets Under the Maximum Likelihood Criterion. PhD Thesis, The University of Texas at Austin.

60 APPENDIX 1 Appendix 1: All taxa included in this study. Abbreviations: DT = Daniel Thomas (Thomas, 2010), MH = Mark Hughes, LK = Lakmini Kumarage, PM = Peter Moonlight (Moonlight, 2013) WGC = William Goodall-Copestake (Goodall-copestake, 2005) Taxon Section Distribution cpDNA Sequence Source mtDNA Sequence Source B. aff. heydei C.DC. ex Donn.Sm. Casparya Neotropics DT This Study B. albo-coccinea Hook. Reichenheimia I Asia MH This Study B. amphioxus M.J.S.Sands Petermannia Asia DT WGC B. annobonensis A. DC. Sexalaria Africa This Study This Study B. antongilensis Humbert ex Bosser & Keraudren-Aymonin Erminea Africa This Study This Study B. aspleniifolia Hook. f. ex A. DC. Filicibegonia Africa - WGC B. baccata Hook. f. Baccabegonia Africa This Study This Study B. bissei J. Sierra Calzado Begonia Neotropics This Study WGC B. bogneri Ziesenh. Erminea Africa DT WGC B. boliviensis A. DC. Barya Neotropics DT WGC B. bracteata Jack Bracteibegonia Asia DT This Study B. carolineifolia Regel Gireoudia Neotropics PM This Study B. chloroneura P. Wilkie & Sands Diploclinium I Asia DT WGC B. conchifolia A. Dietr. Gireoudia Neotropics PM This Study B. cordifolia (Wight) Thwaites Diploclinium I Asia LK This Study B. coursii Humbert ex Keraudren Nerviplacentaria Africa This Study This Study B. crispula Brade Pritzelia Neotropics PM This Study B. dietrichiana Irmscher Pritzelia Neotropics PM This Study B. dipetala R. Grah. Haagea Asia DT WGC B. dregei Otto & Dietr. Augustia Africa DT WGC B. fagifolia Fisch. ex Otto & Dietr. Wageneria Neotropics PM This Study B. floccifera Bedd. Reichenheimia I Asia DT WGC B. foliosa Humb., Bonpl., & Kunth Lepsia Neotropics DT WGC B. fuscisetosa Sands Petermannia Asia This Study This Study B. goudotii A. DC. Quadrilobaria Africa DT WGC B. grandis Dryander Diploclinium II Asia DT This Study B. heracleifolia Cham. & Schlecht. Gireoudia Neotropics This Study This Study B. herbacea Vell. Trachelocarpus Neotropics DT WGC B. hoehneana Irmscher Unassigned Neotropics This Study This Study 61 B. johnstonii Oliv. ex Hook. f. Rostrobegonia Africa - This Study B. johnstonii Oliv. ex Hook. f. Rostrobegonia Africa PM WGC B. kingiana Irmscher Ridleyella Asia DT WGC B. laruei M. Hughes Petermannia Asia DT This Study B. loranthoides Hook. f. Tetraphilia Africa PM This Study B. luxurians Sheidw. Scheidweileria Neotropics PM This Study B. lyallii A. DC. Nerviplacentaria Africa This Study This Study B. macduffieana L.B. Smith & Schubert Gaerdtia Neotropics - This Study B. macduffieana L.B. Smith & Schubert Gaerdtia Neotropics PM WGC B. malabarica Lamk non A. DC. Unassigned Asia DT WGC B. meyeri-johannis Engl. Mezierea Africa This Study This Study B. microsperma Warb. Loasibegonia Africa PM This Study B. morsei Irmscher Coelocentrum Asia DT WGC B. nelumbiifolia Cham. & Schlecht. Gireoudia Neotropics DT WGC B. odorata Willd. Begonia Neotropics PM This Study B. oxyloba Welw. ex Hook. f. Mezierea Africa DT WGC B. palawanensis Merr. Petermannia Asia MH This Study B. palmata D. Don Platycentrum Asia DT WGC B. petasitifolia Brade Pritzelia Neotropics PM WGC B. piurensis L.B. Smith & Schubert Knesebeckia Neotropics - WGC B. poculifera Hook. f. Squamibegonia Africa DT WGC B. polygonoides Hook. f. Tetraphilia Africa DT WGC B. pringlei S. Wats. Gireoudia Neotropics This Study/DT This Study B. prismatocarpa Hook. Loasibegonia Africa - This Study B. pseudodryadis C.Y. Wu Platycentrum Asia This Study This Study B. radicans Vell. Solananthera Neotropics DT WGC B. robusta Blume Sphenanthea Asia DT WGC B. schmidtiana Regel Begonia Neotropics PM This Study B. socotrana Hook. f. Peltaugustia Africa DT WGC B. sp. - Africa This Study This Study B. stictopoda (Miq.) A. DC. Reichenheimia I Asia This Study This Study B. sutherlandii Hook. f. Augustia Africa DT This Study B. symsanguinea L. L. Forrest & Hollingsw. Symbegonia Asia DT WGC B. thelmae L.B. Smith & Wasshausen Unassigned Neotropics This Study This Study B. thomeana C. DC. Cristasemen Africa - WGC 62 B. thwaitesii Hook. Reichenheimia III Asia LK This Study B. ulmifolia Willd. Donaldia Neotropics DT WGC Coriaria sarmentosa G. Forst. - New Zealand - WGC Datisca cannabina L. - Himalaya - WGC Gynostemma pentaphylla (Thunb.) Malino - Japan - WGC Hillebrandia sandwicensis Oliv. - Hawaii This Study/DT WGC Tetrameles nudiflora R. Br - India - WGC