Species Delimitation in Silene Acaulis (L.)L. (Caryophyllaceae)
Total Page:16
File Type:pdf, Size:1020Kb
Species delimitation in Silene acaulis (L.)L. (Caryophyllaceae) based on multi-locus DNA sequence data Patrik Cangren Degree project for Master of Science (Two Years) in Biodiversity and systematics Degree course in (Biodiversity and systematics, BIO707) 60 hec Autumn and Spring 2015 - 16 Department of Biological and Environmental Sciences University of Gothenburg Examiner: Bernard Pfeil Department of Biological and Environmental Sciences University of Gothenburg Supervisor: Bengt Oxelman Department of Biological and Environmental Sciences University of Gothenburg Cover photo by Jörg Hempel, published under Creative Commons License. Abstract Species delimitation has for a long time been seen as an arbitrary endeavour and has historically been separated from phylogenetics which aims to infer the evolutionary history of species. This separation is problematic since neither species boundaries or evolutionary histories can be inferred without knowledge of the other. Since species are the basis for many biological research problems, the results of erroneous delimitations can have a great impact on scientific accuracy. In Silene acaulis, a wide spread perennial, alpine cushion plant with an almost circumpolar distribution across the northern hemisphere, a large number of subspecies has been described. There is little consensus and knowledge regarding the validity of these names and their application also varies between continents. Using recently developed methods for automated species delimitation based on Bayesian inference and the multi-species coalescent, this study aims to infer the evolutionary history and genetic subdivision of Silene acaulis. The data used include DNA sequences captured from 142 probes through hybrid capture and Illumina sequencing from 86 populations of Silene acaulis and two closely related taxa for which the relation to Silene acaulis is unclear. Of the 142 probes 90 were processed during the study, resulting in 57 informative alignments with complete sequences. Of these a large proportion displayed signs of paralogy and the final STACEY analysis included 8 genes. The results points towards a complicated genetic history with gene duplications or introgression. There was no support for any genetic differentiation between the previously described subspecies but the results indicate the presence of several geographically restricted populations with high internal similarity and little external gene flow. I also present an estimation of the extent of paralogy within Silene acaulis and present an alternative solution for phasing which circumvents a previously unknown and highly problematic error in the commonly used software package samtools. Table of contents Introduction ............................................................................................................................................. 6 General introduction to taxonomy and systematics ........................................................................... 6 Target capture and the multi species coalescent ................................................................................ 6 Gene duplications ................................................................................................................................ 7 Silene acaulis: current knowledge, history and distribution. .............................................................. 8 Aims of this thesis................................................................................................................................ 9 Material and methods ............................................................................................................................. 9 Materials used ..................................................................................................................................... 9 Sequence capture data set .............................................................................................................. 9 Transcriptome data set ................................................................................................................. 12 DNA preparation and next generation sequencing........................................................................... 13 Sequence capture data set ............................................................................................................ 13 Data preparation ............................................................................................................................... 18 Sequence capture dataset ............................................................................................................. 18 Transcriptome dataset .................................................................................................................. 20 Data exploration ................................................................................................................................ 21 Sequence capture dataset ............................................................................................................. 21 Data analysis ...................................................................................................................................... 22 Sequence capture dataset ............................................................................................................. 22 Estimation of phylogeny and species delimitation ........................................................................... 23 Sequence capture dataset ............................................................................................................. 23 Transcriptome dataset .................................................................................................................. 24 Results ................................................................................................................................................... 24 Data preparation: .............................................................................................................................. 24 Sequence capture data set ............................................................................................................ 24 Data exploration ................................................................................................................................ 25 Sequences capture dataset ........................................................................................................... 25 Analyses ............................................................................................................................................. 27 Sequence capture dataset ............................................................................................................. 27 Transcriptome dataset .................................................................................................................. 32 Discussion .............................................................................................................................................. 36 Target capture sequencing results and possible missing genes .................................................... 36 Low read depth and catch-n-de novo approach ........................................................................... 37 Problem in the allele phasing software BCFtools .......................................................................... 37 Summarizing and visualizing mapping parameters ....................................................................... 38 Alignment "finishing" .................................................................................................................... 38 Calculating and plotting pairwise distance against read depth .................................................... 39 Unmapped reads ........................................................................................................................... 39 Paralogy issues .............................................................................................................................. 39 SNAPP analysis............................................................................................................................... 40 Species delimitation and phylogeny .............................................................................................. 41 Acknowledgements ............................................................................................................................... 43 References ............................................................................................................................................. 44 Supplemental material .......................................................................................................................... 47 Introduction General introduction to taxonomy and systematics Nearly 300 years ago Linnaeus began his enormous project of classifying all living organisms into groups and is considered by many as the father of the science of taxonomy. Some of the ideas formalized by him still remain, such as the binomial nomenclature system and an hierarchical classification system with formal ranks, but in other respects much has changed. Linnaeus classified organisms into categories based on shared morphological features and initially saw them as independently created and unchangeable (Linnaeus, 1758). This concept was gradually abolished by the scientific community after Darwin's publication of ´On the Origin