<<

Examensarbete vid Institutionen för geovetenskaper Degree Project at the Department of Earth Sciences ISSN 1650-6553 Nr 471

Estimating the Early of Using an Integrated Approach Combining Genomics and En uppskattning av armfotingarnas tidiga evolution – med hjälp av genomik och

Chloé Robert

INSTITUTIONEN FÖR

GEOVETENSKAPER

DEPARTMENT OF EARTH SCIENCES

Examensarbete vid Institutionen för geovetenskaper Degree Project at the Department of Earth Sciences ISSN 1650-6553 Nr 471

Estimating the Early Using an Integrated Approach Combining Genomics and Fossils En uppskattning av armfotingarnas tidiga evolution – med hjälp av genomik och fossil

Chloé Robert

ISSN 1650-6553

Copyright © Chloé Robert Published at Department of Earth Sciences, Uppsala University (www.geo.uu.se), Uppsala, 2019

Abstract

Estimating the Early Evolution of Brachiopods Using an Integrated Approach Combining Genomics and Fossils Chloé Robert

The Brachiopoda, a major group of the , experienced a rapid early evolutionary diversification during the well-known explosion and subsequently dominated the Palaeozoic benthos with its diversity and abundance. Even though the phylogeny of the Lophotrochozoa is still hotly debated, it is now known that the Brachiopoda are a monophyletic grouping. However, the early evolutionary rates for the Brachiopoda have never been studied in the framework of a study combining molecular data and fossil time calibration points. In order to investigate the expected higher evolutionary rates of the at its origin, we conducted phylogenetic studies combining different methodologies and datasets. This work has at its foundation Maximum Likelihood and Bayesian analyses of 18S and 28S rRNA datasets followed by analyses of phylogenomic sequences. All material was obtained from previously available sequences and from sequencing of genetic material from specimens from a concerted worldwide collection effort. While the analyses of the phylogenomic dataset produced a robust phylogeny of the Brachiopoda with good support, both the results of the novel rRNA and phylogenomic dating analyses provided limited insights into the early rates of evolution of the Brachiopoda from a newly assembled dataset, demonstrating some limitations in calibration dating using the software package BEAST2. Future studies implementing fossil calibration, possibly incorporating morphological data, should be attempted to elucidate the early rates of evolution of Brachiopoda and the effect of the in this .

Keywords: Brachiopoda, evolutionary history, phylogeny, Maximum likelihood, Bayesian inference, fossil calibrations

Degree Project E1 in Earth Science, 1GV025, 30 credits Supervisors: Aodhán Butler and Graham Budd Department of Earth Sciences, Uppsala University, Villavägen 16, SE-752 36 Uppsala (www.geo.uu.se)

ISSN 1650-6553, Examensarbete vid Institutionen för geovetenskaper, No. 471, 2019

The whole document will be available at www.diva-portal.org

Populärvetenskaplig sammanfattning

En uppskattning av armfotingarnas tidiga evolution – med hjälp av genomik och fossil Chloé Robert

Det är ofta antaget att evolution (förändringar i arvsmassan hos en grupp organismer) sker i en konstant hastighet men i slutändan ändå osäkert om så är fallet. Stora grupper av organismer har ofta associerats med en högre evolutionär hastighet, speciellt nära deras uppkomst, vilket ökar sannolikheten för överlevnad. Armfotingar (Brachiopoda) är marina ryggradslösa djur med skal som tidigare var allmänt spridd, idag är istället musslor () betydligt mer spridda. Armfotingar har funnits och utvecklats under flera miljoner år med ursprung under tidigt kambrium. Genom år av forskning och många fossil har vi fått mer information om utseendet hos utdöda organismer vilket har bidragit till att antalet fossila arter som vi känner till har ökat tusenfalt. Under den senaste tiden har det också skett innovationer inom molekylära tekniker som gjort det möjligt att applicera dessa kunskaper även på utdöda arter. Dessa molekylära tekniker har nyligen hjälpt till att bestämma några av släktskapsförhållandena inom armfotingar som tidigare ansetts vara väldigt svåra att lösa. Det finns fortfarande vissa släktskapsförhållanden inom armfotingar som inte är kända och man vet ännu inte hur fort de utvecklades. Genom att undersöka just evolutionens hastighet kan man börja förstå gruppens tidiga framgång under Kambrium och Ordovicium samt minskningen som följde. Syftet med den här studien var att beräkna evolutionshastigheten hos armfotingar med särskild fokus på den tidiga diversifieringen av gruppen. För att undersöka detta använde vi oss av molekylära data för att analysera släktskapsförhållandena inom armfotingar. Dessutom använde vi fossil för att datera stora händelser i armfotingarnas evolutionära historia. Med hjälp av statistiska analyser kunde vi beräkna evolutionshastighet och släktskapsförhållandena inom gruppen. Vi kom fram till att armfotingar härstammar från en gemensam förfader. Dateringen kring när detta skedde blev inte fastställd då det beräknades ske miljoner år före det äldsta djurfossilet. Det kommer behövas mer forskning för att ta reda på om armfotingar hade en högre evolutionär hastighet i tidigt skede.

Keywords Brachiopoda, evolutionhistoria, fylogeni, Maximum likelihood-metoden, Bayesiansk statistik, fossilkalibering

Examensarbete E1 i geovetenskaper, 1GV025, 30 hp Handledare: Aodhán Butler och Graham Budd Institutionen för geovetenskaper, Uppsala universitet, Villavägen 16, 752 36 Uppsala (www.geo.uu.se)

ISSN 1650-6553, Examensarbete vid Institutionen för geovetenskaper, Nr 471, 2019

Hela publikationen finns tillgänglig på www.diva-portal.org

Table of Contents 1 Introduction ...... 1 2 Aims ...... 2 3 Background ...... 3 3.1 Brachiopods ...... 3 3.1.1 Classification ...... 3 3.1.2. Description ...... 3 3.1.3 ...... 4 3.2 Phylogenetic analyses ...... 5 3.2.1 Maximum Likelihood ...... 6 3.2.2 Bayesian inference ...... 6 3.2.3 Model selection ...... 6 3.3 Molecular clock...... 7 3.4 Push of the past ...... 7 4 Material and methods ...... 8 4.1 rRNA pilot study ...... 9 4.1.1 Alignment and model choice ...... 9 4.1.2 Bayesian analyses ...... 13 4.1.3 RAxML analysis ...... 15 4.2 Phylogenomic analyses ...... 15 4.2.1 Dataset and model choice ...... 15 4.2.2 RAxML analysis ...... 17 4.2.3 Bayesian inference for phylogenetic reconstruction ...... 17 4.2.4 Bayesian inference for molecular dating ...... 17 5 Results ...... 18 5.1 rRNA datasets ...... 18 5.1.1 Topology ...... 18 5.1.2 rRNA molecular dating ...... 22 5.2 Phylogenomic datasets ...... 25 5.2.1 PhyloBayes topology ...... 25 5.2.2 RAxML LG and Dayhoff ...... 26 5.2.3 Molecular dating ...... 27 6 Discussion ...... 29 6.1 Limitations of rRNA datasets in ...... 29 6.2 BEAST2: a problematic dating methodology ...... 30 6.3 Future studies ...... 31 7 Conclusion ...... 32 8 Acknowledgments ...... 32 8 References ...... 33 APPENDIX ...... 39

1 Introduction

The first appearance of metazoans and the subsequent diversification of all phyla during the Cambrian is one of the key events in the history of on Earth. The multitude of processes and drivers for this diversification (e.g. Budd & Jensen, 2000) as well as other effects (e.g. Push of the Past (Nee et al., 1994; Budd & Mann, 2018) that may skew our view of these events are a central research topic in and Palaeontology. The was this unprecedented event that witnessed the appearance and diversification of the animal Phyla (Conway Morris, 1989). Outside the very specific conditions and found in Cambrian Lagerstätten (Chengjiang, Sirius Passet, Burgess Shale) a widespread of chiefly biomneralizing organisms with the capacity to produce protective structures such as shells, dominate the fossil record. These biomineralizers came to be overrepresented in the fossil diversity up to the present, something likely caused by the high fossilization potential of biomineralized structures (calcitic or apatitic). While biomineralizers remain dominant to this day, their clade abundances and diversification patterns varied greatly from the Cambrian to the Present evidenced by, for example, the relative marginalization of brachiopods today by bivalves (Gould and Calloway, 1980). Patterns of diversification through time have been of considerable interest to evolutionary theory. Changes in diversification patterns can be easily identified in major radiation events (Budd & Jensen 2000) or after mass (Erwin 1993, Hull et al., 2011). These events are responsible for faunal/floral turnovers, major drops in the diversity of several groups with a concurrent increase in others and a striking shift in fossil morphologies. Predominant among these events, the Cambrian Explosion and the Great Biodiversification Event have attracted researchers interested in the origin of life and its subsequent rapid diversification and occupation of most major environmental niches. Rates of evolution in ancestral lineages can be inferred using molecular and phenotypic data, for instance, these were analysed in previous analyses focusing on (Lee et al., 2013), shedding light on the surprisingly faster rates of evolution of this clade during the Cambrian. Although a great wealth of information has accumulated over the past decades for the Brachiopoda, similar analyses for this phylum are lacking. Dominant during the Palaeozoic, around 15000 extinct described, Brachiopoda has a diverse fossil record (Foote 1999), with very few extant representatives but with a broad geographic distribution. Metazoan groups with a wealth of fossil occurrences, diversity and stratigraphic congruence are prime candidates

1

for the study of the early diversification of animal groups as well as the effect of the Push of the Past in Evolutionary patterns (Budd & Mann, 2018). In this project, I investigated whether an early higher rate of diversification of the Brachiopoda close to the putative Cambrian emergence is responsible for the apparent success of this phylum in accordance to the hypothesis put forward by the Push of the Past, a concept which will be explained in a following section. Instead, exceptional features of the Brachiopod , such as valves enclosing their body, are singularly responsible for their rather successful course through time. This question was approached by using genetic data and fossil calibrations, an attempt to explore the early rates of evolution of the brachiopods.

2 Aims

This project was designed to study the rate of evolution for the Phylum Brachiopoda since its first appearance and through time. I aim to identify patterns in the rate of evolution and to clarify, if present, the effects of the Push of the Past in the observed diversification/evolutionary patterns. To ascertain the accuracy of the results, this work has as its foundation analyses of sequences of genetic material obtained through wide, global collection of specimens and consequent laboratory work to extract genetic material that is then sequenced and analysed. To supplement our analyses, readily available sequences published online at GenBank were also used (Benson et al., 2008). The most widely utilised statistical methodologies to approach such questions as mentioned above are, Maximum Likelihood and Bayesian analyses. These methodologies were used to produce a well-supported phylogeny of brachiopods and produce an accompanying, robust, . Molecular clocks being notoriously difficult to calibrate (e.g. Wilke et al., 2009) and to give accurate results about the timing of the divergence of pose a significant problem for this analysis. In order to fine-tune and calibrate our results, I made extensive use of the most up to date fossil occurrence data for the Brachiopoda to calibrate key events in their evolutionary history such as their first unequivocal appearance.

2

3 Background

3.1 Brachiopods

3.1.1 Classification

There are nearly 5,000 brachiopod genera described and classified morphologically into 26 orders and eight classes (Carlson 2016 citing (Kaesler & Selden 1997–2007)). This Phylum has a very rich fossil record: 400 extant species are described, contrasting with the 15000 extinct species known (Brusca et al., 2016). Brachiopods are divided into three distinct subphyla: Linguliformea, Craniformea and (Kaesler & Selden 1997–2007). These are comprised of superfamilies, of which some contain only extinct taxa. The 25 extant species belonging to Linguliformea are divided in two superfamilies: and . Almost as many extant species are classified within the subphylum Craniformea but all belong to the same superfamily: Cranioidea. Rhynchonelliformea, the last subphylum, contains the greatest diversity of all, with more than 350 species. The extant species of this superfamily are divided in three orders: , and Thecideida (Kaesler & Selden 1997–2007).

3.1.2. Description

The Brachiopoda constitute a Phylum of the Animal . The name is influenced by the morphological features of the group and originates from the Greek words brachium (meaning arm) and poda (feet). They are commonly known as lamp shells, referring to the shape similarities with oil lamps. A pair of dorsoventrally oriented valves protect the animal’s body and organs. These valves are typically unequal in size and are connected to each other, either through the function of muscles or through a hinge system with a tooth-in-socket morphology. Extant Brachiopods are marine and benthic Metazoa, mostly found on the continental shelf. They can be found attached to the benthos, at the lower levels of the water column with their specialised pedicle, an outgrowth of the body wall with the function of providing attachment. There are two types of pedicles, either developing from an outgrowth of the body wall or separating dorsal and ventral body walls (Ekman, 1896). Some brachiopods do not attach themselves to hard substrata, as is the case for , which are infaunal burrowers. Other brachiopod species can also be found cemented to a hard substratum. Brachiopods

3

possess a , a ciliated and tentacular structure surrounding the mouth (Carlson, 2016). This, combined with the epistome, a muscular structure bringing food to the mouth, enables efficient suspension feeding. The lophophore tentacles’ cilia draw water in the mantle cavity by generating a unidirectional current, making suspension feeding possible. Most brachiopods are gonochoristic , reproducing by external fertilization. They have a free-swimming juvenile stage, also called planktotrophic stage, but there are quite different ontogenetic stages for each subphyla (Brusca et al., 2016). It is thought that brachiopods originated at least 530Ma years ago (e.g. Aldanotreta sunnaginensis Pelman, 1977). Brachiopods were among the most abundant Metazoa and suspension feeding organisms during the Palaeozoic, their diversity increased in the Ordovician through the (Valentine & Jablonski, 1983). They experienced an event at - at the end of which they never recovered their previous diversity and steadily declined. This decline is probably exacerbated by competition with mussels (Thayer, 1985).

3.1.3 Lophophorata

Brachiopods are , more precisely belonging to the Lophophorata. For several centuries from their discovery in the 16th century, brachiopods were grouped with molluscs and were only in the late 19th century recognized as lophophorates and grouped with other Lophophorata, under the name Tentaculata (Hatschek, 1988). Traditionally, the lophophorates have been seen as a monophyletic grouping consisting of the (6000 extant species), Phoronida (11 species) and Brachiopoda (Brusca et al., 2016). These phyla are all sessile, organisms generally have a U-shaped gut and a relatively simple reproductive system (Brusca et al., 2016). But these organisms were first grouped together because of their unique feeding structure: the lophophore. Traditionally the Lophophorata were first considered as (Hennig, 1979 and Schram, 1991), because they share various ontogenetic similarity with this group, such as a secondary development of the mouth. Molecular data showed that they were actually genetically closer to Protostomia, and specifically belonging to the (Halanych et al., 1995). The internal phylogenetic relationships of the Spiralia remain a subject of heated debate and a great amount of uncertainty remains (Fig. 1) (e.g. Eernisse et al., 1992; Hausdorf et al.,

4

Figure 1 Cladogram depicting the phylogenetic relationships among Protostomia. Tip names written in purple are Spiralian phyla. Phyla belonging in Lophophorata are indicated by a crown. Nodes with several branches represent soft polytomies. From Brusca et al., 2016.

2007; Laumer et al., 2015;). The phylogeny of the Lophophorata unsurprisingly is also unresolved (e.g. Hejnol et al., 2009; Nesnidal et al., 2013; Dunn et al., 2014). Phoronida is often found as the sister group of Brachiopoda but is also found as genetically closer to Bryozoa.

3.2 Phylogenetic analyses

Evolutionary histories can be visualised with phylogenetic trees, providing information on the last shared common ancestry between two taxa. Recreating patterns of evolution has been the research focus of many works since the 19th century, starting with Larmarck’s Scala Naturae ordering organisms from simple to more complex (1809), followed by Lyell’s theory of common ancestry (1832) and Darwin’s evolutionary theory depicting evolutionary relationships in the form of a branching tree (1837). Since that first depiction of evolution as a tree of interconnected clades and organisms, there have been great developments in the visualisation, methodology and accuracy of the models used to describe the divergence of organismal groups. Today’s reconstruction methods of phylogenetic trees can achieve precise estimates of evolutionary history by using mathematical modelling of character evolution, such as Maximum Likelihood (ML) and Bayesian inference. These methods use a substitution model

5

that specifies which characters can evolve between states based on the data of interest, for example give information for each transversion and transition when DNA data is used and inform about the rates of these evolutionary changes.

3.2.1 Maximum Likelihood

Maximum Likelihood (ML) analyses aim at finding the tree that has the highest probability of giving rise to the observed data. The process of finding this tree starts with computing a single tree, calculating the likelihood for each site, for example calculation the likelihood for each nucleotide when DNA is the material of the study and then calculating the overall likelihood of the tree. Then, by modifying the parameters of this tree, its likelihood is maximized, and the same steps are repeated for each possible tree. The highest likelihood tree is found by comparing the likelihood of each tree (Baum & Smith, 2013).

3.2.2 Bayesian inference

Contrary to Maximum Likelihood analysis, Bayesian inference assesses the posterior probability of the tree, which is the probability that the tree is true given the data, our models of evolution and our prior beliefs:

With Pr(H|D), the probability of the hypothesis (H) given the data (D), also known as posterior probability, Pr(D|H) the probability of the data given the hypothesis (also known as the likelihood). Pr(H) and Pr(D) both refer to priors probability. With this mathematical model, the starting information (Prior) is determinant, hence the importance to choose these with caution (Baum & Smith, 2013).

3.2.3 Model selection

Before starting a phylogenetic analysis, the substitution model fitting the data best has to be identified. This will specify the way the characters can evolve between their different states (Baum & Smith, 2013). Several substitution models exist, firstly with a model with the least parameters, such as the Jukes-Cantor model (1969), which assumes equal frequencies, rates and substitution for all DNA bases, to more complex models such as the general time-reversible model (GTR) (Travaré, 1986). The GTR model assigns different substitution rates for each

6

transition and transversion as well as variable base frequencies to the characters. The best fitting model is chosen with the help of statistical tests, such as the corrected Akaike Information Criterion (AICc, Akaike 1974) or the Bayesian Information Criterion (BIC, Schwarz, 1978). The model obtaining the smallest value in these tests are set to fit the data better.

3.3 Molecular clock

Obtaining accurate evolutionary timescales and correctly dating past events is fundamental to the reconstruction of evolutionary histories. Pauling and Zuckerkandl (1963) observed from studies on mammals, constant rates of base changes of amino-acids through time, the sole principle of molecular clock. Based on the assumption that the number of changes in the genome of two organisms is proportional to the time elapsed since they last shared a common ancestor, the concept of molecular clock encompasses methods and models that assess variations of rates of genetic evolution across the tree of life (Lee and Ho, 2016). Thanks to different molecular clock models, one can accommodate for diversifications of rate of evolution among and lineages in order to replicate the real evolutionary rate, which will be useful to calculate evolutionary time. In Maximum Likelihood and Bayesian inference, a molecular clock can be used to compute phylograms in which branch lengths are proportional to past evolutionary changes, shedding light on information that might complement the information gathered from the fossil record. However, prior knowledge is fundamental to compute such trees, for example in the form of calibration points, that enable dating evolutionary events such as the emergence of a particular clade. Hence the existing fossil record is useful and fossil data are essential to calibrate and date the phylogenetic tree. Calibrating trees consists of using the age the older fossil known for a monophyletic group as the minimum age for that clade, or even assigning minimum and maximum age constraints for a specific event. The principles outlined above were used in the present study.

3.4 Push of the past

Phylogenetic models aim at reconstructing relationships among selected organisms, highlighting heterogeneous performance of different clades in a macroevolutionary context. These variations can partially be explained by differences in their early evolutionary rates. The

7

effect of the push of the past (POTPa) is a concept introduced in the 1990s, which suggests an “apparently higher rate of at the beginning of the growth of the actual phylogeny” (Nee et al., 1994). This effect would be one of the causes of the great diversification and survivability of today’s extant clades (Budd & Mann, 2018). These clades would have benefited from a high diversification rate at their origin that consequently increased their probability of survival. Clades that do not benefit from such greater early diversification rates are thus much more prone to extinction. This implies that the monophyletic groups that had the “good fortune” to have a distinctly greater early diversity also had much more time to diversify and survive to the present. Hence, when the effects of the POTPa are not recognized in phylogenetic analyses, if one considers a rather homogeneous rate of evolution throughout the tree, the age of the root will be set excessively early, affecting the accuracy of the evolutionary history depicted in the phylogenetic tree, resulting in erroneous observations. Rates of evolution in ancestral lineages can be reconstructed using molecular and phenotypic data, for instance these were used in a previous analysis focusing on arthropods (Lee et al., 2013), where higher rates of evolution were observed at the origin of the Arthropoda. However, no similar study has taken place to address the diversification and survival of the Brachiopoda to date.

4 Material and methods

The methodology for this study can be divided into three sections. The primary step is the acquisition of the material for the study. This section includes everything from the collection of animals to the extraction of genetic material. The second part of the data collection focuses on retrieving the genes of interest to our study and grouping them in corresponding alignment files. The third and final step is the actual analysis of the data and using fossil data to calibrate the molecular clock. The first and second sections were carried out before the time frame of the present thesis and thanks to the collaboration of other scientists, hence will only be described in the Appendix. Two analyses were conducted using two distinct datasets. First, the method was tested in a pilot study (proof of principle study) using ribosomal RNA (rRNA) sequences and the second analyses took advantage of a phylogenomic dataset assembled prior this study.

8

4.1 rRNA pilot study

In order to familiarize with the methodology and the software, ribosomal RNA (rRNA) sequences were used for a pilot study. For the pilot study 18S rRNA and 28S rRNA sequences were used. The genetic material used in this first study consisted of newly sequenced material and GenBank material. Accession numbers of these sequences are available in Table 1. Separate analyses were launched for 18S, 28S and a concatenated dataset grouping both 18S and 28S.

4.1.1 Alignment and model choice

Sequences were realigned using the Q-INS-i algorithm in MAFFT (Katoh & Standley, 2013), that considers the secondary structure of RNA. With 152 taxa, belonging to height different phyla (Table 2), the final 18S rRNA alignment contained 1952 sites, while the 28S rRNA dataset that had 140 taxa and 1450 sites. In order to concatenate two datasets, they must contain the same taxa, therefore, the final concatenated dataset included 76 taxa and 3234 sites (1884 from the 18S partition and 1350 from the 28S partition). Before setting the parameters for the Bayesian analysis, PAUP* (Swofford, 2003) was used to find the model fitting the data the best, based on a distance matrix and a neighbour joining tree created from the alignment:

#open the alignment in nexus format exe [filename].nexus #create a distance matrix and save it in a text file savedist format=tabText triangle=both diagonal file=[filename] #create a neighbour joining tree nj bionj=yes showtree=yes #find the best model automodel ShowParamEst=yes

Results are presented in Table 2.

9

Table 1 Species names, authorship and GenBank accession numbers of taxa included in this rRNA study. If an asterisk (*) follows a GenBank numbers it indicates that this sequence was used in the concatenated dataset. If the GenBank accession numbers are followed by a superscripted B (B) or a superscripted R (R), it denotes that the sequence was only used respectively in the BEAST2 or RAxML analysis.

Species 18S 28S Autorship LINGULIFORMEA Discinoidea Discinisca stella JN415657* Dall, 1871 Discinisca tenuis AY842020*; AF202444 Dall, 1871 Discina striata U08333* JQ414037* (Schumacher, 1817) Discinisca tenuis U08327 (Sowerby, 1847) Pelagodiscus atlanticus JQ414034; JQ414035; (King, 1868) JQ414036 Linguloidea Glottidia palmeri AF201744* JN618994* Dall, 1871 Glottidia pyramidata U12647 (Stimpson, 1860) adamsi U08329 Dall, 1873 Lingula anatina AB747095; U08331; Lamarck, 1801 X81631* AB747096B; AH001678; Davidson, 1880 LC334155 Lingula rostrum AB855774 (Shaw, 1797) Lingula sp. Not published yet* Not published yet*, AY839250* Brugière, 1791

CRANIFORMEA Cranioidea Neoancistrocrania sp. HQ852066* HQ852082*; HQ852083* Laurin, 1992 Neoancistrocrania norfolki AY842019; HQ852055*; HQ852075*; JX575602* Laurin, 1992 HQ852057*; HQ852058* Neocrania anomala DQ279934; U08328 (O. F. Müller, 1776) Neocrania huttoni U08334 (Thomson, 1916) HQ837667* HQ852076* (O. F. Müller, 1776) Novocrania anomala HQ852068 (O. F. Müller, 1776) Novocrania huttoni U08334 (Thomson, 1916) Novocrania lecointei JX645243*; JX645254* Lee & Brunton, 2001 Novocrania pourtalesi HQ837658* Dall, 1871 Novocrania sp. HQ852060*; HQ852065*; HQ852070*; HQ852072*; Lee & Brunton, 2001 HQ837653*; HQ852061*; HQ852073; HQ852077; HQ852064; Not published HQ852079; HQ852080; yet* HQ852084*; HQ852086; HQ852087; HQ852088; HQ852089: HQ918260*; JN673266*; JN673267*; JN985738*; Not published yet*

RHYNCHONELLIFORMEA Thecideida Thecideoidea Lacazella caribbeanensis KC577459; KC577460; Cooper, 1977 KC577461; KC577462 Minutella sp. KC577466B Hoffmann & Lüter, 2010 Ospreyella palauensis KC577456 Logan, 2008 Ospreyella depressa KC577455 Lüter, 2003 Ospreyella sp. KC577451; KC577452; Lüter & Wörheide, 2003 KC577453; KC577454 Pajaudina atlantica KC577457; KC577458 Logan, 1988 Thecidellina japonica KC577465B (Hayasaka, 1838) Thecidellina williamsi KC577464B Lüter & Logan, 2008 Thecidellina sp. KC577463B Thomson, 1915

Terebratulida Cancellothyroidea Cancellothyris hedleyi AF025929* HQ918285* (Finlay, 1927) Cancellothyris sp. HQ918285 Thomson, 1926 Chlidonophora sp. HQ822064 Dall, 1903 Chlidonophora sp. AF025930* HQ918275* Dall, 1903 Eucalathis murrayi JN632510 (Davidson, 1878) Terebratulina retusa GU125759*; U08324* MF619926; MF619927; (Linnaeus, 1758) MF619928; MF619930; MF619932; MF619933; MF619934; MF619935; MF619936*; MF619937*

10

Species 18S 28S Autorship Terebratulina septentrionalis HQ918277; MF619938; (Couthouy, 1838) MF619940; MF619941; MF619944; MF619949 Terebratulina unguicula HQ918278.1 (Carpenter, 1864) Dyscolioidea Abyssothyris wyvillei HQ918286 (Davidson, 1878) Abyssothyris sp. AF025928* HQ918274* Thomson, 1927 Dyscolia wyvillei HQ918276* HQ918276* (Davidson, 1878) Dyscolia sp. AF025931 Fischer & Oehlert, 1890 Xenobrochus africanus HQ918283 (Cooper, 1973) Gwynioidea capsula AF025940* HQ918267* (Jeffreys, 1859) Kingenidae Ecnomiosa sp. HQ852096 Cooper, 1977 Fallax neocaledonensis AF025939* HQ918265* Laurin, 1997 Kraussinoidea Kraussina mercatori HQ852100 Helmcke, 1939 Megerlia truncata U08321 (Linnaeus, 1767) Megerlina sp. AF025943* HQ918269* Deslongchamps, 1884 Pumilus antiquatus HQ852097 HQ918268 Atkins, 1958 Laqueoidea Coptothyris adamsi KF038315 GQ275348 (Davidson, 1871) Laqueus californicus U08323; Not published Not published yet*; Not (Koch, 1848) yetB*; Not published yet*; published yet*; Not published Not published yet*; yet* Terebratalia coreanica HQ852098 (Adams & Reeve, 1850) Terebratalia transversa AF025945; JF509725; (Sowerby, 1846) FJ196115; U12650 Megathyridoidea cordata AF119078 (Risso, 1826) Megathiris detruncata HQ852093 (Gmelin, 1791) Platidioidea Amphithyris sp. HQ852094* HQ918270* Thomson, 1918 Terebratelloidea Anakinetica cumingi HQ852095 (Davidson, 1852) Aneboconcha obscura GQ261959 Cooper, 1973 Calloria inconspicua AF025938* DQ871292* (Sowerby, 1846) Calloria variegata DQ871293 Cooper & Doherty, 1933 Campages sp. HQ918259 Hedley, 1905 septigera JF273494 (Lovén, 1846) Dallina sp. GQ265554; GQ261960 Beecher, 1893 Gyrothyris mawsoni AF025941* DQ871294* Thomson, 1918 Gyrothyris williamsi DQ871295 Bitner, Cohen, Long, Richer de Forges & Saito, 2008 Magellania venosa GQ265555; GQ265556; (Dixon, 1789) DQ871297 Neothyris lenticularis DQ871298 (Deshayes, 1839) Neothyris parva AF025944* DQ871299* Cooper, 1982 Nipponithyris sp. KF313140 Yabe & Hatai, 1934 Terebratella dorsata GQ261961; GQ265558; (Gmelin, 1791) GQ265559 Terebratella sanguinea DQ871291* DQ871300; GQ265896* (Leach, 1814) Terebratuloidea Dallithyris murrayi HQ918280 Muir-Wood, 1959 Gryphus vitreus AF025932* HQ918272* (Born, 1778) Kanakythyris pachyrhynchos HQ852090* HQ918281* Laurin, 1997 Liothyrella neozelanica HQ822070; U08332* GQ261962* Thomson, 1918 Liothyrella uva U08330* GQ261963* (Broderip, 1833) Magellania fragilis AF202110* DQ871296* Smith, 1907 Stenosarina crosnieri AF025934 (Cooper, 1983) Stenosarina globosa HQ918273, HQ918282 Lurin, 1997 Terebratulidae sp. HQ918284 Gray, 1840 Zeillerioidea Macandrevia cranium AF025942* HQ918266* (O. F. Müller, 1776) Macandrevia sp. HQ918271 King, 1859

Rhynchonellida Dimerelloidea Cryptopora gnomon HQ852102B Jeffreys, 1869 Hemithiridoidea Hemithiris psittacea HPU08322* GQ265895* (Gmelin, 1791) Notosaria nigricans AY839243 (Sowerby, 1846) Norelloidea Frieleia halli GQ261964 Dall, 1895 Manithyris rossi GQ261965 Foster, 1974

11

Species 18S 28S Autorship Neorhynchia sp. AF025937 Thomson, 1915 Parasphenarina cavernicola HQ852101* JQ663539* Motchuro-Dekova, Saito & Endo, 2002 Tethyrhynchia mediterranea HQ852105B* HQ918264* Logan in Logan & Zibrowius, 1994 Pugnacoidea Acanthobasiliola doederleini HQ852099 (Davidson, 1886) Basiliola beecheri HQ852103B* HQ918262* (Dall, 1895) Basiliola lucida HQ852104B* HQ918263* (Gould, 1862) Eohemithiris grayi AF025936* AY839242* (Woodward, 1855) Brachiopod sp. AF025935

PHORONIDA Actinotrocha harmeri KC161253; KC161254 Zimmer, 1964 Actinotrocha sp. KC161255 Müller, 1846 Phoronis architecta AF025946* EU334109* AF119079; AF202111; EU334110; EU334111*; Haswell, 1883 EU334122*; EU334123*; EU334112* KT030908; KT030909; KT030910 Phoronis embryolabi KY643695* Temereva et Chichvarkhin, 2016 Phoronis emigi AB621913 Hirose, Fukiage, Katoh & Kajihara, 2014 AF202112; JF509726; Wright, 1856 KT030907; U08325 Phoronis ijimai AB752304* KY643697*; KY643698* Oka, 1897 EU334125*; KJ193748 EU334114* Selys-Longchamps, 1903 EU334126*; GU125758B EU334115* Wright, 1856 EU334127* EU334116* (Schneider, 1862) U36271 Cori, 1889 Phoronis sp. AB106269*; KT030901*; Wright, 1856 KT030903*; KT030904*; KT030906* Phoronis vancouverensis AY210450; FJ196118 Pixell, 1912 EU334129* EU334118* Hilton, 1930 Phoronopsis harmeri AF123308*; EU334130* EU334119*; EU334120 Pixell, 1912

ANNELIDA Urechis caupo AF119076R AF342804 Fisher & MacGinitie, 1928 Phascolosoma granulatum X79874R AF519279R Wesenberg-Lund, 1959 Drawida sp. HQ728964B Michaelsen, 1900

MOLLUSCA Abra alba AM774533* KC429505* (W. Wood, 1802) Astarte undata KPD68176* KP068126* Gould, 1841 Chaetoderma sp. MG595728B* AY145397B* Lovén, 1844 Dentalium octangulatum AY145372B* AY145403B* Donovan, 1804 Koreamya arcuata AB907557B; AB907558B; (A. Adams, 1856) AB907560B Koreamya sp. AB907561B Lützen, Hong & Yamashita, 2009

NEMERTEA Antarctonemertes riesgoae KF935322R KF935378 Taboada et al., 2013 Micrura fasciolata HQ856846B Ehrenberg, 1828 Ovicides paralithodis AB704416 Kajihara & Kuris, 2013 Tubulanus tamias LCO42092R AB854622B Kajihara, Kakui, Yamasaki & Hiruta, 2015

PLATHYELMINTES Discocelis tigrina U70078B* MF398475B* (Blanchard, 1847) Geocentrophora wagini AJ012509B* AY157156B* Timoshkin, 1984 Neoheterocotyle rhinobatidis AF348361B (Young, 1967) Stenostomum leucops FJ384804B* AY157151B* (Duges, 1828)

BRYOZOA Favosipora rosea FJ409605B* FJ409581B* Gordon & Taylor, 2001 Fredericella sultana JN680926B* JN681029B* Blumenbach, 1779 FJ409576B (Leidy, 1851) Plumatella emarginata KC461939B* KC462025B* Allmann, 1844

ROTIFERA bulla DQ297698B* AY829083B* (Gosse, 1851) sp. DQ297710B* DQ297748B* Ehrenberg, 1830 Sinantherina socialis AY210451B* AY210471B* (Linnaeus, 1758)

12

Table 2 Best fitting model according to PAUP*. The best model are found thanks to BIC and AICc criterions score. The best models found for each dataset considers a proportion of invariant sites and gamma distributed rate variation among sites. p.i. stands for proportion of invariant sites. These data are used to parameter the BEAST2 analyses.

lset Dataset Best model rates shape p. i. rclass rmatrix base frequencies nst 18S TRN + I + G gamma 0.49 0.26 6 abaaea 1 1.96 1 1 4.61 0.25 0.22 0.29

28S GTR + I + G gamma 0.58 0.05 6 abcdef 0.77 1.86 1.36 0.71 3.28 0.18 0.28 0.33 18S TRN + I + G gamma 0.5 0.3 6 abaaea 1 1.91 0.87 0.87 4.27 0.26 0.29 0.27 reduced 28S GTR + I + G gamma 0.69 0.2 6 abcdef 0.79 1.93 1.44 0.75 3.66 0.18 0.27 0.33 reduced

4.1.2 Bayesian analyses

The BEAST2 package (Bouckaert et al., 2014), v.2.4.8, was used for the Bayesian inference. First, BEAUti was used to create xml files processed by BEAST2 and containing the alignments (nexus format) and priors, as well as all the parameters of the MCMC chain for the analysis. This fundamental and determinant part of a Bayesian analysis requires parameters specific to the selected dataset. In total, three different xml files were created: one containing 18S rRNA alignment, the second containing 28S rRNA sequences and the third containing a concatenation of both alignments. Site model and substitution model parameters were chosen using the results of the PAUP* analysis (Table 2). The Tamurei-Nei model (TrN, also known as TN93) fits best the 18S rRNA partition and sets equal transversion rates but variable transition rates (Tamura & Nei, 1993). A model integrating different nucleotide frequencies and different rates for every change called the general time-reversible (GTR) model has been chosen for the 28S rRNA dataset. Also, a rate of variation across sites (Yang, 1994) and a proportion of invariable sites (Shoemaker & Fitch, 1989) are combined with the models, respectively indicated by +G and +I (Table 2). The relaxed clock lognormal model (Drummond et al., 2006) was used for all datasets, as well as fossil calibrations used as priors are listed in Table 3. Long MCMC chain lengths of 100 millions generations were used, with trees stored every 20 000. Analyses were either run directly on computer memory (smaller ones) or on clusters, such as Rakham, a high performance computer cluster at Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) or on CIPRES (Miller et al., 2010), the

13

Cyberinfrastructure for Phylogenetic Research. To launch analyses on UPPMAX, bash scripts are created following this structure:

#!/bin/bash #SBATCH -A [project-name] #SBATCH -p [partition-used] -n [number-of-nodes] #SBATCH -t D-HH:MM:SS #SBATCH --mail-user=[email] #SBATCH --mail-type=ALL #SBATCH -J [job name] #SBATCH -o [output file name] #SBATCH -e [error file name]

module load [ necessary modules for the analysis] java -jar [jar file of the program and its directory] - overwrite [input file and its directory]

The three xml files were processed by BEAST2, and gave as output information about the run, as a .log file and information about all the trees explored during the analysis, as a .trees file. Tracer v1.7.1 (Rambaut et al., 2018) was used to summarize parameter estimates and check for convergence of the run thanks to the .log file. Then, in order to summarize the results of the Bayesian analysis contained in the .trees file, maximum clade credibility trees were built using TreeAnnotator. Different burn-in values were chosen for each analysis, 40 million states for 18S, 10 million states for 28S and 40 million states for the combine analysis. Finally, FigTree v.1.4.3 (Rambaut, 2017) was used to visualize the final phylogenetic tree.

Table 3 Fossil calibrations used in this study. For brachiopods, the age refers to the minimum age of the crown groups, based on the oldest fossil known (Kaesler & Selden, 1997–2007; Peters & McClennen, 2016). Sometimes a maximum age value is included.

Crown Age Geological period Species Specimen Reference (MYA) number

Brachiopoda 530 Early Cambrian Aldanotreta sunnaginensis, Pelman, 1977 Pelman, 1977 Rhynchonelliformea 460.9 Middle Ordovician Dorytreta ovata, Cooper 1956 117192 Cooper, 1956 Thecideida 232 Late Triassic Thecospira tirolensis Terebratulida 419.2 Boundary / Brachyzyga absoleta 64/12904 Craniformea 467.5 Middle Ordovician Pseudocrania petropolitana, 588-42 Pander, 1830 Pander 1830 Linguliformea 242 Middle Triassic Glottidia tenuissima Discinoidea 471.8 Early Ordovician Schizotreta sp. 1, Hansen and TSGF17017 Hansen & Holmer, 2011 Holmer, 2011 Annelida 467.5 – Middle Ordovician Xanioprion viivei, Hints & GIT 424–19 Parry et al., 636.1 /Late Ediacaran Nõlvak, 2006 2014; Benton et al., 2015 Arthropoda 515 - Cambrian/ Late Yicaris dianensis, Zhang et YKLP Zhang et al., 636.1 Ediacaran al., 2007 10840 2007; Benton et a.l, 2015 532 - Early Cambrian/ Late Aldanella yanjiahensis, Chen TU YXII02- Chen, 1984; 636.1 Ediacaran 1984 02 Benton et al., 2015

14

4.1.3 RAxML analysis

Similarly to the Bayesian inference, three Maximum Likelihood analyses were launched, and support values for the phylogenetic trees were calculated using bootstrap, a method that asseses the confidence in a statistical analysis by producing random, pseudoreplicate datasets and analysing each of them to see how often a specific result is obtained (Baum & Smith, 2013). A RAxML (Stamatakis, 2014) command combining a message passing interface (MPI) and threads for parallelization was used in CIPRES: raxmlHPC- -T 4 -n [result file] -s [input file] -m [GTRCAT] -p 12345 -f a -N 1000 -x 12345 --asc-corr lewis

This command set 1000 rapid bootstrap replicates, with 12345 informing on the random seed value guaranteeing compatibility between runs. The standard ascertainment bias correction Lewis is chosen. The final best ML tree was drawn in FigTree with bootstrap values plotted.

4.2 Phylogenomic analyses

For these analyses, our own new transcriptome data (n=13) were clustered with published (n=5) and ortholog genes from Kocot et al. (2016) across Lophotrochozoa, known as the LB106 subset (n=106). The latter are protein coding metabolic genes, which were corrected for heterogeneous evolutionary rates and were analysed as individual partitions. The full phylogenomic matrix consisted of 103 taxa with 75% matrix occupancy, and 22 000 amino acid sites.

4.2.1 Dataset and model choice

The final dataset used for the study was reduced, which considerably decreased the computation time of the analysis. Preferably, taxa with a poor fossil record were removed from the analysis. Sequences used for the phylogenomic analyses are summarized in the Table 4. The best fitting evolutionary model for this dataset, LG, was determined using PartitionFinder (Lanfear et al., 2017). This empirical model for amino acid substitution and gamma model of rate heterogeneity accounts for variability of evolutionary rates across sites.

15

Table 4 Species names in alphabetical order, authorship, GenBank accession numbers and references of taxa included in this phylogenomic study.

Species name Phyla Accession Number(s) / Autorship References Version / URL / Annelida JGI v1.0 Blake, Grassle and Kocot et al., 2016 Eckelbarger, 2009 Crassostrea gigas Mollusca v9 protein models (Thunberg, 1793) Kocot et al., 2016 pulex Arthropoda JGI filtered gene Leydig, 1860 Kocot et al., 2016 models v1.1 Drosophila melanogaster Arthropoda Inparanoid 7.0 Meigen, 1830 Kocot et al., 2016 processed sequences Glottidia pyramidata Brachiopoda SRX731468 (Stimpson, 1860) Kocot et al., 2016 Hemithiris psittacea Brachiopoda SRX731469 (Gmelin, 1791) Kocot et al., 2016 Kraussina rubra Brachiopoda PRJNA289348 (Pallas, 1760) Laumer et al., (BioProject) 2015 Laqueus erythraeus Kocot Brachiopoda Dall, 1920 Lineus lacteus SRX565174- (Rathke, 1843) Kocot et al., 2016 SRX565175 Lineus ruber SRX565182- (Müller, 1774) Kocot et al., 2016 SRX565183 Lingula anatina Brachiopoda Larmarck, 1801 Luo et al., 2015 Lingula Gert Brachiopoda Bruguière. 1791 Liothyrella uva Brachiopoda (Broderip, 1833) Magellania venosa Brachiopoda (Dixon, 1789) Mytilus edulis Mollusca SRX565221- Linnaeus, 1758 Kocot et al., 2016 SRX565224 Novocrania anomala Brachiopoda SRX731472 (O.F. Müller, 1776) Kocot et al., 2016 Octopus vulgaris Mollusca http://datadryad.org/h Cuvier, 1797 Kocot et al., 2016 andle/10255/dryad.34

644 Phoronis architecta Phoronida Andrews, 1890 Phoronis australis Phoronida Haswell, 1883 Phoronis vancouverensis Phoronida SRX731474 Pixell, 1912 Kocot et al., 2016 Phoronopsis harmeri Phoronida Pixell, 1912 Pomatoceros lamarckii Annelida SRR516531 (Quatrefages, 1866) Kocot et al., 2016 Priapulus caudatus Cephalorhynch SRX731477 Lamarck, 1816 Kocot et al., 2016 a Terebratalia transversa Brachiopoda (Sowerby, 1846) Tubulanus polymorphus Nemertea SRX732127 Renier, 1804 Kocot et al., 2016 Halanych Tubulanus polymorphus Nemertea SRX534879- Renier, 1804 Kocot et al., 2016 Struck SRX534881 X1 BC Laqueus Brachiopoda Not published yet Davidson, 1887 This study vancouveriensis X1 SF Hemithiris psittacea Brachiopoda Not published yet (Gmelin, 1791) This study X2 Liothyrella sp S Ork Brachiopoda Not published yet Thomson, 1916 This study X2 TN Terebratulina sp Brachiopoda Not published yet dOrbigny, 1847 This study X3 Pictothyris picta Japan Brachiopoda Not published yet (Dillwyn, 1817) This study X3 Terebratulina Brachiopoda Not published yet (Couthouy, 1838) This study septentrionalis X4 GM Mollusc Mollusca Not published yet This study X4 lingula reevi Hawaii Brachiopoda Not published yet Davidson, 1880 This study X4 Pla1 Platidia sp Japan Brachiopoda Not published yet Costa, 1852 This study X5 Laqueus blanfordi Japan Brachiopoda Not published yet (Dunker, 1882) This study X5 P2 Pictothyris picta Brachiopoda Not published yet Thomson, 1927 This study X6 PA3 Pajaudina atlantica Brachiopoda Not published yet Logan, 1988 This study

16

4.2.2 RAxML analysis

Maximum Likelihood analysis of complete and reduced sets were undertaken, using the -CAT-LG ProtGamma model and substitution amino matrix. To find a single best tree, 350 rapid bootstrap replicates and subsequent thorough ML were done. Similarly to the rRNA datasets, a RAxML version combining a message passing interface and threads for parallelization was used: raxmlHPC-HYBRID_8.2.12_comet -s infile -n result -q partition -k -f a -N 350 -p 12345 -x 12345 -m PROTCATLG

Alternatively to the LG empirical model, another ML analysis was carried out using Dayhoff recoding (Dayhoff, 1978), which consists of separating the amino acids in six groups with distinct biochemical properties. This recoding process is used to reduce the effects of substitution saturation and compositional heterogeneity (Hrdy et al., 2004). The highest likelihood trees and their corresponding bootstrap values were drawn in FigTree.

4.2.3 Bayesian inference for phylogenetic reconstruction

The Bayesian analysis was performed using the CAT-GTR model (Lartillot & Philippe, 2004) implemented in PhyloBayes (Lartillot & Philippe, 2004 and 2006; Lartillot et al., 2007). This model is particularly adapted for amino acid alignments as it accounts for site-specific features of protein evolution. The PhyloBayes analysis were run on Stanford’s cluster Sherlock, using a bash script: mpirun -n [number of processes running in parallel] pb_mpi -d [alignment in PHYLIP format] -cat -gtr [chain name]

Convergence of the chain was controlled using bpcomp and tracecomp programs implemented in PhyloBayes. The readpb program produced output summarizing files, such as a majority-rule posterior consensus tree.

4.2.4 Bayesian inference for molecular dating

Molecular dating in PhyloBayes requires a fixed tree topology. From this tree topology and by specifying the outgroups to root the tree, information related to calibration can be added to a new analysis to obtain the divergence time. The computation time for these two analyses

17

(topology and molecular dating) in PhyloBayes was longer than the time frame of this thesis. For this reason, the molecular dating analysis was run using the BEAST2 package, utilizing the majority-rule posterior consensus tree from PhyloBayes as starting tree. This calibration analysis was run with the reduced dataset containing the transcriptomes of 38 taxa, the same sequences used for the Bayesian inference for phylogenetic reconstruction. The gamma site model was used, with substitution rate, shape and proportion of invariant estimated. The Dayhoff substitution model was used as it was the most similar model option implemented in BEAST2 to the CAT-GTR model. The relaxed clock lognormal model was set and the clock rate was estimated. The birth- model was used, and fossil calibrations were added as originate, hence, giving information on the divergence time of this clade from his sister clade. An MCMC chain of 100 million generation was set, with trees saved every 15000 generations.

5 Results

5.1 rRNA datasets 5.1.1 Topology

The three maximum clade credibility trees have been rooted using the phyla most genetically distant from brachiopods in our dataset, Rotifera. The Sinantherina socialis was set as the outgroup. Analysing the 18S rRNA and the concatenated dataset, all the phyla are resolved as monophyletic. However, within the Brachiopoda, Linguliformea is only monophyletic in the tree resulting from the Bayesian analysis of the 18S rRNA dataset (Fig. 2). The RAxML tree has a very low support at its basal nodes, such as a bootstrap support (BS) of only 23 for the node splitting Brachiopoda from Urechis caupo, a Polychaeta. The 28S rRNA tree depicts a different topology, resolving Brachiopoda as polyphyletic (Fig. 3). Contrary to the rRNA dataset, the support value are high all over the tree. The polyphyletic of Brachiopoda is fully supported by the bootstrap replicates. The concatenated dataset depicts the same evolutionary history as 28S (Fig. 4). However, the polyphyly of Brachiopoda is poorly supported by the bootstrap replicates. Interestingly, in all the RAxML analyses, Linguliformea are depicted as polyphyletic.

18

Figure 2 Cladogram depicting phylogenetic relationships among Brachiopoda and relatives based on 18S rRNA. Originated from RAxML analyses. Support values indicated for each node are the result of Maximum Likelihood bootstrap replicates. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea. LB added to the taxa name indicates a long branch leading to the tip

19

Figure 3 Cladogram depicting phylogenetic relationships among Brachiopoda and relatives based on 28S rRNA. Topology originated from RAxML analyses. Support values indicated for each node are the result of Maximum Likelihood bootstrap replicates. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

20

Figure 4 Cladogram depicting phylogenetic relationships among Brachiopoda and relatives based on a concatenated dataset of 18S rRNA and 28S rRNA. Topology originated from RAxML analyses. Support values indicated for each node are the result of Maximum Likelihood bootstrap replicates. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea. LB added to the taxa name indicates a long branch leading to the tip.

21

5.1.2 rRNA molecular dating

The three maximum clade credibility trees depict different but overall very early emergence dates for all clades (Figs. 5-7). The root age is older than 1.4 billion years old for the three datasets (18S=1487.59 Mya, 28S=1532.95 Mya, 18S+28S=1539.48 Mya). The emergence date for the Brachiopoda is found around 850 Mya (18S=813.52 Mya, 28S=886.65 Mya, 18S+28S=875.03f Mya). The results for the dating of the different orders fluctuate more, as the emergence of the Rhynchonelliformea has been estimated to be between 549.3 Mya (28S) and 718.53 Mya (18S). This 150 Mya difference is also found for Craniformea with 500.98 Mya found for 18S to 567.6 found for the concatenated dataset and 661.53 found for 28S.

Figure 5 Dated maximum clade credibility trees depicting phylogenetic relationships among Brachiopoda and relatives based on a concatenated dataset of 18S rRNA and 28S rRNA. Topology and node ages originated from BEAST2 analyses. Node ages are indicated in million years. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea

22

Figure 6 Dated maximum clade credibility trees depicting phylogenetic relationships among Brachiopoda and relatives based on18S rRNA. Topology and node ages originated from BEAST2 analyses. Node ages are indicated in million years. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

23

Figure 7 Dated maximum clade credibility trees depicting phylogenetic relationships among Brachiopoda and relatives based on 28S rRNA. Topology and node ages originated from BEAST2 analyses. Node ages are indicated in million years. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

24

5.2 Phylogenomic datasets

5.2.1 PhyloBayes topology

This tree has been rooted using the scalidophoran Priapulus caudatus as the outgroup (Fig. 8). All the Phyla are resolved as monophyletic, with relatively high support. Nevertheless, the bootstrap value for the monophyly of Brachiopoda is 0.7. Within Brachiopoda, Linguliformea and Rhynchonelliformea are monophyletic. However, the two taxa comprising the data subset for the order Craniformea are located at different parts of the tree. Novocrania anomalais resolved as a sister taxon to Linguliformea, with a bootstrap support of 0.9. The other Craniformea (X2_BC_Novocrania_sp) forms a clade with Kraussina rubra, a Rhynchonlliformea, with a high support value of almost of 1. The branch of the taxon X2_BC_Novocrania_sp is excessively long.The orders in Rhynchonelliformea are not monophyletic: Terebratulida form a paraphyletic group, as Pajaudina atlantica, a Thecideida is found among the Terebratulida.

Figure 8 Phylogenetic relationships among Brachiopoda and relatives based on a phylogenomic dataset. Topology, branch lengths and support values originated from PhyloBayes analyses. Support for each node are indicated as posterior probabilities.

25

5.2.2 RAxML LG and Dayhoff

The highest likelihood trees of the RAxML LG (Fig. 9) and RAxML Dayhoff (Fig. 10) have the same topology and identical support values, which are overall very high. Moreover, the topology of the tree for the Brachiopoda clade is fully supported by the bootstrap values. As for the PhyloBayes analysis, the two trees have been rooted with Priapulus caudatus, however the topology of the outgroups is different in the RAxML trees because Nemertea is resolved as the sister group to Mollusca. This node has a quite low support value of 55. For the RAxML analysis, the taxa X2_BC_Novocrania_sp was removed. The monophyly of Linguliformea and Craniformea is fully supported by these two Maximum Likelihood trees. Interestingly, a long branch is connected to Kraussina rubra, a Rhynchonelliformea. These same taxa formed a clade with the problematic Craniformea, X2_BC_Novocrania_sp.

Figure 9 Phylogenetic relationships among Brachiopoda and relatives based on a phylogenomic dataset. Topology, branch lengths and support values originated from RAxML analyses, using the LG model. Support for each node are indicated as Maximum Likelihood bootstrap values. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

26

Figure 10 Phylogenetic relationships among Brachiopoda and relatives based on a phylogenomic dataset. Topology, branch lengths and support values originated from RAxML analyses, using Dayhoff recoding. Support for each node are indicated as Maximum Likelihood bootstrap values. Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

5.2.3 Molecular dating

The tree depicts very early divergence dates for the brachiopods and the other Lophotrochozoan phyla (Fig.11). The age of the root is calculated at more than 1.1 billion years ago (1103.48 Mya). The divergence date of Brachiopoda and their sister group Phoronida is estimated at 612.97 Mya, and the age of 591.77 Mya is calculated for Brachiopoda. The clade comprising the orders Linguliformea and Craniformea is 518.69 Ma old. Linguliformea is dated at 225.94 million years, twice younger as the Rhynchonelliformea (450.75). The node bars indicating the support for the age values calculated are very large, around 100 million years for the nodes indicating the divergence of Phoronida and Brachiopoda as well as for the Brachiopoda. The error interval gets larger towards deeper nodes of the tree, reaching more than 200 million years at the divergence event between Annelida and the other phyla. The

27

younger nodes do not have however, greater support. For example, the Liothyrella uva and Liothyrella sp. event is set at 70.86 Mya but the error bars indicate 95% confidence interval of 150 Ma.

Figure 11 Dated maximum clade credibility trees depicting phylogenetic relationships among Brachiopoda and relatives based on a phylogenomic dataset. Topology, branch lengths and node ages originated from BEAST2 analyses. Node ages are indicated in million years, and 95% confidence intervals are drawn for each nodes as blue bars.Branches leading to Brachiopod taxa are highlighted, in dark green for Craniformea, in lighter green for Linguliformea and in blue for Rhychonelliformea.

.

28

6 Discussion

6.1 Limitations of rRNA datasets in Brachiopod phylogenetics

In phylogenetic studies, rRNA datasets are a powerful tool for analysing the evolutionary relationships of the Metazoa, such as identifying the phylogenetic position of particular clades (e.g. Halanych et al.,1995). In our study, using rRNA datasets, various topologies (monophyly or polyphyly) for the Brachiopoda have been obtained through Maximum Likelihood and Bayesian analyses. The analyses of the 18S dataset (Fig. 2 and Fig. 6) resolved the Brachiopoda as monophyletic. Within the Brachiopoda the suborder Linguliformea is polyphyletic, while the other two suborders Craniformea and Rhynchonelliformea are monophyletic. However, due to the low support values calculated for the majority of the internal nodes of this tree (Fig. 2), this topology should not be considered valid. When using 28S rRNA and the concatenated dataset (18S rRNA and 28S rRNA), depicting Brachiopoda as polyphyletic, with a low bootstrap value for the concatenated dataset but fully supported topology in the 28S analysis (Fig. 3 and Fig. 4). In previous studies, 18S and 28S rRNA have also been used to investigate Brachiopoda and Lophotrochozoan relationships (e.g. Cohen, 2000; Cohen and Weydmann, 2005; Passamaneck et al., 2006), where similar topologies were found and deemed inaccurate. The combined dataset (18S and 28S) resolved brachiopods as polyphyletic. A first work resolved Brachiopoda and Phoronida as a monophyletic group (Cohen and Weydmann, 2005), while another found this Phylum nested between Phoronida, Mollusca and Annelida (Passamaneck et al., 2006). Nevertheless, these results were due to a lack of phylogenetic signal and later disputed (e.g. Sperling et al.,2011; Kocot et al., 2017). However, the Brachiopoda were resolved as monophyletic when systematic error is considered, and a sufficient phylogenetic signal is ensured. Systematic errors due to, for example, the heterogeneity of nucleotide composition and rate variation across lineages (Som, 2014), can be corrected by using appropriate models, increasing or changing the taxa sampling and removing fast evolving genes. Moreover, several indices have been developed to test and quantify the phylogenetic signal (Münkemüller et al., 2012). Thus, the results obtained in this study are consistent with previous observations from rRNA analyses. Our results corroborate the lack of phylogenetic signal in ribosomal gene data, making them inappropriate for phylogenetic analyses. In addition, unusually long branch lengths have been recovered in our phylogenetic trees for some taxa (especially for the outgroups Platyhelmintes, Annelida, Bryozoa and Mollusca (Fig. 2 and Fig. 4)). This introduces

29

the problem of the phylogenetic tree building artefacts known as long-branch attraction (LBA, Hendy and Penny, 1989). This pattern occurs when higher evolutionary rates occur in few branches, grouping the descendants of these branches in a clade, even if these are unrelated (Baum & Smith, 2013). Such LBA artefacts impact the topology of the tree and have consequence on the depicted evolutionary history. Nevertheless, solutions exist to suppress LBA, such as removing problematic taxa or even adding closely related taxa to break long branches. Also make sure to use fitting evolutionary models before analysis and carefully selecting genetic markers prior to the sequencing ensure minimization of the LBA effects.

6.2 BEAST2: a problematic dating methodology

Using the phylogenomic dataset, a well-supported topology (Figs. 8, 9 and 10) and a dated phylogenetic tree have been generated (Fig. 11). However, unrealistic dating values were obtained for the basal nodes of the tree, if we consider the extensive fossil record of the Brachiopoda. Dating the molecular phylogeny of the Brachiopoda using the software package BEAST2 has shown its limitations in this work. At the current state of the study, a dated tree has been produced, but we still cannot accurately study the early rate of evolution of the Brachiopoda using rRNA and phylogenomic data with the methodologies developed for BEAST2. The results obtained for all datasets, including single rRNA datasets, concatenated S18 and S28 rRNA and phylogenomic material, are in disagreement with the extensive fossil record of the Brachiopoda and the early Metazoa and cannot be trusted. Values obtained reach dates of over 1 billion years for the divergence node between the and the other taxa included in the phylogenomic study. The fossil record can provide useful insight for the validity of the results obtained by studies focused on molecular data (phylogenomics, rRNA). For instance, the controversial oldest protostome Kimberella (Fedonkin et al., 1997) has been dated between 550.25 and 636.1 Ma but has been used as a calibration point for previous studies (e.g. Benton et al., 2015). When setting calibration points for the Brachiopoda through their fossil record at 530 Ma, the posterior dating result estimates the age of the clade at 591 Ma. As for all the fossil calibrations, the posterior ages were always estimated as relatively older to the parameters set prior to the analysis. It seems here that BEAST2 always overestimated the ages, indicating large error bars but a discrepancy of 50 Ma is indication that the methodology can potentially be useful with further analysis and appropriate parameters.

30

I attempted to replicate the study from Lee et al. (2013), without success, as this analysis was launched in a previous version of BEAST that is no longer available today (Drummond et al., 2012). Being able to replicate this previous work would have been useful to fully comprehend the process of molecular dating in BEAST2. It is very likely that our results are impacted by incorrectly set priors, and more specifically time priors. The software used for molecular dating is quite complex to adjust and get accurate results from. The validity of the results also depends on very specific priors for each dataset. Even though parameters set for the study were chosen because of their consistency, the analyses might have been wrongly calibrated. It is surprising that posterior dating values vary that much to the fossil calibration priors set for the study (Table 3). The likelihood that issues with the prior parameters are what is producing perturbations in our models is further reinforced by the fact that significant time and computational power was required to obtain a convergence of the run i.e. for the posterior parameters of the analysis to reach stable values. A considerably high amount of MCMC (Markov chain Monte Carlo) runs were chosen (100 millions) as well as high burn-in values (up to 40%, i.e.40 millions) were set to compute the maximum clade credibility tree. Several studies put emphasis on the impact of uncertainties of prior calibrations to the validity of the posterior probabilities obtained (e.g. Warnock et al., 2014; Guindon, 2018). Fossil calibration is a delicate task that requires full control of the parameters and extra care for the calibration of the priors.

6.3 Future studies

The phylogenomic dataset analysis resulted in well supported trees, especially after removing one problematic Craniformea causing LBA artifacts (e.g. X2_BC_Novocrania_sp, containing a great proportion of missing data). Using a reduced LB106 dataset added resolution to the final phylogenetic tree. The full dataset was composed with several low occupancy taxa, adding uncertainties to the final study result. However, future studies should include several Discinoidea taxa, to finally include specimen samples from every brachiopod order. We will be able to study more in detail the relationships among this phylum and will be able to add fossil calibrations in shallower nodes. With the current methodology, it was not possible without further investigation to discard or corroborate the hypothesis of fast evolutionary rates in early Brachiopod evolution. Nevertheless, this hypothesis will be later tested using other software. For example, MCMCtree

31

(dos Reis and Yang, 2019), also using molecular clock models combined with soft fossil constraints to obtain species divergence time could be used to compare results with BEAST2. It would also be very interesting to investigate the result of a PhyloBayes molecular dating analysis, as this software uses other substitution models and requires setting fewer parameters and constraints. Additionally, morphological data should be taken into consideration and further utilised in dating and calibrating phylogenomic studies (e.g. tip dating). These could be used to infer the validity of phylogenetic trees. And lastly, morphological evolution can later be mapped on the molecular tree, as a total evidence type study (Eernisse et al., 2013).

7 Conclusion

For the time being, this study has encountered some limitations of BEAST2 to investigate brachiopod evolutionary history. As correct estimations of early rates of evolution are only possible by means of correctly dated trees, further study will be required to clarify, as has been shown for the Arthropoda (Lee et al., 2013), the effect of the Push of the Past in the evolution of the Brachiopod phylum. While accurate dating methodologies still need to be determined, promising, well-supported results for a Brachiopod phylogeny have been obtained in this work. The results obtained reinforce the value of using phylogenomic data in phylogenetic analyses.

8 Acknowledgments

I would like to thank Aodhàn Butler and Graham Budd for this opportunity. You have always been available when I needed help and advice and I really appreciated that you always were so positive. Thanks for trusting me with this project. I also would especially like to thank Anneli, Bobby, Diem, Gianni, Jenni, Katerina and Raquel for everything they did for me during this thesis. Your presence, advice and friendship have helped me to chill and feel more confident. I would also like to thank many people from systbio. Thank you Iker and Martin for your help with bioinformatics, thank you Mikael and Petra for the fun and thank you Nahid for caring so much. I am also grateful to my opponents Eliott Capel and Michael Streng, who came up with interesting comments on my work. Moerover, fewer results would have been presented without the Stanford University and the Stanford Research Computing Center which provided computational resources and support that contributed to these research results. Enfin, merci à ma famille pour leur soutien indéfectible.

32

8 References

Akaike, H. (1974). A new look at the statistical model identification. In: Selected Papers of Hirotugu Akaike. Springer, New York, NY, pp. 215-222.

Baum, D. A., & Smith, S. D. (2013). Tree thinking: an introduction to phylogenetic biology. Greenwood Village (CO): Roberts.

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. (2008). GenBank. Nucleic acids research, vol. 37, pp. D26-D31. DOI: 10.1093/nar/gkn723.

Benton, M. J., Donoghue, P. C. J., Asher, R. A., Friedman, M., Near, T. J. & Vinther, J. (2015). Constraints on the timescale of animal evolutionary history. Palaeontologia Electronica, 18.1.1FC. DOI : 10.26879/424.

Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C-H., Xie, D., Suchard, M. A., Rambaut, A., & Drummond, A. J. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, vol. 10. DOI:10.1371/journal.pcbi.1003537.

Brusca., R. C., Moore, W., & Shuster, S. M. (2016). . 3rd ed. Sunderland.

Budd, G. E., & Jensen, S. (2000). A critical reappraisal of the fossil record of the bilaterian phyla. Biological Reviews, vol. 75, pp. 253-295. DOI: 10.1017/S000632310000548X.

Budd, G. E., & Mann, R. P. (2018). History is written by the victors: The effect of the push of the past on the fossil record. Evolution, vol. 72, pp. 2276-2291. DOI:10.1111/evo.13593.

Carlson, S. J. (2016). The evolution of Brachiopoda. Annual Review of Earth and Planetary Sciences, vol. 44, pp. 409-438. DOI: 10.1146/annurev-earth-060115-012348.

Chen, P. (1984). Discovery of lower Cambrian small shelly fossils from Jijiapo, Yichang, West Hubei and its significance (in Chinese). Professional Papers of Stratigraphy and Palaeontology, vol. 13, pp. 49-64.

Cohen, B. L. (2000). Monophyly of brachiopods and : reconciliation of molecular evidence with Linnaean classification (the subphylum Phoroniformea nov.). Proceedings of the Royal Society B: Biological Sciences, vol. 267, pp. 225–231. DOI: 10.1098/rspb.2000.0991.

Cohen, B. L. & Weydmann A. (2005). Molecular evidence that phoronids are a subtaxon of brachiopods (Brachiopoda: Phoronata) and that of metazoan phyla began long before the early Cambrian. Organisms Diversity & Evolutio., vol. 5, pp. 253–273. DOI: 10.1016/j.ode.2004.12.002.

Cooper, G. A. (1956). Chazyan and related brachiopods. Smithsonian Miscellaneous Collections, vol. 127, pp. 1-1024.

Darwin, C. (1987). 's notebooks, 1836-1844: geology, , metaphysical enquiries. Ed. Paul H. Barrett et al. Ithaca, NY, Cornell University Press.

33

Dayhoff, N. O. (1978) in Dayhoff, N.O. (ed.) Atlas of protein sequences and structure, In Vol 5, suppl. 3. National Biomedical Research Foundation, Washington DC, pp. 345-352. dos Reis, M. & Yang, Z. (2019). Bayesian molecular clock dating using genome- scale datasets. In: Anisimova M (ed.) Evolutionary Genomics: Statistical and Computational Methods, Springer (in press).

Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. (2006). Relaxed phylogenetics and dating with confidence. PLoS Biology, vol. 4 (5). DOI:10.1371/journal.pbio.0040088.

Drummond, A. J., Suchard, M. A., Xie, D., and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, vol. 29, 1969–1973.

Dunn, C. W., Giribet, G., Edgecombe, G. D. & Hejnol, A. (2014). Animal phylogeny and its evolutionary implications. Annual review of ecology, evolution, and , vol. 45, pp. 371-395. DOI: 10.1146/annurev-ecolsys-120213-091627.

Eddy, S. R. & the HMMER development team. (2018). HMMer Software. Version 3.2.1. Available: http://hmmer.org.

Eernisse, D. J., & Kluge, A. G. (1993). Taxonomic congruence versus total evidence, and amniote phylogeny inferred from fossils, molecules, and morphology. Molecular Biology and Evolution, vol. 10, pp. 1170-1195. DOI: 10.1093/oxfordjournals.molbev.a040071.

Ekman, T. (1896). Beiträge zur kenntnis des stieles der Brachiopoden. Zeitschrift für wissenschaftliche Zoologie, vol. 6, pp. 169-249.

Erwin, D. H. (1993). The great crisis: life and death in the Permian. Columbia University Press.

Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N. & Regev, A. (2011). Full-length transcriptome assembly from RNA-seq data without a reference genome. Nature Biotechnology, vol. 29, pp. 644-652. DOI: 10.1038/nbt.1883.

Fedonkin, M. A. & Waggoner, B. M. (1997). The Late fossil Kimberella is a mollusk-like bilaterian organism. Nature, vol. 338, pp. 868-871.

Foote, M. & Sepkoski, J. J. (1999). Absolute measures of the completeness of the fossil record. Nature, vol. 398, pp. 415-417.

Grabherr, M. G., Haas, B .J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Eernisse, D. J., Albert, J. S. & Anderson, F. E. (1992). Annelida and Arthropoda are not sister taxa: a phylogenetic analysis of Spiralian metazoan morphology. Systematic Biology, vol. 41, pp. 305-330. DOI: 10.2307/2992569

Giribet, G. (2008). Assembling the lophotrochozoan (= spiralian) tree of life. Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 363, pp. 1513-1522. DOI: 10.1098/rstb.2007.2241.

34

Gould, S., & Calloway, C. (1980). Clams and Brachiopods-Ships that Pass in the Night. Paleobiology, vol. 6, pp. 383-396. DOI: 10.1017/S0094837300003572.

Guindon, S. (2018). Accounting for calibration uncertainty: Bayesian molecular dating as a “doubly intractable” problem. Systematic Biology, pp. 651-661. DOI: 10.1093/sysbio/syy003

Halanych, K. M., Bacheller, J. D., Aguinaldo, A. M., Liva, S. M., Hillis, D. M. & Lake, J. A. (1995). Evidence from 18S ribosomal DNA that the Lophophorates are protostome animals. Science, vol. 267, pp. 1641-1643. DOI: 10.1126/science.7886451.

Hansen, H. and Holmer, L. E. (2011). and biostratigraphy of Ordovician brachiopods from northeastern Ny Friesland, Spitsbergen. Zootaxa, vol. 3076, pp. 1-122.

Hausdorf, B., Helmkampf, M., Meyer, A., Witek, A., Herlyn, H., Bruchhaus, I., Hankeln, T., Struck, T. H. & Lieb, B. (2007). Spiralian phylogenomics supports the resurrection of Bryozoa comprising Ectoprocta and . Molecular Biology and Evolution, vol. 24 , pp. 2723- 2729. DOI: 10.1093/molbev/msm214.

Hejnol, A., Obst, M., Stamatakis, A., Ott, M., Rouse, G. W., Edgecombe, G. D., Martinez, P., Baguñà, J., Bailly, X., Jondelius, U., Wiens, M., Müller, W. E., Seaver, E., Wheeler, W. C., Martindale, M. Q., Giribet, G. & Dunn, C. W. (2009). Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B: Biological Sciences, vol. 276, pp. 4261-4270. DOI: 10.1098/rspb.2009.0896.

Hendy, M. D. & Penny, D. (1989). A framework for the quantitative study of evolutionary trees. Systematic , vol. 38, pp. 297-309. DOI: 10.2307/2992396.

Hrdy, I., Hirt, R. P., Dolezal, P., Bardonová, L., Foster, P. G., Tachezy, J. & Embley, T. M. (2004). Trichomonas hydrogenosomes contain the NADH dehydrogenase module of mitochondrial complex I. Nature, vol. 432, pp. 618-621.

Hull, P. M., Norris, R. D., Bralower, T. J., & Schueth, J. D. (2011). A role for chance in marine recovery from the end- extinction. Nature Geoscience, vol. 4, pp. 856–860.

Jukes, T. H. & Cantor, C. R. (1969). Evolution of Protein Molecules. New York: Academic Press. pp. 21–132.

Kaesler, R. L., Selden, P. A., eds 1997-2007. Treatise on Paleontology, Part H (Revised): Brachiopoda, Vols 1-6. Boulder, CO/Lawrence, KS: Geol. Soc. Am./Univ. Kansas.

Katoh, K. & Standley, D. M. (2013). MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution, vol. 30, pp. 772-780. DOI: 10.1093/molbev/mst010.

Kocot, K. M., Struck, T. H., Merkel, J., Waits, D. S., Todt, C., Brannock, P. M., Weese, D. A., Cannon J. T., Moroz, L. L., Lieb, B. & Halanych, K. M. (2017). Phylogenomics of Lophotrochozoa with consideration of systematic error. Systematic Biology, vol. 66, pp. 256- 282. DOI: 10.1093/sysbio/syw079.

35

Lamarck, J. B. P. A. (1809). Philosophical zoology: An exposition with regard to the natural history of animals. Chicago, Chicago, EEUU.

Lanfear, R., Frandsen, P. B., Wright, A. M., Senfeld, T. & Calcott B. (2017). PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Molecular Biology and Evolution, vol. 34, pp. 772-773. DOI: 10.1093/molbev/msw260.

Lartillot, N. & Philippe, H. (2004). A bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution. vol. 21, pp. 1095-1109. DOI:10.1093/molbev/msh112.

Lartillot, N. & Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, vol. 55, pp. 195-307. DOI: 10.1080/10635150500433722.

Lartillot, N., Brinkmann, H. & Philippe, H. (2007). Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evolutionary Biology, vol. 7. DOI: 10.1186/1471-2148-7-S1-S4.

Laumer, C. E., Bekkouche, N., Kerbl, A., Goetz, F., Neves, R. C., Sørensen, M. V., Kristensen, R.M., Hejnol, A., Dunn, C.W., Giribet, G. & Worsaae, K. (2015). Spiralian phylogeny informs the evolution of microscopic lineages. Current Biology, vol. 25, pp. 2000-2006. DOI: 10.1016/j.cub.2017.11.026

Lee, M. S. Y., Soubrier J. & Edgecombe, G. D. (2013). Rates of Phenotypic and Genomic Evolution during the Cambrian Explosion. Current Biology, vol. 23, pp. 1889-1895. DOI: 10.1016/j.cub.2013.07.055.

Lee, M. S. & Ho, S. Y. (2016). Molecular clocks. Current Biology, vol. 26, pp. 399-402.

Lyell, C. (1832). The Principles of Geology (Vol. 3). London: Murray.

Miller, M. A., Pfeiffer, W., & Schwartz, T. (2010). Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In 2010 Gateway Computing Environments workshop (GCE), pp. 1-8. DOI: 10.1109/GCE.2010.5676129.

Morris, S. C. (1989). Burgess Shale faunas and the Cambrian explosion. Science, vol. 24, pp. 339-346. DOI: 10.1126/science.246.4928.339.

Münkemüller, T., Lavergne, S., Bzeznik, B., Dray, S., Jombart, T., Schiffers, K., & Thuiller, W. (2012). How to measure and test phylogenetic signal. Methods in Ecology and Evolution, vol. 3, pp. 743-756. DOI: 10.1111/j.2041-210X.2012.00196.

Nee, S., May, R. M., & Harvey, P. H. (1994). The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, vol. 344, pp. 305-311.

Nesnidal, M. P., Helmkampf, M., Meyer, A., Witek, A., Bruchhaus, I., Ebersberger, I., Hankeln, T., Lieb, B., Struck, T. H. & Hausdorf, B. (2013). New phylogenomic data support the

36

monophyly of Lophophorata and an Ectoproct- clade and indicate that Polyzoa and Kryptrochozoa are caused by systematic bias. BMC evolutionary biology, vol. 13 (253).

Pander, C. H. (1830). Beitrage zur Geognosie des Russischen Reiches. St. Petersburg.

Pauling, L., Zuckerkandl, E., & Henriksen, T. (1963). Chemical paleogenetics. Acta Chemica Scandinavica, vol. 17, pp. 9-16.

Parry, L., Tanner, A. & Vinther, J. (2014). The origin of . Palaeontology, vol. 57, pp. 1091-1103. DOI: 10.1111/pala.12129.

Passamaneck, Y. J. & Halanych, K. M. 2006. Lophotrochozoan phylogeny assessed with LSU and SSU data: evidence of lophophorate polyphyly. Molecular Phylogenetics and Evolution, vol. 40, pp. 20–28. DOI: 10.1016/j.ympev.2006.02.001.

Pelman, Y. L. (1977). Early and Middle Cambrian inarticulate brachiopods of the Siberian Platform. Trudy Instituta Geologii i Geofiziki, Sibirskoe Otdelenie, vol. 316, pp. 1-168.

Peters, S. E. & McClennen, M. (2016). The Paleobiology Database application programming interface. Paleobiology, vol. 42, pp. 1-7. DOI: 10.1017/pab.2015.39.

Rambaut, A. (2017). FigTree-version 1.4. 3, a graphical viewer of phylogenetic trees.

Rambaut, A., Drummond, A. J., Xie, D., Baele, G., & Suchard, M. A. (2018). Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Systematic biology, vol. 67, pp. 901-904. DOI: 10.1093/sysbio/syy032.

Schram, F. (1991). Cladistic analysis of metazoan phyla and athe placement of fossil problematica. In: Simonetta, A. M. & Conway-Morris, S. (ed.) The Early Evolution of Metazoa and the Significance of Problematic Taxa. UK: Cambridge University Press, pp. 35-46

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, vol. 6, pp. 461-464.

Shoemaker, J. S. & Fitch, W. M. (1989). Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. Molecular Biology and Evolution, vol. 6, pp. 270–289. DOI: 10.1093/oxfordjournals.molbev.a040550.

Som, A. (2014). Causes, consequences and solutions of phylogenetic incongruence. Briefings in Bioinformatics, vol. 16, pp. 536-548. DOI: 10.1093/bib/bbu015.

Sperling, E. A., Pisani, D., & Peterson, K. J. (2011). Molecular paleobiological insights into the origin of the Brachiopoda. Evolution & development, vol. 13, pp. 290-303. DOI: 10.1111/j.1525-142X.2011.00480.

Stamatakis, A. (2014). RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, vol. 30, pp. 1312-1313. DOI: 10.1093/bioinformatics/btu033.

37

Swofford, D. L. (2003). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

Tamura, K. & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, vol. 10, pp. 512-526. DOI: 10.1093/oxfordjournals.molbev.a040023.

Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA. Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86.

Thayer, C. W. (1985). Brachiopods versus mussels: competition, , and palatability. Science, vol. 228, pp. 1527-1528. DOI: 10.1126/science.228.4707.1527.

Valentine, J. W., & Jablonski, D. (1983). Larval and patterns of brachiopod diversity in space and time. Evolution, vol. 37, pp. 1052-1061. DOI: 10.1111/j.1558- 5646.1983.tb05632.x.

Warnock, R. C., Parham, J. F., Joyce, W. G., Lyson, T. R., & Donoghue, P. C. (2015). Calibration uncertainty in molecular dating analyses: there is no substitute for the prior evaluation of time priors. Proceedings of the Royal Society B: Biological Sciences, vol. 282, pp. DOI: 10.1098/rspb.2014.1013.

Waterhouse, R. M., Seppey, M., Simão, F.A., Manni, M., Ioannidis, P., Guennadi, K., Kriventseva, E. V. & Evgeny M. Zdobnov, E. M. (2017). BUSCO applications from quality assessments to gene prediction and phylogenomics. Molecular Biology and Evolution, vol. 35, pp. 543-548. DOI: 10.1093/molbev/msx319.

Wilke, T., Schultheiß, R., & Albrecht, C. (2009). As time goes by: a simple fool's guide to molecular clock approaches in invertebrates. American Malacological Bulletin, vol. 27, pp. 25- 46. DOI: 10.4003/006.027.0203

Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of , vol. 39, pp. 306-314.

Zhang, X.G., Siveter, D.J., Waloszek, D. & Maas, A. (2007). An epipodite-bearing crown- group from the Lower Cambrian. Nature, vol. 449, pp. 595-598. DOI: 10.1038/nature06138

38

APPENDIX: Methodological background

Collection, RNA extraction and sequencing.

After world-wide collection of specimens, whole brachiopods and tissues were preserved and protected in RNAlater at -20°C or -80°C, before being processed with an Invitrogen RNAqueous Micro Total RNA isolation kit. RNA was isolated and separated using silica centrifuge spin columns and purified using LiCl precipitation with a DNAse digestion step. The quality of the resulting RNA extract (quantification of RNA extraction amount and purity) was controlled using a NanoDrop spectrophotometer. Finally, cDNA libraries were created from raw mRNA by polyA selection and sequencing of 5-6 taxa was done using Illumina HiSeq 2500 lane with 50-60 million read depth coverage each.

Assembly of transcriptomes and phylogenomic matrix

As few reference genomes are available for Lingulida, transcriptomes were created using de- novo transcriptome assembly protocols. Illumina Hiseq 2500 paired short reads were assembled using Trinity assembly software (Grabherr et al., 2011) at Stanford’s Sherlock cluster and at Ludwig-Maximilians-Universität München (LMU). Data quality checks and standard sequences statistics, conversion of sequenced nucleic acids to amino acid were carried out using BUSCO gene set (Waterhouse et al., 2017). An ortholog gene set from a pre-existing multi- gene alignment (Kocot et al. 2016) was used to search for orthologs in new brachiopods sequences. Using a custom python script (https://github.com/wrf/supermatrix), Hidden Markov Model searches (HMMs) were generated using HMMer software (Eddy et al., 2018) for each gene partition in the phylogenomic matrix. Using a sequence similarity threshold, transcriptomes were successively analysed for matches. Then, trimmed matching sequences were aligned using MAFFT (Katoh & Standley, 2013), and overhanging aligned sections were trimmed to preserve the alignment length. The resulting single matrix and partition file output were saved in FASTA format and were then ready for phylogenetic analysis.

39

Examensarbete vid Institutionen för geovetenskaper ISSN 1650-6553