How Got Bones Comparative Genomics Reveals the Evolution of Calcification

Dissertation by

Xin Wang

In Partial Fulfillment of the Requirements

For the Degree of

Doctor of Philosophy

King Abdullah University of Science and Technology

Thuwal, Kingdom of Saudi Arabia

September, 2018 2

EXAMINATION COMMITTEE PAGE

The dissertation of Xin Wang is approved by the examination committee.

Committee Chairperson: Manuel Aranda

Committee Members: Takashi Gojobori, Christian R. Voolstra, Frederic Marin, Didier

Zoccola.

3

© September, 2018

Xin Wang

All Rights Reserved 4

ABSTRACT

How corals got bones

Comparative Genomics Reveals the Evolution of Coral Calcification

Xin Wang

Scleractinian corals represent the foundation of one of the most diverse and productive ecosystem on earth, coral reefs. Corals not only constitute the trophic basis of these ecosystems, but also provide essential habitats and shelter for a wide variety of marine species, many of which are commercially relevant. They also provide other important ecosystem services such as food provision, shoreline protection and opportunities for ecotourism. Despite the ecological importance of corals, very little is known about how their soft-bodied ancestor evolved the ability to form a calcified skeleton and became the ecosystem builders they are today.

Corallimorpharia are closely related to reef-building corals but lack the ability to form calcified skeletons. Here we assembled and annotated two draft genomes of the corallimorpharians, Amplexidiscus fenestrafer and Discosoma sp., and further provided an online interface to facilitate the use of these resources. The two genomes can not only inform on the current evolutionary gap in genomic resources for the subclass of

Hexacorallia but also provide important resources for comparative genomic studies aiming at understanding the evolution of coral specific traits. Our broad phylogenomic approach using whole genome data, including phylogenetic analyses of nuclear encoding 5

genes as well as genome-wide presence/absence information and synteny conservation from six hexacorallian species, provides robust evidence that corallimorpharians are a monophyletic sister group of scleractinians, therefore rejecting the “naked coral” hypothesis.

Being the closest non-calcifying relative of scleractinian corals, corallimorpharians appear to be the best candidates to understand the evolutionary origin of coral calcification. Molecular divergence analysis of scleractinian coral and genes suggests that the soft-bodied ancestor of corals evolved the ability to calcify within approximately 80 million years after the divergence of these two orders. To uncover the molecular basis of coral skeletal formation and growth, we integrate genomic and transcriptomic data as well as skeletal proteomic data, and show that gene and domain duplications have been the main evolutionary mechanisms underlying the evolution of calcification in scleractinian corals.

6

ACKNOWLEDGEMENTS

I would like to thank my supervisor, Dr. Manuel Aranda, for his help, guidance and support over those years. He has set me an excellent example of an easy-going friend, a knowledgeable instructor and a rigorous scientist. I am so lucky to be his Ph.D. student and feel extremely comfortable to work with him. Since I was enrolled to pursue my

Ph.D. at KAUST, his support has never faded. I still remember the days he modified my two manuscripts. Although unsatisfied with my speaking and writing, sometimes even drove him crazy, he yet showed his kindest patience and helped me as much as he can.

As a bioinformaticist, sometimes I am not entirely concentrating my analyses on biological problems, and it is him that always gives me a lot of suggestions and generously explains how I should execute to fulfill the task efficiently. Without his help, I have no way to accomplish my Ph.D. work.

I would also like to express my appreciation to my Ph.D. committee members, Dr.

Takashi Gojobori, Dr. Christian R. Voolstra, Dr. Didier Zoccola and Dr. Frederic Marin for their comments and advice. Thanks to our Postdoc, Yi Jin Liew especially. He is a talented researcher, and we cooperated and co-authored a lot of publications. I learned a lot from him, not only bioinformatics but also more focusing analyses on biological conception. I would also like to thank my friends and colleagues: Dr. Jit Ern Chen, Dr.

Yong Li, Guoxin Cui who provided me a lot of biological conceptions, and the whole lab mates to create an excellent environment. Thanks a lot to my friends, Yujing Ouyang and

Jian Cao to help me to fix my thesis. 7

Finally, my heartfelt gratitude is extended to my parents for their encouragement and to my friends for their patience and support.

8

TABLE OF CONTENTS

How Corals Got Bones...... 1

Comparative Genomics Reveals the Evolution of Coral Calcification ...... 1

EXAMINATION COMMITTEE PAGE ...... 2

ABSTRACT ...... 4

ACKNOWLEDGEMENTS ...... 6

TABLE OF CONTENTS ...... 8

LIST OF ABBREVIATIONS ...... 12

LIST OF ILLUSTRATIONS ...... 15

LIST OF TABLES ...... 19

Chapter 1 Introduction...... 20

1.1 Two hypothetical models of biomineralization ...... 21

1.1.1 The biologically induced model...... 22

1.1.2 The biologically controlled model ...... 23

1.2 The major processes involved in biologically controlled biomineralization ...... 24

1.2.1 The ions – chemical reaction with four molecules ...... 25

1.2.2 Ion delivery -- Para-cellular transport, trans-cellular transport, and particle

attachment ...... 27

1.2.3 The organic matrix ...... 28

1.3 “Naked corals” – corallimorpharians ...... 32

1.3.1 Corallimorpharia ...... 32

1.3.2 “Naked coral” Hypothesis...... 34 9

1.4 Objectives and Contributions ...... 37

Chapter 2 Draft genomes of two corallimorpharians, Amplexidiscus fenestrafer and Discosoma sp...... 39

2.1 Abstract ...... 40

2.2 Introduction ...... 40

2.3 Final Results...... 42

2.3.1 Validation of assembled genomes ...... 42

2.3.2 Functional annotation of predicted gene models ...... 43

2.3.3 Higher similarity between Corallimorpharia and Scleractinia...... 45

2.3.4 Rapid expansion of repeat sequences in Discosoma sp...... 46

2.4 Methodology ...... 49

2.4.1 DNA, RNA extraction and sequencing ...... 49

2.4.2 Sequence filtering and genome assembly ...... 50

2.4.3 Identification and classification of repeats and transposable elements ...... 52

2.4.4 Gene prediction ...... 52

2.4.5 Gene functional annotation ...... 54

2.4.6 Validation of assemblies and gene models ...... 54

2.5 Discussion ...... 55

2.6 Author’s contributions ...... 57

2.7 Data availability ...... 57

Chapter 3 Phylogenomic analyses using whole genome data from six Hexacorallia genomes reject “naked coral” hypothesis ...... 60 10

3.1 Abstract ...... 61

3.2 Introduction ...... 61

3.3 Final Results...... 64

3.3.1 Nuclear encoded genes support scleractinian monophyly ...... 64

3.3.2 Sequence independent analyses support scleractinian monophyly ...... 66

3.4 Methodology ...... 70

3.4.1 Orthologs/Paralogs group identification ...... 70

3.4.2 Nuclear single-copy ortholog phylogenies ...... 70

3.4.3 Presence/Absence analysis...... 71

3.4.4 Synteny-based phylogeny ...... 72

3.5 Discussion ...... 74

3.6 Author’s contributions ...... 77

Chapter 4 The evolution of coral calcification ...... 78

4.1 Abstract ...... 79

4.2 Introduction ...... 79

4.3 Results ...... 83

4.3.1 Short divergence time of coral calcification ...... 83

4.3.2 The evolution of plasma membrane calcium ATPase ...... 85

4.3.3 The evolution of bicarbonate transporter ...... 86

4.3.4 The evolution of carbonic anhydrase ...... 87

4.3.5 Identification of core sets of skeleton organic matrix proteins (SOMPs) .... 87

4.3.6 Coral acid-rich proteins...... 89 11

4.3.7 Crystal lattice associated proteins ...... 90

4.4 Methodology ...... 93

4.4.1 Identification of scleractinian species-specific and expanding gene families

...... 93

4.4.2 OM extraction for S. pistillata ...... 95

4.4.3 Transcriptome analysis ...... 95

4.4.4 Identification of ion transporters ...... 96

4.4.5 Identification of core SOMPs ...... 96

4.4.6 The evolution of ion-involved genes and SOMPs ...... 97

4.4.7 Immunolocalization ...... 98

4.5 Discussion ...... 99

4.5.1 The co-option of genes as evolutionary mechanism of coral calcification 100

4.5.2 Domain shuffling might not have been important for the evolution of coral

calcification ...... 102

4.6 Author’s contributions ...... 104

Chapter 5 Conclusion ...... 115

5.1 Valuable corallimorpharian resources ...... 115

5.2 The phylogenetic relationship between corallimorpharians and scleractinians 115

5.3 The cooption of coral calcification ...... 116

APPENDICES ...... 119

PUBLICATIONS ...... 124

BIBLIOGRAPHY ...... 126 12

LIST OF ABBREVIATIONS

OA Ocean acidification

ECM Extracellular matrix

NMR Nuclear magnetic resonance

COC Centers of calcification

BIM Biologically induced mineralization

BCM Biologically controlled mineralization

2- - DIC Dissolved inorganic carbon (CO2, CO3 or HCO3 )

OM Organic matrix

NGS Next generation sequence

NCBI National Center for Biotechnology Information

EVM EvidenceModeler

Mya Million years ago pH Potential of hydrogen

ML Maximum likelihood

BI Bayesian inference 13

Nt Nucleotide aa Amino acid

PIGGs Pairs of incompatible groups of genomes

Ks Synonymous substitution per synonymous site

PMCA Plasma membrane calcium ATPase

TPM Transcripts per million

SLC Solute carrier protein

AEnd Aboral endoderm

CEct Calicoblastic Ectoderm

Co Coelenteron

Sk Skeleton

CA Carbonate anhydrase

SOMPs Skeletal organic matrix proteins

CARP Coral acid-rich protein

DE Asp and Glu

NIDO Nidogen 14

AMOP Adhesion-associated domain

VWD Von Willebrand factor D

VWA Willebrand factor type A

FA58C Extracellular F5/8 type C

GO Gene Ontology

KEGG Kyoto Encyclopaedia of Genes and Genomes

15

LIST OF ILLUSTRATIONS

Figure 1. Two corallimorpharians, Amplexidiscus fenestrafer and Discosoma sp...... 42

Figure 2. Venn diagram representing the shared ortholog groups across five

hexacorallian genomes. There is extensive conservation of ortholog groups (4,952)

across all five genomes...... 47

Figure 3. Chord diagram tallying which non-corallimorpharian species has the best-

matching homolog to every corallimorpharian gene based on BLAST. The different

colours correspond to the different species as per the inner ring. Numbers on the

inner ring depict the number of genes, and the numbers on the outer ring indicate

their percentage in the respective species...... 48

Figure 4. Phylogenetic relationships according to the “naked coral” hypothesis. (SL)

marks the putative evolutionary origin of Corallimorpharia from a complex coral

ancestor through “skeleton loss”...... 63

Figure 5. Phylogenetic analysis of single copy genes. A) Phylogenetic analyses based on

amino acid sequences of single copy genes; B) Phylogenetic analysis based on

nucleotide sequences of coding single copy proteins. Values on the nodes

correspond to branch support from RAxML, MrBayes, and PhyloBayes. Node

support values depicted here were identical across all evolutionary models used. .. 66

Figure 6. Phylogenetic analyses based on gene presence/absence. The nodes supports are

AU (Approximately Unbiased) p-values and BP (Bootstrap Probability) values. The 16

distance was calculated using binary methods and hierarchical cluster analysis with

1000 bootstraps using the average Hcluster method...... 68

Figure 7. Phylogenetic analyses based on synteny conservation. The unrooted tree

(represented as rooted as in Figures 5 and 6 for more clarity) represents the topology

obtained by PhyChro for Delta 2-5. The values on the branches correspond to the

total number of PIGGs (pairs of incompatible groups of genomes) contradicting the

branch / the total number of PIGGs supporting the branch obtained for Delta 2-5. . 69

Figure 8. A) The “Classical” sequence based phylogenetic approach. This approach

determines phylogenetic relationships based on changes in amino acid (see Figure

5A) and/or nucleotide sequence (see Figure 5B). Fewer changes generally indicate

closer relationship between species. B) The presence/absence approach determines

phylogenetic relationship based on the presence or absence of genes in the genome

(see Figure 6). This approach assumes that closer related species share more

common genes (blue=present, red=absent). C) The synteny-based analysis

determines phylogenetic relationship based on the conservation of gene order in the

genome (see Figure 7). The farther species are from one another the lower the

conservation of gene order due to gene duplications, losses and genome

rearrangements...... 74

Figure 9. The main proteins and processes involved in coral calcification (from (Zoccola

et al., 2016). Calcium transported from the calicoblastic cells to the calcification site

by Ca2+-ATPase pump in exchange for protons. Transfer of bicarbonate into the sub-

calicoblastic space occurred via bicarbonate anion transporter (BAT). Bicarbonate 17

- ions were also catalyzed by carbonate anhydrase to convert CO2 into HCO3 . The

formation of aragonite skeleton is controlled by skeletal organic matrix...... 81

Figure 10. Comparative genomics across seven published cnidarian genomes. A)

Phylogeny of seven cnidarian genome sequences. The values on nodes indicate the

estimated divergence time. B) Density of pair-wise genetic divergence calculated

with synonymous substitution rate (Ks) among all possible orthologous genes across

two scleractinians and two corallimorpharians. C) GO enrichment of scleractinian

specific duplicated proteins...... 106

Figure 11. Plasma membrane calcium ATPase (PMCA). A) Phylogeny of PMCA across

hexacorallian genomes. B) Synteny of PMCA. C) Expression of PMCA across

homologs of PMCA in six genomes. D) Immunolocalization of PMCA1, PMCA2

and PMCA3 in A. fenestrafer, Discosoma sp., S. pistillata and A. viridis. AEnd =

Aboral Endoderm; CEct = Calicoblastic Ectoderm; Co = Coelenteron; Sk = Skeleton.

Nuclei are labeled in blue color. Red color represents autofluorescence of different

PMCA...... 108

Figure 12. Solute carrier 4 (SLC4). A) Phylogeny of SLC4 across six hexacorallians. B)

SLC4β, SLC4γ and their adjacent proteins across six hexacorallian genomes. C)

Immunolocalization labeled by SLC4β (left two), SLC4γ (right two, published by

Zoccola et al., 2015) in S. pistillata. AEnd = Aboral Endoderm; CEct = Calicoblastic

Ectoderm; Co = Coelenteron; Sk = Skeleton...... 110

Figure 13. Carbonate anhydrases (CAs). A) Phylogeny of CAs across six hexacorallians.

Orange labels represent the extracellular CAs, and blue labels represent intracellular 18

CAs. B) Expression of CAs across homologs in Scleractinia. Orange labels represent

the extracellular CAs, and blue labels represent intracellular CAs. C) Expression of

CAs in different developing stages of A. digitifera...... 112

Figure 14. Comparison of the proteins identified in four skeletal proteomic studies. ... 113

Figure 15. Coral acid-rich proteins (CARPs). A) Phylogenetic analysis of CARP4. B)

The abundance of aspartic and glutamic acid residues across the homologs of

CARP4s...... 114 19

LIST OF TABLES

Table 1. Genome and annotation statistics for A. fenestrafer and Discosoma sp...... 58

Table 2. Functional annotation of gene models from five cnidarian species...... 59

20

Chapter 1 Introduction

Corals are the unique foundation species of one of the most biological diverse and productive ecosystem on the earth. The aragonite skeletons produced by reef-building corals form extensive structures that support more species than any other marine habitat.

12 Yielding 10 kg calcium carbonate (CaCO3) per year and covering an area of about

284,300 km2 (Milliman and Droxler, 1996), their hard crystalline skeletons shelter adjacent coastlines from surges and tropical storms. Nearly half of a billion people, who live within the range of 100 kilometers from coral zones (NOAA coral reef), benefit from their productivity and protection. Coral skeletons are also good bio-implants for bone surgery (Demers et al., 2002). Unfortunately, in the past decades, coral reefs have been suffering from climate change and intensifying human activities. To better understand their potential for adaptation under further climate change scenarios (Adkins et al., 2003), it is fundamental to unravel the evolutionary history of coral calcification, the key to resolve how the soft-bodied ancestor of corals became the ecosystem builder it is today.

Coral calcification, or bio-mineralization in general, is a global biological and geochemical process, which allows to form external skeletons through the transformation of dissolved inorganic carbonate and calcium into calcium carbonate

(Bertucci et al., 2013). Numerous studies have already investigated the molecular aspects of coral calcification, which particularly concentrated on the initial nucleation, subsequent CaCO3 crystal deposition and highly ordered spatial framework (Falini et al.,

2015). Coral skeletogenesis builds the essential 3D scaffolding structures primarily made layer upon layer of calcium carbonic compounds. Once coral skeletons form, they not 21

only establish the major components of reef architecture, but also play an indirect physiological role in scattering and absorption of light to fuel symbiosis (Enríquez et al.,

2005). Studying the process of coral calcification could also help us to better understand how corals responded to past environments and provide significant insights into the potential consequences of rapid ongoing climate change. The details of these bio- mineralization processes are gradually better understood but gaps in knowledge still remain, including the question of how the soft coral tissue transfers soluble ions to the calcifying fluid and becomes the hard extracellular calcareous 3D framework.

1.1 Two hypothetical models of biomineralization

The study of coral calcification began more than a hundred years ago and involves various fields such as structural biology, biochemistry, mineralogy, paleontology and civil engineering (Goreau and Bowen, 1955). Coral calcification was initially studied with the pioneering work of Dana in 1846 (Dana, 1846) in Porites lobata and studies on coral morphology during the nineteenth century (Ogilvie-Grant, 1897; Owen, 1857).

Coral calcification has been extensively investigated using environmental records and species (Barnes and Lough, 1996; Well et al., 1956). In these early studies, scientists mainly focused on coral morphology to name the species rather than the mechanisms of coral calcification. While the complete mechanisms and pathways of calcification are still enigmatic in numerous aspects, there are two main hypotheses addressing the molecular process underlying stony coral calcification. They are believed to mediate the formation of diverse minerals either through “the biologically induced 22

mineralization (BIM)” (physicochemically dominated model) (Lowenstam, 1981) or “the biologically controlled model (BCM)” (organic matrix-mediated mineralization) (SOM)

(Lowenstam, 1981).

1.1.1 The biologically induced model

The first hypothesis (BIM) suggests that the organism adjusts its local microenvironment to induce proper physicochemical processes, such as physical interactions and equilibrium saturations, for the precipitation of mineral particles. These particles then nucleate and grow extracellularly, as a result of the metabolic activity of the organisms and subsequent interactions with ions and compounds in the surrounding physico- chemical environment (Frankel and Bazylinski, 2003). According to this hypothesis, no specialized cellular or molecular mechanism is necessary to induce mineralization and the CaCO3 is usually precipitated outside the calicoblastic cells, which is then largely determined by the concentrations of ion and dissolved inorganic carbon (Hughes et al.,

2003).

This process is especially significant in prokaryotes, fungi and some algae (7).

Consequently, this hypothesis was initially proposed to directly correlate the presence of light and zooxanthellae with the calcification process by Kawaguti and Sakumoto

(Kawaguti, 1948), and was further defined as the physiological model of coral calcification by Goreau (Goreau, 1959). After that, there have been a lot of early studies supporting this model. For example, Barnes stated that only crystal growth determines the morphology of coral skeletons (Barnes, 1970) while Constantz claimed that the 23

morphology and arrangement of the coral skeletons is entirely predictable by inorganic kinetic factors of crystal growth, akin to inorganically precipitated marine cements

(Constantz, 1986). Indeed, coral reefs are considered to be the most sensitive ecosystem with regard to changes in the environment (Hughes et al., 2003). The morphology and growth of aragonite fibers and their organization into bundles is completely foreseeable by factors controlling the physiochemical crystal growth (Given and Wilkinson, 1985).

Environmental factors, such as light, temperature, pCO2, salinity, and turbidity, all influence the calcification rate. The calcification process largely depends on the environmental conditions, saying that the pH, the concentration of calcium and carbonate ions at the calcification site provide a metastable state to determine the nucleation of the mineral phase.

1.1.2 The biologically controlled model

The BCM hypothesis is currently the favored model to explain the mechanism of coral calcification. It was firstly defined by Mann (Mann, 1983) who proposed that calcification is mainly controlled by the secreted skeletal organic matrix (SOM). This process is ubiquitous in calcifying eukaryotes and mainly occurs in extracellular parts, integrated with the procedures of structural establishment, chemical accumulation, and spatial arrangement. Corals contain an interface between their calicoblastic cells and skeletons, called extracellular calcifying medium (ECM), where calicoblastic cells secrete OM compounds, constituting the OM and determining the orientation of crystallographic axes and crystal components. In other words, the biologically controlled 24

mineralization process affects calcification at all levels, from crystal composition and structure to morphological development of crystalline materials (crystal nucleation and growth).

Lately, there is growing evidence that the ability of corals to calcify is biologically controlled. For example, coral skeletons in Stylophora (Mass et al., 2017) and Porites

(McCulloch et al., 2017) maintained relatively stable levels of aragonite saturation even when they were placed into seawater that was sufficiently acidic to dissolve CaCO3.

While the first hypothesis crumbles, the second one blooms. Georgious and coworkers

(Georgiou et al., 2015) showed that the corals living in highly dynamic environments exerted strong physiological control on the carbonate chemistry of their calcifying fluid within the ECM. The combination of ultrahigh-resolution three-dimensional imaging and two-dimensional solid-state nuclear magnetic resonance (NMR) spectroscopy revealed that mineral deposition in Stylophora is indeed biologically driven (Von Euw et al.,

2017). More and more studies aiming to understand the calcification mechanism in scleractinian corals have already shifted to analyzing the biological control of highly regulated OM components. Despite the current progress, the second hypothesis is still insufficiently understood, as it requires complex intra- and extracellular mechanisms to explain the deposition of CaCO3.

1.2 The major processes involved in biologically controlled biomineralization

The biologically controlled model is based on two processes: the sustainment of ion supply and the catalysis of the calcification reaction by the OM. To better understand the 25

evolutionary mechanisms of scleractinian coral calcification, we must unravel the evolutionary history of proteins involved in these processes.

1.2.1 The ions – chemical reaction with four molecules

The result of coral calcification is the formation of a mineral skeleton made of calcium carbonate (aragonite). Coral calcification requires at least two fundamental ions (calcium ions and dissolved inorganic carbon ions) to form CaCO3 (reaction 1). However, the ratio

2- - of CO3 /HCO3 at physiological pH of calcifying fluid (between 7.5-9) is extremely low

(5×10-4 to 5×10-2) (Ichikawa, 2007). Generally, carbonate ions are also produced from bicarbonate ions or through CO2 hydration (reactions 2,3 below). According to these reactions, coral calcification is a process of equilibrium among calcium ions, bicarbonate ions, CO2, and hydrogen ions. These reactions can maintain a stable concentration of calcium carbonate at the site of coral calcification under certain conditions.

2+ 2- (1) Ca + CO3 <=> CaCO3

2+ - + (2) Ca + HCO3 <=> CaCO3 + H

2+ + (3) Ca + H2O + CO2 <=> CaCO3 + 2H

Most scleractinian corals are comprised of multiple polyps that are linked together by the coenosarc and the calcium carbonate skeleton. The skeleton is not directly exposed to the surrounding water, but covered by tissue i.e. the polyps and the coenosarc. Therefore, the growth of coral skeletons is not determined by the concentration of ions in the 26

surrounding seawater but by their concentration in the calcifying fluid in the extracellular calcifying medium, which is situated between the calicoblastic ectoderm and pre-existing skeleton. Determining the ion concentrations and pH of the calcifying fluid is the primary step to understand the mechanisms of coral calcification. There is a lot of direct and indirect evidence showing that the concentration of calcium and the pH in the calcifying fluid is higher than in the external seawater (McCulloch et al., 2017). Al-Horani et al.

(Al-Horani et al., 2003) reported that the calcium concentration under the calicoblastic layer in Galaxea fascicularis was about 0.6 mM higher than that of the surrounding

2- + seawater. Recently, electric potentials of CO3 and H have been directly measured inside the calcifying fluid of three scleractinian species, further supporting the finding of

2- higher concentrations of CO3 and pH (Cai et al., 2016). Allison et al. estimated concentrations of bicarbonate and carbonate ions in the ECM dissolved inorganic carbon

2- - (DIC, including CO2, CO3 or HCO3 ) system (Allison et al., 2014), and Raybaud et al. measured the chemical reactions in the coral calcifying medium (Raybaud et al., 2017), both indicating higher concentrations of dissolved inorganic carbon in the ECM than in seawater. Euw et al. using NMR further demonstrated the existence of at least two

2- - different inorganic carbonate species in the interfacial regions (CO3 or HCO3 ) (Von

Euw et al., 2017).

Ocean acidification (OA) results in the pH declines of seawater due to the uptake of CO2, which has been a major threat for calcifying organisms. Recent studies have investigated a lot of responses to ocean acidification in the laboratory and field experiments, such as morphological changes, the rate of calcification, or the concentration of pH at the 27

calcifying site. Direct measurements of pH with fluorescent probes have consistently revealed that corals maintain a higher pH in the calcifying fluid than the surrounding seawater (Holcomb et al., 2014; McCulloch et al., 2012; Venn et al., 2011; Venn et al.,

2013). Comeau et al. showed that pH in sub-calicoblastic space is not the only a driver of the response to OA in sub-tropical corals (Comeau et al., 2017). They also showed that corals with low rates of calcification would be less affected by changes in pCO2.

McCulloch et al. investigated coral calcification under changing condition and revealed that seawater pH and/or temperature, light, nutrients and water movement are not conducive to producing the interactive effects between pH and DIC in calcifying fluid

(McCulloch et al., 2017). Due to reactions underlying the calcification process (reactions

2,3), corals have to export large amount of protons from the calcifying fluid to balance a high pH mineralizing microenvironment (Venn et al., 2013).

1.2.2 Ion delivery -- Para-cellular transport, trans-cellular transport, and particle attachment

It was widely suggested that seawater is the main source of ions for coral calcification.

To obtain a higher level of ions in the calcifying fluid, ions must cross several tissue layers. According to the two classical models, the transport of ions from seawater across the calicodermis, to the sub-calicoblastic ECM can be either active or passive. The para- cellular transport is driven passively by diffusion and seawater flow, whereas trans- cellular transport is active (or facilitated). Several previous studies suggested that trans- cellular transport might be the only mechanism for ions (Allemand et al., 2004; Hohn and 28

Merico, 2012). However, experiments using the membrane impermeable dye calcein showed that seawater entered the calcifying site directly without passing through the cytoplasm (Cuif et al., 2010; Tambutté et al., 2012). In this case, seawater works as a diffusive or adventive flux and exchanges all dissolved ions between itself and the calcifying sites. Measuring the relative concentrations of trans-cellular and para-cellular ions indicates that the combination of both models is the most likely mechanism for coral calcification, where the trans-cellular transport accounts for the majority of calcium, and a passive diffusion of calcium is mediated through para-cellular transport (Tambutté et al., 2012).

Recently, these two classical models were challenged by direct spectromicroscopic evidence. These new findings imply that coral aragonite crystals are not formed ion by ion from solution but by the attachment of amorphous CaCO3 particles from tissue. What holds the two classical models is that, even with this particle attachment mechanism, the carbonate and bicarbonate ions still need to be provided by ion transporters and injected into vesicles to largely supply and stabilize amorphous precursors of coral calcification

(Mass et al., 2017).

1.2.3 The organic matrix

Although the formation of precursors of calcification and further fibrous growth could be explained by the saturation of calcium and bicarbonate ions, the regulation of the ion transport is not the factor determining skeleton morphology. In corals, three matrices are involved in spatial organization: the organic ECM, which facilitates cell-cell and cell- 29

substrate adhesion; the skeletal OM, which facilitates controlled deposition of the CaCO3 skeleton; the CaCO3 skeleton itself, which provides the structural support for the 3D organization of coral colonies (Helman et al., 2008). The OM is the one that stabilizes and assembles amorphous CaCO3 into skeletal structures, dictating the arrangement of different skeleton growth (Falini et al., 2015; Lowenstam and Weiner, 1989).

The chemical composition of the OM is crucial to understand coral calcification. The fully developed OM contains the molecules in both the skeleton and the interface between the tissue and the skeleton (Falini et al., 2015). Temporarily, the majority of knowledge about the OM molecules comes from the soluble OM of the coral skeleton.

Using X-ray analysis, Wainwright reported the first biochemical component, chitin

(polysaccharides), which composes the OM, and Wainwright estimated that the OM only represents 0.01-1% of the skeletal weight (Wainwright, 1963). Then a vast amount of biochemical analyses of the skeleton in several corals discovered that the characteristic of

OM proteins is that they are rich in acidic amino acids, especially in aspartic acid

(Young, 1971). The major components of the OM in various corals were identified to be phospholipids (Isa and Okazaki, 1987) and saccharides (Adamiano et al., 2014; Bilan and

Usov, 2001; Cuif et al., 2003). A high content of sulfur-containing proteoglycans was also discovered in the soluble OM of three scleractinian species (Cuif et al., 2003).

The high abundance of acidic amino acids in OM proteins suggested a role in binding of calcium for accumulation and delivery during calcification (Puverel et al., 2005), whereas aspartic acid–rich proteins have been suggested to alter the thermodynamic equilibrium 30

of the growth surface (Teng et al., 1998). These calcium-binding proteins, extracted from skeletal phospholipids, are proposed to interact with carbonate to precipitate CaCO3 and serve as templates for CaCO3 nucleation (Watanabe et al., 2003). Saccharides, in combination with acidic proteins, are the key elements to control biomineral growth (Isa and Okazaki, 1987). The secretion of proteoglycan is predicted to be associated with amorphous CaCO3 and crystallization, where fibers and centers of calcification (COC) are produced (Cuif et al., 2008). Galaxin proteins are potentially involved in sulfide bonds to provide a structural framework of matrix (Reyes-Bermudez et al., 2009). There are also other possible roles of OM. For example, OM proteins include the calcium- binding proteins that are directly involved in the control of crystal formation (Clapham,

2007; Zhou et al., 2013) as well as the enzyme carbonic anhydrase that directly catalyzes

2- the conversion between CO2 hydrate and CO3 (Moya et al., 2008b).

Previous biochemical studies mainly focused on the OM extracted after demineralization of the powdered skeleton, but these extraction methods might cause the loss of certain proteins during the preparation of OM. As an advance, a new proteomics approach has been developed integrating both transcriptomes and genomes. The proteomic analysis of the Seriatapora sp. skeleton identified 36 coral skeletal organic matrix proteins (SOMPs), including adhesion proteins, actin, tubulin, collagens, etc. (Drake et al., 2013). The combination of proteomics and transcriptomics analyses in Acropora millepora also identified 36 SOMPs, which are thought to regulate crystal depositions and suggest that co-option and domain shuffling might be general evolutionary mechanisms of coral calcification (Ramos-Silva et al., 2013b). Further analysis of RNA-seq data of 20 stony 31

corals also provided some novel gene families involved in biomineralization

(Bhattacharya et al., 2016). Skeletal proteome and expression analyses identified 30

SOMPs in different development stages of Acropora digitifera, suggesting a stepwise evolution of coral skeleton formation (Takeuchi et al., 2016).

During evolution, it is supposed that calcification has arisen independently on multiple occasions. Some proteins involved in calcification have conserved homologs in cnidarian lineages, while others have already evolved to be unique to scleractinian lineage. Our approach to understanding the evolutionary history of coral calcification is to use a comparative genomics approach to explore the different aspects of coral biomineralization. Similar methods using sea anemones, such as Nematostella and

Aiptasia, have already indicated that most genes involved in skeleton deposition have homologs in anemones, such as the genes involved in ion transport, but this is also the case for Scleractinia, which potentially provides new perspectives on the evolution of calcification. However, our understanding of coral calcification is still impeded by the large evolutionary gap (>500 Mya) between corals and sea anemones. Compared to anemones, corallimorpharians are supposed to be the closest living non-calcifying relatives to scleractinians, thus their genomes are expected to be more informative with regard to unraveling the evolutionary origin of coral calcification genes. 32

1.3 “Naked corals” – corallimorpharians

1.3.1 Corallimorpharia

Corallimorpharians, also known as “false” corals, are closely related to reef building corals of the order Scleractinia but lack the ability to form CaCO3 skeletons. They are widely distributed, ranging from the tropical Indo-Pacific oceans and East Africa to polar areas, but preferentially survive in shallow oceans with weak currents (Leen et al., 2013).

Their vibrant appearance, in addition to their relative ease of maintenance in salt-water aquarium systems, is the primary reason for their popularity among reef aquarium enthusiasts (Murray and Watson, 2014).

Amplexidiscus fenestrafer, one of the more common Corallimorpharia species found in the aquarium trade, was first described as the giant cup mushroom in 1980 by Hamner and Fautin (Hamner and Fautin, 1980). Morphologically, A. fenestrafer lacks the calcifying skeleton like anemones but possesses an oral disc on top of its body or the upper surfaces similar to stony corals. At the edges of its oral disc, A. fenestrafer has bare areas that appear fenestrated, thus the species name (Animal-World Reference: Marine and Reef). Notably, corresponding to its oral regions and flexible disc, A. fenestrafer is the largest species of Corallimorpharia that can grow up to a diameter of 45 cm or more

(Animal-world). A. fenestrafer specimens are quite adaptable to extreme light and water turbulence, but usually dwell in the shallow water of tropical Western Indo-Pacific

Ocean. They cannot actually move around, they always adhere to substrates, such as dead corals or live rock but can also be found in sandy areas (Animal-world). Similar to most 33

corallimorpharians, they obtain the majority of their nutrients from symbiotic zooxanthellae. They also actively capture prey, such as small and flesh with their mushroom tentacles, bead-like verrucas and spirocysts by closing up around their prey and subsequently forming into short-lobed tentacles (Encyclopedia of life).

Discosoma sp. (synonym Actinodiscus), also belonging to the order of Corallimorpharia, were first discovered by Ruppel and Leuckart in 1828 (Gewin, 2002). They were previously known as mushroom corals, mushrooms, mushroom polyps, and disc anemones (Gosse, 1860). In terms of morphology, Discosoma sp. possesses a surface that is covered by small bumps, warties or tentacles, and a mouth that protrudes in the middle

(Gosse, 1860). Discosoma sp. are widely distributed in nearly all reefs from the tropical

Indo-Pacific and East Africa to the Central Pacific Ocean, surviving from the low current and shallow water to middle depth-ocean and spreading (Tullock, 2008). However, the most comfortable environment for Discosoma is the one with moderate level of light and gentle water movement. When exposed to intense light, they turn brown, gradually fade, and die. Due to the preference of shaded reef-areas, most Discosoma can be found surviving on dead corals, rocks or between coral heads (Tullock, 2008).

Here, we sequenced the genomes of Amplexidiscus fenestrafer and Discosoma sp., two members of the order Corallimorpharia that represent the closest non-calcifying relatives of reef-building corals. These genomes constitute important genomic resources for future comparative studies that contrast the individual orders within Hexacorallia, thus 34

providing a better insight into the distinct evolutionary histories and innovations of these organisms.

1.3.2 “Naked coral” Hypothesis

Phylogenetic analyses reconstruct evolutionary relationships between taxa based on genetic similarities. Numerous genomic resources, such as mitochondrial, plastid, nuclear genomes, coding amino acids or nucleotides that evolve under different selective pressures, are usually used to determine phylogenetic relationships and to reconstruct phylogenetic tree.

Corallimorpharia are marine cnidarians that belong to the subclass of Hexacorallia and are therefore closely related to Scleractinia (Parker, 1982). As corallimorpharians share many morphological and life history traits with corals, but do not possess calcareous skeletons that provide the basis for the bio-construction of reef ecosystems, they have been classified as “naked corals”. The highly similar morphology of corallimorpharian and scleractinian polyps casts doubt on whether or not having a skeleton is an essential attribute of Scleractinia (Stanley and Fautin, 2001a). Like stony corals, corallimorpharians possess small tentacle bumps that are laden with stinging nematocysts. However, similar to the majority of sea anemones, corallimorpharians possess a pedal disc adhering to the substrate and an oral disc on the top. Despite these morphological similarities and diversities, the taxonomic relationships within

Hexacorallia, especially between Corallimorpharia and Scleractinia, are still controversial 35

(Gladfelter, 1988). This evolutionary relationship is the key to understanding the evolution of coral calcification and their potential ability to respond to current threats imposed by climate change and human activities. It is widely acknowledged that

Scleractinia can be subdivided into two major clades, “Complex” and “Robust”

Scleractinia (Chen and Yu, 2000; Le Goff-Vitry et al., 2004; Ramos-Silva et al., 2013a;

Romano and Cairns, 2000; Romano and Palumbi, 1996; Stolarski et al., 2011).

However, two different scenarios have been proposed to explain the relationship between

Corallimorpharia and Scleractinia. The “naked coral” hypothesis was first proposed by

Stanley and Fautin (Stanley and Fautin, 2001a) based on the theory that corals have lost and redeveloped skeletons repeatedly during the middle Triassic. A study of 17 hexacorallian whole mitochondrial genomes by Medina et al. (Medina et al., 2006) strongly supported the “naked coral” hypothesis. Their findings showed with strong evidence that Corallimorpharia was likely to be derived from a complex scleractinian ancestor who lost the ability to form an aragonite skeleton due to high CO2 level present during the Cretaceous period. Despite the strong support for this hypothesis provided by mitochondrial genome-based phylogenetic analyses, other studies using different markers and techniques reported contradicting results (Aranda et al., 2016; Fukami et al., 2008).

Recently, various studies (Kayal et al., 2013; Kitahara et al., 2014; Lin et al., 2016; Lin et al., 2014; Stolarski et al., 2011) have addressed this discrepancy in detail by analyzing mitochondrial nucleotide- and amino acid-based alignments using different evolutionary models, showing that certain models allowed the recovery of Scleractinia as monophyletic group even when using mitochondrial markers. Although these newer 36

studies kept challenging the idea of a potential complex coral origin of Corallimorpharia, the dearth of genome sequences did not allow rigorous testing of the hypothesis using non-sequence-based phylogenomic approaches. To overcome these limitations, we used a multi-pronged approach including phylogenetic analyses of nuclear-encoded genes, as well as genome-wide presence/absence information and synteny conservation of genomes from seven hexacorallian species. They are the actiniarians Nematostella vectensis

(Putnam et al., 2007) and Aiptasia pallida (Baumgarten et al., 2015), the corallimorpharians Amplexidiscus fenestrafer and Discosoma sp. (Wang et al., 2017b), as well as the complex scleractinian Acropora digitifera (Shinzato et al., 2011) and the robust scleractinian Stylophora pistillata (Voolstra et al., 2017) using Hydra magnipapillata (Chapman et al., 2010) as the outgroup. 37

1.4 Objectives and Contributions

My Ph.D. project aims to unravel the evolutionary history of the hexacorallian orders to further elucidate the evolutionary history of coral calcification.

To achieve these aims, the objectives of projects are set out as follows:

• To sequence, assemble, validate and annotate draft genomes of the

Corallimorpharia species A. fenestrafer and Discosoma sp., providing valuable

resources for the hexacorallian research community (Chapter 2).

• To resolve the evolutionary relationship between reef-building corals and

corallimorpharians by employing phylogenetic approaches based on single-copy

orthologs, gene presence/absence as well as synteny information. The two-

pronged approach of using sequence-based and sequence-independent analyses

will allow us to test if scleractinian corals are monophyletic and if

Corallimorpharia are indeed their closest, non-calcifying living relatives (Chapter

3).

• To unravel the evolutionary origin of molecules controlling calcium and

bicarbonate ion supply for calcification in corals. Analysis of conserved

calcification genes, their domain architecture and their phylogenetic relationships

across the hexacorallian genomes will assist in resolving how these proteins 38

evolved to support coral calcification. Further, expression data and

immunolocalization will be used to identify the potential enrichment and location

of these molecules in Scleractinia (Chapter 4).

• To identify a core set of organic matrix molecules and determine how these

proteins evolved to enable the soft-bodied ancestor of corals to form a skeleton.

To determine the core set of SOMPs that were likely present in the ancestor of

contemporary reef-building coral, we sequenced the proteomic constituents of the

S. pistillata skeletal organic matrix (SOM) and integrated this data with three

previous proteomic analyses from other coral species. To trace the origin of genes

involved in scleractinian coral calcification, we analyzed the phylogenetic

topologies of the core set of SOMPs. To understand the evolutionary innovations

of ancestor genes, we also compared the amino acid compositions and domain

architectures of the orthologs of core SOMPs (Chapter 4). 39

Chapter 2 Draft genomes of two corallimorpharians, Amplexidiscus

fenestrafer and Discosoma sp.

Running title: Draft genomes of two Corallimorpharia

Xin Wang1, Yi Jin Liew1, Yong Li1, Didier Zoccola2, Sylvie Tambutte2, Manuel Aranda1,*

1King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC), Biological and Environmental Sciences & Engineering Division (BESE), Thuwal, 23955-6900, Saudi Arabia 2Centre Scientifique de Monaco, 8 quai Antoine Ier, Monaco, 98000, Monaco

This manuscript has been published with Molecular Ecology Resources (Published 13 April 2017)

40

2.1 Abstract

Corallimorpharia are the closest non-calcifying relatives to reef-building corals of the order Scleractinia. Aside from their popularity in aquarium, their evolutionary position between sea anemones and hard corals makes them the best candidates to aim at understanding the evolutionary order in Hexacorallia and calcifying corals in particular.

Here we have sequenced and assembled draft genomes of two Corallimorpharia, A. fenestrafer and Discosoma sp.. The draft genomes encompass 370 Mbp and 445 Mbp respectively and encode for 21,372 and 23,199 genes. To facilitate future studies using these resources, we provide annotations for the predicted gene models—not only at gene level, by annotating gene models with the function of the best-matching homolog, and

GO terms when available; but also at protein domain level, where gene function can be better verified through the conservation of the sequence and order of protein domains.

Further, we provide an online platform (http://corallimorpharia.reefgenomics.org), which includes a BLAST interface as well as a genome browser to facilitate the use of these resources. We believe that these two genomes are important resources for future studies on hexacorallian systematics and the evolutionary basis of their specific traits such as the symbiotic relationship with dinoflagellates of the genus Symbiodinium or the evolution of calcification in reef-building corals.

2.2 Introduction

Next generation high-throughput DNA sequencing technologies (NGS) have been very popular over the past decade, as these revolutionary innovations opened up the possibility to tackle a bigger biological questions, like the complexities of genome architectures for 41

instance (Koboldt et al., 2013). Since the complete sequencing of the human genome in

2003, more than 14,000 genomes have been published and uploaded to the US National

Center for Biotechnology Information (NCBI) genome repositories (Goodwin et al.,

2016). Due to being highly efficient, rapid and cheap, high throughput DNA sequencing is being applied by researchers to tackle an increasingly diverse range of biological problems (Shendure and Ji, 2008). Assembled genomes are fundamental to computational biological issues, as successful assemblies can provide the resources to study gene content, regulatory regions, or evolutionary relationships.

For the scientific community, corallimorpharians are especially interesting due to their close phylogenetic relationship to corals. While the availability of the genomes of the sea anemones A. pallida (Baumgarten et al., 2015), N. vectensis (Putnam et al., 2007) and the coral A. digitifera (Shinzato et al., 2011) provided valuable insights into the genomic basis of Hexacorallian traits, our understanding of the evolution of more specific traits, such as calcification in reef-building corals, is still hampered by the large evolutionary gap (> 500 Mya) in the genomic resources available. Here, we sequenced the genomes of

Amplexidiscus fenestrafer and Discosoma sp. (Figure 1), two members of the order

Corallimorpharia that represents the closest non-calcifying relatives of reef-building corals. These genomes constitute valuable genomic resources for future comparative studies that contrast the individual orders within Hexacorallia, thus providing a better insight into the distinct evolutionary histories and innovations of these organisms. 42

Discosoma sp.

Amplexidiscus fenestrafer

Figure 1. Two corallimorpharians, Amplexidiscus fenestrafer and Discosoma sp..

2.3 Final Results

2.3.1 Validation of assembled genomes

We generated a total of 124 Gb (334x) and 142 Gb (319x) of sequence data for A. fenestrafer and Discosoma sp. respectively using the Illumina HiSeq2000 platform

(Table S1). After quality trimming, filtering and digital normalization, we retained approximately 81.16x and 149.96x coverage for the final assemblies respectively, where a larger proportion of mate pair reads were filtered for A. fenestrafer than Discosoma sp. 43

(Table S1). The assembled genome sizes for A. fenestrafer (370 Mbp) and Discosoma sp.

(445 Mbp) were in close agreement to the k-mer based estimates of 350 Mbp and 428

Mbp respectively (Table 1, Table S2). After gap-filling with Gapcloser, the A. fenestrafer genome had contig and scaffold N50s of 20 Kbp and 510 Kbp respectively; for

Discosoma sp., contig and scaffold N50s were 19 Kbp and 770 Kbp respectively (Table

1).

To validate our assembled genomes, one library of paired-end data for each corallimorpharian was back-mapped to the assembly. 76.49% (A. fenestrafer) and

86.28% (Discosoma sp.) of the Illumina paired-end reads could be mapped concordantly.

For both genomes, ≥ 99% of the sequence had coverage of at least 2x (Table S3). To assess the extent of contig-level duplication we performed a BLASTN search for all contigs against themselves for both species. Comparison to other available anthozoan genomes showed less duplicated contigs for both corallimorpharian genomes. Further, we found less duplication in A. fenestrafer than Discosoma sp., which possibly contributes to the observed differences in estimated and assembled genome sizes (Figure S1).

Assessment of genome completeness using CEGMA confirmed that 85.48% and 82.66% of core genes were either completely or partially present in A. fenestrafer and Discosoma sp. respectively, which is close to the completeness of the available genomes of the coral

A. digitifera and the sea anemone A. pallida (Table S4).

2.3.2 Functional annotation of predicted gene models

Data from ab initio gene prediction (AUGUSTUS, SNAP, GlimmerHMM), protein-based homology (Spaln), and transcript evidence (Trinity, PASA) were integrated with 44

EvidenceModeler (EVM) to produce consensus gene sets for both organisms (Table S5).

For A. fenestrafer, 21,372 high confidence gene models were predicted with a gene density of 5.78 genes per 100 Kbp and an average of 6.5 exons per gene; for Discosoma sp., 23,199 high confidence gene models were predicted with a gene density of 5.22 and an average of 6 exons per gene (Table 1). Annotation of these gene models against Swiss-

Prot, TrEMBL and nr resulted in predicted gene functions assigned to 84.6% and 83.5% genes in A. fenestrafer and Discosoma sp. respectively, which is similar to the annotation rates in the other three hexacorallian genomes (Table 2).

To assess the accuracy of our predicted gene models, we performed a comparative assessment of the gene models from all five hexacorallian species (A. fenestrafer,

Discosoma sp., A. digitifera, N. vectensis and A. pallida) using several metrics: gene length (Figure S2A), exon length (Figure S2B), exon count (Figure S2C) and GC content of coding regions (Figure S2D). As expected from the smaller evolutionary distance, both corallimorpharians produced distributions that are more similar to each other than to any of the other three non-corallimorpharians.

To further evaluate the quality of our gene models we identified their closest homolog in the Swiss-Prot database via BLASTP and assessed the proportion of the gene model that is covered by its putative homolog. Comparison of the coverage results against the previously published Hexacorallia gene models showed very similar gene coverage distributions, with the exception of N. vectensis, as its gene models are present in the

Swiss-Prot database (Figure S3). 45

Analysis of known protein domains encoded in the genomes using the Pfam annotation identified 16,045 and 16,966 protein domains in A. fenestrafer and Discosoma sp. (Table

2), and the proportions of annotated gene models (75.2% and 83.5% respectively) are consistent with the other three cnidarians (Table 2). The analysis of domain abundances highlighted WD domain, G-beta repeat and Ankyrin domains to be the most abundant domains, similar to the other hexacorallian genomes (Figure S4). Further domains showing significant enrichment in Corallimorpharia were hemopexin, zona pellucida-like domain, EAD/DEAH box helicase domain and putative glycoside hydrolase xylanase, among others (Figure S5).

2.3.3 Higher similarity between Corallimorpharia and Scleractinia

While Corallimorpharia are commonly accepted to be the sister group of Scleractinia, it remains unclear whether, from the 'corallimorpharians perspective', scleractinians are genetically closer to them than actiniarians. Using the gene models encoded in the five hexacorallian genomes available, we investigated the overlap of genes between both corallimorpharians and the other non-corallimorpharians. We measured the amount of conservation at an ortholog group level (Figure 2, Table S6) as well as at sequence conservation level using BLAST (Figure 3), and for both measures, the overlap between both corallimorpharians and A. digitifera was more extensive than to either of the two actiniarians (Figure 2, 3). These findings likely reflect the closer evolutionary distance of corals and corallimorpharians since gene gains or losses as well as sequence divergence are expected to increase proportionally with evolutionary distance. 46

2.3.4 Rapid expansion of repeat sequences in Discosoma sp.

Homology and structure based searches via Dfam and Repbase identified 666,894 and

1,000,975 repeat elements for A. fenestrafer and Discosoma sp., which constituted 113.5

Mbp and 167.9 Mbp (30.7% and 37.8%, respectively) of sequences in the assembled A. fenestrafer and Discosoma sp. genomes (Table 1). Despite the recent evolutionary split, both genomes harboured different proportions of repeat element categories (Table S7).

The predominant types of transposable elements in two corallimorpharian genomes were

DNA/PIF-Harbinger, LINE/L2, and LINE/Penelope (Table S8). Interestingly we observed that the Discosoma sp. genome appears to contain a higher proportion of DNA transposons (6.97%) and SINEs (2.5%) than A. fenestrafer (5.13% and 2.04% respectively, Table S7). The prevalence of repeat sequences in Discosoma sp. potentially contributes to the genome size differences observed in comparison to A. fenestrafer: the additional 54.4 Mbp of additional repeat sequences in the former is close to the 74.9 Mbp size disparity that separates these two corallimorpharian genomes.

This is further corroborated by self-alignments of the A. fenestrafer and Discosoma sp. genomes with MUMmer that indicate far more partial duplications (with contiguous length of > 3 Kbp) in Discosoma sp. than A. fenestrafer—these regions, when tallied, totaled 5,281 Kbp in the former versus 928 Kbp in the latter (Figure S6A, 6B). A. fenestrafer and Discosoma sp. contained one and seven matches respectively that were >

10 Kbp, and these regions tend to be annotated as repeat regions. For example, the largest duplication in Discosoma sp. (12,967 bp) showed homology to known retrotransposons, and contained the LTR/Gypsy repeat element. 47

Di sco som a sp. r 344 fe a tr s e n fe 213 . 1561 A 138 295 308 277 214 603 670

217 608

A

920 .

279 d i

332 g i

326 t

i

4952 f e

198 113 r a 1003 168 833 101 266 343 597 289 190 A 778 . 1089 p a ll id a 547

sis ten vec N.

Figure 2. Venn diagram representing the shared ortholog groups across five hexacorallian genomes. There is extensive conservation of ortholog groups (4,952) across all five genomes.

48

Figure 3. Chord diagram tallying which non-corallimorpharian species has the best- matching homolog to every corallimorpharian gene based on BLAST. The different colours correspond to the different species as per the inner ring. Numbers on the inner ring depict the number of genes, and the numbers on the outer ring indicate their percentage in the respective species. 49

2.4 Methodology

2.4.1 DNA, RNA extraction and sequencing

Samples of A. fenestrafer and Discosoma sp. were maintained at the Centre Scientifique de Monaco in aquaria supplied with flowing seawater from the Mediterranean Sea

(exchange rate 2% h-1) at a salinity of 38.2 ppt, pH 8.1 ± 0.1 under an irradiance of 300

µmol photons m-2s-1 at 25 ± 0.5 °C. Individuals were fed 3 times a week with both frozen krill and live Artemia salina nauplii. DNA for sequencing libraries was extracted from A. fenestrafer and Discosoma sp. tissue samples using a nuclei isolation approach to minimize contamination with symbiont DNA. Briefly, cells were harvested using a Water

Pick in 50 ml of 0.2 M EDTA solution refrigerated at 4 °C. Extracts were passed sequentially through a 100 µm and a 40 µm cell strainer (Falcon®) to eliminate most of the Symbiodinium. Then extracts were centrifuged at 2,000g for 10 min at 4 °C. The supernatant was discarded and the resulting pellets were homogenized in lysis buffer

(G2) of the Qiagen Genomic DNA isolation kit (Qiagen, Hilden, Germany). The DNA was extracted following manufacturer instructions using genomic-tip 100/G. DNA concentration was determined by O.D. with an Epoch Microplate Spectrophotometer

(BioTek, Winooski, VT, USA). Contamination with Symbiodinium DNA was assessed via PCR targeting the multicopy gene RuBisCO (Genbank accession number

AY996050). Total RNA extraction was performed as described previously for corals

(Moya et al., 2008a). Briefly, total tissue from individuals maintained at the above described culture conditions was snap frozen in liquid nitrogen and ground into powder in a cryogrinder (Freezer/Mill 6770, Spex Sample Prep®). Total RNA was subsequently 50

extracted with Trizol® Reagent (Invitrogen) and quantified on a Bioanalyzer 2100

(Agilent, Santa Clara, CA). Strand specific sequencing libraries were generated using the

NEBNext Ultra Directional RNA Library Prep Kit for Illumina (NEB, Ipswich, MA,

USA).

Sequencing libraries were prepared using the Illumina TruSeq DNA kits for paired-end or mate-pair libraries respectively according to the manufacturer’s instructions. A total of 2 paired-end and 4 mate-pair libraries were generated for each species and sequenced on the Illumina HiSeq 2000 platform at the KAUST Bioscience Core Facility. All data were uploaded to NCBI and are available under Bioproject IDs: PRJNA354436 (A. fenestrafer) and PRJNA354492 (Discosoma sp.).

2.4.2 Sequence filtering and genome assembly

A total of 1,199 and 1,411 million raw paired-end reads were sequenced for A. fenestrafer and Discosoma sp. respectively. These included 621 and 815 million paired- end as well as 578 and 596 million mate-pair reads respectively. All reads were processed to remove low quality reads (< Q30 and < 30 bp), adapter sequences and duplicated reads using Trimmomatic v0.32 (Bolger et al., 2014). Mate paired reads were additionally evaluated by mapping against the assembled genomes using Bowtie2 v2.1.0 (Langmead and Salzberg, 2012). Potential contamination from Symbiodinium sources was removed by mapping all paired-end reads to the S. minutum (Shoguchi et al., 2013) and S. microadriaticum (Aranda et al., 2016) genomes via Bowtie2 using default parameters.

Mate-pair reads were not filtered for contamination due to low mapping rates to both

Symbiodinium genomes (< 1%). 51

Genome sizes for A. fenestrafer and Discosoma sp. were estimated using Jellyfish

(Marçais and Kingsford, 2011) and KmerFreq_AR (Luo et al., 2012). For both genomes, the frequency distributions of the k-mers revealed the presence of many k-mers with low coverage. To reduce this considerable sequence heterogeneity a three-pass digital normalization approach was applied, as described in Khmer v1.4 (Crusoe et al., 2015).

Briefly, the paired-end libraries were initially normalized to a k-mer coverage of 20 (-C

20, -x 4e9) followed by removal of low abundant k-mers and a final normalization to a coverage of 10 (-C 10, -x 4e9). Construction and scaffolding of contigs were carried using ALLPATH-LG (Butler et al., 2008) with HAPLOIDIFY and estimated genome sizes of 400Mbp. Ambiguous and erroneous bases were identified through mapping of filtered pre-normalized paired-end against the assembled genome using Bowtie2 and subsequently corrected using SAMtools v1.2 (Li et al., 2009) and a custom Perl script.

Finally, gaps were filled within the genome scaffolds with GapCloser v1.12 (Luo et al.,

2012) using stringent parameters (-l 125, -p 31) and all trimmed, filtered pre-normalized paired-end and mate pair reads. Assembly statistics were assessed as previously described

(Bradnam et al., 2013). In order to identify and remove scaffolds originating from bacterial or virus contaminants, BLASTN searches (e-value 10-5) were carried out against two bacterial databases (NCBI complete and draft bacterial genomes) and a viral database

(NCBI complete viral genomes) (Pruitt et al., 2006). Scaffolds with high degree of contaminating sequences (coverage > 50%, bitscore > 1000 and e-value < 10-20) were subsequently removed from the assemblies. 52

2.4.3 Identification and classification of repeats and transposable elements

A combination of homology and ab initio-based methods was used to identify interspersed repeats and low complexity DNA sequences. To identify repeat boundaries and construct species-specific consensus models, de novo identification was carried out with RMBlast as implemented in RepeatModeler v1.0.8

(http://www.repeatmasker.org/RepeatModeler.html). Subsequently, RepeatScout, TRF and Recon were used to identify de novo repeats, which produced a consensus library file. RepeatMasker was then run using this high-quality custom repeat library.

Subsequently, RepBase v20.02 (Bao et al., 2015) and Dfam v1.4 (Wheeler et al., 2013) were used to identify and classify different categories of repetitive elements. A hierarchical system was utilized to classify de novo repeat elements into five different categories: DNA transposon, LTR, SINEs, LINEs, and simple repeats. Based on the different categories of repeat elements, coverage distributions and repeat divergences for each species were calculated. A Perl script that extracted results from RepeatMasker was written to calculate the percentage of nucleotide divergence of each transposable element to the consensus sequence from the respective genomes.

2.4.4 Gene prediction

After repeat masking, a combination of ab initio, homology- and expression- based prediction strategies was used to produce the final set of gene models. To generate draft transcriptomes for our gene prediction pipeline, high-quality RNA from both organisms was extracted to produce cDNA sequencing libraries. These libraries (insert size of 180 bp) were sequenced on the Illumina HiSeq2000 platform with a paired-end read length of 53

101 bp. A total of 52,772,546 and 37,876,678 paired-end reads were sequenced for A. fenestrafer and Discosoma sp. respectively, of which 52,771,850 and 37,875,788 passed filtering and trimming at Q20. Filtered RNA-seq reads were used for de novo assemblies of the A. fenestrafer and Discosoma sp. reference transcriptomes using Trinity v2.02

(Evans et al., 2012), and PASA (Haas et al., 2003) was used to align the newly assembled transcripts to the respective genomes. Full-length transcripts, defined to be transcripts containing start and stop codons, 5’ and 3’ untranslated regions (UTRs) and a minimum of two exons (4,539 A. fenestrafer and 3,739 Discosoma sp. passed all criteria) were identified and used as the training set for three ab initio gene prediction pipelines:

Augustus v3.0.3 (Stanke et al., 2006), GlimmerHMM v3.02 (Kelley et al., 2012) and

SNAP v2013-02-16 (Korf, 2004). To refine exon-intron boundaries via Spaln v2.1.4

(Gotoh, 2008), gene models were also aligned against protein sequences in UniProt and other cnidarians, i.e. N. vectensis (Putnam et al., 2007), A. pallida (Baumgarten et al.,

2015), A. digitifera (Shinzato et al., 2011) and H. magnipapillata (Chapman et al., 2010).

The final consensus gene models were generated using EVM (Haas et al., 2008), which incorporated the various methods of gene predictions. Gene models with more than two stop codons or < 20 bp in length were subsequently removed. tRNAs were predicted via tRNAscan-SE v1.3.1 (Lowe and Eddy, 1997), while rRNAs were identified by aligning rRNA sequences from Rfam v12.0 database (Nawrocki et al., 2014) against the genomes using BLASTN with cutoffs of e-value < 10-5, identity > 85% and match length > 50 bp. 54

2.4.5 Gene functional annotation

To infer putative functions of protein coding genes, gene models were subjected to a successive BLASTP search against SwissProt, TrEMBL and nr databases as previously described (Liew et al., 2014). Domain annotations were derived from different databases including Pfam, PRINTS, PROSITE, ProDom, and SMART using InterProScan

(Zdobnov and Apweiler, 2001). Gene Ontology (GO) annotations were obtained from

BLASTP results against SwissProt and TrEMBL. Additional functional information was derived via pathway analysis based on homology to characterized pathways in Kyoto

Encyclopaedia of Genes and Genomes (KEGG) (Moriya et al., 2007). Detailed annotation files can be retrieved from http://corallimorpharia.reefgenomics.org. To ensure the consistency of the annotations, the same annotation pipeline was carried out on the protein coding gene sets from the other anthozoans used in the subsequent comparative analysis. To assess potential protein domain expansions or contractions in the corallimorpharian lineage, Fisher’s exact tests (p < 0.001) were conducted on the relative Pfam domain counts, comparing in-group (A. fenestrafer and Discosoma sp.) and outgroup (all other cnidarians in the analysis). The resulting p-values were corrected using false discovery rate (FDR) (Benjamini and Hochberg, 1995). To visualize these significantly enriched Pfam domains, z-scores were calculated and plotted in a heatmap using gplots (in R) (Warnes, 2016).

2.4.6 Validation of assemblies and gene models

The completeness and accuracy of our assembled genomes was assessed by mapping all filtered, normalized paired-end libraries to the final genome assemblies using Bowtie 55

v2.1.0 (Li and Durbin, 2009) with default parameters. Coverage and mapping rates were calculated using SAMtools and downstream Perl scripts. As assemblies of highly heterogeneous organisms have been shown to be prone to duplicate contig artifacts from unresolved bubbles in the de Bruijn graphs (Kajitani et al., 2014), within-species

BLASTN analyses of the assembled contigs were carried out to assess the contribution of such erroneously duplicated contigs to our assemblies. Briefly, best matches between two unique contigs were identified using BLASTN. The resulting congruencies were tallied for each contig to produce a plot depicting the bitscore ratios of the second best hits to the first best hits of each contig in order to evaluate the ratio of putative artificially duplicated contigs. The general completeness of our assemblies was assessed using

CEGMA (Parra et al., 2007). In order to estimate the overall accuracy of the genome annotations, genic properties e.g. mean GC%, gene length, exon length, and number of exons were calculated via Perl and R scripts, and contrasted against the N. vectensis, A. pallida, A. digitifera, and H. magnipapillata genome annotations.

2.5 Discussion

Corallimorpharians, also termed “false corals”, are very popular in the aquarium hobby due to their appealing colors and easy maintenance. However, they are also the closest non-calcifying relatives of reef-building corals. Their phylogenetic position between sea anemones and stony corals bridges a large evolutionary gap of > 500 Mya that separates

Actiniaria (sea anemones) from Scleractinia (stony corals). The genomes of the sea anemones Nematostella vectensis and Aiptasia pallida as well as the stony coral

Acropora digitifera have been invaluable for the study of reef-building corals. However, 56

the large evolutionary distance between the actiniarian and scleractinian order has hampered further progress in our studies of more specific traits, such as the independent evolution of symbiosis in the different cnidarian lineages or the ability of reef building corals to calcify, two processes for which deeper understanding is becoming increasingly important in light of climate change and ocean acidification.

Here, we assembled and annotated two draft genomes of corallimorpharians, A. fenestrafer and Discosoma sp.. The assemblies encompass ~370 Mbp and ~445 Mbp respectively and encode for 21,372 and 23,199 genes. Our stringent validation of these genomic resources shows completeness and quality similar to the genomes available for

Aiptasia pallida and Acropora digitifera, further corroborating their suitability for comparative studies. We find that Corallimorpharia yield slightly larger genome size than published genomes of Actiniaria but lower than Scleractinia, where two corallimorpharians consistently showed a higher quality with higher scaffold N50 than published hexacorallians. Interestingly, we also find that the differences in genome size between two closely corallimorpharian genomes are expected to be caused by a higher proportion of transposable elements and genome duplications in Discosoma sp. than A. fenestrafer as indicated by genome content analyses. Cross measurements based on shared orthologous groups and the best-matching homologue to every corallimorpharian genes indicate the high conservation across those genomes, as well as the higher similarity between Corallimorpharia and Scleractinia than Actiniaria. Moreover, to facilitate the use of the resources we provide download links on our website 57

http://corallimorpharia.reefgenomics.org, in addition to a BLAST interface and a genome browser, which facilitates the exploration of our data.

Given the alarming need and heightened awareness due to the most recent global bleaching event, we believe that our work is of timely significance to the field as it provides a valuable resource for future comparative genomic studies. Consequently, we expect these genomic resources to be of great value for future comparative genomic studies analysing the evolution of Hexacorallia species and their specific traits such as the symbiotic relationship with dinoflagellates of the genus Symbiodinium or the evolution of calcification in reef-building corals.

2.6 Author’s contributions

M.A. conceived and designed the study. W.X. assembled and analysed the genomes with help from Y.J.L., Y.L. and M.A.. S.T. and D.Z. provided materials. W.X., Y.J.L. and

M.A. wrote the manuscript with input from all authors.

2.7 Data availability

Raw sequence libraries produced for the assembly of the genomes, including whole genome and RNA-seq data are accessible under Bioproject IDs: PRJNA354436 (A. fenestrafer) and PRJNA354492 (Discosoma sp.). The sequenced genomes, transcriptomes annotations and Perl scripts are downloadable at http://corallimorpharia.reefgenomics.org.

58

Table 1. Genome and annotation statistics for A. fenestrafer and Discosoma sp.

A. fenestrafer Discosoma sp.

Estimated genome size (Mbp) 350 428

Number of scaffolds 5,631 6,449

Scaffold length (bp) 370,076,467 444,930,386

Genome Scaffold N50 (bp) 510,298 769,877

Number of contigs 30,993 39,613

Contig length (bp) 305,666,947 364,304,111

Contig N50 (bp) 20,058 18,746

GC content (%) 39.30 39.97

Gene Number of gene model 21,372 23,199

Average gene length (bp) 7,194 6,907

Gene density (genes per 100Kbp) 5.78 5.22

Number of exons 139,123 138,376

Average exon length (bp) 218 226

Number of introns 117,751 115,177

Average intron length (bp) 1,047 1,119

Overall GC content (%) 36.70 37.09

Repeat Total repeat length (Mbp) 113.5 167.9

Repeat proportion (%) 30.7 37.8

Nocoding Total tRNAs 2,936 4,775 59

Total rRNAs 752 1,066

Table 2. Functional annotation of gene models from five cnidarian species.

A. fenestrafer Discosoma sp. A. digitifera A. pallida N. vectensis

Total gene number 21,372 23,199 23,668 29,269 27,273

SwissProt 12,867 13,408 16,025 18,876 20,125

TrEMBL 3,684 4,233 2,519 4,117 3,496

nr 1,534 1,677 1,428 2,052 3,652

No hit 3,279 3,826 3,551 4,224 0

Pfam 16,045 16,966 15,102 19,499 18,171

PANTHER 13,276 13,888 15,813 19,582 20,001

GO 16,552 17,642 18,546 22,997 23,621

KO 7,356 7,592 9,310 10,619 11,402

60

Chapter 3 Phylogenomic analyses using whole genome data from six

Hexacorallia genomes reject “naked coral” hypothesis

Xin Wang1, Guénola Drillon2, Taewoo Ryu3, Christian R. Voolstra1, Manuel Aranda1

1 Red Sea Research Center, Division of Biological and Environmental Science and Engineering,

King Abdullah University of Science and Technology (KAUST), Saudi Arabia.

2 Laboratoire de Biologie Computationnelle et Quantitative, Sorbonne Universités, UPMC-Univ

P6, CNRS, IBPS, - UMR 7238, 4 Place Jussieu, Paris, 75005, France

3APEC Climate Center, Busan 48058, South Korea

This manuscript has been published with Genome Biology and Evolution (Accepted 25 September 2017, published 25 September 2017) 61

3.1 Abstract

Reef-building corals are the engineers of the most diverse and productive coral-reef ecosystem on the earth. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by unresolved relationships within their subclass Hexacorallia. Some phylogenetic analyses suggested that Corallimorpharia might have evolved from a skeleton-forming coral ancestor who lost the ability to calcify in response to increased ocean acidification, thus implying an evolutionary exit mechanism for corals to escape extinction through aragonite skeleton dissolution during periods of increased ocean acidification. Here we employed a phylogenomic approach using whole genome data from six hexacorallian species to resolve the evolutionary relationship between reef- building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same finding, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.

3.2 Introduction

Scleractinian corals form the large calcium carbonate structures that constitute the foundation of the coral reef ecosystem. Their evolutionary history traces back to the early

Triassic around 245 Ma ago, a time of high diversification within this order when 62

multiple coral clades, including the two extant “Complex” and “Robust” clades, appear in the fossil record for the first time (Park et al., 2012a). However, in contrast to fossil evidence, molecular analyses of the evolutionary history of Scleractinia remain inconclusive, with some extant deep-water families suggesting the evolutionary roots of this order to potentially date as far back as ~425 Ma ago. The precise phylogenetic relationships of scleractinian corals within the Hexacorallia has further been elusive due to contradicting phylogenies derived from phylogenetic analyses using different molecular markers and evolutionary models. Yet, understanding the evolutionary history of these organisms is imperative if we aim to understand their evolutionary history and resilience in light of climate change.

Of special interest is the phylogenetic relationship of scleractinian corals to the order

Corallimorpharia (Kitahara et al., 2014). According to Timetree, the calcifying

Scleractinia diverged from the Corallimorpharia approximately 334Mya (Medina et al.,

2006; Park et al., 2012a; Simpson et al., 2011b; Stolarski et al., 2011), from the Actiniaria around 569.4Mya (Park et al., 2012a; Schwentner and Bosch, 2015). The divergence between A. digitifera and S. pistillata is estimated to be 250.1Mya (Park et al., 2012a;

Simpson et al., 2011b). Based on phylogenetic analyses of complete mitochondrial genomes, it was proposed that Corallimorpharia have evolved from a complex coral ancestor approximately 110–132 Ma ago that lost its calcium carbonate skeleton in response to increased oceanic CO2 prevalent during this time period. This finding provided strong support for the so-called “naked coral” hypothesis, which was first coined by Stanley and Fautin (Stanley and Fautin, 2001b) and proposes that corals have 63

lost and re-evolved skeletons repeatedly during the middle Triassic (Figure 4). The importance of this hypothesis lays in its implications as a potential mechanism for corals to escape extinction from aragonite skeletal dissolution during periods of increased CO2 levels such as those projected by future climate change scenarios.

ANTHOMEDUSAE

ACTINIARIA

SCLERACTINIA (Robust)

SCLERACTINIA (Complex)

naked coral SL

CORALLIMORPHARIA

Figure 4. Phylogenetic relationships according to the “naked coral” hypothesis. (SL) marks the putative evolutionary origin of Corallimorpharia from a complex coral ancestor through “skeleton loss”.

64

3.3 Final Results

3.3.1 Nuclear encoded genes support scleractinian monophyly

Phylogenetic analyses of nuclear encoded genes were performed on a set of single-copy orthologs both on the amino acid (aa) and nucleotide (nt) level. To this end, we first identified a suitable set of single copy orthologs (1421) (Table S9) using OrthoMCL (Li et al., 2003) and the genome encoded protein sets of N. vectensis, A. pallida, A. fenestrafer, Discosoma sp., A. digitifera and S. pistillata as well as the hydrozoan Hydra magnipapillata which served as an outgroup. Phylogenetic analyses of aa and nt were performed using Maximum Likelihood (ML) as well as Bayesian Inference (BI) based methods in combination with different evolutionary models.

For the aa based analysis, we concatenated aligned sequences from 1,021 selected single copy orthologs that passed the filtering process, providing a supermatrix with 179,381 aa positions for phylogenetic reconstruction. ML trees were constructed with RAxML using the LG+I+G+F (Figure S7A) as determined by ProtTest (Darriba et al., 2011). BI analyses were performed with MrBayes using the LG+I+G +F (Figure S7B) as well as

Phylobayes using the CAT-LG (Figure S8A), CAT-GTR (Figure S8B), and CAT-Poisson models (Figure S8C). The topologies of both the ML as well as the BI based trees were identical and consistently showed maximum support for all nodes, independent of the substitution model used (Figure 5A). All trees recovered Scleractinia as monophyletic group with Corallimorpharia as its sister group. We also generated independent trees for each of the orthologous genes and obtained a consensus tree (majority rule) (Figure S9). 65

Interestingly, 96% of the single ortholog trees supported the monophyly of

Corallimorpharia, but only 62% of the trees supported the monophyly of Scleractinia.

For the nt based analysis, we concatenated 1,255 selected single-copy orthologs that passed filtering, providing a total of 668,245 positions. Similar to the aa analysis we used

ML and BI based methods and tested different evolutionary models in order to account for potential biases due to long-branch effects. Based on initial model tests using

JModelTest2, we selected the GTR+I+G as the best model for further ML analyses with

RAxML (Figure S10A) and BI analyses with MrBayes (Figure S10B). Phylobayes was also used to infer phylogenetic tree using the CAT-Poisson (Figure S11A), and CAT-

GTR model (Figure S11B). All nucleotide-based analyses (Figure 5B) recovered the same topologies as the aa-based tree with maximum bootstrap and posterior probability support for all nodes independent of the method or model used. 66

Aa Stylophora pistillata Stylophora pistillata B b

100/1/1 100/1/1

Acropora digitifera Acropora digitifera

100/1/1 100/1/1

Discosoma sp. Discosoma sp.

100/1/1 100/1/1

100/1/1 Amplexidiscus fenestrafer Amplexidiscus fenestrafer 100/1/1

Aiptasia pallida Aiptasia pallida

100/1/1 100/1/1

Nematostella vectensis Nematostella vectensis

Hydra magnipapillata Hydra magnipapillata

Figure 5. Phylogenetic analysis of single copy genes. A) Phylogenetic analyses based on amino acid sequences of single copy genes; B) Phylogenetic analysis based on nucleotide sequences of coding single copy proteins. Values on the nodes correspond to branch support from RAxML, MrBayes, and PhyloBayes. Node support values depicted here were identical across all evolutionary models used.

3.3.2 Sequence independent analyses support scleractinian monophyly

The use of sequence data to resolve phylogenetic relationships can produce controversial results when analyzing deep evolutionary splits due to biases such as long-branch attraction or the choice of evolutionary models to determine the evolutionary relationships, even in the presence of whole genome information (Philippe et al., 2011). 67

To address this problem, we also performed sequence independent phylogenetic analyses using presence/absence information of ortholog groups as well as a synteny conservation.

Based on our orthologs/paralogs analysis, we identified 21,718 ortholog/paralogs groups that were used to generate a distance matrix in R using the binary method. Based on this matrix we reconstructed putative phylogenetic relationships through hierarchical clustering of calculated distances across the six Hexacorallia genomes and our outgroup

H. magnipapillata. To provide further statistical support for the nodes, we calculated the approximately unbiased probability and bootstrap probability using pvclust and 1,000 bootstraps. The recovered topology was identical to the phylogenetic analyses using nuclear encoded single copy orthologs and showed very high statistical support for all branches (Figure 6). Monophyletic tree was also recovered with strong statistical support when reconstructed the presence/absence analysis with the binary F81-like model implemented in MrBayes, after applying an ascertaining bias correction by removing genes present in fewer than two species (Figure S12). We further expanded our phylogenetic analyses to the number of paralogs identified in different orthologs/paralogs groups. The result recovered the identical topology with gene presence/absence tree, when distances were calculated using the “canberra” method and clustered using

“average” Hcluster and 100 boostraps (Figure S13).

68

a t a l l i t s i p .

S. pistillata S

0

0

1

3

7

9

)

a r e f i t i g i d . A %

( A. digitifera

s

8

e

9

4

u

l

9

a

9

v

p s . a m o s o c s i D P

Discosoma sp. e

g

B

0

/

a

0

r

y

1

U

e

r

1

v

a

A

a

0

n

i

0

:

0

h

b

1

0

t

d

i

r e f a r t s e n e f .

1 A

o

:

5

h

e w

A. fenestrafer t

0

c

e

0

n

1

m

m

a

t

a

r

s

r

i

e

t

g

D

s

o

u

l r

a d i l l a p . A C

d A. pallida

n

0

e

p 0

#

d

b 1

e

2

r

g

d

u 0

e

e

a 0

t

1

s

s i s n e t c e v . u N

l N. vectensis

C

a t a l l i a p i n g a m .

H. magnipapillata H

5 5 . 0 5 4 . 0 5 3 . 0 5 2 . 0

t h g i e

Figure 6. Phylogenetic analyses based on geneH presence/absence. The nodes supports are AU (Approximately Unbiased) p-values and BP (Bootstrap Probability) values. The distance was calculated using binary methods and hierarchical cluster analysis with 1000 bootstraps using the average Hcluster method.

In order to derive further sequence independent evidence, we analyzed synteny conservation across the six genomes using the phylogenetic reconstruction tool PhyChro.

Due to the phylogenetic distance and the dearth of conserved synteny groups in the H. magnipapillata genome we omitted the outgroup from this analysis. Briefly, we first identified synteny blocks using SynChro (Drillon et al., 2014) with different block stringency parameters (Delta 1-6). In the following step, we analyzed the different outputs using PhyChro to identify "incompatible" block adjacencies between all genome pairs and from them deduce “pairs of incompatible groups of genomes” (PIGGs). A 69

common tree was found for Delta 2-5 (other deltas 1,6 show only partial reconstructions), showing strong support for Scleractinia as a monophyletic group. The scleractinian group

(S. pistillata and A. digitifera) was, as expected, the weakest group in this analysis, and it was the latest identified, after the Actiniaria and Corallimorpharia groups, in the recursive tree reconstruction (Figure 7).

AScr.o poiras dtiiglil+afetra

1/5

SAtyl.o pdhoigra iptisi+flelartaa

ADmpislecxiodiscousm fenaes tsrapfe.r

1/75

DAisc.o sfoemna esps. trafer

AAip.ta psiaa plallliidaa

0/28

NNem. avtoestceltlae vnecsteinssi s

Figure 7. Phylogenetic analyses based on synteny conservation.Hydra magnipapillat aThe unrooted tree (represented as rooted as in Figures 5 and 6 for more clarity) represents the topology obtained by PhyChro for Delta 2-5. The values on the branches correspond to the total number of PIGGs (pairs of incompatible groups of genomes) contradicting the branch / the total number of PIGGs supporting the branch obtained for Delta 2-5. 70

3.4 Methodology

3.4.1 Orthologs/Paralogs group identification

In order to prepare the gene models for the subsequent comparative analyses, we discarded alternative isoforms and only selected the longest one. Further, we removed proteins with fewer than ten amino acids or low sequence quality (more than 20% stop codons, more than 20% non-standard amino acids). This resulted in

26,908, 29,253, 21,327, 23,144, 23,232, 27,385, 32,338 genes for N. vectensis, A. pallida,

A. fenestrafer, Discosoma sp., A. digitifera, S. pistillata, and H. magnipapillata respectively. After preparing the final gene and corresponding protein sets we ran

OrthoMCL v2.0.9 (Li et al., 2003) using an e-value cut-off of 10-5 to create groups of orthologs and paralogs across all seven genomes that were subsequently assigned to the latest OrthoMCL-DB v4 (http://orthomcl.cbil.upenn.edu) (Chen et al., 2006) for further validation. We only retain the single orthologs with common Pfam domain or BLAST annotation, which result in a set of 1,421 groups for the subsequent phylogenetic analyses.

3.4.2 Nuclear single-copy ortholog phylogenies

We performed separate phylogenetic analyses on both the amino acid (aa) and the nucleotide (nt) sequences using selected single copy genes. For the aa-based analyses, we first aligned the sequences with MUSCLE v3.8.31 (Edgar, 2004) using default settings.

All alignments were optimized manually and trimmed to remove poor sequence alignments with Gblocks (Talavera and Castresana, 2007). Then we concatenated the remaining 1021 alignments, generating a supermatrix of 179,381 aa positions. For the 71

subsequent phylogenetic analyses we first determined the most appropriate substitution model using ProtTest v2.4 (Darriba et al., 2011). In the following we performed analyses using both Maximum Likehood (ML) as well as Bayesian Inference (BI) based methods.

For ML inference, we ran the PTHREADS version of RAxML v8.1.22 (Stamatakis,

2014) using the best estimated model LG+I+G+F with 1000 bootstraps. For the BI phylogenies, we used the parallel (MPI) version of MrBayes v3.2.6 (Ronquist et al.,

2012) to construct a phylogenetic tree using the model LG+I+G+F. In addition, we also ran PhyloBayes-MPI (Lartillot et al., 2013) to infer phylogenetic trees using a range of different models, including the CAT, CAT-GTR, CAT-Poisson and GTR model, which were shown to be less sensitive to saturation and long-branch attraction artifacts (Lartillot et al., 2007). The phylogenetic analysis of the nucleotide sequences followed the same overall approach as proteins. Differentially, we determined the most appropriate evolutionary model using jModelTest v2.1.10 (Posada, 2008). The ML and BI analysis were performed using RAxML and MrBayes with GTRGAMAI and GTR+I+G respectively. We also ran PhyloBayes with the CAT-Poisson, and CAT-GTR model. All

BI-based analyses were checked for convergence. The resulting trees were visualized and analysed using Figtree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree).

3.4.3 Presence/Absence analysis

Homologs, including orthologs and paralogs, were identified using OrthoMCL as described above. Based on 21,718 orthology/paralogs presence/absence matrix, we calculated the distances between species using the binary (aka asymmetric binary) method in R and conducted a hierarchical cluster analysis using the average Hcluster 72

method. Node supports were calculated as Approximately Unbiased p-values (AU) and

Bootstrap Probabilities (BP) using the R package Pvclust (Suzuki and Shimodaira, 2006) and 1000 bootstraps. To further provide a probabilistic analysis of the presence/absence information, we ran MrBayes using the binary restriction-site model and a discrete gamma distribution with four site rate categories (rates=gamma) after applying an ascertainment bias correction by removing genes present in fewer than two species

(Pisani et al., 2015). In addition, we also reconstructed a phylogenetic tree using the number of paralogs identified for each species in the previously defined ortholog/paralog groups. Briefly, we calculated the distance based on the number of paralogs using the

“canberra” method implemented in the R since the binary method used for the presence/absence data is not suitable for count data. The results were then clustered using the “mcquitty” method and node supports were calculated using the R package Pvclust as described above.

3.4.4 Synteny-based phylogeny

Synteny conservation analyses were performed using the programs SynChro (Drillon et al., 2014) and PhyChro from the CHROnicle package (Drillon, 2013). In a first step we identified synteny blocks with SynChro using multiple “Delta” (block stringency parameter) values ranging from 1-6. Subsequently we analysed the SynChro outputs using PhyChro in order to reconstruct phylogenetic relationships based on chromosomal rearrangement signals. Briefly, PhyChro looks for "incompatible" blocks adjacencies between two genomes (such as, for instance, the adjacency of the blocks A and B in a genome G1, and the adjacency of the blocks A and C in G2). Once these "incompatible" 73

blocks adjacencies or "breakpoints" are identified between G1 and G2, PhyChro looks into the other genomes G3..Gn, to see if one or the other would not share the adjacency

(A,B) or the adjacency (A,C). If (A,B) is shared by G1 and at least one other genome,

(A,C) by G2 and at least one other genome also, we got what we called a "PIGG": a pair of incompatible groups of genomes. Based on the parsimony principle, PhyChro tries to reconstruct a phylogenetic tree where each PIGG can be associated with a unique rearrangement that can be associated with the branch(es) that separates the two groups of genomes. PhyChro can be obtained upon request from [email protected] .

With extreme Delta values (Delta=1 and Delta=6), block synteny information was too scarce to resolve all groups. Other Delta values resulted in fully resolved trees, showing a common topology (which by the way included the partial trees reconstructed for the extreme Delta values). The resulting tree was parsimonious with only one PIGG out of the 263 PIGGs identified across the six analyses that could not be associated to a unique path of branches, i.e. to a unique rearrangement. To incorporate an additional measure of branch support, we further provide the number of PIGGs rejecting a branch as well as the number of PIGGs supporting this branch. A PIGG is said to reject a branch when this branch implies at least two rearrangements to “explain” the PIGG and a PIGG is said to support a branch when the rearrangement to which it is associated must have occurred on this specific branch. Figure 3 shows the total number of PIGGs rejecting/supporting a branch obtained with the different Deltas 2-5. 74

S. pis' llata A A. digi' fera Discosoma sp. A. fenestrafer A. pallida N. vectensis H. magnipapillata

B Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 C

S. pis' llata S. pis' llata

A. digi' fera A. digi' fera

Discosoma sp. Discosoma sp.

A. fenestrafer A. fenestrafer

A. pallida A. pallida

N. vectensis N. vectensis

H. magnipapillata H. magnipapillata

Figure 8. A) The “Classical” sequence based phylogenetic approach. This approach determines phylogenetic relationships based on changes in amino acid (see Figure 5A) and/or nucleotide sequence (see Figure 5B). Fewer changes generally indicate closer relationship between species. B) The presence/absence approach determines phylogenetic relationship based on the presence or absence of genes in the genome (see Figure 6). This approach assumes that closer related species share more common genes (blue=present, red=absent). C) The synteny-based analysis determines phylogenetic relationship based on the conservation of gene order in the genome (see Figure 7). The farther species are from one another the lower the conservation of gene order due to gene duplications, losses and genome rearrangements.

3.5 Discussion

The origin of scleractinian corals and their relationship to other Hexacorallia is essential for our understanding of coral evolution and their potential ability to respond to current threats imposed by climate change. The aim of this study was to provide an unambiguous answer to the ongoing debate about the phylogenetic relationship of scleractinian corals and their closest relatives, the Corallimorpharia, and to resolve if corals can potentially 75

adapt to increased ocean acidification through skeletal loss as proposed by the “naked coral” hypothesis. To provide an explicit answer, we conducted the first phylogenomic analysis based on whole genome data from six Hexacorallian genomes including the anemone N. vectensis and A. pallida, the Corallimorpharia A. fenestrafer and Discosoma sp., and the reef-building corals A. digitifera and S. pistillata. Here, we applied sequence- based and sequence-independent approaches (including presence/absence data and synteny conservation), thus providing three independent methods that elucidate the hypothesis.

Our phylogenetic analyses, based on a subset of stringently high confidence nuclear encoded genes that provided the longest alignment used to date, validated the phylogenetic relationship between Scleractinia and Corallimorpharia (Kitahara et al.,

2014; Lin et al., 2016). The sequence-based results using nucleotides and amino acid phylogenetic reconstructions, investigating across all analyses independent of method or model, consistently recover Scleractinia as monophyletic group with Corallimorpharia as a close sister clade with maximum support for each node. In spite of limited number of species used in phylogenetic analyses, our results are in agreement with recent studies based on 14 hexacorallian transcriptomes, including three Corallimorpharia species (Lin et al., 2016).

The use of presence/absence and synteny information has previously been employed to provide additional support for sequence-based phylogenies in other organisms (Pisani et al., 2015; Ryan et al., 2013; Tian et al., 2012), but represents a novelty in the field of coral research due to the dearth of genomic data, although this is expected to change 76

(Voolstra et al., 2015). Similar to sequence-based phylogenetic reconstruction methods these analyses are not entirely free of biases and potential errors (Pisani et al., 2015).

However, the use of two sequence-independent approaches and two genomes for each of the orders analysed provides additional confidence and support for our results.

While all phylogenetic reconstructions are subject to biases and errors, our sequence- based and sequence-independent-based findings clearly show that corallimorpharians did not evolve from a complex coral ancestor but rather from a common ancestor of complex and robust corals. Consequently Corallimorpharia do not qualify as proof of the “naked coral” hypothesis, and it remains to be proven if scleractinian corals can indeed lose and regain the ability to calcify over geological timescales. Experimental approaches on selected coral species have shown that reef-building corals can indeed lose the ability to calcify under extreme conditions (Fine and Tchernov, 2007) and recover; however, it remains to be shown if these responses observed in short term experiments allow coral species to survive for evolutionary relevant periods.

Although corallimorpharians do not appear to be the paragon of coral survival in light of increasing ocean acidification, they still represent the closest living, non-calcifying relatives to date, making them the best candidates for future studies aiming at understanding the evolution of corals and their traits, in particular calcification. Given the alarming need and heightened awareness due to the most recent global bleaching event, we believe that our work is of outstanding and timely significance to the field as well as the general public as it settles a long-standing debate about the potential evolutionary mechanisms for corals to respond to the increasing threats from ocean acidification 77

imposed by global change. As such, it aids to accelerate research towards climate change mitigation and deters from settling on false hopes.

3.6 Author’s contributions

M.A. designed and conceived the study. M.A. and C.R.V. contributed reagents/materials/analysis tools. X.W., analysed the sequence based and gene presence/absence parts; G.D. and X.W analysed the synteny phylogenetic part. X.W. and

M.A wrote the manuscript with input from all authors. 78

Chapter 4 The evolution of coral calcification

Xin Wang1, Didier Zoccola2, Yi Jin Liew1, Sylvie Tambutte2, Manuel Aranda1,*

1King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC), Biological and Environmental Sciences & Engineering Division (BESE), Thuwal, 23955-6900, Saudi Arabia 2Centre Scientifique de Monaco, 8 quai Antoine Ier, Monaco, 98000, Monaco

This manuscript is prepared for the submission to Nature Communications. 79

4.1 Abstract

Coral calcification is a biologically controlled process that is orchestrated by the organism through the supply of calcium and bicarbonate ions and the secretion of the skeletal organic matrix. Although the process of coral calcification has been extensively investigated, the details of key molecules and their corresponding evolutionary origin and history are still poorly understood. Our phylogenetic analysis across six hexacorallians indicates the short-term history of the Scleractinia to acquire the calcifying ability from soft-body ancestors. The conservation amongst diverse coral skeletal proteomic data shows that only few SOMPs were already present in the SOM of the last common ancestor of robust and complex corals, suggesting that only few proteins were required to control the deposition of calcium carbonate. In-depth analyses of these putative ancestral

SOMPs also indicate that corals did not require extensive evolutionary changes, such as de novo innovation and horizontal gene transfer, to evolve the capability of calcifying.

Instead, our findings integrating of the genomic, transcriptomic, and proteomic data, strongly support gene duplications and co-option of available proteins are the fundamental evolutionary mechanisms of coral calcification, followed by a series of small, gradual changes to optimize protein sequence to the newly acquired role.

4.2 Introduction

Coral calcification, i.e. the deposition of a calcium carbonate skeleton, is a biological process that is mainly controlled by two processes: a sustainable ion supply to adjust calcifying fluid composition to promote aragonite crystal growth, and the catalysis of this 80

process by skeletal organic matrix proteins that control crystal formation and structure arrangements (Falini et al., 2015).

The result of coral calcification is the formation of hard crystalline shell or skeletal structures (Lowenstam and Weiner, 1989), made of calcium carbonate. Therefore, the main prerequisite for coral calcification is the abundant and persistent supply of Ca2+ and

2- CO3 to the site of calcification (Figure 9). It was previously found that calcium ions cannot be stored in the calcifying fluid and must therefore be continuously transported from the external seawater through the epithelial layers to the calicoblastic cells (Gagnon et al., 2012). Corals must therefore produce numerous vital molecules to control the transport of calcium and other ions from the seawater to the calcifying fluid. Calcium ions can be transported by several proteins, such as Ca-ATPases, which exchange two calcium ions with four protons across the cell membrane (Ip et al., 1991; Zoccola et al.,

2004; Zoccola et al., 1999). The second essential component of coral calcification is

- dissolved inorganic carbon. The provision of HCO3 is supported by specific transporters, where two distinct membrane protein families, solute carrier 4 (SLC4) and solute carrier

26 (SLC26), are responsible for the transport of bicarbonate to the calcification site

(Zoccola et al., 2015a). However, there are also a enzymes belonging to the group of carbonic anhydrases that can facilitate and catalyze the hydration of metabolic CO2 into

- HCO3 in calcifying fluid (Moya et al., 2008b) (Figure 9). The supersaturation of calcium and bicarbonate ions in the calcifying fluid plays an important role in crystal establishment and growth but not in determining coral colony morphology. However, proteins secreted from the calicoblastic cells contribute the organic matrix nucleation and 81

stabilization, the connection between calicoblastic cells and skeletons, the arrangement of skeleton growth, as well as morphology differentiations. Previous proteomic analyses of coral skeletons by Drake et al. (Drake et al., 2013) and Ramos-Silva et al. (Ramos-Silva et al., 2013b) also identified several extracellular matrix like proteins including LaminG,

CUB-domain and EP-like proteins, as well as several transmembrane proteins including cadherin-like, neurexin, EGF domain, Zona pellucida domain, and Mucin4 like proteins.

The majority of these SOMPs have been intensively studied with regard to their roles in the biological control of skeleton formation, calcium biochemistry pathways, and crystal growth; however, the evolutionary history and recruitment to the calcification process remains unknown.

Figure 9. The main proteins and processes involved in coral calcification (from (Zoccola et al., 2016). Calcium transported from the calicoblastic cells to the calcification site by Ca2+-ATPase pump in exchange for protons. Transfer of bicarbonate into the sub- calicoblastic space occurred via bicarbonate anion transporter (BAT). Bicarbonate ions - were also catalyzed by carbonate anhydrase to convert CO2 into HCO3 . The formation of aragonite skeleton is controlled by skeletal organic matrix. 82

Although our understanding of the processes that enable corals to build these immense structures has significantly improved over the last decades, we still know little about the evolutionary changes that allowed the soft bodied, anemone-like ancestor of corals to become the ecosystem builders they are today. Studies suggest that their order,

Scleractinia, might be as old >400 Mya. However the earliest evidence of hermatypic corals, i.e. reef-building corals, only appears in the fossil record around 245 Mya (Jin et al., 2000; Veron, 1995), shortly after the Permian-Triassic extinction event that killed

>90% of all marine species (Erwin, 1994). Analyses of morphological and molecular markers indicate that calcification evolved multiple times independently in difference cnidarian lineages (Miglietta et al., 2010) and some hypothesis even propose that scleractinian corals might have lost and regained the ability to calcify multiple times during their evolution (Wang et al., 2017a). These findings suggest that the ability to calcify might not have required extensive genetic adaptations and the evolution of many specialized proteins. Then what is required and how did the non-calcifying ancestor of reef-building corals evolve this ability? To further understand how corals acquired their ability to calcify, we used a comparative genomic approach based on two evolutionary divergent scleractinians (A. digitifera (Shinzato et al., 2011), S. pistillata (Voolstra et al.,

2017)), their closest non-calcifying relatives, the Corallimorpharia (A. fenestrafer,

Discosoma sp.(Wang et al., 2017b)) and two sea anemones (N. vectensis (Putnam et al.,

2007) and A. pallida (Baumgarten et al., 2015). Compared to sea anemones, corallimorpharians, are the best candidate to derive the evolutionary history of 83

scleractinian calcification as they are the closest living non-calcifying relatives of corals

We specifically traced the origin and evolutionary history of important calcification genes and protein constituents of the skeletal organic matrix (SOM) to identify the evolutionary innovations that turned the ancestor of corals into the founding species of the iconic coral-reef ecosystem.

4.3 Results

4.3.1 Short divergence time of coral calcification

Our phylogenetic analysis showed that Corallimorpharia did not evolve from the complex coral but evolved independently with Scleractinia after the divergence from Actiniaria

(Figure 10A), indicating that Corallimorpharia are better candidates than Actiniaria to explore the evolution of coral calcification. Distribution of synonymous substitutions per synonymous site (Ks) for the orthologs identified across all six hexacorallian genomes showed that Scleractinia had a closer divergence time to Corallimorpharia than to

Actiniaria, dating back to ~334Mya (Figure 10A, 10B). Fossil records showed that the robust coral and complex coral diverged around the early Triassic 245Mya (Park et al.,

2012b; Simpson et al., 2011a), a time of high diversification within this order (Figure

10A, 10B) (Park et al., 2012a). Therefore, the evolution of scleractinian calcification can be placed within a time window of approximately ~80 million years between the divergence of Scleractinia from Corallimorpharia ~334Mya and the divergence of the two major coral clades ~245Mya. 84

To determine de novo gene emergence in corals, we first identified 472 scleractinian - specific gene families that only showed up in both scleractinians but not in non-calcifying hexacorallians and investigated their gene ontology (GO) enrichment to determine the underlying processes. The GO enrichment analysis of these coral specific gene families

(Figure S14) showed that the majority of these proteins are not likely involved in coral calcification but rather in basic biological process, such as immune response and neural development thereby potentially contributing to the genetic diversification and specialization of corals. These findings also implied that gene innovation, i.e. the appearance of new gene families might not have been an important factor for the evolution of coral calcification. This prompted us to reconsider that the evolution of coral calcification might have been driven by gene duplication and/or co-option of existing genes. Therefore, we identified 478 scleractinian lineage-expanded gene families that showed significantly higher numbers or paralogs in both scleractinians compared to the other non-calcifying hexacorallians and also performed a GO enrichment analysis to check for potential functional enrichment in these gene families. Surprisingly, those proteins showed significant enrichment of processes involved in biological adhesion, cell adhesion, cell-cell junction (Figure 10C). These processes undoubtedly play critical roles in coral calcification, suggesting scleractinian lineage-specific expansions might have been an important mechanism for the evolution of coral calcification. Further, enrichment analysis of Pfam domains encoded across the six hexacorallian genomes illustrated that some extracellular domains (Figure S15), such as collagen, fibronectin, F5/8 type C domain and fibrillar collagen, and transmembrane domains (Figure S15), including calcium-binding EGF domains, CUB domain, and LDL receptors, had also experienced 85

significant expansions in both scleractinian clades. These domains were previously identified in skeletal organic matrix proteins and are thought to play important roles in crystal formation and arrangement, suggesting that domain expansions might also have played a role in the evolution of coral calcification.

4.3.2 The evolution of plasma membrane calcium ATPase

Calcium ions need to be continuously transported from external seawater through the epithelial layers to calicoblastic cells and the calcifying fluid during calcification

(Gagnon et al., 2012). Calcium ion transport to the calcifying fluid is supposedly performed by calcium ATPases that exchange two calcium ions for four protons across the membrane (Ip et al., 1991; Zoccola et al., 2004; Zoccola et al., 1999). Searching the six genomes for calcium-transporting ATPase’s (PMCA), we identified two hexacorallian specific gene duplications (PMCA1-3). Interestingly, all six genomes encoded at least two of these copies in tandem (Figure 11A, 11B). In corals, the third copy (PMCA3) was located adjacently while it was encoded on a different genomic locus in the two anemones and completely absent in the Corallimorpharia genomes (Figure 11C).

Immunolocalization analyses using cross-hybridizing antibodies showed that PMCA1 and PMCA2 are localized ubiquitously in both Corallimorpharia and Actiniaria (Figure

11D) consistent with their ubiquitous expressions (Figure 11C). However, PMCA1 and

PMCA2 displayed strong localization to the calicoblastic ectoderm in the coral S. pistillata (Figure 11D). Here PMCA1 showed the most robust staining in line with being the highest expressed homolog in both corals (Figure 11C). In contrast to this, we also found PMCA3 to be ubiquitously expressed in Actiniaria while showing localization to 86

small spots of high intensity in the coral tissues (Figure 11D). These findings agree with the generally lower expression of this ortholog (Figure 11C). It should be further noted that no antibody staining was observed in Corallimorpharia (Figure 11C) in accordance with the absence of this gene in this lineage (Figure 11A, 11B).

4.3.3 The evolution of bicarbonate transporter

The second essential component of coral calcification is the provisioning of bicarbonate, which is transported from calicoblastic cells through members of two distinct membrane transporter, the solute carrier 4 (SLC4) and solute carrier 26 (SLC26) transporter families

(Zoccola et al., 2015a). Previous studies have identified eight potential bicarbonate transporters in corals, of which five (SLC4α - γ) belong to the SLC4 and three to the

2- - - - SLC26 families (the SO4 transporters, the Cl /HCO3 exchangers, the selective Cl channels) (Zoccola et al., 2015a). A phylogenetic analysis of these genes identified a duplication of the SLC4β gene, termed SLC4γ, in both coral genomes but not in the genomes of the sea anemones Nematostella or Aiptasia. Synteny analysis of all six genomes in our study showed overall high conservation of the genomic locus surrounding

SLC4β (Figure 12B). Immunolocalization of the SLC4β, coded by the ancestral gene, revealed the ubiquitous expression in all coral tissues but stronger localization to the calicoblastic ectoderm (Figure 12B). Immunohistochemical localization of the SLC4γ protein showed specific expression in the calicoblastic ectoderm of the coral S. pistillata.

Our results suggested that this coral-specific gene duplication likely represents an adaptation to calcification (Figure 12A, 12B, 12C) (Bhattacharya et al., 2016; Zoccola et al., 2015a). 87

4.3.4 The evolution of carbonic anhydrase

Numerous studies have also shown that carbonic anhydrases facilitate and catalyze the

- hydration of metabolic CO2 into HCO3 (Hopkinson et al., 2015; Moya et al., 2008b).

Some of these CAs are secreted directly into the calicoblastic fluid and catalyze this reaction at the site of calcification (Hopkinson et al., 2015). Comparison of the CA repertoire across the six genomes showed frequent lineage-specific expansions with unique gene duplications being evident across all genomes (Figure S16). However, both scleractinian genomes consistently showed a higher number of duplications of both secreted and membrane bound CAs (Lin et al., 2017) when compared to corallimorpharians or actiniarians (Figure 13A). Lineage-specific duplications of cytoplasmic CAs in corals showed signatures of extracellular localization, suggesting subcellular relocalization of the respective proteins (Figure 13A). Interestingly, we found that the extracellular CAs (CA12/CA14) were only found in corals and experienced further lineage-specific duplications after the divergence of robust and complex corals

(Figure 13A). Furthermore, we observed that these extracellular CAs appear to be higher expressed than intracellular CAs in adult corals (Figure 12B) and also showed higher expression in calcifying stages of A. digitifera than non-calcifying stages (Figure 13C).

4.3.5 Identification of core sets of skeleton organic matrix proteins (SOMPs)

To determine a core set of SOMPs that were likely present in the ancestor of contemporary reef-building corals, we sequenced the protein constituents of the SOM from S. pistillata (65 proteins) (Table S10) by LC/MS/MS analysis and compared it to proteomic data from 3 previous studies including Drake et al. (36 SOMPs in S. 88

pistillata.), Ramos-Silva et al. (36 SOMPs in Acropora millepora) and Takeuchi et al. (30

SOMPs in Acropora digitifera). Reciprocal BLASTP analyses between these skeletal proteomes identified seven proteins that were commonly identified in the skeletons of these robust and complex corals (Figure 14, Table S11, S12). These seven-shared SOMPs were subsequently defined as the core sets of SOMPs that were likely present in the ancestor of both complex and robust corals. Despite differences in annotation, most of these genes shared similar domain architectures and thus likely similar functions in

Scleractinia. These proteins are coral acid rich proteins, Mucin-4 like proteins,

Protocadherin fat 4, CUB domain containing protein, MAM and LDL-receptor class A domain–containing protein, Contactin-associated protein and Collagen alpha-6 (VI) chain

(Figure 14). To avoid possible biases of different proteomic studies, we also restricted the core set of SOMPs to three studies. These proteins included Vitellogenin-like, REJ domain-contain proteins, uncharacterized skeletal organic matrix protein 8 (USOMP8) and Galaxin. This core set of SOMPs together accounted for 71.73% of the total spectral counts in the S. pistillata SOM (Table. S12). In order to unravel the evolutionary origins of these proteins, we further used BLASTP searches against the two Corallimorpharia and sea anemone genomes. We identified clear homologous proteins for each of these

SOMPs in the genomes of both Corallimorpharia and Actiniaria, suggesting that the ancestors of these proteins predated the divergence of corals and their non-calcifying relatives. To further identify the specific evolutionary mechanisms underlying of coral calcification, we then determined the evolutionary history of these SOMPs. 89

4.3.6 Coral acid-rich proteins

Acidic proteins are critical components during the calcification, as aspartic acid and glutamic acids have the ability to interact with calcium ions through their negative charge at neutral pH (Addadi and Weiner, 1985; Constantz and Weiner, 1988; Mitterer and

Kriausakul, 1989; Puverel et al., 2005). All four proteomic data sets consistently identified the coral acid-rich protein4 (CARP4) to be the most abundant protein in the

SOM (24% of SOM), while the other members of this family alpha integrin-like protein

(CARP4 γ) and CARP5 showed lower (1.92% and 1.66% of SOM) but still high abundance (Takeuchi et al., 2016) (Table S12). Previous studies showed that distant homologs of these proteins were found in the sea anemones Nematostella and

Anthopleura, but these studies could not resolve if the observed diversification of these acidic proteins was indeed a coral specific innovation or already present in the closest non-calcifying relatives (Bhattacharya et al., 2016). Our six genome comparisons showed that the last common ancestor of robust and complex corals experienced two scleractinian specific duplications of the ancestral CARP4 gene after the divergence from

Corallimorpharia approximately ~334 Mya, while both N. vectensis and A. pallida independently evolved three species-specific duplications (Figure 15A). Comparison of the coral-specific duplications to the single homolog present in Corallimorpharia revealed significant extensions of acid rich (Asp and Glu (D, E)) stretches in the coral homologs, suggesting evolutionary adaption of these novel proteins to the calcification process

(Figure 15B). Interestingly, we found that the overall acidity of the coral orthologs generally correlated with their relative abundance in the SOM, i.e. the most acidic 90

ortholog CARP4 in S. pistillata consistently showed the highest protein abundance in all four studies while the less acidic orthologs CARP4γ and CARP5 were significantly less abundant (Table S12). These results showed that a proto-coral acidic protein was already present in the ancestor of corallimorpharians and scleractinians, and subsequently underwent scleractinian specific gene duplications as well as changes in acidic domain composition to adapt to scleractinian coral calcification.

4.3.7 Crystal lattice associated proteins

It was supposed that coral acidic proteins can induce and stabilize synthetic amorphous calcium carbonate in vitro (Mass et al., 2017; Wolf et al., 2011). However, coral calcification is also controlled by skeletal transmembrane proteins that facilitate cell-cell and cell-substrate adhesion or determine the structural support for the 3D organization of coral colonies, and extracellular proteins that regulate the deposition of a calcium carbonate skeleton (Helman et al., 2008).

Our broad comparison of the secreted transmembrane proteins found in the SOMP fraction, such as Mucin4 (Figure S17), Procadherin (Figure S18), Neurexin (Figure S19),

Zona pellucida domain proteins (Figure S20) and REJ domain-containing protein (Figure

S21), consistently showed high conservation and similar domain architectures across all 6 species. For example, all mucin4-like proteins contained the same domain architectures in the C-terminus (nidogen (NIDO), adhesion-associated domain (AMOP), von

Willebrand factor D (VWD) and EGF-like domain) (Figure S17) to interact protein- protein adhering and anchor between the cell surface and extracellular matrix (126).

Procadherins are composed of two laminin G domains, one cytoplasmic cadherin, and 91

multiple extracellular cadherins (Figure S18), where two laminin-G domains, Ca2+ mediated receptors, are thought to adhere the extracellular and cytoplasmic (Hohenester et al., 1999), and cadherins are glycoproteins involved in Ca2+ mediated cell-cell adhesion (Takeichi, 1988). Another Laminin-G domain-containing protein, Neurexin, is supposed to connect calicoblastic cells to extracellular matrixes. Neurexin in S. pistillata showed similar domain architecture to their actiniarian homologs, whereas the A. digitifera ortholog showed higher similarity to the corallimorpharian homologs but further experienced multiple gene duplications during the specialization (Figure S19).

The Zona pellucida domain proteins, which are supposed to be responsible for intra- and inter-molecular disulfide bridges and polymerization of proteins, also exhibited high conservation and identical domain architecture (Figure S20) (Boja et al., 2003). REJ domain-containing protein, which has been shown to function in ion transport homeostasis (Moy et al., 1996) and to be required for cleaveage at the GPS (Qian et al.,

2002), also exhibited a highly conserved phylogenetic tree in agreement with their phylogeny (Figure S21). The REJ domain-containing protein consists of a series of domains, and its overall domain distribution is very similar to actiniarians but slightly different from both corallimorpharians (Figure S21). Despite their conservations in

Hexacorallia, these transmembrane proteins still appear to have experienced slight adaptations during or before their co-option to the calcification process, evident by the expansion of highly repetitive or tandem domains, such as Thrombospondin type 1 (TSP1) in the N-terminus of Mucin4-like proteins (Figure S17), and Laminin G domain in

Neurexin (Figure S19). 92

In SOMPs, there are also three other conserved proteins that not only execute the transmembrane functions but also acquired extracellular signatures. SOM proteomic data consistently showed high spectral counts for collagen alpha-6 (VI) (COL6A6) homologous (14.81%), which is proposed to tether aragonite crystal (Nudelman et al.,

2010). Hexacorallian phylogenetic analysis of COL6A6 revealed a scleractinian-specific gene duplication (Figure S22). Protein domain analyses highlighted multiple transmembrane domain insertions and deletions, whereby coral COL6A6 homologues showed different patterns of domain insertions and deletions of cell-cell and cell- substrate adhesion domains such as TSP1 and Willebrand factor type A (VWA) (Figure

S22). The unique extracellular F5/8 type C (FA58C) domains, inserted in both termini of scleractinian homologs, was found to be the major protein domain in many blood coagulation factors (Kane and Davie, 1986). This protein with the domains of VWA,

FA58C, TSP1, therefore, is assumed to modulate the coagulation of amorphous calcium carbonate and tethering of the aragonite crystals. Another Fibronectin II domain containing protein was assumed to be the main component of the acid-insoluble and acid- soluble organic matrix of the aragonitic skeleton (Ramos-Silva et al., 2013b). This protein showed high conservation and similar transmembrane domain architecture to the

MAM and LDL-receptor, where these domains exhibited distinctive duplications in corals (Figure S23). Besides the extension of MAM and LDL-receptor domains, scleractinian genes also displayed the unique existence of Fibronectin II domain that was completely lacking in their corallimorpharian counterparts (Figure S23). This unique insertion of the Fibronectin II domain might help in the specific arrangement of coral aragonite through the connection of collagens. Vitellogenin-like protein showed high 93

conservation with similar domain architecture of Lipoprotein amino terminal region and

VWD domains (Figure S24). Here, we noticed the presence of a protein kinase domain in

A. digitifera (Ramos-Silva et al., 2013b; Takeuchi et al., 2016), but the fact that failed to identify in S. pistillata suggested that protein kinase domain in Vitellogenin-like protein is not essential for scleractinian calcification (Figure S24).

Additionally, we identified uncharacterized SOMP-8 (USOMP8) and cysteine-rich

Galaxin in three SOMPs studies. USOMP8, transmitting signals within cells to affect cell adhesion, showed high conservation and identical domain architecture (Figure S25).

Galaxin was originally suggested to be a coral-specific secretory (Bhattacharya et al.,

2016; Ramos-Silva et al., 2013b; Takeuchi et al., 2016; Watanabe et al., 2003). However, our broad comparative analysis of galaxin-like proteins showed that they not only appeared in non-calcifying taxa, such as corallimorpharians, but also revealed quite divergent evolution across species (Figure S26).

4.4 Methodology

4.4.1 Identification of scleractinian species-specific and expanding gene families

The complete genomes and gene models of A. digitifera, S. pistillata, A. fenestrafer,

Discosoma sp., N. vectensis and A. pallida are available at http://reefgenomics.org (Liew et al., 2016). In order to prepare the gene models for the subsequent comparative analyses, we discarded alternative isoforms and only selected the longest one per locus. Further, we removed proteins with fewer than ten amino acids or low sequence quality (more than 20% stop codons, more than 20% non-standard amino acids). After preparing the final gene 94

and corresponding protein sets we ran OrthoMCL (Li et al., 2003) using an e-value cut- off of 10-5 to create groups of orthologs and paralogs across all seven genomes (H. magnipapillata included as outgroup) that were subsequently assigned to the latest

OrthoMCL-DB v4 (http://orthomcl.cbil.upenn.edu) (Chen et al., 2006) for further validation. Protein sequences from final orthologous groups were aligned using

MUSCLE (Edgar, 2004) using default settings and trimmed using TrimAl with automated1 (Capella-Gutiérrez et al., 2009). To build the phylogenetic trees, we detemined the best model with Prottest (Darriba et al., 2011) and performed phylogenetic analysis using RAxML (Stamatakis, 2014) with branch supports estimated using 1000 bootstrap replicates. The final trees were visualized and modified with FigTree.

Corresponding coding and amino acid sequences were retrieved from orthologs mentioned above. Multiple protein sequences within each group were aligned using

MUSCLE with default parameters. Nucleotide alignments were generated from the alignment of corresponding proteins using Perl scripts. Then synonymous substitution rate was calculated using codeml program from PAML v4.8 (Yang, 2007). To estimate the divergence, the Ks distributions of pair-wise orthologs were visualized using ggplot2

(Wickham, 2016).

To further understand gene family expansion or contraction in Scleractinian in comparison to other non-calcifying species, we calculated the orthologs/paralogs groups to analyze changes in the gene family size. Each gene family was subsequently transferred to a normalized matrix based on their z-score profile. The function of each family was determined through Pfam domain composition and BLAST results. The 95

species specific and expanded gene families in Scleractinia were obtained from this group matrix. The corresponding genes for specific and expanded gene families were analyzed for functional enrichment based on their GO assignment and Pfam domain composition.

4.4.2 OM extraction for S. pistillata

Organic matrix proteins were extracted as previously published (Tambutté et al., 2015).

Briefly, coral branches were treated with sodium hypochlorite to prepare skeletons, then cryo-ground into powder. The powder was incubated with sodium hypochlorite to remove potential contaminants such as endoliths. Then the powder was demineralized in

EDTA and the obtained solution was filtered through Sep-Pak Plus C18 cartridges

(Waters, 5 kDa). Protein content was determined using the bicinchoninic acid assay kit

(BC Protein Assay, Interchim). The standard curve was established with bovine serum albumin and the absorbance was measured with a microplate reader (EpochTM, Bioteck,

US) at 562 nm. Our proteomic data detected by LC/MS/MS analysis in S. pistillata showed 65 SOMPs (Table S10), where only 48 of them had the strong supports that displayed at least two of samples with spectrum counts.

4.4.3 Transcriptome analysis

Low quality ends of sequenced reads from six hexacorallians, including A. digitifera

(PRJDB3244) (Reyes-Bermudez et al., 2016), S. pistillata (PRJNA415215) (Voolstra et al., 2017), A. fenestrafer (PRJNA354436) (Wang et al., 2017b), Discosoma sp.

(PRJNA354492) (Wang et al., 2017b), N. vectensis (PRJNA189768) (Helm et al., 2013) and A. pallida (PRJNA386175) (Baumgarten et al., 2015), were trimmed using 96

Trimmomatic (Bolger et al., 2014) and quality-checked using FastQC v0.11.3 (Andrews,

2010). We further assembled each transcriptome using Trinity v2.02 using filtered RNA- seq data. The remaining reads were also assigned to their corresponding gene models using Kallisto v0.42.2 (Weijers et al., 2012) to validate the existence of genes and to quantify gene expression levels. In addition, we used the same procedures to quantify gene expression based on the raw reads obtained from different embryonic stages and adult tissue of A. digitifera (Reyes-Bermudez et al., 2016). The heat map and clustered matrix were created using R with Bioconductor and pheatmap (Kolde, 2015).

4.4.4 Identification of ion transporters

Based on previously published data, we established a list of putative ion transporters involved in calcium and bicarbonate supply of coral calcification based on the analogies with transporters previously described in mammalians, including PMCA, SLC4, CAs, L- type calcium channel family and Calreticulin. Corresponding sequences from each species were selected from the BLASTP searches against the known candidates and further validated against the ortholog groups. To determine their expression levels, we further determined transcripts per million (TPM) counts for each homologous protein, and the TPM of proteins at different developmental stages of A. digitifera.

4.4.5 Identification of core SOMPs

Our proteomic analysis of the S. pistillata skeletal organic matrix using LC/MS/MS detected 65 SOMPs. 48 of these proteins had strong supports and had spectrum counts in at least two of the four samples. Homology analyses were performed using the identified 97

S. pistillata proteomic sequence against integrated proteomic data from 3 previous studies including Drake et al. (36 in Seriatopora sp.), Ramos-Silva et al. (36 in A. millepora) and

Takeuchi et al. (30 in A. digitifera). We performed BLASTP searches using default parameters and OrthoMCL pipeline to determine orthologous/paralogous gene families across these proteomic studies. Using this approach we identified 7 core SOMPs that were commonly identified in the skeletons of all four coral species.

4.4.6 The evolution of ion-involved genes and SOMPs

Homology analysis of ion-transporters and SOMPs was performed with local BLASTP searches against the predicted coding genes of A. digitifera, S. pistillata, A. fenestrafer,

Discosoma sp., N. vectensis, A. pallida (e-value 10-5). The best matches of each SOMP was also manually compared for their domain architectures and screened within corresponding genomes to determine their synteny information. To further determine their expression patterns, we compared the TPMs of these aligned proteins and TPMs at different developmental stages of A. digitifera. To confirm the existence in each genome, we also aligned SOMPs against their corresponding transcripts using TBLASTN and default parameters. Phylogenetic analyses of those proteins were reconstructed following the pipeline describe above.

To obtain the domain annotations, we used InterProScan (Jones et al., 2014) against various databases, including Pfam, ProDom, PRINTS, SMART. Gene Ontology (GO) was obtained from the BLASTP results. We also used Phobius (Jones et al., 2014) to determine the potential location of each gene product. Additional functional information of pathways was derived from KEGG (Kanehisa, 2008). For potential interesting genes, 98

phylogenetic trees were built by the same method.

Due to the inaccuracy of some gene models, we also selected the proteins in the trees that disagree with the expected phylogeny or domain architectures. Then we searched the related close orthologous protein against the de novo transcriptomes or further corresponding genomes using TBLASTN. We further supplemented and modified some proteins accordingly, such as extracellular carbonate anhydrase in Discosoma sp. and A. fenestrafer, acidic proteins in A. digitifera, Mucin4-like in S. pistillata, Procadherin in A. digitifera, collagen alpha-6 (VI) in A. digitifera, and all fibronectin II proteins.

4.4.7 Immunolocalization

Polyclonal antibodies against PMCA1-3 were produced in rabbit by Eurogentec.

Antibodies were raised against the following peptides: CLTGESDLVKKGPDRD and

CLIRDSSGKVSQKKFD for PMCA1, CREKFGKNFMPLEPPR and CDRLMNYKP

YGRHKPL for PMCA2, and CYKKQEGKPKDSGQGF and CTVTPAAEEYSMTTGN for PMCA3. Apexes of colonies were prepared for immunolocalization as described previously by Bertucci et al, 2011 (Bertucci et al., 2011). Briefly, species were fixed in 3% paraformaldehyde in S22 buffer at 4°C overnight and then decalcified using EDTA in

Ca-free S22 at 4°C. They were then dehydrated in an ethanol series and embedded in

Paraplast. Cross-sections (6 µm thick) were cut and mounted on silane-coated glass slides.

Then, deparaffinized sections of tissues were incubated for 1 h in saturating medium at

RT. The samples were then incubated with the anti-PMCAs or the pre-immune serum as primary antibodies. After rinsing in saturating medium, samples were incubated with biotinylated anti-rabbit antibodies as secondary antibodies. All samples were then 99

incubated with biotinylated anti-rabbit antibodies (Sigma) as secondary antibodies, stained with streptavidin-Alexa Fluor 568 (Molecular Probes, Invitrogen). 0.002% DAPI

(4′6-diamidino-2-phenylindole, Sigma) was used to stain the nuclei. Samples were embedded in Pro-Long antifade solution (Molecular Probes, Invitrogen) and observed with a confocal laser-scanning microscope (Leica SP5) equipped with UV and visible laser lines.

4.5 Discussion

The evolutionary history of Scleractinia can be traced back to the early Triassic around

245Mya (Park et al., 2012b; Simpson et al., 2011a). However, molecular divergence analysis between corallimorpharians and scleractinians suggests that they diverged around 334Mya (Figure 10A, 10B). Therefore, it can be concluded that the ability of corals to calcify evolved during these 80 million years between the divergence of

Scleractinia from the non-calcifying Corallimorpharia (334 MYA) and the earliest evidence for reef-building corals in the fossil record (245 MYA). To our surprise, we could not find evidence for coral novel gene families that might be involved in the process of coral, suggesting that de novo innovations through the evolution of novel gene families or the recruitment of novel genes through horizontal gene transfer has not been a main evolutionary mechanism in the evolution coral calcification (Figure S14). It was proposed that the evolution of coral calcification was driven by the co-option of existing genes (Lin et al., 2017; Ramos-Silva et al., 2013a; Takeuchi et al., 2016), which is strongly supported by our broader genomic comparison including the closest non- calcifying species. Our GO enrichments of scleractinian expansion genes (Figure 10C), 100

Pfam significance of Scleractinia (Figure S15), conserved orthologs and ubiquitous expression of most SOMPs among other non-calcifying cnidarians (Table S13) strongly suggest that the majority of the potential calcifying proteins derived from the recruitments of non-calcifying cnidarians ancestors and coral specific gene duplications.

4.5.1 The co-option of genes as evolutionary mechanism of coral calcification

The continuous supply of calcium and carbonate ions to the sub-calicoblastic space is the most critical factor for the growth of coral skeletons (McCulloch et al., 2017), which is biologically controlled by Ca2+- ATPase pump (Ip et al., 1991; Zoccola et al., 2004;

Zoccola et al., 1999), bicarbonate anion transporters (Furla et al., 2000; Zoccola et al.,

2015b) and CA (Bertucci et al., 2013; Bertucci et al., 2011; Moya et al., 2008b; Zoccola et al., 2015b). In line with this we found that corals evolved specific adaptations of their molecular transporters to satisfy the high demand for calcium and carbonate in the calcifying fluid. These adaptations included the co-option of existing genes, such as the

Ca2+ transporting calcium ATPases PMCA1-3, as well as the neo-functionalization of coral-specific gene duplications, such as the coral-specific bicarbonate transporter SLC4.

This is strongly supported by the calicoblastic ectoderm specific expression of the respective coral proteins, which suggests that these proteins have been recruited to the calcification process to facilitate the transport of calcium and bicarbonate to the site of calcification.

2+ While efficient transport of Ca and CaCO3 maintains a high saturation of ions in the calcifying fluid, the skeletal organic matrix (SOM) provides the scaffold to direct the process of crystal morphology, aggregation and polymorphism (Falini et al., 2015; Falini 101

et al., 2013; Goffredo et al., 2011; Reggi et al., 2014). These proteins secreted from the calicoblastic cells either support nucleation and stabilization or provide a connection between the calicoblastic cells and the skeleton, thereby directing skeleton growth and morphological differentiation (Falini et al., 2015). Therefore, the high diversity of

SOMPs identified across the different coral species suggest not only an important role in coral calcification, such as cell-cell and cell-substrate adhesion, deposition of a calcium carbonate skeleton and the common feature of arranged crystals, but also more specific functions in determining or contributing to the distinctive coral morphologies (Drake et al., 2013; Mass et al., 2013; Ramos-Silva et al., 2013a). However, the conserved SOMPs between robust corals and complex corals identified in this study provide the opportunity to trace the evolutionary origin and history of coral calcification. The fact that these coral-specific homologues also constitute the most abundant proteins in the SOM of both robust and complex corals is another strong indication for their essential roles in calcification (Table S12).

Coral acid-rich proteins are the most abundant and important SOMPs in coral calcification (Mass et al., 2013; Von Euw et al., 2017) and showed scleractinian specific duplications and subsequent evolutionary adaptations. These duplicated acidic proteins have further undergone extensive divergence in the scleractinian lineage through the expansion of aspartic and glutamic acid stretches, leading to the unique species-specific set of acidic paralogs for coral calcification. Their ability to catalyze the precipitation and stabilization of amorphous calcium carbonate in vitro (Addadi and Weiner, 1985; Mass et al., 2013; Puverel et al., 2005; Von Euw et al., 2017) makes it plausible to assume that 102

they constitute one of the central evolutionary innovations that enabled the ancestor of contemporary corals to form a calcium carbonate skeleton. Our data revealed that scleractinian calcification only required a small number of transmembrane proteins, including Mucin4, Procadherin, Neurexin, Zona pellucida domain protein, and REJ domain-containing protein. These proteins can perform adhering functions among calicoblastic cells, newly formed skeletons, and skeleton-blanket matrix (Falini et al.,

2015; Helman et al., 2008; Ramos-Silva et al., 2013b; Takeuchi et al., 2016). And their high conservation indicated that these transmembrane proteins did not experience gene duplications but small amounts of domain duplications. Extracellular matrix proteins are also abundant in SOMPs and important to control the deposition of mineral phase (Falini et al., 2015; Helman et al., 2008; Ramos-Silva et al., 2013b; Takeuchi et al., 2016).

Those proteins also derived from non-calcifying proteins and some just retained their ancestral functions, such as vitellogenin-like protein. But some extracellular matrix proteins, such as COL6A6, Fibronectin II domain-containing proteins, appear to have acquired their calcifying functions through novel domain insertion, and subsequently experienced gene duplications likely as adaptation to species-specific traits.

4.5.2 Domain shuffling might not have been important for the evolution of coral calcification

It was proposed that domain shuffling might be a general evolutionary mechanism of coral calcification (Ramos-Silva et al., 2013b; Takeuchi et al., 2016). The scanning of critical SOMP domains across the hexacorallian genomes (Table S14) displayed that the number of those domains is significantly higher than in corallimorpharians, but similar to 103

actiniarians, indicating the domain shuffling might have played an important role in coral evolution. However, comparisons of relative PFAM domain ratios to other non-calcifying hexacorallians showed significant enrichment of domains associated with calcification in both corals (Drake et al., 2013; Ramos-Silva et al., 2013a). This suggested that domain duplications might have also contributed to the evolution of coral calcification. Our broad evolutionary analysis of known calcification genes involved in ion supply and shared

SOMPs showed that ancient ancestral genes have been co-opted to the processes of coral calcification with highly conserved domain compositions to their homologs in soft ancestors. With highly conserved and similar domain architectures, those proteins only experienced slight gradual domain changes, potentially as adaptations to their role in scleractinian coral calcification. Even with multiple duplications of transmembrane domains, a particular striking feature of them is their preponderance of ancient, highly repetitive, and low complexity domains, such as TSP, MAM, LDL-receptor, or EGF-like domains, similar to the rapid evolving secretomes of sea shells (Kocot et al., 2016).

However, the deficiency of those transmembrane proteins measured in the skeletal organic matrix also indicated a less essential role in coral calcification. These findings therefore indicated that domain shuffling is unlikely to have played the most crucial factor in the evolution of coral calcification.

Our divergence time analysis showed that corals began to evolve their ability to calcify at some time between 334 MYA and 245 MYA. During this period, ancestral corals experienced scleractinian-specific duplications events of a small amount of genes. These duplicated genes participate in the provisioning of bicarbonate (SLC4γ and CAs) as well 104

as coral-specific duplications of coral acid rich proteins and subsequent expansion of acidic amino acid stretches. The specific localization of the Ca-ATPases PMC1 and

PMC2 to the calicoblastic ectoderm in the coral S. pistillata further suggested that the co- option and neo-localization of these proteins might have contributed to the evolution of calcification. Most transmembrane and extracellular proteins in both scleractinians showed high conservation with non-calcifying hexacorallians. But their co-existence in the skeletal organic matrix of different corals suggested that those proteins just retained their ancestral functions and were subsequently co-opted to coral calcification. This new adaptation to calcification happened with or without associated changes in domain or amino acid sequence, where the changes included domain duplications and novel domain insertions.

4.6 Author’s contributions

M.A. conceived and designed the study. W.X. analyzed comparative genomes with the help from Y.J.L. and M.A. ST and D.Z. provided materials and proteomic data of S. pistillata and performed the immunolocalization of PMCAs.

105

106

Figure 10. Comparative genomics across seven published cnidarian genomes. A) Phylogeny of seven cnidarian genome sequences. The values on nodes indicate the estimated divergence time. B) Density of pair-wise genetic divergence calculated with synonymous substitution rate (Ks) among all possible orthologous genes across two scleractinians and two corallimorpharians. C) GO enrichment of scleractinian specific duplicated proteins. 107

Spis6691

95

Adig1.22367 3

A 84 A C

Apal27668 C M

86 P Nvec182548

Disc258.10 50 100 Afen230.1

100 2

Spis6692 A C

100 M

69 Adig1.22366 P

Apal27621 99 72 Nvec166010

Disc258.11 100 Afen230.2

92 1

Spis6693 A C

80 M

91 Adig1.22365 P

Apal27606 99 Nvec166011

0 0.5 1 1.5 2 0.4 Log (TPM + 0.01)

B S. pistillata 5’ 3’ A. digitifera 5’ 3’ A. fenestrafer 5’ 3’ novel expression pattern Discosoma sp. 5’ 3’

A. pallida 5’ 3’ N. vectensis 5’ 3’ PMCA1 PMCA2 PMCA3 108

A. fenestrafer Discosoma sp. A. viridis S. pistillata D

SK CEct PMCA1

SK CEct PMCA2

AEnd CO

AEnd PMCA3 SK CEct

Figure 11. Plasma membrane calcium ATPase (PMCA). A) Phylogeny of PMCA across hexacorallian genomes. B) Synteny of

PMCA. C) Expression of PMCA across homologs of PMCA in six genomes. D) Immunolocalization of PMCA1, PMCA2 and

PMCA3 in A. fenestrafer, Discosoma sp., S. pistillata and A. viridis. AEnd = Aboral Endoderm; CEct = Calicoblastic Ectoderm; Co =

Coelenteron; Sk = Skeleton. Nuclei are labeled in blue color. Red color represents autofluorescence of different PMCA. 109

SLC4β SLC4γ Disc173.18 100 a Afen325.8 c

100

Spis10339 α 4

64 C L

100 Adig1.05797 S

Apal13912 96 Nvec135965

40 Apal21648 100

Nvec91031

β

Disc115.35 4 C

100 L

100 S Afen181.3 99 Spis8325 CO 100 CEct

95 Adig1.04799 AEnd

Spis8326 γ 4

100 C Sk L

Adig1.04793 S

0.07

A. digitifera PDE12 b S. pistillata SLC4β A. fenestrafer SLC4γ Discosoma sp. CACNB2 N. vectensis RPN8

A. pallida GOSR2

novel expression pattern 110

Figure 12. Solute carrier 4 (SLC4). A) Phylogeny of SLC4 across six hexacorallians. B) SLC4β, SLC4γ and their adjacent proteins across six hexacorallian genomes. C) Immunolocalization labeled by SLC4β (left two), SLC4γ (right two, published by Zoccola et al., 2015) in S. pistillata. AEnd = Aboral Endoderm; CEct = Calicoblastic Ectoderm; Co = Coelenteron; Sk = Skeleton. 111

3 Adig1.19066 A 74 B Adig1.19067 2.5 77 Cytoplasmic Adig1.16355

98 2

)

1

0

. 0

45 Adig1.16634 + M

P 1.5

T

(

0

Adig1.16354 1 g

26 o l Extracellular 1 Spis22427 96 Spis10674 0.5 23

Spis22424 0

6 7 5 4 4 7 4 4 3 6 83 6 6 5 3 5 2 7 2 3 2 0 0 3 6 3 4 6 4 4 4 9 9 6 6 6 2 0 2 4 2 .1 .1 .1 .1 .1 2 1 2 2 2 Spis24433 1 1 1 1 1 is is is is is ig ig ig ig ig p p p p p 100 d d d d d S S S S S A A A A A 40 Mitochondrion Spis22426

Disc8.100 2.5 100 Adig1.19066 2 56 Cytoplasmic Disc339.100 100 1.5

Afen0.69 1 Adig1.19067 0.5 Apal2901 0 83 93 Cytoplasmic Apal2960 Adig1.16355

Nvec90817 Adig1.16634 81 Nvec121435 100 Nvec222808 63 Adig1.16354 Adig1.13591 66 Adig1.13591 Spis16865

90 B G S P A

l p l d

a

a a

h u

s

s n

e l

Disc43.79 t

t

u t

r

u

r

u

l

e

a

l

a 99 l Cytoplasmic a C 81 Afen96.4

Apal10420 82 Protein subcellular relocalization Nvec200803

0.3

112

Figure 13. Carbonate anhydrases (CAs). A) Phylogeny of CAs across six hexacorallians. Orange labels represent the extracellular CAs, and blue labels represent intracellular CAs. B) Expression of CAs across homologs in Scleractinia. Orange labels represent the extracellular CAs, and blue labels represent intracellular CAs. C) Expression of CAs in different developing stages of A. digitifera.

113

Drake Tak Seriato(p2o1)ra sp. A(.1 d9)igitifera

9 4

Amil 0 Our A. millepo(r2a9 ) (4S8.) pistillata 0 0

11 30 0 1

CARPs Mucin-4 Prot o7c adherin Neurexin ZP domain protein Collagen alpha-6(VI) 4 Fibronectin II 3

3 1

3

Figure 14. Comparison of the proteins identified in four skeletal proteomic studies. 114

Spis3547 4 P

23 R A 30.00% Adig1.01441-0 C 29 28.00%

27.50%

A Spis4534 5 B

89 P

R A

C 25.00% Adig1.11068 CARP4

93

r CARP5

Adig1.06327 4

P

n

94 R

o CARP4r

i A t 20.00%

99 r C

Spis4292 o 18.70%

p

o

r p

16.90%

d i

Afen146.23 c

a

o 15.50%

100 n

i 15.00%

m a

Disc12.45 c i 13.00%

m 12.80% 12.80% 12.60% a 12.20%

t 12.10%

u l

Apal13428 g 11.30%

d

n

a

97 c 10.00%

i t

Apal19632 r a

p 7.90% 77 s A 7.30% Apal19639 99 Nvec215658 5.00% 100 Nvec246686 99 0.00% Nvec240217 S. pistillata A. digitifera A. fenestrafer Discosoma sp. A. pallida N. vectensis

0.4

Figure 15. Coral acid-rich proteins (CARPs). A) Phylogenetic analysis of CARP4. B) The abundance of aspartic and glutamic acid residues across the homologs of CARP4s. 115

Chapter 5 Conclusion

5.1 Valuable corallimorpharian resources

Here we sequenced and assembled draft genomes for the two Corallimorpharia

Amplexidiscus fenestrafer and Discosoma sp.. The assemblies encompass ~370 Mb and

~445 Mb respectively and encode for 21,372 and 23,199 genes. We provide gene predictions, protein and domain annotations as well as comparisons of various genomic features. Additionally, we also provide more complex analyses such as quantitative and qualitative repeat content analyses as well as ortholog groupings across all sequenced cnidarian genomes, which we expect to be highly helpful for the coral reef community in conducting more specific studies.

Our findings confirm that corallimorpharians are the closest living, non –calcifying relatives of reef building scleractinians, therefore, these genomes close the current evolutionary gap in hexacorallian genomic resources. We believe that these resources will provide a great value for future comparative studies, especially for the study of coral specific traits, such as calcification.

5.2 The phylogenetic relationship between corallimorpharians and scleractinians

While genomic studies showed the expected closer relationship between Scleractinia and

Corallimorpharia in comparison to Actiniaria, their precise phylogenetic relationship has been elusive due to the contracting topologies using different molecular markers and evolutionary models. Based on phylogenetic analyses using various mitochondrial markers, it was proposed that Corallimorpharia might have diverged from a complex 116

coral ancestor who lost the ability to calcify in response to increasing ocean acidification during the Cretaceous period, a scenario that was termed the “naked coral” hypothesis.

However, our phylogenetic reconstructions based on amino acid/nucleotide alignments, gene presence/absence, and synteny information, consistently show that corallimorpharians did not evolve from a complex corals ancestor but rather from a common ancestor with robust corals and complex corals. Therefore, our findings strongly support “scleractinian monophyly” and reject the “naked coral” hypothesis, which settles a long-standing debate about the potential evolutionary mechanism for corals to evade extinction through skeleton dissolution in response to increasing ocean acidification.

Although Corallimorpharia consequently do not qualify as a candidate to study future responses of corals to climate change induced ocean acidification, their being closest living and non-calcifying relatives to corals makes them the best candidates to study the evolution of coral specific traits, such as calcification.

5.3 The cooption of coral calcification

Our comparative genomic approach using genomes from non-calcifying hexacorallians and calcifying scleractinians revealed that corals did not require extensive evolutionary changes, such as de novo innovations or horizontal gene transfer, to evolve the ability to calcifying. Our results suggest that comparably few genetic innovations were necessary to evolve the ability to calcify. These changes include the evolution of gene duplications of the CARP gene family in corals, which have been shown to promote nucleation in vitro. It could therefore be speculated that secretion of the coral-specific CARP homologs 117

CARP 4 and CARP 5 provided initial calcification activity which was further supported by the evolution of gene duplications and neo-localization of genes involved in ion supply to the calicoblastic endoderm, such as SLC4γ, the plasma membrane calcium

ATPases PMC1 and PMC2 as well a diversification of CAs. Other proteins found in the

SOM appear to have retained their ancestral functions and might therefore have been co- opted to the calcification process without the need of extensive adaptions. We propose that this basic machinery likely promoted the initial calcification process and allowed the continuous recruitment of further proteins supporting the lineage-specific evolution of morphological skeletal characteristics.

Calcification in Scleractinian corals depends on multiple processes that are orchestrated by the organism through the supply of calcium and bicarbonate ions and the secretion of the skeletal organic matrix. Here, we elucidated the evolutionary origin and history of known calcification genes that are involved in these processes and conserved across scleractinians. However, some of the candidate genes are not evidently supported due to the lack of functional tools, such as gene knock-outs and genetic transformation, but have previously been implied in the calcification processes. For example, some proteins that are involved in the supply of Ca2+, such as calcium ion channels, calcium ion pumps, calcium ion exchangers, voltage-dependent Ca2+ channels, transient receptor cation channels, SLC8 family, calcium released activated calcium channel proteins (ORAII), might also be involved in controlling the concentration of calcium ions in the calcifying fluid. The evolutionary history of those proteins and their actual functions during coral calcification are still largely unknown. Therefore, we still need a large number of 118

laboratory proofs and corresponding evolutionary analysis to reveal all genes contributing to these mechanisms.

Here we concentrated our analyses on the most ancestral genes that enabled the soft bodied ancestor of scleractinian corals to form a calcium carbonate skeleton, neglecting genes that evolved or were co-opted to the calcification process at later time points probably to control skeleton morphologies underlying species specific growth forms. The future challenge will then be applying the information generated from various corals to elucidate the process underlying the complexity of coral skeleton morphology. Our genomic sequence, transcriptomes, and proteomic data will further facilitate the understanding of evolutionary trajectories of phenotype response and their adaptive contributions. 119

APPENDICES

Appendix A Supplementary Tables

Table S1. Corallimorpharian genome sequencing statistics.

Table S2. k-mer genome size estimation.

Table S3. Mapping statistics of paired-end sequencing libraries.

Table S4. CEGMA evaluation of genome completeness.

Table S5. Gene prediction statistics.

Table S6. Single copy orthologs with Pfam and BLAST annotations.

Table S7. Repeat content and classification in Corallimorpharia.

Table S8. Repeat element categories.

Table S9. List of single copy orthologs across 6 Hexacorallia species and outgroup

Hydra magnipapillata as determined by OrthoMCL.

Table S10. Proteomic data displayed in S. pistillata.

Table S11. The orthogs groups of proteomic data in four studies.

Table S12. Overlapping genes in four studies and their corresponding spectral counts.

Table S13. Best match proteins across other Cnidarians with SOMPs from Acropora millepora. 120

Table S14. Comparison of the main domains from Acropora millepora SOMPs with other Cnidarian species.

Appendix B Supplementary Figures

Figure S1. Estimate of artificial contig duplications across different genomes using

BLASTN. Bitscore ratios of the second best hit vs. the first best hit are plotted indicating similarity between contigs in each genome.

Figure S2. Comparison of gene structure across the five cnidarian genomes. A) Gene length density distribution. B) Exon length distribution. C) Exon per gene distribution.

D) GC content of coding sequences.

Figure S3. Coverage of SwissProt gene homologs.

Figure S4. Ratios of the top 30 most abundant Pfam domains (depicted as domain ratios

* 100).

Figure S5. Enrichment of Pfam domains in Corallimorpharia in comparison to A. digitifera, A. pallida and N. vectensis (z-scores of significantly enriched domains, p <

0.001)

Figure S6. Large duplication (> 3 kb) density in A) A. fenestrafer and B) Discosoma sp..

Figure S7. Phylogenies inferred from concatenated amino acid sequences under

LG+I+G+F using A) RAxML and B) Mrbayes. Nodes show bootstrap values in (A) and posterior probabilities in (B). 121

Figure S8. BI phylogenies inferred from concatenated amino acid sequences using A)

CAT-LG, B) CAT-GTR and C) CAT-Poisson models in Phylobayes. Nodes show posterior probabilities.

Figure S9. 50% majority rule consensus tree based on the 1,021 best ML single-gene trees calculated by RAxML (–J MR). Nodes show bootstrap values.

Figure S10. Phylogenies inferred from concatenated nucleotide sequences under GTR+

I+G using A) RAxML and B) Mrbayes. Nodes show bootstrap values in (A) and posterior probabilities in (B).

Figure S11. Bayesian phylogenies inferred from concatenated nucleotide under A) CAT-

Poisson, and B) CAT-GTR models in Phylobayes. Nodes show posterior probabilities.

Figure S12. Bayesian phylogenies inferred from gene presence/absence data using the binary F81-like model in Mr. Bayes. Genes present in fewer than two species were removed and nodes show posterior probabilities.

Figure S13. Phylogenetic analysis based on the number of paralogs per species and ortholog/paralog group. The nodes supports are AU (Approximately Unbiased) p-values and BP (Bootstrap Probability) values. Distances were calculated using the “canberra” method and clustered using the “average” Hcluster method and 100 bootstraps.

Figure S14. GO enrichment of coral species-specific proteins.

Figure S15. Pfam domain enrichment across six hexacorallians. 122

Figure S16. Phylogeny of all carbonate anhydrase (The red labeled occurred in SOMPs).

Figure S17. Phylogenetic analysis of Mucin4-like protein (Red one showed up in all four

SOM). The right showed domain distribution for Mucin4-like orthologous proteins.

Figure S18 Phylogenetic analysis of Procadherin (Red one showed up in all four SOM).

The right showed domain distribution for Procadherin orthologous proteins.

Figure S19. Phylogenetic analysis of contacting-associated protein (Neurexin) (Red one showed up in all four SOM, blue in some of them). Domain distribution for Neurexin orthologous proteins. The figure showed the transmembrane domain duplications in two corals (Laminin G).

Figure S20. Phylogenetic analysis of Zona pellucida domain proteins. The right showed the domain distribution for each relevant protein. Highly conserved transmembrane domain distribution.

Figure S21. Phylogenetic analysis of REJ domain-containing protein. The right showed the domain distribution for each relevant protein. Highly conserved transmembrane domain distribution between Actiniaria and Scleractinia. The blue color represents those genes present in three SOMPs.

123

Figure S22. Phylogenetic analysis of collagen alpha-6 (VI). The right showed the domain distribution for each relevant protein. This figure showed the transmembrane domain duplications (VWA loss) and extracellular domain insertion of F5/8 type C.

Figure S23. Phylogenetic analysis of fibronectin II (MAM and LDL-receptor class A domain-containing protein) and corresponding protein domain distribution. Fibronectin showed the transmembrane domain duplications of MAM and LDL, as well as extracellular domain insertion of Fibronectin II in two corals.

Figure S24. Phylogenetic analysis of Vitellogenin-like protein and corresponding protein domain distribution. The blue color represents those genes present in three SOMPs.

Figure S25. Phylogenetic analysis of USOMP8 and corresponding protein domain distribution. The blue color represents those genes present in three SOMPs.

Figure S26. Amino acid alignment of Galaxin across six genomes. Top three amino acids are consistently found in three SOMPs.

124

PUBLICATIONS

First-author:

Wang X, Liew YJ, Li Y, Zoccola D, Tambutte S, Aranda M. Draft genomes of the corallimorpharians Amplexidiscus fenestrafer and Discosoma sp. Molecular Ecology

Resources (2017), 17 (6): e187-e195.

Wang X, Drillon G, Ryu T, Voolstra CR, Aranda M. Non-sequence-based analyses of six hexacorallian genomes reject the “naked coral” hypothesis. Genome Biology and

Evolution (2017), 9 (10): 2626-2634.

Wang X, Zoccola D, Liew YJ, Tambutte S, Aranda M. How corals got bones: the cooption of calcification in reef-building corals. (Ready to submit)

Wang X, Liew YJ, Howells EJ, Meyer, Aranda M. Transposable elements driving rapid adaptation to thermal stress in Platygyra daedalea genome. (Ready to submit)

Wang X, Aranda M. Reference-based assemblies of hexacorallian mitochondrial genomes. (Ready to submit)

Co-author:

Dibattista JD, Wang X, Saenz-Agudelo P, Piatek MJ, Aranda M, Berumen, M. Draft genome of an iconic Red Sea reef fish, the blacktail butterflyfish (Chaetodon austriacus): current status and its characteristics. Molecular Ecology Resources (2018), 18 (2): 347-

355. 125

Dibattista JD, Saenz-Agudelo, P, Piatek MJ, Wang X, Aranda M, Berumen, ML. Using a butterflyfish genome as a general tool for RAD-Seq studies in specialized reef fish.

Molecular Ecology Resources (2017), 17 (6): 1330-1341.

Chen JE1, Cui GX1, Wang X, Liew YJ, Aranda M. Recent expansion of heat-activated retrotransposons in the coral symbiont Symbiodinium microadriaticum. ISME Journal

(2018), 12 (2): 639-643.

Liew YJ1, Howells EJ1, Wang X, Michell C, Burt JA, Idaghdour Y & Aranda M.

Intergenerational epigenetic inheritance in reef-building corals. BioRxiv (2018). (Nature

Climate Change, Under Review)

Involved Research:

Hybrid assembly and annotation of braincoral: Platygyra daedalea using Pacbio sequencing.

The draft genome of red coral.

Draft genome of Symbiodinium thermophilum.

The genome of Symbiodinium microdriaticum clade 2.

The genome of Sponge. 126

BIBLIOGRAPHY

Adamiano, A., Goffredo, S., Dubinsky, Z., Levy, O., Fermani, S., Fabbri, D., and Falini, G. (2014). Analytical pyrolysis-based study on intra-skeletal organic matrices from Mediterranean corals. Analytical and Bioanalytical Chemistry 406, 6021-6033.

Addadi, L., and Weiner, S. (1985). Interactions between acidic proteins and crystals: stereochemical requirements in biomineralization. Proceedings of the National Academy of Sciences 82, 4110-4114.

Adkins, J., Boyle, E., Curry, W., and Lutringer, A. (2003). Stable isotopes in deep-sea corals and a new mechanism for “vital effects”. Geochimica et Cosmochimica Acta 67, 1129-1143.

Al-Horani, F., Al-Moghrabi, S., and De Beer, D. (2003). The mechanism of calcification and its relation to photosynthesis and respiration in the scleractinian coral Galaxea fascicularis. Marine Biology 142, 419-426.

Allemand, D., Ferrier-Pagès, C., Furla, P., Houlbrèque, F., Puverel, S., Reynaud, S., Tambutté, É., Tambutté, S., and Zoccola, D. (2004). Biomineralisation in reef-building corals: from molecular mechanisms to environmental control. Comptes Rendus Palevol 3, 453-467.

Allison, N., Cohen, I., Finch, A.A., Erez, J., and Tudhope, A.W. (2014). Corals concentrate dissolved inorganic carbon to facilitate calcification. Nature Communications 5, 5741.

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.

Aranda, M., Li, Y., Liew, Y., Baumgarten, S., Simakov, O., Wilson, M., Piel, J., Ashoor, H., Bougouffa, S., and Bajic, V.B. (2016). Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle. Scientific Reports 6, 39734.

Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11.

Barnes, D., and Lough, J. (1996). Coral skeletons: storage and recovery of environmental information. Global Change Biology 2, 569-582.

Barnes, D.J. (1970). Coral skeletons: an explanation of their growth and structure. Science 170, 1305-1308.

Baumgarten, S., Simakov, O., Esherick, L.Y., Liew, Y.J., Lehnert, E.M., Michell, C.T., Li, Y., Hambleton, E.A., Guse, A., Oates, M.E., et al. (2015). The genome of Aiptasia, a sea 127

anemone model for coral symbiosis. Proceedings of the National Academy of Sciences 112, 11893-11898.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 289-300.

Bertucci, A., Moya, A., Tambutté, S., Allemand, D., Supuran, C.T., and Zoccola, D. (2013). Carbonic anhydrases in anthozoan corals—A review. Bioorganic & Medicinal Chemistry 21, 1437-1450.

Bertucci, A., Tambutté, S., Supuran, C.T., Allemand, D., and Zoccola, D. (2011). A new coral carbonic anhydrase in Stylophora pistillata. Marine Biotechnology 13, 992-1002.

Bhattacharya, D., Agrawal, S., Aranda, M., Baumgarten, S., Belcaid, M., Drake, J.L., Erwin, D., Foret, S., Gates, R.D., and Gruber, D.F. (2016). Comparative genomics explains the evolutionary success of reef-forming corals. Elife 5, e13288.

Bilan, M., and Usov, A. (2001). Polysaccharides of calcareous algae and their effect on the calcification process. Russian Journal of Bioorganic Chemistry 27, 2-16.

Boja, E.S., Hoodbhoy, T., Fales, H.M., and Dean, J. (2003). Structural characterization of native mouse zona pellucida proteins using mass spectrometry. Journal of Biological Chemistry 278, 34189-34202.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.

Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., Birol, I., Boisvert, S., Chapman, J.A., Chapuis, G., and Chikhi, R. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2, 1.

Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., and Jaffe, D.B. (2008). ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18, 810-820.

Cai, W.-J., Ma, Y., Hopkinson, B.M., Grottoli, A.G., Warner, M.E., Ding, Q., Hu, X., Yuan, X., Schoepf, V., and Xu, H. (2016). Microelectrode characterization of coral daytime interior pH and carbonate chemistry. Nature Communications 7.

Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973. 128

Chapman, J.A., Kirkness, E.F., Simakov, O., Hampson, S.E., Mitros, T., Weinmaier, T., Rattei, T., Balasubramanian, P.G., Borman, J., Busam, D., et al. (2010). The dynamic genome of Hydra. Nature 464, 592-596.

Chen, C.A., and Yu, J.-K. (2000). Universal primers for amplification of mitochondrial small subunit ribosomal RNA-encoding gene in scleractinian corals. Marine Biotechnology 2, 146-153.

Chen, F., Mackey, A.J., Stoeckert, C.J., and Roos, D.S. (2006). OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Research 34, D363-D368.

Clapham, D.E. (2007). Calcium signaling. Cell 131, 1047-1058.

Comeau, S., Cornwall, C., and McCulloch, M. (2017). Decoupling between the response of coral calcifying fluid pH and calcification to ocean acidification. Scientific Reports 7, 7573.

Constantz, B., and Weiner, S. (1988). Acidic macromolecules associated with the mineral phase of scleractinian coral skeletons. Journal of Experimental Zoology 248, 253-258.

Constantz, B.R. (1986). Coral skeleton construction: a physiochemically dominated process. Palaios, 152-157.

Crusoe, M.R., Alameldin, H.F., Awad, S., Boucher, E., Caldwell, A., Cartwright, R., Charbonneau, A., Constantinides, B., Edvenson, G., Fay, S., et al. (2015). The khmer software package: enabling efficient nucleotide sequence analysis. F1000Res 4, 900.

Cuif, J., Dauphin, Y., Farre, B., Nehrke, G., Nouet, J., and Salomé, M. (2008). Distribution of sulphated polysaccharides within calcareous biominerals suggests a widely shared two-step crystallization process for the microstructural growth units. Mineralogical Magazine 72, 233-237.

Cuif, J.-P., Dauphin, Y., Doucet, J., Salome, M., and Susini, J. (2003). XANES mapping of organic sulfate in three scleractinian coral skeletons. Geochimica et Cosmochimica Acta 67, 75-83.

Cuif, J.-P., Dauphin, Y., and Sorauf, J.E. (2010). Biominerals and fossils through time (Cambridge University Press).

Dana, J.D. (1846). United States Exploring Expedition: During the Years 1838, 1839, 1840, 1842 Under the Command of Charles Wilkes, USN. Ethnography and philology, Vol 6 (Lea and Blanchard). 129

Darriba, D., Taboada, G.L., Doallo, R., and Posada, D. (2011). ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164-1165.

Demers, C., Hamdy, C.R., Corsi, K., Chellat, F., Tabrizian, M., and Yahia, L.H. (2002). Natural coral exoskeleton as a bone graft substitute: a review. Bio-medical Materials and Engineering 12, 15-35.

Drake, J.L., Mass, T., Haramaty, L., Zelzion, E., Bhattacharya, D., and Falkowski, P.G. (2013). Proteomic analysis of skeletal organic matrix from the stony coral Stylophora pistillata. Proceedings of the National Academy of Sciences 110, 3788-3793.

Drillon, G. (2013). Analyse combinatoire des réarrangements chromosomiques et reconstruction des génomes ancestraux chez les eucaryotes (Paris 6).

Drillon, G., Carbone, A., and Fischer, G. (2014). SynChro: a fast and easy tool to reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLoS One 9, e92621.

Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792-1797.

Enríquez, S., Méndez, E.R., and Prieto, R.I. (2005). Multiple scattering on coral skeletons enhances light absorption by symbiotic algae. Limnology and Oceanography 50, 1025- 1032.

Erwin, D.H. (1994). The Permo–Triassic extinction. Nature 367, 231-236.

Evans, V.C., Barker, G., Heesom, K.J., Fan, J., Bessant, C., and Matthews, D.A. (2012). De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nature Methods 9, 1207-1211.

Falini, G., Fermani, S., and Goffredo, S. (2015). Coral biomineralization: A focus on intra- skeletal organic matrix and calcification. Paper presented at: Seminars in Cell & Developmental Biology (Elsevier).

Falini, G., Reggi, M., Fermani, S., Sparla, F., Goffredo, S., Dubinsky, Z., Levi, O., Dauphin, Y., and Cuif, J.-P. (2013). Control of aragonite deposition in colonial corals by intra- skeletal macromolecules. Journal of Structural Biology 183, 226-238.

Fine, M., and Tchernov, D. (2007). Scleractinian coral species survive and recover from decalcification. Science 315, 1811. 130

Frankel, R.B., and Bazylinski, D.A. (2003). Biologically induced mineralization by bacteria. Reviews in Mineralogy and Geochemistry 54, 95-114.

Fukami, H., Chen, C.A., Budd, A.F., Collins, A., Wallace, C., Chuang, Y.-Y., Chen, C., Dai, C.- F., Iwao, K., and Sheppard, C. (2008). Mitochondrial and nuclear genes suggest that stony corals are monophyletic but most families of stony corals are not (Order Scleractinia, Class , Phylum ). PloS One 3, e3222.

Furla, P., Galgani, I., Durand, I., and Allemand, D. (2000). Sources and mechanisms of inorganic carbon transport for coral calcification and photosynthesis. Journal of Experimental Biology 203, 3445-3457.

Gagnon, A.C., Adkins, J.F., and Erez, J. (2012). Seawater transport during coral biomineralization. Earth and Planetary Science Letters 329, 150-161.

Georgiou, L., Falter, J., Trotter, J., Kline, D.I., Holcomb, M., Dove, S.G., Hoegh-Guldberg, O., and McCulloch, M. (2015). pH homeostasis during coral calcification in a free ocean CO2 enrichment (FOCE) experiment, Heron Island reef flat, Great Barrier Reef. Proceedings of the National Academy of Sciences 112, 13219-13224.

Gewin, V. (2002). Taxonomy: All living things, online. Nature 418, 362-363.

Given, R.K., and Wilkinson, B.H. (1985). Kinetic control of morphology, composition, and mineralogy of abiotic sedimentary carbonates. Journal of Sedimentary Research 55.

Gladfelter, W. (1988). Tropical marine organisms and communities (Argus).

Goffredo, S., Vergni, P., Reggi, M., Caroselli, E., Sparla, F., Levy, O., Dubinsky, Z., and Falini, G. (2011). The skeletal organic matrix from Mediterranean coral Balanophyllia europaea influences calcium carbonate precipitation. PLoS One 6, e22338.

Goodwin, S., McPherson, J.D., and McCombie, W.R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics 17, 333-351.

Goreau, T., and Bowen, V. (1955). Calcium Uptake by a Coral. Science (New York, NY) 122, 1188-1189.

Goreau, T.F. (1959). The physiology of skeleton formation in corals. I. A method for measuring the rate of calcium deposition by corals under different conditions. The Biological Bulletin 116, 59-75.

Gosse, P.H. (1860). A history of the British sea-anemones and corals (Van Voorst). 131

Gotoh, O. (2008). A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Research 36, 2630-2638.

Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith Jr, R.K., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., and Town, C.D. (2003). Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654-5666.

Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7.

Hamner, W.M., and Fautin, D.G. (1980). Tropical corallimorpharia (Coelenterata: Anthozoa): feeding by envelopment.

Helm, R.R., Siebert, S., Tulin, S., Smith, J., and Dunn, C.W. (2013). Characterization of differential transcript abundance through time during Nematostella vectensis development. BMC Genomics 14, 266.

Helman, Y., Natale, F., Sherrell, R.M., LaVigne, M., Starovoytov, V., Gorbunov, M.Y., and Falkowski, P.G. (2008). Extracellular matrix production and calcium carbonate precipitation by coral cells in vitro. Proceedings of the National Academy of Sciences 105, 54-58.

Hohenester, E., Tisi, D., Talts, J.F., and Timpl, R. (1999). The Crystal structure of a laminin G–like module reveals the molecular basis of α-dystroglycan binding to laminins, perlecan, and agrin. Molecular Cell 4, 783-792.

Hohn, S., and Merico, A. (2012). Modelling coral polyp calcification in relation to ocean acidification. Biogeosciences 9, 4441.

Holcomb, M., Venn, A., Tambutte, E., Tambutte, S., Allemand, D., Trotter, J., and McCulloch, M. (2014). Coral calcifying fluid pH dictates response to ocean acidification. Scientific Reports 4, 5207.

Hopkinson, B.M., Tansik, A.L., and Fitt, W.K. (2015). Internal carbonic anhydrase activity in the tissue of scleractinian corals is sufficient to support proposed roles in photosynthesis and calcification. Journal of Experimental Biology 218, 2039-2048.

Hughes, T.P., Baird, A.H., Bellwood, D.R., Card, M., Connolly, S.R., Folke, C., Grosberg, R., Hoegh-Guldberg, O., Jackson, J., and Kleypas, J. (2003). Climate change, human impacts, and the resilience of coral reefs. Science 301, 929-933. 132

Ichikawa, K. (2007). Buffering dissociation/formation reaction of biogenic calcium carbonate. Chemistry-A European Journal 13, 10176-10181.

Ip, Y., Lim, A., and Lim, R. (1991). Some properties of calcium-activated adenosine triphosphatase from the hermatypic coralGalaxea fascicularis. Marine Biology 111, 191- 197.

Isa, Y., and Okazaki, M. (1987). Some observations on the Ca2+-binding phospholipid from scleractinian coral skeletons. Comparative Biochemistry and Physiology Part B: Comparative Biochemistry 87, 507-512.

Jin, Y., Wang, Y., Wang, W., Shang, Q., Cao, C., and Erwin, D. (2000). Pattern of marine mass extinction near the Permian-Triassic boundary in South China. Science 289, 432- 436.

Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., et al. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236-1240.

Kajitani, R., Toshimoto, K., Noguchi, H., Toyoda, A., Ogura, Y., Okuno, M., Yabana, M., Harada, M., Nagayasu, E., and Maruyama, H. (2014). Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Research 24, 1384-1395.

Kane, W.H., and Davie, E.W. (1986). Cloning of a cDNA coding for human factor V, a blood coagulation factor homologous to factor VIII and ceruloplasmin. Proceedings of the National Academy of Sciences 83, 6800-6804.

Kanehisa, M. (2008). The KEGG database. Paper presented at: ‘In Silico’Simulation of Biological Processes: Novartis Foundation Symposium 247 (Wiley Online Library).

Kawaguti, S. (1948). The effect of light on calcium deposition in corals. Bull Oceanogr Inst Taiwan 4, 65-70.

Kayal, E., Roure, B., Philippe, H., Collins, A.G., and Lavrov, D.V. (2013). Cnidarian phylogenetic relationships as revealed by mitogenomics. BMC Evolutionary Biology 13, 1.

Kelley, D.R., Liu, B., Delcher, A.L., Pop, M., and Salzberg, S.L. (2012). Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Research 40, e9. 133

Kitahara, M.V., Lin, M.F., Foret, S., Huttley, G., Miller, D.J., and Chen, C.A. (2014). The "naked coral" hypothesis revisited--evidence for and against scleractinian monophyly. PLoS One 9, e94774.

Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K., and Mardis, E.R. (2013). The next-generation sequencing revolution and its impact on genomics. Cell 155, 27-38.

Kocot, K.M., Aguilera, F., McDougall, C., Jackson, D.J., and Degnan, B.M. (2016). Sea shell diversity and rapidly evolving secretomes: insights into the evolution of biomineralization. Frontiers in Zoology 13, 23.

Kolde, R. (2015). pheatmap: Pretty heatmaps [Software].

Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics 5, 1.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357.

Lartillot, N., Brinkmann, H., and Philippe, H. (2007). Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evolutionary Biology 7, S4.

Lartillot, N., Rodrigue, N., Stubbs, D., and Richer, J. (2013). PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Systems Biology 62, 611-615.

Le Goff-Vitry, M., Rogers, A., and Baglow, D. (2004). A deep-sea slant on the molecular phylogeny of the Scleractinia. Molecular Phylogenetics and Evolution 30, 167-177.

Leen, V., Vanhoorne, B., Decock, W., Trias-Verbeek, A., Dekeyzer, S., Colpaert, S., and Hernandez, F. (2013). World Register of Marine Species. Book of.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows– Wheeler transform. Bioinformatics 25, 1754-1760.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078-2079.

Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research 13, 2178-2189. 134

Liew, Y.J., Aranda, M., Carr, A., Baumgarten, S., Zoccola, D., Tambutté, S., Allemand, D., Micklem, G., and Voolstra, C.R. (2014). Identification of microRNAs in the coral Stylophora pistillata. PLoS One 9, e91101.

Liew, Y.J., Aranda, M., and Voolstra, C.R. (2016). Reefgenomics. Org-a repository for marine genomics data. Database 2016.

Lin, M.-F., Moya, A., Ying, H., Chen, C.A., Cooke, I., Ball, E.E., Forêt, S., and Miller, D.J. (2017). Analyses of Corallimorpharian Transcriptomes Provide New Perspectives on the Evolution of Calcification in the Scleractinia (Corals). Genome Biology and Evolution 9, 150-160.

Lin, M.F., Chou, W.H., Kitahara, M.V., Chen, C.L., Miller, D.J., and Foret, S. (2016). Corallimorpharians are not "naked corals": insights into relationships between Scleractinia and Corallimorpharia from phylogenomic analyses. PeerJ 4, e2463.

Lin, M.F., Kitahara, M.V., Luo, H., Tracey, D., Geller, J., Fukami, H., Miller, D.J., and Chen, C.A. (2014). Mitochondrial genome rearrangements in the scleractinia/corallimorpharia complex: implications for coral phylogeny. Genome Biology and Evolution 6, 1086-1095.

Lowe, T.M., and Eddy, S.R. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25, 955-964.

Lowenstam, H.A. (1981). Minerals formed by organisms. Science 211, 1126-1131.

Lowenstam, H.A., and Weiner, S. (1989). On biomineralization (Oxford University Press on Demand).

Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., and Liu, Y. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 1-6.

Mann, S. (1983). Mineralization in biological systems. Inorganic Elements in Biochemistry, 125-174.

Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764-770.

Mass, T., Drake, J.L., Haramaty, L., Kim, J.D., Zelzion, E., Bhattacharya, D., and Falkowski, P.G. (2013). Cloning and characterization of four novel coral acid-rich proteins that precipitate carbonates in vitro. Current Biology 23, 1126-1131. 135

Mass, T., Giuffre, A.J., Sun, C.-Y., Stifler, C.A., Frazier, M.J., Neder, M., Tamura, N., Stan, C.V., Marcus, M.A., and Gilbert, P.U. (2017). Amorphous calcium carbonate particles form coral skeletons. Proceedings of the National Academy of Sciences, 201707890.

McCulloch, M., Falter, J., Trotter, J., and Montagna, P. (2012). Coral resilience to ocean acidification and global warming through pH up-regulation. Nature Climate Change 2, 623.

McCulloch, M.T., D’Olivo, J.P., Falter, J., Holcomb, M., and Trotter, J.A. (2017). Coral calcification in a changing World and the interactive dynamics of pH and DIC upregulation. Nature Communications 8.

Medina, M., Collins, A.G., Takaoka, T.L., Kuehl, J.V., and Boore, J.L. (2006). Naked corals: skeleton loss in Scleractinia. Proceedings of the National Academy of Sciences 103, 9096-9100.

Miglietta, M.P., McNally, L., and Cunningham, C.W. (2010). Evolution of calcium- carbonate skeletons in the Hydractiniidae (Oxford University Press).

Milliman, J., and Droxler, A. (1996). Neritic and pelagic carbonate sedimentation in the marine environment: ignorance is not bliss. Geologische Rundschau 85, 496-504.

Mitterer, R.M., and Kriausakul, N. (1989). Calculation of amino acid racemization ages based on apparent parabolic kinetics. Quaternary Science Reviews 8, 353-357.

Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A.C., and Kanehisa, M. (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Research 35, W182-W185.

Moy, G., Mendoza, L., Schulz, J., Swanson, W., Glabe, C., and Vacquier, V. (1996). The sea urchin sperm receptor for egg jelly is a modular protein with extensive homology to the human polycystic kidney disease protein, PKD1. The Journal of Cell Biology 133, 809- 817.

Moya, A., Tambutté, S., Béranger, G., Gaume, B., Scimeca, J.-C., Allemand, D., and Zoccola, D. (2008a). Cloning and use of a coral 36B4 gene to study the differential expression of coral genes between light and dark conditions. Marine Biotechnology 10, 653-663.

Moya, A., Tambutté, S., Bertucci, A., Tambutté, E., Lotto, S., Vullo, D., Supuran, C.T., Allemand, D., and Zoccola, D. (2008b). Carbonic anhydrase in the scleractinian coral Stylophora pistillata characterization, localization, and role in biomineralization. Journal of Biological Chemistry 283, 25475-25484. 136

Murray, J.M., and Watson, G.J. (2014). A critical assessment of marine aquarist biodiversity data and commercial aquaculture: identifying gaps in culture initiatives to inform local fisheries managers. PloS One 9, e105982.

Nawrocki, E.P., Burge, S.W., Bateman, A., Daub, J., Eberhardt, R.Y., Eddy, S.R., Floden, E.W., Gardner, P.P., Jones, T.A., and Tate, J. (2014). Rfam 12.0: updates to the RNA families database. Nucleic Acids Research, gku1063.

Nudelman, F., Pieterse, K., George, A., Bomans, P.H., Friedrich, H., Brylka, L.J., Hilbers, P.A., and Sommerdijk, N.A. (2010). The role of collagen in bone apatite formation in the presence of hydroxyapatite nucleation inhibitors. Nature Materials 9, 1004-1009.

Ogilvie-Grant, W.R. (1897). A handbook to the game-birds (WH Allen & Company, limited.).

Owen, P. (1857). On the Characters, Principles of Division, and Primary Groups of the Class Mammalia. Zoological Journal of the Linnean Society 2, 1-37.

Park, E., Hwang, D.-S., Lee, J.-S., Song, J.-I., Seo, T.-K., and Won, Y.-J. (2012a). Estimation of divergence times in cnidarian evolution based on mitochondrial protein-coding genes and the fossil record. Molecular Phylogenetics and Evolution 62, 329-345.

Park, E., Hwang, D.S., Lee, J.S., Song, J.I., Seo, T.K., and Won, Y.J. (2012b). Estimation of divergence times in cnidarian evolution based on mitochondrial protein-coding genes and the fossil record. Molecular Phylogenetics and Evolution 62, 329-345.

Parker, S.P. (1982). Synopsis and classification of living organisms.

Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061-1067.

Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., Wörheide, G., and Baurain, D. (2011). Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology 9, e1000602.

Pisani, D., Pett, W., Dohrmann, M., Feuda, R., Rota-Stabelli, O., Philippe, H., Lartillot, N., and Worheide, G. (2015). Genomic data do not support comb jellies as the sister group to all other animals. Proceedings of the National Academy of Sciences 112, 15402- 15407.

Posada, D. (2008). jModelTest: phylogenetic model averaging. Molecular Biology and Evolution 25, 1253-1256. 137

Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2006). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 35, D61-D65.

Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., Terry, A., Shapiro, H., Lindquist, E., Kapitonov, V.V., et al. (2007). Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization. Science 317, 86-94.

Puverel, S., Tambutté, E., Pereira-Mouries, L., Zoccola, D., Allemand, D., and Tambutté, S. (2005). Soluble organic matrix of two Scleractinian corals: partial and comparative analysis. Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology 141, 480-487.

Qian, F., Boletta, A., Bhunia, A.K., Xu, H., Liu, L., Ahrabi, A.K., Watnick, T.J., Zhou, F., and Germino, G.G. (2002). Cleavage of polycystin-1 requires the receptor for egg jelly domain and is disrupted by human autosomal-dominant polycystic kidney disease 1- associated mutations. Proceedings of the National Academy of Sciences 99, 16981- 16986.

Ramos-Silva, P., Kaandorp, J., Huisman, L., Marie, B., Zanella-Cleon, I., Guichard, N., Miller, D.J., and Marin, F. (2013a). The skeletal proteome of the coral Acropora millepora: the evolution of calcification by co-option and domain shuffling. Mol Biol Evol 30, 2099-2112.

Ramos-Silva, P., Kaandorp, J., Huisman, L., Marie, B., Zanella-Cléon, I., Guichard, N., Miller, D.J., and Marin, F. (2013b). The skeletal proteome of the coral Acropora millepora: the evolution of calcification by co-option and domain shuffling. Molecular Biology and Evolution 30, 2099-2112.

Raybaud, V., Tambutté, S., Ferrier-Pagès, C., Reynaud, S., Venn, A.A., Tambutté, É., Nival, P., and Allemand, D. (2017). Computing the carbonate chemistry of the coral calcifying medium and its response to ocean acidification. Journal of Theoretical Biology 424, 26- 36.

Reggi, M., Fermani, S., Landi, V., Sparla, F., Caroselli, E., Gizzi, F., Dubinsky, Z., Levy, O., Cuif, J.-P., and Dauphin, Y. (2014). Biomineralization in Mediterranean corals: the role of the intraskeletal organic matrix. Crystal Growth & Design 14, 4310-4320.

Reyes-Bermudez, A., Lin, Z., Hayward, D.C., Miller, D.J., and Ball, E.E. (2009). Differential expression of three galaxin-related genes during settlement and metamorphosis in the scleractinian coral Acropora millepora. BMC Evolutionary Biology 9, 178. 138

Reyes-Bermudez, A., Villar-Briones, A., Ramirez-Portilla, C., Hidaka, M., and Mikheyev, A.S. (2016). Developmental progression in the coral Acropora Digitifera is controlled by differential expression of distinct regulatory gene networks. Genome Biology and Evolution 8, 851-870.

Romano, S., and Cairns, S.D. (2000). Molecular phylogenetic hypotheses for the evolution of scleractinian corals. Bulletin of Marine Science 67, 1043-1068.

Romano, S.L., and Palumbi, S.R. (1996). Evolution of scleractinian corals inferred from molecular systematics. Science 271, 640-642.

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Hohna, S., Larget, B., Liu, L., Suchard, M.A., and Huelsenbeck, J.P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systems Biology 61, 539-542.

Ryan, J.F., Pang, K., Schnitzler, C.E., Nguyen, A.D., Moreland, R.T., Simmons, D.K., Koch, B.J., Francis, W.R., Havlak, P., Program, N.C.S., et al. (2013). The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592.

Schwentner, M., and Bosch, T.C. (2015). Revisiting the age, evolutionary history and species level diversity of the genus Hydra (Cnidaria: Hydrozoa). Molecular Phylogenetics and Evolution 91, 41-55.

Shendure, J., and Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology 26, 1135-1145.

Shinzato, C., Shoguchi, E., Kawashima, T., Hamada, M., Hisata, K., Tanaka, M., Fujie, M., Fujiwara, M., Koyanagi, R., Ikuta, T., et al. (2011). Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476, 320-U382.

Shoguchi, E., Shinzato, C., Kawashima, T., Gyoja, F., Mungpakdee, S., Koyanagi, R., Takeuchi, T., Hisata, K., Tanaka, M., and Fujiwara, M. (2013). Draft assembly of the Symbiodinium minutum nuclear genome reveals dinoflagellate gene structure. Current Biology 23, 1399-1408.

Simpson, C., Kiessling, W., Mewis, H., Baron-Szabo, R.C., and Muller, J. (2011a). Evolutionary diversification of reef corals: a comparison of the molecular and fossil records. Evolution 65, 3274-3284. 139

Simpson, C., Kiessling, W., Mewis, H., Baron‐Szabo, R.C., and Müller, J. (2011b). Evolutionary diversification of reef corals: a comparison of the molecular and fossil records. Evolution 65, 3274-3284.

Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30, 1312-1313.

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34, W435-439.

Stanley, G.D., and Fautin, D.G. (2001a). The origins of modern corals. Science 291, 1913- 1914.

Stanley, G.D., Jr., and Fautin, D.G. (2001b). Paleontology and evolution. The origins of modern corals. Science 291, 1913-1914.

Stolarski, J., Kitahara, M.V., Miller, D.J., Cairns, S.D., Mazur, M., and Meibom, A. (2011). The ancient evolutionary origins of Scleractinia revealed by azooxanthellate corals. BMC Evolutionary Biology 11, 316.

Suzuki, R., and Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540-1542.

Takeichi, M. (1988). The cadherins: cell-cell adhesion molecules controlling animal morphogenesis. Development 102, 639-655.

Takeuchi, T., Yamada, L., Shinzato, C., Sawada, H., and Satoh, N. (2016). Stepwise evolution of coral Biomineralization revealed with genome-wide proteomics and Transcriptomics. PloS One 11, e0156424.

Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systems Biology 56, 564-577.

Tambutté, E., Tambutté, S., Segonds, N., Zoccola, D., Venn, A., Erez, J., and Allemand, D. (2012). Calcein labelling and electrophysiology: insights on coral tissue permeability and calcification. Proceedings of the Royal Society B 279, 19-27.

Tambutté, E., Venn, A., Holcomb, M., Segonds, N., Techer, N., Zoccola, D., Allemand, D., and Tambutté, S. (2015). Morphological plasticity of the coral skeleton under CO 2- driven seawater acidification. Nature Communications 6, 7368. 140

Teng, H.H., Dove, P.M., Orme, C.A., and De Yoreo, J.J. (1998). Thermodynamics of calcite growth: baseline for understanding biomineral formation. Science 282, 724-727.

Tian, C.F., Zhou, Y.J., Zhang, Y.M., Li, Q.Q., Zhang, Y.Z., Li, D.F., Wang, S., Wang, J., Gilbert, L.B., Li, Y.R., et al. (2012). Comparative genomics of rhizobia nodulating soybean suggests extensive recruitment of lineage-specific genes in adaptations. Proceedings of the National Academy of Sciences 109, 8629-8634.

Tullock, J.H. (2008). Your first marine aquarium: everything about setting up a marine aquarium, including conditioning, maintenance, selecting fish and , and more (Barron's Educational Series).

Venn, A., Tambutté, E., Holcomb, M., Allemand, D., and Tambutté, S. (2011). Live tissue imaging shows reef corals elevate pH under their calcifying tissue relative to seawater. PloS one 6, e20013.

Venn, A.A., Tambutté, E., Holcomb, M., Laurent, J., Allemand, D., and Tambutté, S. (2013). Impact of seawater acidification on pH at the tissue–skeleton interface and calcification in reef corals. Proceedings of the National Academy of Sciences 110, 1634- 1639.

Veron, J.E.N. (1995). Corals in space and time: the biogeography and evolution of the Scleractinia (Cornell University Press).

Von Euw, S., Zhang, Q., Manichev, V., Murali, N., Gross, J., Feldman, L.C., Gustafsson, T., Flach, C., Mendelsohn, R., and Falkowski, P.G. (2017). Biological control of aragonite formation in stony corals. Science 356, 933-938.

Voolstra, C.R., Li, Y., Liew, Y.J., Baumgarten, S., Zoccola, D., Flot, J.-F., Tambutté, S., Allemand, D., and Aranda, M. (2017). Comparative analysis of the genomes of Stylophora pistillata and Acropora digitifera provides evidence for extensive differences between species of corals. Scientific Reports 7, 17583.

Voolstra, C.R., Miller, D.J., Ragan, M.A., Hoffmann, A., Hoegh-Guldberg, O., Bourne, D., Ball, E., Ying, H., Foret, S., and Takahashi, S. (2015). The ReFuGe 2020 Consortium— using “omics” approaches to explore the adaptability and resilience of coral holobionts to environmental change. Frontiers in Marine Science 2, 68.

Wainwright, S.A. (1963). Skeletal organization in the coral, Pocillopora damicornis. Journal of Cell Science 3, 169-183. 141

Wang, X., Drillon, G., Ryu, T., Voolstra, C.R., and Aranda, M. (2017a). Genome-Based Analyses of Six Hexacorallian Species Reject the “Naked Coral” Hypothesis. Genome Biology and Evolution 9, 2626-2634.

Wang, X., Liew, Y.J., Li, Y., Zoccola, D., Tambutte, S., and Aranda, M. (2017b). Draft genomes of the corallimorpharians Amplexidiscus fenestrafer and Discosoma sp. Molecular Ecology Resources.

Warnes, M.G.R. (2016). Package ‘gplots’.

Watanabe, T., Fukuda, I., China, K., and Isa, Y. (2003). Molecular analyses of protein components of the organic matrix in the exoskeleton of two scleractinian coral species. Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology 136, 767-774.

Weijers, S., De Jonge, J., Van Zanten, O., Benedetti, L., Langeveld, J., Menkveld, H., and Van Nieuwenhuijzen, A. (2012). KALLISTO: cost effective and integrated optimization of the urban wastewater system Eindhoven. Water Practice and Technology 7, wpt2012036.

Well, M.H., MacLean, L.D., Visscher, M.B., and Spink, W.W. (1956). Studies on the circulatory changes in the dog produced by endotoxin from gram-negative microorganisms. The Journal of Clinical Investigation 35, 1191-1198.

Wheeler, T.J., Clements, J., Eddy, S.R., Hubley, R., Jones, T.A., Jurka, J., Smit, A.F., and Finn, R.D. (2013). Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Research 41, D70-82.

Wickham, H. (2016). ggplot2: elegant graphics for data analysis (Springer).

Wolf, S.E., Leiterer, J., Pipich, V., Barrea, R., Emmerling, F., and Tremel, W. (2011). Strong stabilization of amorphous calcium carbonate emulsion by ovalbumin: gaining insight into the mechanism of ‘polymer-induced liquid precursor’processes. Journal of the American Chemical Society 133, 12642-12649.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586-1591.

Young, S.D. (1971). Organic material from scleractinian coral skeletons—I. Variation in composition between several species. Comparative Biochemistry and Physiology Part B: Comparative Biochemistry 40, 113-120. 142

Zdobnov, E.M., and Apweiler, R. (2001). InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-848.

Zhou, Y., Xue, S., and Yang, J.J. (2013). Calciomics: integrative studies of Ca 2+-binding proteins and their interactomes in biological systems. Metallomics 5, 29-42.

Zoccola, D., Ganot, P., Bertucci, A., Caminiti-Segonds, N., Techer, N., Voolstra, C.R., Aranda, M., Tambutté, E., Allemand, D., and Casey, J.R. (2015a). Bicarbonate transporters in corals point towards a key step in the evolution of cnidarian calcification. Scientific Reports 5.

Zoccola, D., Innocenti, A., Bertucci, A., Tambutté, E., Supuran, C.T., and Tambutté, S. (2016). Coral carbonic anhydrases: regulation by ocean acidification. Marine drugs 14, 109.

Zoccola, D., Tambutté, E., Kulhanek, E., Puverel, S., Scimeca, J.-C., Allemand, D., and Tambutté, S. (2004). Molecular cloning and localization of a PMCA P-type calcium ATPase from the coral Stylophora pistillata. Biochimica et Biophysica Acta (BBA)- Biomembranes 1663, 117-126.

Zoccola, D., Tambutté, E., Sénégas-Balas, F., Michiels, J.-F., Failla, J.-P., Jaubert, J., and Allemand, D. (1999). Cloning of a calcium channel α1 subunit from the reef-building coral, Stylophora pistillata. Gene 227, 157-167.