MOLECULAR INSIGHTS INTO THE EVOLUTION OF NOVEL GENES

Joachim Maximilian Surm

Bachelor of Applied Science/Bachelor of Business Bachelor of Biomedical Science (Honours 1A)

Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Biomedical Sciences Faculty of Health Queensland University of Technology 2019

Keywords

Acrorhagi, Actinioidea, tenebrosa, , cnidocyte, convergent evolution, de novo, desaturase, development, diversifying selection, ecology, elongase, Elovl, episodic diversifying selection, evolution, Fad, fatty acid, gene duplication, gene expression, genome, long-chain polyunsaturated fatty acids, lineage-specific, nematocyst, nematocyte, novel cell, ontogeny, phylogenetic, proteome, purifying selection, , spatiotemporal expression, toxin, transcriptome, venom.

Molecular Insights into the Evolution of Novel Genes i

Abstract

Understanding the mechanisms that underpin the formation of novel genes is a key research area in molecular evolution. Novel genes can originate through multiple mechanisms (e.g. duplication, horizontal gene transfer and de novo gene formation) and have been associated with phenotypic variation including evolution of venoms, the adaptive immune system, as well as various disease states. Cnidarians represent an excellent model for investigation and characterisation of novel genes as they possess a high number of genes that lack homology with other lineages (10-20 % of the gene set) as well as having genomes more similar to than other model species (e.g. Drosophila melanogaster and Caenorhabditis elegans). Moreover, the last common ancestor of cnidarians was venomous, with all extant cnidarian species delivering their venom using a novel, and defining cell – the cnidocyte. These venoms constitute toxin peptides that are often encoded by novel genes. Here I have generated genomic resources (genome, transcriptome) in combination with functional genomics (RNA-seq, proteomics, and fatty acidomics) to elucidate the evolution of novel genes in sea anemone genomes using a comparative phylogenetic approach. Based on the results of the three studies, gene duplication dominates the evolution of novel genes in cnidarians, specifically those that occur within a lineage. This is observed in the first study for both the gene families involved in the biosynthesis of long-chain polyunsaturated fatty acids (LC-PUFAs, ≥ C20), as well as in the second study for multiple toxin families. Finally, the first complete Actinioidea genome confirmed a large gain of novel genes in this lineage, while novel genes in Cnidaria, collectively, appear to undergo an increased frequency of gene duplication events. Broadly, I demonstrate that cnidarian gene families are evolving under a regime of pervasive purifying selection, with some evidence of episodic diversifying selection. These results are consistent with other studies, which report similar observations in cnidarians. Taken together, my data indicates phylogeny constrains much of the evolution of gene complement and gene families in cnidarians, suggesting phylogenetic inertia plays an important role. This is significant, with genes encoding toxins previously reported to be strongly associated with a shift in ecological niche. However, dynamic patterns of spatiotemporal gene expression of toxin genes are observed to meet the functional and ecological requirements of cnidarians. My

ii Molecular Insights into the Evolution of Novel Genes

results further demonstrate Actinioidea-specific genes have pronounced signatures of gene expression and protein localisation to Actinioidea-specific morphological structures, acrorhagi. This suggests that novel morphological structures evolve in concert with novel genes, and that novel innovations are reliant on a process of gene regulation. Overall, I provide insights into the expression, phylogenetic and molecular evolutionary histories of cnidarian genes and gene families, which has important implications in understanding novel genes and novel innovations.

Molecular Insights into the Evolution of Novel Genes iii

A Note Regarding Format

This dissertation is a thesis by publication. It contains three articles that have either been published or are under blind-peer review by refereed journals. The logical flow of the thesis is maintained by introducing these articles where they fit most appropriately into the thesis structure. All articles have been reformatted using the APA referencing style and reconfigured to Word to provide consistent formatting throughout the thesis. Moreover, tables and figures have been numbered continuously throughout the thesis, for consistency.

iv Molecular Insights into the Evolution of Novel Genes

Table of Contents

Keywords ...... i Abstract ...... ii Table of Contents ...... v List of Figures ...... viii List of Tables ...... xi List of Supplementary Figures ...... xii List of Supplementary Tables ...... xiv List of Abbreviations ...... xvi List of Publications ...... xvii Acknowledgements ...... xx Chapter 1: Introduction...... 1 1.1 Mechanisms of novel gene formation ...... 1 1.1.1 Gene duplication...... 1 1.1.2 Gene fusion and fission...... 5 1.1.3 Exon shuffling ...... 6 1.1.4 Horizontal gene transfer ...... 6 1.1.5 de novo gene formation...... 7 1.2 Frequency, distribution and outcomes of novel genes ...... 9 1.2.1 Chimeric genes ...... 9 1.2.2 Orphan genes ...... 9 1.3 Cnidarians ...... 12 1.4 Actinia tenebrosa as a candidate to study the evolution of novel genes ...... 17 1.5 Aims of the Project ...... 18 1.6 Research plan ...... 19 1.6.1 Aim 1/Study 1: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria ...... 19 1.6.2 Aim 2/Study 2: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones ...... 19 1.6.3 Aim3/Study 3: The draft genome of Actinia tenebrosa reveals innovations in Actinioidea ...... 20 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria ...... 21 Abstract ...... 24 2.1 Introduction ...... 25 2.2 Materials and Methods ...... 28 2.2.1 Identification of candidate genes ...... 28 2.2.2 Phylogenetic analyses ...... 29 2.2.3 Selection analyses ...... 29 2.2.4 Fatty acid analysis ...... 30

Molecular Insights into the Evolution of Novel Genes v

2.3 Results ...... 31 2.3.1 Identification of candidate genes...... 31 2.3.2 Comparative and phylogenetic analyses of Fad and Elovl gene families ...... 33 2.3.3 Selection analysis of the Fad and Elovl gene families ...... 37 2.3.4 Fatty acid analysis ...... 41 2.4 Discussion...... 45 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones ...... 51 Abstract ...... 54 3.1 Introduction ...... 55 3.2 Materials and methods...... 57 3.2.1 Identification of TTL genes ...... 57 3.2.2 Comparative genomic and phylogenetic analyses ...... 58 3.2.3 Mass Spectrometry ...... 64 3.3 Results ...... 67 3.3.1 Comparative analysis of TTL genes across Metazoa ...... 67 3.3.2 Comparative analysis of TTL genes across Actiniaria ...... 74 3.3.3 TTLs show marked differences in expression and distribution across tissue types ...... 79 3.3.4 TTLs show marked differences in expression across ontogenetic stages ...... 85 3.4 Discussion...... 89 3.4.1 Comparative analysis of TTL genes ...... 89 3.4.2 Expression differences of TTLs and the production of multiple venoms ...... 91 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations ...... 95 Abstract ...... 98 4.1 Introduction ...... 99 4.2 Methods ...... 100 4.2.1 Genome assembly of Actinia tenebrosa ...... 100 4.2.2 Annotation ...... 101 4.2.3 Gene family evolution ...... 103 4.3 Results ...... 104 4.3.1 Genome assembly ...... 104 4.3.2 Functional annotation of predicted gene models ...... 105 4.3.3 Gene family evolution ...... 106 4.4 Discussion...... 115 Chapter 5: General discussion ...... 119 5.1 Lineage-specific duplications ...... 121 5.2 Novel genes ...... 123 5.3 Novel structures ...... 127 5.4 Limitations and future directions ...... 129 Chapter 6: Conclusion ...... 133 Bibliography ...... 135

vi Molecular Insights into the Evolution of Novel Genes

Appendices ...... 185

Molecular Insights into the Evolution of Novel Genes vii

List of Figures

Figure 1.1. Phylogeny and physiology of a cnidarian. A) Phylogeny of Metazoa. B) Anthozoan cnidarian Actinia tenebrosa, with oral-aboral axis highlighted. Photograph credit: Jonathon Muller ...... 14 Figure 2.1. Maximum Likelihood tree with midpoint root depicting relationships among Fad protein sequences and branches transformed as a cladogram. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70 % not reported. Three distinct clades are named Clades A, B, and C. The branches of functionally characterised Fad proteins are highlighted by the colour of their respective clade. These functionally characterised sequences were retrieved from the Swiss-prot database and named according to their Uniprot accession with the species name abbreviated, Candidate sequences identified in this study have their full species names...... 34 Figure 2.2. Maximum Likelihood tree with midpoint root depicting relationships among Elovl protein sequences and branches transformed as a cladogram. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70 % not reported. Twelve distinct clades are annotated based on the functionally characterised proteins found within them. The branches of functionally characterised proteins are highlighted by the colour of their respective clade. These functionally characterised sequences were retrieved from the Swiss- prot database and named according to their Uniprot accession with the species name abbreviated. Candidate sequences identified in this study have their full species names...... 36 Figure 2.3. Episodic diversifying selection of Elovl gene family in actiniarians. (a) Maximum Likelihood tree of nucleotide sequences with midpoint root depicting relationships among Elovl genes in actiniarians. Foreground branches tested are numbered and coloured. Foreground branches corresponding to their respective clades in Figure 3 are annotated accordingly. (b) A plot of posterior probability of codons with dN/dS > 1 against amino acid residue positions. Significantly detected codons under diversifying selection (dN/dS > 1) with posterior probability ≥ 0.95 (Bayes Empirical Bayes analysis) are coloured to correspond to their respective foreground branches. The horizontal line represents the line of significance with posterior probability ≥ 0.95...... 40 Figure 2.4. Plot of average fatty acid profile from whole-organism (n=3) of anemone and prawn. (a) Bar plot of average concentration of FAME given in mmol/kg for both anemone and prawn with error bars shows standard deviation. (b) Bar plot of average concentration of FAME given in % of total FAME for both anemone and prawn with error bars shows standard deviation...... 43

viii Molecular Insights into the Evolution of Novel Genes

Figure 3.1. Distribution and expansion of toxin and toxin-like genes across Metazoa. A) Metazoan phylogeny showing distribution and expansion of TTL genes in representative genomes, including Ctenophore, Porifera and Placozoa (CPP), Cnidaria, , , and Deuterostomia (opaque bars represent number of different gene families; coloured bars represent copy number). Abbreviations Ve and Hs refer to species that are considered venomous or hematophagous specialists (specialised venomous subtype), respectively (Fry et al., 2009). B) Venn diagram showing the overlap of toxin gene families across major metazoan groupings. See Supplementary Table 7 for full list TTL gene copy number...... 69 Figure 3.2. Comparative analysis of TTL within Cnidaria. A) Venn diagram of the distribution of TTL gene families within cnidarians. B) Heat map of the distribution and copy number of TTL gene families within cnidarians...... 73 Figure 3.3. Comparative analysis and molecular evolution of TTL within Actiniaria. A) Maximum Likelihood protein tree generated to determine actiniarian phylogeny, all bootstrap support > 95 %. TTL gene family gains (green) and losses (red) are represented above and below branches, respectively. Bubble plot of the distribution and copy number of TTL gene families within actiniarians and TTL gene families with dN/dS > 1 highlighted with a black circle, dN/dS = 1 highlighted with a grey circle, and dN/dS < 1 highlighted with a white circle, above respective gene family. B) A plot of site-specific dN/dS values against amino acid residue positions for TTL gene families within actiniarians...... 77 Figure 3.4. Toxin expression profile across tissue types and ontogeny in Actinia tenebrosa. A) Heat map of differentially expressed TTL, Z- scaled FPKM values, for morphological structure: acrorhagi, mesenteric filaments and tentacle. B) Plot of the subclusters of differentially expressed TTL transcripts. C) Bar plot of the respective subclusters showing copy-number variation of differentially expressed TTLs across tissue types. D) Heat map of differentially expressed TTL, Z-scaled FPKM values, for ontogeny: 1, 3, 6, and 9 mm size classes. E) Plot of the subclusters of differentially expressed TTL transcripts. F) Bar plot of the respective subclusters showing copy- number variation of differentially expressed TTLs across tissue types...... 83 Figure 3.5. Mass spectrometry imaging (MSI) positive mode spectra acquired from cross-sectioned . A) Histological image of the section that was used for MSI experiments (stained with PAS). Tagged regions of interest (ROI) were selected based on biological functions and associated cnidae profile. ROI 01 is related to actinopharnyx, column and mesenterial filaments regions; ROI O2 is the acrorhagi; ROI 03 and 04 are regions related to tentacles. B) Slide sprayed with matrix CHCA. C) MSI of the average mass related to a peptide widely distributed with higher concentration in the tentacle region. D) MSI of the average mass related to a peptide with a distribution restricted to acrorhagi. E) Projection of the MSI linear positive mode spectra of ROIs and overall spectra...... 88

Molecular Insights into the Evolution of Novel Genes ix

Figure 4.1. Comparative analysis of gene families within Cnidaria. Maximum Likelihood protein tree generated to determine cnidarian phylogeny, with all bootstrap support equal to 100 %. TTL gene family gains (green) and losses (red) are represented above and below branches, respectively. ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis...... 108 Figure 4.2. Comparative analysis of gene families among Actiniarians. Venn diagram highlighting orthologous genes between Actiniarian genomes. ATEN = Actinia tenebrosa, EPAL = Exaiptasia pallida, NVEC = Nematostella vectensis...... 110 Figure 4.3. Protein enrichment across Cnidaria. Heat map of Pfam domains enriched in Actinia tenebrosa. Abundance of Pfam domains in cnidarians log2 and median centred ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis...... 111 Figure 4.4. Microsynteny of sea anemone type 3 (BDS-LIKE) potassium channel toxin (KTx) copies in Actinia tenebrosa. A) Intron-exon structure of sea anemone type 3 (BDS-LIKE) potassium channel toxin. Exon 1 coloured red, exon 2 coloured blue, and intron coloured grey. Arrows depicting strand directionality and scale bar representing 100 nucleotides. B) Protein alignment of sea anemone type 3 (BDS- LIKE) potassium channel toxin with proteins sequences annotated with signal peptide (yellow) and Pfam domain (purple). Coloured box surrounding sequences corresponds to exon number as per A...... 113

x Molecular Insights into the Evolution of Novel Genes

List of Tables

Table 2.1. Fad and Elovl gene copy numbers in cnidarian taxa with sequenced genomes ...... 32 Table 2.2. Fad and Elovl gene copy numbers in actiniarian transcriptome assemblies ...... 33 Table 2.3. Detecting pervasive diversifying selection using site models implemented in CODEML for the Fad and Elovl gene families from actiniarian transcriptome assemblies...... 38 Table 2.4. Detecting lineages under episodic diversifying selection with branch models implemented in CODEML for Fad and Elovl gene families from actiniarian transcriptome assemblies...... 39 Table 4.1. Functional annotation of gene models from seven cnidarian genomes... 106 Table 4.2. Expansion of shared and species-specific gene families in cnidarians ... 109

Molecular Insights into the Evolution of Novel Genes xi

List of Supplementary Figures

Supplementary Figure 1. Maximum Likelihood tree of nucleotide sequences with midpoint root depicting relationships among A) Fad genes and B) Elovl genes. Branches are coloured and numbered according to the foreground branches used for testing for episodic diversifying selection...... 187 Supplementary Figure 2. Principle component analysis (PCA) of the counts matrix for four sea anemone DGE experiments. A) PCA of counts median centred and log2 transformed for morphological structure: acrorhagi, mesenteric filaments and tentacle in Actinia tenebrosa, and across ontogeny (B): 1, 3, 6, and 9 mm size classes. C) PCA of counts median centred and log2 transformed for morphological structure: nematosomes, mesenteric filaments and tentacles in Nematostella vectensis, and across development (D): gastrula, planula and adult...... 189 Supplementary Figure 3. Principle component analysis of the distribution and copy number of TTL genes of superfamily (A) and species (B) median centred and log2 transformed...... 190 Supplementary Figure 4. Transcript expression profile across tissue types and ontogenetic stages in Actinia tenebrosa. A) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for morphological structure: acrorhagi (a), mesenteric filaments (m), and tentacle (t). B) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for ontogenetic stages: 1, 3, 6, and 9 mm...... 193 Supplementary Figure 5. Transcript expression profile across tissue types and development in Nematostella vectensis. A) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for morphological structure: nematosomes, mesenteric filaments, and tentacles (t). B) Heat map of DE transcripts, median centred and log2 transformed FPKM values, for development stages: gastrula, planula and adult...... 194 Supplementary Figure 6. Transcript expression profile of toxins across tissue types and development in Nematostella vectensis. A) Heat map of differentially expressed (DE) TTL transcripts, z-scale transformed FPKM values, for morphological structure: nematosomes, mesenteric filaments, and tentacles (t). B) Heat map of DE TTL transcripts, z- scale transformed FPKM values, for development stages: gastrula, planula and adult...... 197 Supplementary Figure 7. Gene order in Actinia tenebrosa mitochondrial DNA (20,691 bp). Figure produced in Geneious...... 198 Supplementary Figure 8. Maximum Likelihood tree of sea anemone type 3 (BDS-LIKE) potassium channel toxin (KTx) in actiniarians ANEVI = Anemonia viridis, ANTMC = Antheopsis maculate, ACTTE = Actinia

xii Molecular Insights into the Evolution of Novel Genes

tenebrosa, BUNGR = Bunodosoma granuliferum, and ANTEL = Anthopleura elegantissima...... 199

Molecular Insights into the Evolution of Novel Genes xiii

List of Supplementary Tables

Supplementary Table 1. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. tenebrosa ecotype (red, brown, blue and green) ...... 201 Supplementary Table 2. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. buddemeieri, A. veratra, C. polypus, N. annamensis and Telmatactis sp...... 202 Supplementary Table 3. Detecting pervasive purifying and diversifying selection using FUBAR(Murrell et al., 2013) within HyPHy(Pond et al., 2005) package at posterior probability of ≥ 0.95 ...... 203 Supplementary Table 4. Detecting codons under episodic diversifying selection with branch-site models implemented in CODEML for the Fad and Elovl gene families from actiniarian transcriptome assemblies. Significance at ≤ 0.05 and ≤ 0.01 following Bonferroni's correction are highlighted as * and **, respectively. Codons under episodic diversifying selected detected at ≥ 0.95 significance are indicated and ≥ 0.99 significance in the parenthesis using the Bayes Empirical Bayes analysis. NS refers to not significant...... 204 Supplementary Table 5. Codons encoding amino acids under episodic diversifying selection from the branch-site models implemented in CODEML for Elovl gene families from actiniarian transcriptome assemblies and a significance at ≥ 0.95 using the Bayes Empirical Bayes analysis. The number refers to the consensus position and letter refers to the amino acid of TR115686_c0_g1_i1_m.966674_Anthopleura_buddemeieri...... 205 Supplementary Table 6. Average fatty acid profile from whole-organism (n=3) of anemone and prawn. The concentration of FAME (given in mmol/kg and % of total FAME)...... 206 Supplementary Table 7. TTL gene family distribution across Metazoa. TTL genes identified using BLAST analysis (e value < 1e-05) against the Swiss-prot database...... 207 Supplementary Table 8. TTL gene family distribution across actiniarian transcriptomes. TTL genes identified using BLAST analysis (e value < 1e-05) against the Swiss-prot database...... 229 Supplementary Table 9. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. tenebrosa ecotypes (red, brown, blue and green) ...... 235 Supplementary Table 10. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of Anthopleura buddemeieri, Aulactinia veratra, Calliactis polypus, annamensis and Telmatactis sp...... 236

xiv Molecular Insights into the Evolution of Novel Genes

Supplementary Table 11. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of Megalactis griffithsi, Anemonia sulcata, Stichodactyla haddoni, Anthopleura dowii, Aiptasia diaphana, Edwardsiella carnea, and E. pallida...... 237 Supplementary Table 12. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of RNA-seq reference transcriptomes of Actinia tenebrosa across morphological structures (acrorhagi, mesenteric filaments and tentacles) and ontogeny (1, 3, 6 and 9 mm)...... 238 Supplementary Table 13. TTL gene validation. Primers used for Sanger sequence validation of TTL genes identified in actiniarian transcriptomes...... 239 Supplementary Table 14. Validation of TTL genes. Sanger sequencing alignments with TTL transcripts from Trinity de novo transcriptome assembly of actiniarian species...... 240 Supplementary Table 15. Selective pressures of TTL genes using phylogenetic analysis by Maximum Likelihood (PAML). Selection analysis determined using CODEML program within PAML of TTL genes found in actiniarian transcriptomes...... 241 Supplementary Table 16. Gene set enrichment analysis (GSEA). REVIGO output from GSEA of differentially expressed transcripts across morphological structures in Actinia tenebrosa...... 248 Supplementary Table 17. TTL gene family distribution in RNA-seq reference transcriptomes of Actinia tenebrosa and Nematostella vectensis across morphological structures and ontogeny. TTL genes identified using BLAST analysis (e value < 1e-05) against the Swiss-prot database...... 267 Supplementary Table 18. Tissue-specific TTL gene validation in Actinia tenebrosa. ANOVA statistical analysis from qPCR data of TTL genes across tissues (acrorhagi, mesenteric filaments and tentacles)...... 269 Supplementary Table 19. Illumina raw reads metrics used to generate the draft genome of Actinia tenebrosa...... 270 Supplementary Table 20. Assembly metrics for the draft genome of Actinia tenebrosa...... 271 Supplementary Table 21. Comparative genome metrics across Cnidaria ...... 272 Supplementary Table 22. Repeats breakdown masked in the Actinia tenebrosa genome ...... 273 Supplementary Table 23. Gene set enrichment analysis of gene ontologies from genes unique to Actinia tenebrosa within actiniarians ...... 274

Molecular Insights into the Evolution of Novel Genes xv

List of Abbreviations

Fad Fatty acyl desaturase Elovl Elongation of very long chain fatty acid protein PUFA Polyunsaturated fatty acid LC-PUFA Long-chain polyunsaturated fatty acid TTL Toxin and toxin-like CPP , Porifera, Placozoa HGT Horizontal gene transfer LCA Last common ancestor GSEA Gene Set Enrichment Analysis qPCR Quantitative PCR IMS Imaging mass spectrometry ORF Open reading frame CDS Coding sequence MALDI-TOF Matrix Assisted Laser Desorption/Ionization-Time of flight SCRiP Small Cysteine-Rich Proteins NaTx Sodium channel inhibitory toxin KTx Potassium channel toxin RNA-seq RNA sequencing FAME Fatty acid methyl esters

xvi Molecular Insights into the Evolution of Novel Genes

List of Publications

Surm, J. M., Toledo, T. M., Prentis, P. J., & Pavasovic, A. (2018). Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria. Ecology and Evolution, 8(11), 5323–5335. doi.org/10.1002/ece3.4044 Surm, J. M., Smith, H. L., Madio, B., Undheim, E. A. B., King, G. F., Hamilton, B. R., … Prentis, P. J. (2019). A process of convergent amplification and tissue- specific expression dominate the evolution of toxin and toxin-like genes in sea anemones. Molecular Ecology, 0. doi.org/10.1111/mec.15084 Surm, J. M., Papanicolaou, A., Stewart, Z. K., Prentis, P. J., & Pavasovic, A. (Accepted.). The draft genome of Actinia tenebrosa reveals innovations in Actinioidea. Ecology and Evolution.

The following is a list of publications that are not related to the work performed in this PhD thesis but published during the PhD candidature:

Smith, H. L., Pavasovic, A., Surm, J. M., Phillips, M. J., & Prentis, P. J. (2018). Evidence for a Large Expansion and Subfunctionalization of Globin Genes in Sea Anemones. Genome Biology and Evolution, 10(8), 1892–1901. doi.org/10.1093/gbe/evy128 van der Burg, C. A., Prentis, P. J., Surm, J. M., & Pavasovic, A. (2016). Insights into the innate immunome of actiniarians using a comparative genomic approach. BMC Genomics, 17, 850. doi.org/10.1186/s12864-016-3204-2 Surm, J. M., Prentis, P. J., & Pavasovic, A. (2015). Comparative Analysis and Distribution of Omega-3 lcPUFA Biosynthesis Genes in Marine Molluscs. PLoS ONE, 10(8), e0136301. doi.org/10.1371/journal.pone.0136301

The following is a list of conferences presented at during the PhD candidature:

Molecular Insights into the Evolution of Novel Genes xvii

Surm, J. M., Smith, H. L., Madio, B., Undheim, E. A. B., King, G. F., Hamilton, B. R., van der Burg, C. A., Prentis, P. J., Pavasovic, A. (2018) Convergent amplification and functional specialisation of toxin and toxin-like genes in Cnidarians. Gordon Research Conference on Venom Evolution, Function and Biomedical Applications, 5-10th August, Mount Snow, USA Surm, J.M., Prentis, P.J. and Pavasovic, A. (2016) Expression patterns of cnidarian toxins reveal dynamic gene family evolution and regulation. SMBE Conference, 3-7th July, Gold Coast, Australia Surm, J.M., Harris, J.M., Prentis, P.J. and Pavasovic, A. (2016) Evolution of Novel Genes in Cnidaria. Lorne Genome Conference, 14-17th February, Lorne, Australia

xviii Molecular Insights into the Evolution of Novel Genes Statement of Original Authorship The work contained in this thesis has not been previously submitted to meet requirements foran award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due referenceis made.

Signature: QUT Verified Signature

Date:

'

MolecularInsights into theEvolution of Novel Genes XIX

Acknowledgements

I would like to express my most sincere appreciation to Dr Ana Pavasovic and Dr Peter Prentis. Your expert advice, guidance, and encouragement during this project has been instrumental. I would like to offer my special thanks to Ana, who is largely the reason for my interest in research to begin with. I also cannot thank Pete enough, your jokes alone made my PhD life bearable, although I truly hate to admit this, and I only hope you never read this (sorry Ana). You have both been such influential mentors ever since we first grabbed coffee, and I am so sincerely grateful to have joined your lab. To Associate Professor Jonathan Harris and Professor Louise Hafner, thank you for your continual support and words of encouragement, especially Professor Louise Hafner for filling in as principle supervisor late in my candidature. To my fellow colleagues, thank you so much for making each day brighter. Our shared laughs were as abundant as the coffee drank. In particular, I would like to thank Hayden Smith, Jessica O’Callaghan, Mathilde Klein, Tal Cooper, and Tarik Toledo. I would also like thank Dr Shorash Amin for his patience when first training me. In addition, your strength, courage, and positive attitude during your candidature was truly inspiring. To my family who have been so amazing. I am not quite sure why I have been so lucky to have you, but I am thankful to you all. To my sisters, thank you for being so reliable, and forgiving me for being so not. Both you and your partners have been like a second and third set of parents, your practical advice has never steered me wrong. To my grandparents, thank you for always helping support me, without which I would be a very skinny man indeed. Finally, to my parents, I simply could not have done this without you and I am forever grateful for all of your love and encouragement. To my wonderful partner Chloé van der Burg, thank you for your constant support and companionship. I am a better person because of you. Finally, these last few years have been a truly wonderful experience because of so many people and I am so grateful to everyone in my life that has supported me on this journey.

xx Molecular Insights into the Evolution of Novel Genes

Chapter 1: Introduction

Emergence of new genes provides the essential raw material for evolutionary innovation (Ding, Zhou, & Wang, 2012). As such, novel gene discovery is essential for advancements in therapeutic and commercial applications, such as the development of new anti-microbial peptides to combat the antibiotic resistance epidemic, as well as extending our understanding of fundamental biological processes including gene expression regulation (Almeida Da Silva & Palomino, 2011; Capra, Pollard, & Singh, 2010; Ponce, Martinsen, Vicente, & Hartl, 2012). Genes can originate through multiple mechanisms such as duplication, horizontal gene transfer, exon shuffling, gene fusion/fission, and de novo gene formation (Cai, Zhao, Jiang, & Wang, 2008). Genes generated through these processes may have multiple functional outcomes; some generate duplicate copies of a gene which are identical to the ancestral sequence, while other processes create genes with novel sequences (novel genes). These novel genes have been associated with phenotypic variation including evolution of the adaptive immune system as well as various disease states, such as chronic myeloid leukaemia (Bosch, 2014; Flajnik & Kasahara, 2010; Najmabadi et al., 2011; Rowley, 1973). A number of studies have investigated the origin and evolution of duplicated genes, however, research on the evolution of novel genes remains in its infancy and consequently a deeper understanding of novel gene formation is currently needed (Capra et al., 2010).

1.1 MECHANISMS OF NOVEL GENE FORMATION

1.1.1 Gene duplication Gene duplication is the primary mechanism for the evolution of new genes (Innan & Kondrashov, 2010; Lynch & Conery, 2003; Ohno, 1970, 1972). This can occur through a variety of processes including doubling of the entire nuclear genome complement (whole genome duplication), unequal crossing-over within or among chromatids (segmental or single gene duplication), and rarely the re-integration of reverse-transcribed mRNA molecules (single gene duplication by retroposition; Innan & Kondrashov, 2010). In 1970, Susumu Ohno recognised that gene duplication plays

Chapter 1: Introduction 1

a major role in adaptive evolution, because it is the most common process that generates genes with new or novel function. In this seminal book, Ohno theorized that gene duplication can lead to new gene function responsible for evolutionary innovation, most notably he coined the 2R hypothesis, which suggests that two rounds of full genome duplication preceded the origin of vertebrates. The fate of duplicated genes depends on the function of the gene, the type of mutations that occur within the coding or regulatory sequence as well as the selective pressures they experience (Innan and Kondrashov, 2010; Lynch and Conery, 2003). Most duplicated genes will become pseudogenized through an accumulation of deleterious mutations that lead to a loss of function (Ohno, 1972). Alternatively, the duplicated copies may be retained within the genome leading to increased copy number and having the potential for higher expression (dosage repetition). Dosage repetition has been identified to play a role in various diseases, such as cancer (Stankiewicz & Lupski, 2010). Specifically, in triple-negative breast cancer, high copy number of epidermal growth factor receptor results in an over expression of this receptor and often leading to poor-clinical outcomes for patients (Park et al., 2014). One of rarest outcomes of gene duplication is currently thought to be neofunctionalisation, a process where one of the duplicated copies takes on a new function though accumulation of non-deleterious mutations. For example, a duplication event of an old-world primate opsin gene has led to the evolution of trichromatic colour vision in this lineage compared to dichromatic vision in new-world primates (Nathans, Thomas, & Hogness, 1986). This is one of many examples that illustrates the importance of gene duplication and neofunctionalisation in the diversification of proteins with similar function (de Sousa-Pereira et al., 2014; Moleirinho et al., 2011; Zhang, 2013). The evolution of the globin gene-family provides a classic example of the role that subfunctionalisation has in evolutionary innovation. Subfunctionalisation is an additional evolutionary trajectory following of gene duplication in which each paralog retains a specialised subset of its original ancestral funciton. Specifically, globin proteins (e.g., haemoglobin (Hb), myoglobin (Mb), and cytoglobin (Cygb)) have evolved specialized functions in oxygen transport, metabolism and signalling pathways (Hardies, Edgell, & Hutchison, 1984; Storz, Opazo, & Hoffmann, 2013). These proteins trace their origins to two separate whole- genome duplication events in vertebrates. The retention of the Hb and Mb genes in the

2 Chapter 1: Introduction

ancestor of jawed vertebrates allowed a separation of function between the oxygen- carrier function in Hb and the oxygen-storage function in Mb (Gaudry, Storz, Butts, Campbell, & Hoffmann, 2014; Hoffmann, Opazo, & Storz, 2010). Subsequent gene duplication events in the Hb gene lineage gave rise to the α- and β-globin genes (Storz et al., 2013), which form a heterotetramer (2α:2β). Further rounds of duplication and divergence have resulted in a diverse family of α- and β-like globin genes, allowing functionally distinct Hb heterotetramers to form (Brittain, 2002). Specifically, the diversification of β-globin gene cluster of placental mammals has resulted in their developmental regulation and arranged in their temporal order of expression. These include ε-globin (HBE), γ-globin (HBG), and η-globin, in which HBE and HBG are expressed in embryonic and foetal cells respectively, while η-globin is a pseudogene (Goodman, Koop, Czelusniak, & Weiss, 1984; Hardies et al., 1984; Hardison, 2012; Opazo, Hoffmann, & Storz, 2008). These examples in the β-globin gene cluster highlight that gene duplication can lead to a diverse range of outcomes.

The biosynthesis of long-chain polyunsaturated fatty acids (LC-PUFAs, ≥ C20 molecule) is a conserved Eukaryotic pathway that relies on the enzymes encoded from two gene families (Castro, Tocher, & Monroig, 2016; Cook & McMaster, 2002; Sperling, Zähringer, & Heinz, 1998; Sprecher, 2000), the fatty acyl desaturase (Fad) and elongation of very long-chain fatty acid (Elovl). This pathway converts PUFAs

(e.g., C18 molecule) to LC-PUFAs through a multistep process, which continually inserts double bonds at different positions (desaturates) and provides additional carbons (elongates) to the pre-existing fatty acid carbon chain (Jakobsson, Westerberg, & Jacobsson, 2006; Leonard, Pereira, Sprecher, & Huang, 2004; Sprecher, 2000). The Fad genes encode the desaturase enzymes, and the Elovl genes encode the elongase enzymes, both of which require their respective enzymes to be multi-functional. In vertebrates, the canonical LC-PUFA biosynthesis pathway relies on multiple desaturase enzymes that have either Δ5 or Δ6 activity, as well as two elongase enzymes

(Elovl5 elongate C18 and C20 PUFA and Elovl2 elongate C20 and C22 PUFA; Castro et al., 2012, 2016; Jakobsson et al., 2006). However, multiple iterations and deviations of the canonical pathway have been observed across animal lineages (Hastings et al., 2001; Monroig et al., 2017; Monroig, Li, & Tocher, 2011a; Monroig et al., 2016). Gene duplication events have played a major role in the evolution of Fad and Elovl genes, having significant impact on their distribution and copy number (Carmona- Antoñanzas et al., 2011; Castro et al., 2012; Fonseca-Madrigal et al., 2014; Kabeya et

Chapter 1: Introduction 3

al., 2017; Li et al., 2017, 2010; Mohd-Yusof et al., 2010; Monroig et al., 2012a, 2012b; Morais et al., 2009; Surm et al., 2015). Specifically, neofunctionalisation has been essential in generating multiple Fad gene copies encoding functionally different desaturase enzymes. This occurred before the divergence of jawed vertebrates, resulting in the Fads1 and Fads2 genes with Δ5 and Δ6 activities, respectively (Castro et al., 2012, 2016). Alternatively, the genome duplication observed in vertebrates provided the raw genetic material to allow for the diversification of elongase enzymes, generating Elovl2 and 5, compared to other which have a single elongase enzyme referred to as Elovl2/5 (Castro et al., 2016; Monroig et al., 2016). This diversification of elongase enzymes is suggestive of subfunctionalisation, with the new duplicates taking on the separate role of the ancestral sequence (Hittinger & Carroll, 2007). Furthermore, duplication events restricted to a lineage have played a major role in the distribution and diversification of the Fad gene family in molluscs (Li et al., 2013; Monroig et al., 2012b; Surm et al., 2015). This is evident with functional characterisation of a lineage-specific duplication in Haliotis discus hannai resulting in two desaturase enzymes, both with Δ5 function, highlighting a dosage repetition effect following gene duplication (Li et al., 2013). It is therefore evident that gene duplication and has played a major role in evolution of the biosynthesis of LC-PUFAs, resulting in multiple evolutionary trajectories, including neofunctionalisation, subfunctionalisation and dosage repetition. To date, however, metazoan research investigating the distribution and evolution of these gene families has been restricted largely to , limiting our ability to infer their trajectories which relies on the ancestral complement of these genes to be reconstructed. The rate of gene duplication is considered to be extremely high and similar to the rate of mutation per nucleotide, with the frequency of duplication of genes found to be ∼0.001-0.01 per gene per million years in vertebrates (Kaessmann, 2010; Lynch et al., 2003). High rates of gene duplication are not specific to vertebrates, however, as high rates of gene duplication have also been observed in other such as yeast and flies (0.002 and 0.0012 duplications per gene per million years, respectively; Demuth & Hahn, 2009). The other processes of novel gene generation are thought to be far more infrequent and research on their contribution to genome evolution is scant.

4 Chapter 1: Introduction

1.1.2 Gene fusion and fission

Gene fusion can occur when at least two different genes are combined together to form a new chimeric sequence (Long, Betrán, Thornton, & Wang, 2003). This can occur through the deletion or mutation of the stop codon and the transcription termination signal in the upstream gene (Gaudry et al., 2014), inter-chromosomal translocations or intra-chromosomal rearrangements (Frenkel-Morgenstern et al., 2012). Gene fusion events have the potential to result in novel genes that can have unique phenotypic consequences, as seen in diseases such as cancer (Gaudry et al., 2014; Mitelman, Johansson, & Mertens, 2007). Studies have shown that gene fusion occurs in all malignancies, encoding chimeric proteins that account for 20 % of human cancer morbidity and specifically play important roles in the initial stages of tumorigenesis (Mitelman et al., 2007). A well-known example of gene fusion and its implications in cancer was first identified in 1973 (Rowley, 1973), where a chromosomal translocation was identified to play an important role in chronic myeloid leukaemia (CML). This chromosomal translocation results in the fusion gene BCR- ABL1 which encodes a constitutively active tyrosine kinase implicated in the pathogenesis of CML (Ben-Neriah, Daley, Mes-Masson, Witte, & Baltimore, 1986). Recently over 10 000 gene fusions have been associated with cancer including the deletion of ~500 kb in chromosome band 10q25 which fuses VTI1A and TCF7L2 found in colorectal carcinomas (Mertens, Johansson, Fioretos, & Mitelman, 2015). Conversely, a single gene can split into two separate genes, a process called gene fission. In a survey of young genes in Drosophila mauritiana, a gene family with unknown function, called monkey king, originated through gene fission (Wang, Yu, & Long, 2004). Following a gene duplication event, this gene family evolved into fission genes that separately encode protein domains from a multidomain ancestor. The ancestral gene is hypothesised to have contained two domains which include a zinc finger domain and a poly(A)-binding domain. Another example, while not relating to genomic DNA, involves the fission of a mitochondrial chromosome into smaller mini circles (fragmented coding components) as seen in the human body louse Pediculus humanus (Cameron, Yoshizawa, Mizukoshi, Whiting, & Johnson, 2011). Despite these few examples, the underlying mechanism by which fission occurs remain unclear, although it is likely to play an important role in driving the evolution of new gene function (Kaessmann, 2010; Long et al., 2003).

Chapter 1: Introduction 5

1.1.3 Exon shuffling

Exon or domain shuffling involves the recombination, exclusion or duplication of exons (domains) within a gene and leading to the evolution of novel gene architecture (Long et al., 2003). The molecular mechanisms that cause exon shuffling are chromosomal rearrangements, retrotransposition and transcription slippage (Ding et al., 2012). A study by Kawashima et al. (2009) identified that exon shuffling events have occurred during the evolution of , resulting in ∼1000 new domain pairs in the vertebrate clade, including ∼100 that were shared by seven different vertebrate species examined. The study further identified the importance of exon shuffling in the evolution of novel proteins such as aggrecan, the most abundant non- collagenous protein in cartilage. Exon shuffling in aggrecan has resulted in binding of hyaluronic acid which provides tensile strength and allows cartilage to absorb shock and resist compression in joints (Patthy, 2003). Another example of domain shuffling is the evolution of the shematrin gene family, responsible for formation of shell matrix in pearl oysters. In this instance, the shuffling of repetitive, low complexity domains and motifs was likely the result of mispairing during replication and highlights the association of shuffling events and morphological changes (McDougall, Aguilera, & Degnan, 2013).

1.1.4 Horizontal gene transfer

In prokaryotes, it has long been established that genes are often transferred between organisms (Polz, Alm, & Hanage, 2013). This transfer of genes from a non- ancestral organism is called horizontal gene transfer (HGT) or lateral gene transfer (Ding et al., 2012). Recent estimates have suggested that around 81 % of prokaryotic genes have been involved in HGT, highlighting the important role HGT plays in prokaryote evolution (Dagan, Artzy-Randrup, & Martin, 2008). A study by Crisp et al. (2015) comparing currently available, sequenced metazoan genomes identified that HGT is not only prokaryote specific, but is also present in complex eukaryotes at levels much greater than previously thought. It appears that HGT has also contributed to the evolution of many organisms with the potential for ongoing linage-specific retention resulting in adaptations which drive evolutionary innovation (Crisp, Boschetti, Perry,

6 Chapter 1: Introduction

Tunnacliffe, & Micklem, 2015). This study by Crisp et al. (2015) highlighted that tens to hundreds of foreign genes are present in all the that were surveyed, including humans. One of the better-known examples of HGT was identified between Wolbachia pipientis, a parasitic intracellular endosymbiont, and its hosts, which include a wide range of and filarial (Stouthamer et al., 1999). In fact, an entire Wolbachia genome (∼1.4 Mb) was transferred to the Drosophila ananassae nuclear genome, of which at least 28 genes are transcribed in the host species (Hotopp et al., 2007). In addition, the recent draft genome of a (Hypsibius dujardini) identified approximately one-sixth of its genes have been acquired through HGT, a proportion nearly double that of any other metazoan (Boothby et al., 2015). The study further proposed that organisms that survive in extremely stressful environments might be more susceptible to acquisition of foreign genes. Interestingly, however, an alternative genome assembly for the same tardigrade species, found that a smaller percentage of genes were acquired through HGT (Koutsovoulos et al., 2016). The authors suggest that the tardigrade genome published by Boothby et al. (2015) was in fact contaminated with . Although the contribution of HGT to gene evolution is likely to be significant, it should be viewed with caution as contamination can lead to spurious identification of genes acquired through HGT. Alternatively, a relatively new hypothesis suggests that extensive gene loss may explain the patchy distribution of genes previously thought to be acquired through HGT (Inoue, Sato, Sinclair, Tsukamoto, & Nishida, 2015). Therefore, investigation into lesser studied, early divergent species is crucial in contributing to our understanding of HGT as a driver of evolutionary innovation.

1.1.5 de novo gene formation

Our understanding of de novo genes has emerged only recently with the development of modern sequencing technology and advanced bioinformatics. De novo genes are now understood to originate from ancestrally non-coding DNA sequences, an idea that was previously dismissed as being highly unlikely (Schlotterer 2015; Jacob 1977). We now recognise that de novo genes are commonly found in genomes of most species and share a set of features. For example, de novo genes frequently have short open reading frames (Neme & Tautz, 2014; Palmieri, Kosiol, & Schlötterer, 2014), are

Chapter 1: Introduction 7

commonly associated with repetitive DNA (Palmieri et al., 2014; Toll-Riera et al., 2009), have fewer introns compared to old genes (Carvunis et al., 2012; Toll-Riera et al., 2009), are expressed at lower levels (Zhao, Saelao, Jones, & Begun, 2014), are often expressed in a tissue specific manner (Zhao et al., 2014) and show codon usage bias (Neme et al., 2014; Palmieri et al., 2014). Another key feature is the observed bias in chromosomal localisation. While significant, this is largely limited to studies in Drosophila, where newly formed de novo genes are underrepresented on the X chromosome (reviewed in Schlotterer 2015). Recent research has also highlighted that newly formed de novo genes have a high turnover rate, where they form frequently but also have high extinction rates (Palmieri et al., 2014; Zhao et al., 2014). Deriving the function of de novo formed genes is difficult, but gene expression studies have been able to provide some important insights. For example, putative de novo genes demonstrated higher gene expression in response to abiotic and biotic stressors in Arabidopsis thaliana than young genes with a different evolutionary origin (Donoghue, Keshavaiah, Swamidatta, & Spillane, 2011). Similarly, during biotic and abiotic stress in Daphnia magna, greater than half of the genes differentially expressed were putative de novo genes (Colbourne et al., 2011). Additionally, in Saccharomyces

+ cerevisiae it was identified that de novo genes, ORFs0 and ORFs1–4, experienced differential expression during starvation and were preferentially located close to four transcription factors related to stress and mating (Carvunis et al., 2012). The function of de novo genes has also been shown to serve an important role during developmental processes. Liu et al. (2014) reported differential regulation of de novo genes during the development of D. melanogaster embryos and identified that they are over abundant during certain stages of development. The developmental stages enriched with de novo genes were in the embryo, larvae, pupae and four days post eclosion (Liu, Li, Irwin, Zhang, & Wu, 2014). While novel ORFs are generated via de novo gene formation, these are generally lost rapidly (Schmitz, Ullrich, & Bornberg-Bauer, 2018). While this preliminary research has contributed significantly to our understanding of de novo genes, further functional characterisation is required across a broader range of species.

8 Chapter 1: Introduction

1.2 FREQUENCY, DISTRIBUTION AND OUTCOMES OF NOVEL GENES

1.2.1 Chimeric genes

Chimeric genes often encode for proteins with new functions due to significant domain architecture rearrangement. A systematic examination of new genes in D. melanogaster revealed a high frequency (30 %) of chimeric genes (Ding et al., 2012). A number of different mechanisms underlie the formation of chimeric genes such as exon shuffling and retrotransposition (Kaessmann, 2010). A study by Yang et al. (2008) identified 17 chimeric genes formed through chromosomal rearrangements in D. melanogaster, highlighting that multiple mechanisms can result in chimeric gene formation. In addition, (Rogers & Hartl, 2012) identified 14 chimeric proteins in D. melanogaster, with many showing temporal and spatial expression patterns. Chimeric genes have also been shown to be lineage-specific, as seen in Arabidopsis thaliana where 54 chimeric genes were identified with no sequence similarity to genes in other species (Donoghue et al., 2011). Cyclotide/albumin-1 gene family found in the legume family Fabaceae, are an example of genes that encode chimeric proteins. These proteins are cyclic cystine-knot peptides, which play an important role in pathogen defence (Gilding et al., 2015). Another example of a lineage-specific chimeric gene is jingwei in Drosophila yakuba. This gene emerged from an insertion of a duplicated alcohol dehydrogenase gene into another duplicated gene called yande and co-opted three exons of yande to form the new chimeric gene jingwei (Wang, Zhang, Alvarez, Llopart, & Long, 2000). Overall, chimeric genes have been shown to play important roles in evolutionary innovation, many of which provide lineage-specific novelties, but much more research is required to determine the frequency and distribution of chimeric genes outside of these few model taxa.

1.2.2 Orphan genes

Genes that lack homology to genes in other taxonomic groups, or that have a limited phylogenetic distribution are known as orphan genes or lineage-specific genes (Wissler, Gadau, Simola, Helmkampf, & Bornberg-Bauer, 2013). Although orphan genes may represent a considerable portion of the genome in some organisms, their evolutionary origin is poorly understood. It is currently hypothesised that they largely

Chapter 1: Introduction 9

arise from de novo gene formation, duplication and rearrangement processes followed by fast divergence, as well as exaptation from transposable elements (Neme et al., 2014; Tautz & Domazet-Lošo, 2011). Overall, these processes appear to provide a continuous source of raw material for the evolution of new gene functions, which can become essential for lineage-specific innovations (Palmieri et al., 2014). The following paragraphs will highlight some examples that provide evidence for the role that orphan genes play in evolutionary innovation. The regenerative ability of salamanders is an example of a lineage-specific adaptation that has allowed this tetrapod clade to regenerate their limbs following injury (Fröbisch & Shubin, 2011). The capacity for limb regeneration in salamanders has in part been attributed to a lineage-specific gene, Prod1, which is a member of the Three Finger Protein family (Brockes and Gates, 2014; da Silva et al., 2002). An additional salamander novelty includes their limb development differing from other tetrapods in which the formation and ossification of the zeugopodial and autopodial elements are anteriorly dominated (preaxial dominance; Kumar, Gates, Czarkwiani, & Brockes, 2015). Prod1 plays an additional role in the development of preaxial dominance in larval salamanders, where it is expressed during early outgrowth of the limb bud (Kumar et al., 2015). These findings provide insights into the role that orphan genes play in lineage-specific novelties relating to limb development and regeneration. It has been suggested that orphan genes also play a role in biotic and abiotic interactions such as defence, stress response, and signalling (Arendsee, Li, & Wurtele, 2014). Many nematodes exhibit phenotypic plasticity and undergo an alternative developmental stage (dauer larvae) when exposed to stressful conditions (Sommer & Ogawa, 2011). An orphan gene called dauerless was recognised to inhibit dauer development in Pristionchus pacificus (Mayer, Rödelsperger, Witte, Riebesell, & Sommer, 2015). The inhibition of dauer development only occurs in strains containing multiple dauerless copies highlighting a role for copy number variation of orphan genes in reduced dauer formation and response to abiotic stress. An orphan gene identified in the sedentary endoparasitic , Meloidogyne incognita, has been identified to play a potential role in the intimate relationship between this nematode and its host plants. The orphan gene, MAP-1 (Meloidogyne, a virulence protein-1), is a secreted protein that is thought to be involved with both loosening of the cell wall and mitotic parthenogenesis (Tomalova, Iachia, Mulet, & Castagnone- Sereno, 2012). Aphids are another taxonomic group that have a close relationship with

10 Chapter 1: Introduction

plants. Studies investigating aphid salivary proteins identified almost half of these proteins to be aphid-specific (Elzinga & Jander, 2013). These aphid-specific proteins are thought to have evolved as effectors to inhibit plant defence and promote their unique phloem feeding style. These examples identify the role orphan genes have in biotic and abiotic interactions. In addition to biotic and abiotic interactions, orphan genes can also play a role in lineage-specific morphological structures. Aphids have evolved novel cells, called bacteriocytes that uniquely differentiate to specifically harbour beneficial endosymbiotic bacteria (Shigenobu & Stern, 2013). The interrogation of bacteriocytes revealed the presence of a class of orphan genes that encode small proteins with signal peptides to be over-represented, suggesting that these small secreted peptides play an important role in encouraging the retention of beneficial bacteria. Comparative genomics has revealed that the completed draft genome of Octopus bimaculoides contains hundreds of and octopus-specific genes (Albertin et al., 2015). These novel genes exhibited tissue-specific expression patterns in octopus-specific morphological structures, such as their chromatophore-laden skin, suckers on tentacles and highly developed nervous system. Novel genes result not only in lineage-specific structures, but sometimes in convergent molecular evolution linked to phenotypic convergence (Foote et al., 2015). This is illustrated by octopus-specific genes that may play important roles in contributing to the evolution of neural complexity, which has independently evolved in Cephalopoda and other bilatarian species (Albertin et al., 2015). Overall, these examples demonstrate that the tissue specific expression of lineage-specific genes is important in the generation of morphological novelty. In addition to the generation of morphological novelty, orphan genes can influence specific developmental differences. Developmental differences in tentacle formation correlate with the presence and expression of a novel gene encoding a small secreted protein in Hydra vulgaris and H. oligactis (Khalturin et al., 2008). Transforming H. vulgaris with the orphan gene induced changes in tentacle morphology that mirror the phenotypic differences observed between species. These results suggest that orphan genes may be involved in the generation of lineage-specific developmental differences. Studies interrogating sequenced genomes have suggested that the proportion of orphan genes in some species is as high 10-20 % (Khalturin, Hemmrich, Fraune,

Chapter 1: Introduction 11

Augustin, & Bosch, 2009). This is in contrast to a number of studies restricted to mammals and ecdysozoan species, which have revealed that orphan genes account for approximately 5 % and 7 % of the total gene set for primates and Caenorhabditis elegans, respectively (Khalturin et al., 2009). Sequencing of the starlet sea anemone (Nematostella vectensis) genome, however, revealed that approximately 15 % of its total gene set are orphan genes (Putnam et al., 2007). Overall, this demonstrates that the number of novel genes varies across taxonomic groups and that cnidarian species may make a good model to examine the distribution of orphan genes due to the high frequency of orphans in this taxonomic group. Current whole genome sequencing and advanced bionformatic analysis of species sampled across the animal phylogeny is providing important insights into the key features underlying animal genome evolution (Dunn & Ryan, 2015). In particular, we now better understand the overlap of ancestral gene sets in cnidarians and bilaterians, which has changed our view of the evolution of metazoan genomes, gene family origins and the evolution of novel genes (Technau & Schwaiger, 2015). Sequencing of multiple cnidarian genomes has also revealed many similarities with vertebrate genomes including an overlap in a substantial proportion of their gene repertoire as well as highly conserved genome synteny (Putnam et al., 2007). In contrast, other bilaterian model species used in genetic research, such as D. melanogaster and C. elegans, have undergone large gene loss events and have less genome similarity to vertebrates than cnidarians (Technau et al., 2005). Epigenetic modifications of the genome are also absent in some model ecdysozoan species but are shared among cnidarian and vertebrate species (Technau et al., 2015). In addition to these genomic similarities, cnidarian genomes contain a significant proportion of novel genes and sequences when compared to vertebrate genomes (Chapman et al., 2010). Consequently, cnidarians present an excellent model to study novel gene evolution by investigating the frequency and distribution of orphan genes within this unique taxonomic group.

1.3 CNIDARIANS

Phylum Cnidaria (corals, sea pens, sea anemones, jellyfish, myxozoans and hydroids) is a diverse group of animals consisting of approximately 13,500 species (Chang et al., 2015; Lom & Dyková, 2013; Technau et al., 2015). The majority of

12 Chapter 1: Introduction

cnidarian species live in salt water, with exception of freshwater hydroid species (Jankowski, Collins, & Campbell, 2007). Although cnidarians lack complex morphology (no organ systems) they require tissue differentiation and cellular structures for processes such as food capture and aggressive encounters with other cnidarians (Jouiaei et al., 2015a). Phylum Cnidaria is sister to superphylum Bilateria, with an estimated divergence time of 600-700 million years ago (Figure 1.1A), with the modern cnidarian classes having evolved by the Cambrian (Cartwright et al., 2007). This is a strategic phylogenetic position, providing insights into bilaterian evolution such as understanding the evolution of body axes and the role of Hox genes. The cnidarian oral–aboral axis is the main body axis that extends from tentacles and oral mouth-bearing end, to the foot or physa closed aboral end (Technau & Genikhovich, 2018). Anthozoans, such as sea anemones and corals, also have an additional body axis that is orthogonal to the oral–aboral body axis, called the directive axis (Figure 1.1B). A study by He et al. (2018) used short hairpin RNA (shRNA)– mediated knockdown and CRISPR-Cas9 mutagenesis to demonstrate that a Hox-Gbx network, Anthox1a, Anthox6a, Anthox8, and Gastrulation brain homeobox (Gbx; a Hox-linked subfamily gene), controls radial segmentation of the endoderm during development in N. vectensis. During this development, the endoderm undergoes segmentation into eight sectors along the directive axis, generating internal anatomical subdivisions that further correlate with the positioning of mesenteries and the patterning of tentacle primordia. Anthox1a, Anthox8, Anthox6a, and Gbx exhibited sharp expression territories, defining segment boundaries, similar to that observed in bilaterians such as mice and flies. While the orally expressed cnidarian Anthox6 belongs to the anterior group of Hox genes in bilaterians, and the aborally expressed Anthox1 was interpreted as a derived posterior Hox gene, it has been proposed that the cnidarian oral–aboral axis corresponds to the bilaterian anterior–posterior axis (Finnerty, Pang, Burton, Paulson, & Martindale, 2004). This data taken together with the shared genomic similarities of cnidarians and vertebrates, allows for improved understanding of evolution spatiotemporal gene expression and gene regulation in metazoans.

Chapter 1: Introduction 13

A B Oral

Aboral

Figure 1.1. Phylogeny and physiology of a cnidarian. A) Phylogeny of Metazoa. B) Anthozoan cnidarian Actinia tenebrosa, with oral-aboral axis highlighted. Photograph credit: Jonathon Muller

Members of phylum Cnidaria are characterised by the presence of specialised cells called cnidae (Jouiaei et al., 2015a). This novel cell type contains complex structural elements that are capable of explosive discharge through triggering of the cnidocil (Fautin, 2009; Özbek, 2010). Once discharged they deliver a complex cocktail of toxins used in prey capture, defence as well as in deterring and repelling predators and competitors (Orts et al., 2013). The cnidae are secreted from the Golgi apparatus and undergo further structural modifications in the extracellular matrix from where they migrate to various locations in the cnidarian body (Fautin, 2009; Özbek, 2010). Three major types of cnidae are currently described and they include nematocysts, the spirocyst and the ptychocysts (Jouiaei et al., 2015a). Nematocysts are the most well studied class of cnidae, which upon firing, penetrate and inject venom into the target organism (Tardent, 1995). They are found in all species of cnidarians, varying extensively in morphology and function (David et al., 2008). Nematocysts are the primary weapon for capturing prey, repelling predators, and intra- and interspecies spatial competition (Shick, 1991). Molecular research has revealed a high frequency of novel genes within these phylum-specific organelles. For example, in H. magnipapillata, 41 of 50 proteins expressed in the nematocyst were found to be cnidarian-specific (Khalturin et al., 2009). These cnidarian-specific genes

14 Chapter 1: Introduction

include a gene family called minicollagens that make up the matrix of the nematocyst wall (David et al., 2008). In the nematocyst, minicollagens have been identified to co- localise with cnidoin, a novel elastic protein involved in kinetic energy storage and release during the rapid firing of the nematocyst (Beckmann et al., 2015). Although cnidoin encodes for a novel protein, it appears to have sequence similarity to spider silk proteins highlighting potential evidence of molecular convergent evolution. Analysis of the distribution of novel genes, identified in nematocysts, has revealed that a high proportion of novel proteins are toxins (proteinaceous component of venom). Interestingly, when the proteome of the nematocyst was investigated across three cnidarian lineages (, a Scyphozoa, and Hydrozoa) it was revealed that only six of the hundreds of proteins were shared among the three lineages (Rachamim et al., 2015). This was specifically the case regarding the toxin and toxin-like proteins with the two medusozoan (Scyphozoa, and Hydrozoa) species having a greater number of similar proteins and comprised mainly of enzymes, whereas the anthozoan was markedly unique with most proteins associated with neurotoxins. Cnidarians have a broad spectrum of toxin and toxin-like gene families which have been characterised to include enzymes, pore forming toxins, neurotoxins and small cysteine-rich peptides, examples of which will be reviewed below. Enzyme toxins are found across all cnidarian taxa with phospholipase and metalloproteases among the most common. In fact, phospholipase A2 has a ubiquitous distribution across venomous animal lineages such as reptiles, insects, arachnids, molluscs and cnidarians (Jouiaei et al., 2015a). Phospholipase A2 is an enzyme that hydrolyses the acyl bond of glycerophospholipids to produce fatty acids including arachidonic acid and lysophospholipid, having toxic functions that include defence, immobilization and digestion (Nevalainen et al., 2004). Within cnidarians, phospholipase A2 activity has been detected in the multiple tissues, including tentacles and mesenteric filaments (Nevalainen et al., 2004; Talvinen & Nevalainen, 2002). In

Exaiptasia pallida, a phospholipase A2 fraction recovered from nematocysts in mesenteric filaments showed hemolytic activity against red blood cells in rats (Hessinger & Lenhoff, 1976). Peptidases such as metalloproteases have been well characterised as venom components in arachnid and reptilian species, where they typically degrade the extracellular matrix inducing haemorrhage and necrosis by preventing blood clot formation (Fox & Serrano, 2005; Fry et al., 2009; Undheim et al., 2014a). A peptidase from the astacin family was found to be expressed in both

Chapter 1: Introduction 15

gland cells and stinging cells in N. vectensis (Moran et al., 2012a). Pore forming toxins are a different toxin gene family with a ubiquitous distribution across cnidarians. These toxins penetrate the cell membrane driving in small molecules and solutes out of the cell, leading to osmotic imbalance and eventually cell lysis (Parker & Feil, 2005). These toxins can be grouped into two categories relating to the protein secondary structure that is used to penetrate membrane. The α pore forming toxins are rich in α helices structures, and β pore forming toxins that are rich in β-sheets and that results in the formation of β-barrel pores (Frazão, Vasconcelos, & Antunes, 2012; Jouiaei et al., 2015b). Neurotoxins (voltage-gated ion channel toxins) are typically small peptides with a wide distribution in cnidarians. These toxins are commonly found in actiniarians, a largely sessile group of cnidarians, that rely on neurotoxins to immobilize prey and defend against potential predators (Frazão et al., 2012; Jouiaei et al., 2015a, 2015b). A general mode of action for these peptides is through the disruption of action potential caused by inhibition of sodium or potassium ion channels (Catterall et al., 2007; Smith & Blumenthal, 2007). Disruption to ion channels then causes the cell to become hyperactive and release neurotransmitters at synapses causing violent convulsion followed by paralysis (Orts et al., 2013). A new class of neurotoxins has been isolated from Acropora millepora and characterised as a small cysteine-rich peptide (SCRiP), which when injected into zebrafish larvae resulted in severe paralysis (Jouiaei et al., 2015a). Toxin peptides encoded by lineage-specific genes seem to be common in cnidarians (Arendsee et al., 2014). Specifically, out of 20 toxins isolated from N. vectensis, six are species-specific while eight appear to be cnidarian-specific (Moran et al., 2012b). In the scyphozoan jellyfish species, Aurelia aurita, a 40 amino acid toxin-like peptide (aurelin) has no known homologs outside of this species (Shenkarev et al., 2012). Interestingly while aurelin is characterized as a toxin-like cysteine rich secretory protein (CRISP) it is structurally similar to many animal toxins, but like many other CRISPs it is restricted to a single species or a lineage (Sperstad et al., 2011). Toxin and toxin-like genes encode small peptides (commonly < 80 amino acids) with strong cell specific expression patterns and show evidence of high turnover rates among lineages (Edger et al., 2015). This highlights that toxin genes have three of the hallmarks of de novo formed novel genes and consequently it is plausible to

16 Chapter 1: Introduction

hypothesise that cnidarians possess a range of novel and previously uncharacterised toxin and toxin-like genes, many of which are likely to be lineage-specific. Despite this, a comprehensive analysis of the entire complement of toxin and toxin-like genes in cnidarians is currently lacking. This knowledge, however, is crucial to gain a better understanding of how novel genes evolve and their frequency and distribution across lineages.

1.4 ACTINIA TENEBROSA AS A CANDIDATE TO STUDY THE EVOLUTION OF NOVEL GENES

Actinia tenebrosa is a cnidarian species from order Actiniaria within class Anthozoa (Figure 1.1B). This sea anemone species is native to Australia, New Zealand, the Kermadec Islands as well as the sub Antarctic islands of Australia and New Zealand (Ottaway, 1979). It is a dominant species in the mid to lower intertidal zones across its distribution and is extremely abundant on rocky outcrops and boulder beaches (Ayre, 1982). This species is highly similar to the northern hemisphere species, Actinia equina, in terms of morphology and the limited genetic data currently available (Farquhar, 1898). The morphology and anatomy of A. tenebrosa is consistent with other anthozoans, having well characterised features such as internal mesenteric filaments used to digest prey (Shick, 1991) as well as six rings of tentacles used to capture prey. In addition to these structures, A. tenebrosa possesses a set of highly modified tentacles (acrorhagi). These specialised structures are used during agonistic encounters over ownership of space on intertidal substrates (Minagawa, Sugiyama, Ishida, Nagashima, & Shiomi, 2008). Tentacles, mesenteric filaments and acrorhagi all display a high density of nematocysts in their tissue (Ottaway, 1978). Two small peptide toxins (Acrorhagin 1 and Acrorhagin 2) have been isolated from the acrorhagi of a closely related species, A. equina. Both of these toxins are completely novel sharing limited sequence similarity with each other as well as with proteins from other species (Honma et al., 2005). Previous studies to date have identified that a significant proportion of novel genes are associated to lineage-specific morphological traits, however, such studies have yet to investigate the role novel genes in acrorhagi. It has previously been established that lineage-specific genes can play an important role during the conserved process of development. However, limited

Chapter 1: Introduction 17

research has evaluated whether the expression of lineage-specific genes change over an ontogenetic time course. Actinia tenebrosa is a viviparous species which has the capacity to reproduce sexually, but it uses asexual reproduction as its dominant mode of producing offspring (Black & Johnson, 1979). This preference for asexual reproduction and the sessile nature of A. tenebrosa, results in parent animals that are frequently surrounded by a colony of individual clones of varying sizes, both in its natural habitat and laboratory settings (Ayre, 1982; Ottaway & Kirby, 1975; Sherman, Peucker, & Ayre, 2007). These differing size classes represent a range of ontogenetic stages in A. tenebrosa and provide an excellent study system in which to examine whether novel genes are differentially expressed across an ontogenetic time course.

1.5 AIMS OF THE PROJECT

This project aims to utilise functional genomic and bioinformatic approaches to understand the fundamental question about the origin and evolution of novel genes. Specifically, in this study we will investigate the origin and evolution of the diverse and rapidly evolving toxin and toxin-like genes and gene families across multiple cnidarian taxa. For this objective toxin genes will be used as a case study for novel genes due to the wide distribution and abundance of novel toxins among cnidarians. In addition, a broader spectrum of novel genes will be investigated for potential association with novel morphological structures, ontogenetic development and phenotypic variation. Currently very few studies outside of Drosophila have conducted a systematic review of the frequency and distribution of novel genes. The project presented here will address this knowledge gap by systematically analysing the distribution and expression of novel genes in phylum Cnidaria, using multiple lines of evidence. Ultimately, this data will be used to form an overview on the frequency and distribution of novel genes with the broader aim of improving our understanding of the molecular underpinnings that drive the evolution of novel genes across other lineages. Outcomes of this research will have far reaching implications that span across multiple fields in biomedical research by contributing to our understanding of gene evolution.

18 Chapter 1: Introduction

1.6 RESEARCH PLAN

This project will consist of three discrete but interrelated aims to elucidate the origin and evolution of novel genes and their potential association with novel morphological structures. Each aim will translate into a single study and publication. Together, these three studies will focus on the molecular evolution, spatiotemporal gene expression, and the contribution to the total gene complement of lineage-specific genes and genes that have undergone lineage-specific duplications. The details of each aim and associated justification are given below:

1.6.1 Aim 1/Study 1: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria In this study I have selected to focus on the fatty acyl desaturase (Fad) and elongation of very long-chain fatty acid (Elovl) gene families which encode enzymes that are essential in the biosynthesis of long-chain polyunsaturated fatty acids (LC-

PUFAs, ≥ C20). Both gene families are found across the tree of life and have undergone repeated rounds of gene duplication, some of which have been lineage-specific. These gene families were chosen as a model to investigate gene duplication in cnidarians, and their biological impact by concordantly investigating the fatty acid profile of A. tenebrosa. Furthermore, metazoan research investigating the distribution and evolution of these gene families has been restricted largely to Bilateria. Given our understanding of the distribution and evolution of these gene families in cnidarians in scant, this highlights an essential knowledge gap needed to be explored. This aim will also investigate whether lineage-specific duplications evolve under pronounced signatures of positive selection.

1.6.2 Aim 2/Study 2: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones In this study I have selected to focus on toxin and toxin-like (TTL) genes due to key characteristics that make them a suitable model for the study of novel genes. These include the ubiquitous distribution of toxins among cnidarians, a high level of diversity including many novel toxins in specific cnidarian lineages as well as a varied rate of evolution among different classes of toxin genes. Acrorhagi are a specialised attack

Chapter 1: Introduction 19

structure for intraspecific combat found in one family of sea anemones. This structure represents an excellent opportunity to investigate the association between lineage- specific genes in lineage-specific structures. Two independent RNA-seq experiments will be performed to investigate the roles of TTL genes in morphological novelty and ontogenetic development. This aim will investigate whether newly evolved genes experience divergent selection pressures compared older, more widely distributed genes. I will also investigate whether novels genes have spatiotemporal expression patterns in newly evolved structures,

1.6.3 Aim3/Study 3: The draft genome of Actinia tenebrosa reveals innovations in Actinioidea A reference genome will be sequenced, assembled and annotated for Actinia tenebrosa in the third study of this project. This will be the first Actinioidea draft genome that will provide much information on the genomic context of novel genes and help determine the origin of these newly evolved genes. Comparative genomic analysis will be performed using other cnidarian draft genomes to provide insights in to the evolution of genes and gene families among Cnidaria. This aim will investigate whether an Actinioidea genome has a high proportion of novel genes, many of which may be related to venom and its delivery.

20 Chapter 1: Introduction

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Joachim M. Surm1, 2*, Tarik M. Toledo1, 2, Peter J. Prentis3, 4 and Ana

Pavasovic1

1School of Biomedical Sciences, Faculty of Health, Queensland University of

Technology

2Institute of Health and Biomedical Innovation, Queensland University of

Technology

3School of Earth, Environmental and Biological Sciences, Science and Engineering

Faculty, Queensland University of Technology

4Institute for Future Environments, Queensland University of Technology

*Correspondence: [email protected];

Surm, J. M., Toledo, T. M., Prentis, P. J., & Pavasovic, A. (2018). Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria. Ecology and Evolution, 8(11), 5323–5335. doi.org/10.1002/ece3.4044

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 21

Statement of Contribution of Co-Authors for Thesis by Published Paper The following is the format for the required declaration provided at the start of any thesis chapter which includes a co-authored publication. The authors listed below have certified that: 1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT ePrints database consistent with any limitations set by publisher requirements.

In the case of this chapter: Publication title and date of publication or status: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Contributor Statement of contribution Joachim M. Surm Wrote and edited the manuscript, conceived and designed project, performed phylogenetic and selection analysis, read and approved the final manuscript. Tarik M. Toledo Wrote and edited the manuscript, performed fatty acid analysis, read and approved the final manuscript. Peter J. Prentis Wrote and edited the manuscript, conceived and designed project, read and approved the final manuscript. Ana Pavasovic Wrote and edited the manuscript, conceived and designed project, read and approved the final manuscript.

22 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Principal Supervisor Confirmation I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date Professor Louise Hafner

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 23

ABSTRACT

The biosynthesis of long-chain polyunsaturated fatty acids (LC-PUFAs, ≥ C20) is reliant on the action of desaturase and elongase enzymes, which are encoded by the fatty acyl desaturase (Fad) and elongation of very long-chain fatty acid (Elovl) gene families, respectively. In Metazoa, research investigating the distribution and evolution of these gene families has been restricted largely to Bilateria. Here, we provide insights into the phylogenetic and molecular evolutionary histories of the Fad and Elovl gene families in Cnidaria, the sister phylum to Bilateria. Four model cnidarian genomes and six actiniarian transcriptomes were interrogated. Analysis of the fatty acid composition of a candidate cnidarian species, Actinia tenebrosa, was performed to determine the baseline profile of this species. Phylogenetic analysis revealed lineage-specific gene duplication in actiniarians for both the Fad and Elovl gene families. Two distinct cnidarian Fad clades clustered with functionally characterized Δ5 and Δ6 proteins from fungal and plant species, respectively. Alternatively, only a single cnidarian Elovl clade clustered with functionally characterized Elovl proteins (Elovl4), while two additional clades were identified, one actiniarian-specific (Novel ElovlA) and the another cnidarian-specific (Novel ElovlB). In actiniarians, selection analyses revealed pervasive purifying selection acting on both gene families. However, codons in the Elovl gene family show patterns of nucleotide variation consistent with the action of episodic diversifying selection following gene duplication events. Significantly, these codons may encode amino acid residues that are functionally important for Elovl proteins to target and elongate different precursor fatty acids. In A. tenebrosa, the fatty acid analysis revealed an absence of LC-PUFAs > C20 molecules and implies that the Elovl enzymes are not actively contributing to the elongation of these LC-PUFAs. Overall, this study has revealed that actiniarians possess Fad and Elovl genes required for the biosynthesis of some LC-PUFAs, and that these genes appear to be distinct from bilaterians.

24 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

2.1 INTRODUCTION

The long-chain polyunsaturated fatty acid (LC-PUFAs; e.g., ≥ C20 molecule) biosynthetic pathway converts PUFAs (e.g., C18 molecule) to LC-PUFAs. Omega-3 and omega-6 LC-PUFAs, such as eicosapentaenoic acid (EPA; 20:5n-3) and arachidonic acid (ARA; 20:4n-6), are converted from PUFA precursors α-linolenic acid (ALA; 18:3n-3) and linoleic acid (LA; 18:2n-6), respectively. This pathway relies on the action of desaturase and elongase enzymes (Sprecher, 2000). The fatty acyl desaturase (Fad) gene family encodes desaturase enzymes which insert double bonds at different positions of PUFAs. The coordination of multiple functionally different desaturase enzymes is often required to desaturate PUFAs and LC-PUFAs. Desaturase enzymes are required to have a combination of Δ5 and/or Δ6 activity, however, alternative pathways also exist which utilise desaturase enzymes with Δ8 activity (Cook et al., 2002; Monroig et al., 2011a; Sprecher, 2000). Genes that encode elongase enzymes are from the elongation of very long-chain fatty acid (Elovl) gene family. In mammals, seven members of the Elovl gene family have been identified, with different genes encoding elongase enzymes that have altered affinity to elongate precursor fatty acids. Specifically, elongase enzymes encoded by Elovl1, 3, 6, and 7 are involved in the elongation of saturated fatty acids (SFAs) and monounsaturated fatty acids (MUFAs), whereas Elovl2, 4, and 5 encode enzymes involved in the elongation of PUFAs to LC-PUFAs (Jakobsson et al., 2006; Leonard et al., 2004; Tamura et al., 2009). Despite this research, the distribution and evolution of genes that encode enzymes responsible for the desaturation and elongation of PUFAs remain largely unresolved in many metazoan taxa. Whole genome and single gene duplication events have played a major role in the distribution and copy number of Fad and Elovl genes (Carmona-Antoñanzas et al., 2011; Castro et al., 2012; Fonseca-Madrigal et al., 2014; Kabeya et al., 2017; Li et al., 2017, 2010; Mohd-Yusof et al., 2010; Monroig et al., 2017, 2012a, 2016, 2012b; Monroig, Tocher, & Navarro, 2013; Monroig, Webb, Ibarra-Castro, Holt, & Tocher, 2011b; Monroig et al., 2010; Morais et al., 2009; Surm et al., 2015). Gene duplication events in mammals have resulted in multiple gene copies encoding desaturase (Fads1, 2, and 3) enzymes, whereas in other vertebrates (such as Danio rerio) only a single desaturase, with Δ5 and Δ6 activity, is present (Hastings et al., 2001). Similarly, whole genome duplication events have resulted in the diversification of elongase enzymes

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 25

observed in vertebrates (Elovl2 and 5) not present in other chordates, which contain a single elongase enzyme referred to as Elovl2/5 (Castro et al., 2016; Monroig et al., 2016). Gene duplication events of Fad and Elovl gene families have also occurred in a lineage-specific manner across other bilaterian taxa, such as molluscs (Monroig et al., 2012b; Surm et al., 2015). Despite these key observations, the distribution and evolution of Fad and Elovl genes remains unresolved in early diverging metazoan phyla such as Cnidaria. Furthermore, due to limited molecular studies investigating the Fad and Elovl gene families in Cnidaria, their phylogenetic and molecular evolutionary histories remain unresolved. Studies investigating the fatty acid profiles of early diverging metazoan taxa, have been focused on the fatty acid profile of cnidarians that rely on an interaction with symbionts, such as Symbiodinium (Garrett, Schmeitzel, Klein, Hwang, & Schwarz, 2013; Harland, Fixter, Davies, & Anderson, 1991, 1992; Papina, Meziane, & van Woesik, 2003). From this body of work, there is strong evidence to suggest that the symbionts transfer essential LC-PUFAs to the host. This was evident with the fatty acid profile of sea anemones that were treated to remove symbionts revealing the presence of LC-PUFAs, ARA and EPA, but lacked LC-PUFAS > C20 such as docosapentaenoic acid (DPA; 22:5n-3) and docosahexaenoic acid (DHA; 22:6n-3) (Garrett et al., 2013; Harland et al., 1991, 1992; Papina et al., 2003). The fatty acid profile of early diverging metazoan species that lack a symbiotic relationship, however, remain unclear and further research investigating the ability of these organisms to elongate and desaturate PUFAs to LC-PUFAs is required. Using a comparative genomic approach, this study examined the distribution and copy number of Fad and Elovl genes from four cnidarian genomes (Hydra vulgaris, Acropora digitifera, Nematostella vectensis, and Exaiptasia pallida). A further fine- scale comparative transcriptomic analysis was also undertaken, within order Actiniaria, to identify specific candidate genes in this group. Phylogenetic and selection analyses of these data have also been performed to elucidate the molecular evolution of the Fad and Elovl gene families in Cnidarians. The fatty acid profile of candidate cnidarian species, Actinia tenebrosa (Figure 1.2), which lacks a symbiotic relationship with Symbiodinium (Black et al., 1979; Muller, Fine, & Ritchie, 2016; Ottaway, 1978), was investigated using fatty acid analysis to address our lack of understanding of the baseline levels of fatty acids in these organisms. Finally, we

26 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

examined if the fatty acid composition data were concordant with the Fad and Elovl enzymes found in this species.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 27

2.2 MATERIALS AND METHODS

2.2.1 Identification of candidate genes Fad and Elovl candidate genes were identified by interrogating predicted protein sets from a range of species in phylum Cnidaria. The specific species interrogated were H. vulgaris, A. digitifera, N. vectensis and E. pallida (Cnidaria). Furthermore, these genes involved in the synthesis of omega-3 LC-PUFAs were investigated in six candidate actiniarian species with sequenced transcriptomes. These species include A. tenebrosa (four ecotypes: blue, brown, green and red), Anthopleura buddemeieri, Aulactinia veratra, Calliactis polypus, Telmatactis sp. and Nemanthus annamensis from the NCBI Bioproject: PRJNA313244 (van der Burg, Prentis, Surm, & Pavasovic, 2016). All transcriptomic data were generated from either whole organism or multiple tissue types. Raw reads were retrieved from the Sequence Read Archive (SRA) and converted to FASTQ files. The Trinity software package (v2.0.6) was used to assemble the data after Trimmomatic quality filtering (Bolger, Lohse, & Usadel, 2014; Grabherr et al., 2011). CEGMA was performed to validate the quality and completeness of the transcriptomes (Parra, Bradnam, Ning, Keane, & Korf, 2009). BUSCO (v3) was also performed using a metazoan specific dataset (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). TransDecoder version 2.0.1 was used to identify open reading frames (ORF) encoding for proteins to produce a predicted proteome (Haas et al., 2013). CD-HIT (v4.6.4) was then performed to cluster 100% identical proteins for each individual proteome to remove redundancy (Fu, Niu, Zhu, Wu, & Li, 2012). Protein sequences generated from both genomic and transcriptomic datasets were then used to identify candidate genes. BLASTP (e value < 1e-05) was performed using the nonredundant translated ORFs as queries against the Swiss-Prot database. Potential Fad and Elovl candidates were identified that had a top-blast hit with a functionally characterised protein from the Swiss-Prot database (e value < 1e-05). Functionally characterised Fad and Elovl proteins were identified in the Swiss-Prot database by having the essential Pfam domains. For Fads this required Pfam domains: Cyt-b5 (PF00173) and FA_desaturase (PF00487); and Elovls this required: ELO (PF01151). The respective candidates and functionally characterised Fad and Elovl proteins were aligned using MUSCLE in MEGA 7 (Kumar, Stecher, & Tamura, 2016). Sequences were retained only if they contained essential structural characteristics. These included an N-terminal cytochrome b5-like binding domain (cyt-b5; PF00173),

28 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

three histidine boxes (HXXXH, HXXXHH, and QXXHH) located in a fatty acid desaturase domain (FA_desaturase; PF00487), and a haem binding motif (HPGG) in Fads. Furthermore, functionally characterised sphingolipid desaturases that also contain these essential structural characteristics were removed from the alignments. The structural characteristics of Elovls included a diagnostic histidine box motif (HXXHH) and a Pfam ELO domain (PF01151). These structural characteristics are essential for desaturation and elongation, and therefore transcripts not containing these domains were not considered. Candidate genes from actiniarian transcriptomes were checked for symbiont contamination using PSyTranS (https://github.com/sylvainforet/psytrans). The symbiont proteomes from Symbiodinium microadriaticum, Symbiodinium kawagutii and Symbiodinium minutum were used as a training dataset to identify potential contamination, while the host proteome used for training was N. vectensis.

2.2.2 Phylogenetic analyses The refined list of full-length translated ORFs was used for phylogenetic analyses to determine the distribution of Fad and Elovl proteins within and across Metazoa. Protein sequences were aligned using MUSCLE in MEGA 7 (Kumar, Stecher, & Tamura, 2016) followed by manual curation to remove sequences that lack conserved residues and motifs. Protein alignments were imported into IQ-TREE (v1.4.2) (Nguyen, Schmidt, von Haeseler, & Minh, 2015) to determine best-fit of protein model evolution. Using Bayesian information criterion, a LG+I+G4 model was selected for both Fad and Elovl as the best-fit model of protein evolution. Phylogenetic trees were generated in IQ-TREE (v1.4.2) (Nguyen et al., 2015) from alignments using 1,000 ultrafast bootstrap iterations. The Fad tree was visualised using Figtree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/) and the Elovl tree was visualised using Interactive Tree Of Life (v3) (Letunic & Bork, 2016).

2.2.3 Selection analyses Sequences that encode full-length protein sequences for both Fad and Elovl proteins generated from actiniarian transcriptomes were investigated to detect the action of pervasive diversifying selection. These codon sequences for the respective Fad and Elovl gene families were aligned using MUSCLE within MEGA 7 (Kumar et

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 29

al., 2016). Codon alignments were imported into IQ-TREE (v1.4.2) (Nguyen et al., 2015) to determine best-fit substitution model (GTR+I+G4) and Maximum Likelihood phylogenetic trees were generated from alignments using 1,000 ultrafast bootstrap iterations. Using these alignments and phylogenetic trees as inputs, the rates of selection could be determined using Maximum Likelihood models in the program CODEML in PAML (v4.8) (Yang, 2007) using the protocol of (Fang et al., 2009) and codon frequency F3X4. To accurately determine significance, Bonferroni correction was computed to account for the repeated testing of multiple branches (Fletcher & Yang, 2010; Hunt et al., 2011) where Fad had n = 2 branches and Elovl had n = 4 branches and the adjusted p-value = 0.05/n. To detect pervasive purifying and diversifying selection Fast UnconstrainedBayesian AppRoximation (FUBAR) (Murrell et al., 2013) was used from the HyPhy package (Pond, Frost, & Muse, 2005).

2.2.4 Fatty acid analysis Fatty acid analysis was performed to investigate the baseline fatty acid levels in a candidate cnidarian (A. tenebrosa) lacking symbionts. Individuals were placed in isolated holding tank containing only artificial sea water and not fed for three days. The fatty acid levels of banana prawn, the primary feed provided to A. tenebrosa, were also investigated to provide a comparison with the fatty acid profile of A. tenebrosa prior to starvation. Analysis was performed using three different individuals, with three technical replicates for each individual. An analysis without internal standard was performed to validate the absence of the internal standard (21:0) in the samples. Lipid extraction was performed using a modification of the method of Matyash et al. (Matyash, Liebisch, Kurzchalia, Shevchenko, & Schwudke, 2008). In brief, 10 mg of the sample was homogenized in liquid nitrogen (LN2), immediately an aliquot of 300 µL methanol (cold) containing 0.01 % butylated hydroxytoluene and internal standard (21:0, heneicosanoic acid, 1 mmol/L) (Chem Service INC, West Chester, PA, USA) was added and mixed by vortexing. A 1,000 µL aliquot of methyl-tert butyl ether was added and samples were rotated at room temperature for 1 hr. A total of 250 µL of 0.15 mol/Lammonium acetate was added to induce phase separation. Tubes were centrifuged at 2,000 rcf for 5 min to complete phase separation and 50 µL of the upper organic layer was removed to a new 2 mL glass vial and stored at -20 °C until analysis, following a similar method utilized by Tran et al. (Tran et al., 2014). For the

30 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

preparation of fatty acid methyl esters (FAMEs), 10 µL of derivatizing reagent (trimethylsulfonium hydroxide solution, ~0.25 mol/L) (Sigma-Aldrich, Castle Hill, NSW, Australia) was added. The solution was mixed for 30 s and allowed to react for 30 min, following the method proposed by Gomez-Brandon et al. (Gómez-Brandón, Lores, & Domínguez, 2010). The FAME extracts were analysed using a gas chromatograph coupled to a mass spectrometer (GCMS – TQ8040) (Shimadzu, Kyoto, Japan) with RTX-2330 capillary columns (Restek, Bellefonte, PA, USA; 60 m x 0.25 mm, film thickness 0.20 µm), and electron ionisation set at 70 eV. Conditions for the analysis of FAMEs were as follow: carrier gas, He: 2.6 ml/min; 22:1 split ratio, injection volume 1 µL; injector temperature 220 °C; thermal gradient 150-170 °C at 10 °C/min, then 170-200 °C at 2 °C/min, then 200-211 °C at 1.3 °C/min and temperature held for 5 min. The mass spectrometer was equipped with an ion source (250 °C). The data were acquired with Q3 scan mode from m/z 50 – 650. For data collection, the MS spectra were recorded from 4 to 30.5 min. All data were processed using GCMS Postrun Analysis software (Shimadzu, Kyoto, Japan). FAME identification was based on an internal spectral library as well as a series of FAME standards (20-component FAME mix) (Restek, Bellefonte, PA, USA) were used to identify retention times (Rt) of specific m/z profiles associated with known FAs. The data processing included smoothing, peak detection, integration, peak alignment, normalization, and identification. Extraction and solvent blanks were included in the analysis to allow exclusion of ions detected at lipid masses that result from extraction chemical or solvent impurities. Quantification was achieved by comparison of the peak area of individual lipids to the internal standard.

2.3 RESULTS

2.3.1 Identification of candidate genes The distribution and copy number of Fad and Elovl genes found in cnidarian species with sequenced genomes are shown in Table 2.1. In phylum Cnidaria, gene copy number varies ranging from zero to three Fad genes. Genes encoding full-length Elovl proteins were identified in all taxa, ranging from one to four. From these data, more Fad and Elovl genes were observed in the actiniarian E. pallida compared with the other cnidarian species.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 31

Table 2.1. Fad and Elovl gene copy numbers in cnidarian taxa with sequenced genomes

Organism Fad Elovl

Hydra vulgaris 1 1 Acropora digitifera 0 3 Nematostella vectensis 1 1 Exaiptasia pallida 3 4

The distribution of Fad and Elovl genes was further investigated in the transcriptome assemblies of six actiniarian taxa, which included A. tenebrosa, A. buddemeieri, A. veratra, C. polypus, Telmatactis sp., and N. annamensis. The six species with sequenced transcriptomes that were used in this analysis are from two superfamilies: Actinioidea (A. tenebrosa, A. buddemeieri, and A. veratra) and Metridioidea (C. polypus, Telmatactis sp. and N. annamensis). The N50 (minimum contig length to cover 50 % of the cumulative sum of contigs) for all transcriptomes are > 800 bp and have a completeness > 90 % for both CEGMA and BUSCO, with the exception of Telmatactis sp. which has a CEGMA and BUSCO completeness of 77 % and BUSCO 83.4 %, respectively (Supplementary Tables 1 and 2). Fad and Elovl gene copies were identified in all transcriptomes (Table 2.2). The four A. tenebrosa individuals all encode two full-length Fad proteins, except for the brown individual which encodes a single full-length Fad protein. Two full-length Fad proteins are also encoded by N. annamensis and A. buddemeieri, whereas C. polypus, Telmatactis sp., and A. veratra encode a single full-length Fad protein. Multiple genes copies encoding full-length Elovl proteins were also observed in all actiniarian species. All A. tenebrosa individuals encode five full-length Elovl proteins, with the expectation of the brown individual, which encode four full-length proteins. In the remaining Actinioidea species, A. buddemeieri and A. veratra encode four and five full-length proteins, respectively. Metridioidea transcriptomes for C. polypus, Telmatactis sp., and N. annamensis encode five, two, and four full-length proteins, respectively.

32 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Table 2.2. Fad and Elovl gene copy numbers in actiniarian transcriptome assemblies

Superfamily Organism Fad Elovl Actinioidea Actinia tenebrosa (blue) 2 5 Actinioidea A. tenebrosa (brown) 1 4 Actinioidea A. tenebrosa (green) 2 5 Actinioidea A. tenebrosa (red) 2 5 Actinioidea Anthopleura buddemeieri 2 4 Actinioidea Aulactinia veratra 1 5 Metridioidea Calliactis polypus 1 5 Metridioidea Telmatactis sp. 1 2 Metridioidea Nemanthus annamensis 2 4

2.3.2 Comparative and phylogenetic analyses of Fad and Elovl gene families Using a phylogenetic framework, we investigated the distribution of Fad genes across Metazoa (Figure 2.1). A Maximum Likelihood tree revealed three distinct clades, which we name A, B, and C. Clades A and B were sister to each other, with clade C the most divergent. All bilaterian Fad proteins are found in clade B. Sequences within clades A and C are found to be from phylum Cnidaria as well as non-metazoan taxa, such as fungi and plant species. In fact, functionally characterised plant Fad proteins (green branches) are found in clade A, while functionally characterised fungal and amoebozoan Fad proteins (blue) are present in clade B.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 33

Figure 2.1. Maximum Likelihood tree with midpoint root depicting relationships among Fad protein sequences and branches transformed as a cladogram. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70 % not reported. Three distinct clades are named Clades A, B, and C. The branches of functionally characterised Fad proteins are highlighted by the colour of their respective clade. These functionally characterised sequences were retrieved from the Swiss-prot database and named according to their Uniprot accession with the species name abbreviated, Candidate sequences identified in this study have their full species names.

34 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

A maximum-likelihood phylogeny of the Elovl gene family produced two clades (A and B) which both contained multiple subclades (Figure 2.2). Broadly, in clade A, four distinct subclades clustered together including non-metazoan eukaryote Elovl, Elovl3, Elovl6, and Elovl 3/6-like. Sequences from the non-metazoan eukaryote Elovl subclade include sequences from Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Dictyostelium discoideum. The Elovl3 clade contains only mammalian taxa and Elovl6 clade contain only taxa from phylum Chordata. The Elovl 3/6-like clade contains functionally characterised Elovl sequences from Caenorhabditis elegans (sp_P49191_ELO3_CAEEL and sp_Q03574_ELO4_CAEEL). A second broad clustering of eight subclades (clade B) can be observed in Figure 2.2 and was annotated as Elovl1, Elovl7, Elovl1/7-like, Elovl4, Elovl5, Elovl2, a subclade that include sequences from actiniarian taxa that did not cluster with any functionally characterised sequences (Novel ElovlA), and a subclade that included cnidarian taxa that did not cluster with any functionally characterised sequences (Novel ElovlB). The novel ElovlA subclade was sister to all other subclades in clade B, contained no functionally characterised sequences and consisted only of actiniarian taxa. The Elovl1 and Elovl7 clades were sister to each and contain sequences from phylum Chordata. The Elovl1/7-like clade contains functionally characterised Elovl proteins from the Ædes aegypti as well as Drosophila melanogaster (Chertemps et al., 2007; Ribeiro et al., 2007). The Novel ElovlB subclade, consisted of only protein sequences from phylum Cnidaria and no functionally characterised Elovl proteins. This subclade is sister to Elovl1, Elovl7 and Elovl1/7-like. Subclades Elovl2 and Elovl5 were sister to each other and only sequences from Chordata are found in these two subclades. In the Elovl4 subclade, both functionally characterised Elovl4 protein sequences, from phylum Chordata, and sequences from phylum Cnidaria clustered together.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 35

Figure 2.2. Maximum Likelihood tree with midpoint root depicting relationships among Elovl protein sequences and branches transformed as a cladogram. Bootstrap values after 1,000 iterations are shown next to nodes, values under 70 % not reported. Twelve distinct clades are annotated based on the functionally characterised proteins found within them. The branches of functionally characterised proteins are highlighted by the colour of their respective clade. These functionally characterised sequences were retrieved from the Swiss-prot database and named according to their Uniprot accession with the species name abbreviated. Candidate sequences identified in this study have their full species names.

36 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

2.3.3 Selection analysis of the Fad and Elovl gene families To investigate the selection pressures on both the Fad and Elovl gene families, multiple analyses were performed. Results from the site model selection analysis reveal that both the Fad and Elovl gene families show patterns of nucleotide variation consistent with the action of pervasive purifying selection (Table 2.3). The weighted average dN/dS ratio of the Fad and Elovl gene families for all models is < 0.1 for both gene families. Using a chi-square significance test, the null models, M1, and M7, could not be rejected against the models M2, and, M8, respectively. The null model, M0, however, could be rejected testing against the M3 model, and therefore the assumption that all codons show the same patterns of nucleotide variation could be rejected. To further examine whether specific codons within gene families are under the influence of pervasive purifying selection or pervasive diversifying selection, FUBAR was used within the HyPHy package (Supplementary Table 3). These results confirmed no codons are under diversifying selection for both gene families; however, codons were identified to be under pervasive purifying selection. In the Fad and Elovl gene families, 282 codons of a possible 419 codons, and 197 codons of a possible 223 codons, show patterns of nucleotide variation consistent with the action of purifying selection, respectively. This indicates that pervasive purifying selection is acting on the majority of the codons for both the Fad and Elovl gene families.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 37

Table 2.3. Detecting pervasive diversifying selection using site models implemented in CODEML for the Fad and Elovl gene families from actiniarian transcriptome assemblies.

Diversifying Gene families Model Likelihood dN/dS Parameters selected codons M0 (one Ratio) -8130.15 0.0814 17 - M1 (neutral) -8103.01 0.1142 18 - M2 (selection) -8103.01 0.1142 24 NS Fad M3 (discrete) -8033.15 0.0935 21 - M7 (beta) -8041.72 0.0932 18 - M8 (beta & ω) -8041.61 0.0939 20 NS M0 (one Ratio) -9462.38 0.0795 45 - M1 (neutral) -9452.13 0.0876 46 - M2 (selection) -9452.13 0.0876 48 NS Elovl M3 (discrete) -9279.86 0.0856 49 M7 (beta) -9281.49 0.0845 46 - M8 (beta & ω) -9281.17 0.0850 48 NS NS (not significant).

Duplication events have played a major role in the expansion of both gene families in cnidarians, in particular the Elovl gene family which has undergone repeated rounds of duplication events. Episodic diversifying selection following duplication events was tested in both the Fad and Elovl gene families. Maximum- likelihood trees were constructed using the coding sequence (CDS) for both Fad and Elovl gene families (Supplementary Figure 1). From the Maximum-likelihood trees, two subclades in the Fad gene family are observed, and four subclades are observed in the Elovl gene family. The null hypothesis could not be rejected for any subclade in both Fad and Elovl gene families, with exception of branch 4 in the Elovl gene family

which had a dN/dS ratio of 0.003.

38 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Table 2.4. Detecting lineages under episodic diversifying selection with branch models implemented in CODEML for Fad and Elovl gene families from actiniarian transcriptome assemblies.

Gene family Branch H0 Likelihood H1 Likelihood p-value dN/dS 1 -8130.15 -8128.74 9.31 e-02NS NS Fad 2 -8130.15 -8129.28 1.89 e-01NS NS 1 -9462.38 -9462.37 9.26 e-01NS NS 2 -9462.38 -9461.54 1.97 e-01NS NS Elovl 3 -9462.38 -9462.01 3.95 e-01NS NS 4 -9462.38 -9458.16 3.66 e-03* 0.003 Significance ≤ 0.05 following Bonferroni's correction are highlighted as *. NS, not significant.

Finally, the branch-sites model was implemented to test for codons under episodic diversifying selection following gene duplication (Figure 2.3). The same foreground branches as previously described were tested (Supplementary Figure 1). Foreground branches with significant p-values were corrected using Bonferroni correction. The foreground branches that had significant p-values were then used to identify codons with dN/dS ratio > 1 and a posterior probability > 0.95 using Bayes Empirical Bayes (BEB) analysis. The null hypothesis could not be rejected for any of the foreground branches in the Fad gene family and therefore no codons appear to be under episodic diversifying selection. Conversely, the null hypothesis could be rejected for all foreground branches tested in the Elovl gene family. Furthermore, BEB analysis identified multiple codons to be under episodic diversifying selection following duplication events in the Elovl gene family. The codons under diversifying selection can be observed in Figure 2.3 and in Supplementary Tables 4 and 5. Branch 4, which includes sequences that clustered with functionally characterised Elovl4 proteins (Ohno et al., 2010), was observed to have 15 codons under episodic diversifying selection with a posterior probability ≥ 0.95. The remaining branches are observed to have between 11 and 14 codons under episodic diversifying selection with a posterior probability ≥ 0.95.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 39

Figure 2.3. Episodic diversifying selection of Elovl gene family in actiniarians. (a) Maximum Likelihood tree of nucleotide sequences with midpoint root depicting relationships among Elovl genes in actiniarians. Foreground branches tested are numbered and coloured. Foreground branches corresponding to their respective clades in Figure 3 are annotated accordingly. (b) A plot of posterior probability of codons with dN/dS >

1 against amino acid residue positions. Significantly detected codons under diversifying selection (dN/dS > 1) with posterior probability ≥ 0.95 (Bayes Empirical Bayes analysis) are coloured to correspond to their respective foreground branches. The horizontal line represents the line of significance with posterior probability ≥ 0.95.

40Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

2.3.4 Fatty acid analysis Fatty acid analysis was performed for three biological replicates in triplicate to determine the fatty acid profile of an anemone (A. tenebrosa) and their feed, banana prawn. Fatty acid methyl esters (FAMEs) observed in the whole organism of A. tenebrosa and prawn are shown in Figure 2.4 (Supplementary Table 6). Of the FAMEs analysed in A. tenebrosa, SFAs are dominant in the samples (65.75 % of total FAMEs), followed by PUFAs (15.29 % of total FAMEs) and MUFAs (10.37 % of total FAMEs). The FAMEs analysed in prawn also revealed that SFAs are dominant in the samples (59.98 % of total FAMEs) followed by MUFAs (24.13 % of total FAMEs) and PUFAs (15.89 % of total FAMEs) In A. tenebrosa and prawn, SFAs, namely 16:0 and 18:0, are the most abundant component of the total FAME profile (A. tenebrosa: 21.17 % and 18.34 %, respectively; prawn: 29.87 % and 16.42 %, respectively). Four MUFAs are found in A. tenebrosa, and five are found in prawn. In both A. tenebrosa and prawn, the methyl ester 18:1n-9 is found in high concentration, 4.32 and 10.27 % of total FAMEs, respectively. Multiple different PUFA methyl esters are identified in both A. tenebrosa and prawn. In A. tenebrosa, LA (18:2n-6), ALA (18:3n-3), ARA (20:4n-6), and EPA (20:5n-3) are present; however, DHA (22:6n-3) was absent. The PUFA methyl esters present in prawn includes LA, ARA, EPA and DHA. Among the PUFAs, ALA is the methyl ester form most abundant in A. tenebrosa, corresponding to 4.22 % of total FAMEs, and ARA is the methyl ester form most abundant in prawn, corresponding to 6.09 % of total FAMEs.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 41

42Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

Figure 2.4. Plot of average fatty acid profile from whole-organism (n=3) of anemone and prawn. (a) Bar plot of average concentration of FAME given in mmol/kg for both anemone and prawn with error bars shows standard deviation. (b) Bar plot of average concentration of FAME given in % of total FAME for both anemone and prawn with error bars shows standard deviation.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 43

2.4 DISCUSSION

Research investigating the Fad and Elovl gene families that desaturate and elongate LC-PUFAs has largely been restricted to bilaterian taxa (Carmona- Antoñanzas et al., 2011; Castro et al., 2012; Fonseca-Madrigal et al., 2014; Kabeya et al., 2017; Li et al., 2017, 2010; Mohd-Yusof et al., 2010; Monroig et al., 2017, 2012a, 2016, 2012b, 2013, 2011b, 2010; Morais et al., 2009; Surm et al., 2015). We have investigated the phylogenetic and molecular evolutionary histories of the Fad and Elovl gene families in phylum Cnidaria, to provide insights into evolution and the distribution of these gene families in Metazoa outside of bilaterian taxa. Our results found multiple copies of both the Fad and Elovl gene families in cnidarian species and most of these gene copies had no true ortholog in bilaterian taxa. An expansion of both Fad and Elovl genes can be observed in actiniarians compared with other cnidarian species. This expansion is the result of lineage-specific gene duplications in both the Fad and Elovl gene families. This was evident in both the transcriptomic and genomic data, with the exception of N. vectensis. Variations of Fad and Elovl gene copy number were also observed within the same species as observed in the brown ecotype of A. tenebrosa, which had one less Fad and Elovl compared with the other three ecotypes. This may be an actual case of copy number variation or also could be an artefact of low expression of this gene in this ecotype (Surm et al., 2015). Actiniarian Fad proteins clustered with functionally characterised Δ5 and Δ6 desaturases Fad proteins in clades A and C, respectively (Figure 2.1). In Clade A, a functionally characterised Δ6 Fad protein with the ability to desaturate PUFAs from the plant species, Borago officinalis, is present (Sayanova et al., 1997). In clade C, a functionally characterised Δ5 Fad from the oleaginous fungus, Mortierella alpina, was found (Michaelson, Lazarus, Griffiths, Napier, & Stobart, 1998). The fatty acid profile of M. alpina has been found to have high levels of ARA and also EPA when conditions are optimal, but lack > C20 PUFAs, such as DHA (Knutzon et al., 1998; Michaelson et al., 1998). Furthermore, M. alpina has been shown to encode an additional Δ6 desaturase enzyme; however, this sequence is not present in the Swiss-prot database and therefore was not included in this analysis (Knutzon et al., 1998; Michaelson et al., 1998). Several sphingolipid desaturases share the same structural characteristics as Fads. These structural characteristics include an N-terminal cytochrome b5-like

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 45

binding domain (cyt-b5; PF00173), which contains a haem binding motif (HPGG), and a fatty acid desaturase domain (FA_desaturase; PF00487), which contains three histidine boxes (HXXXH, HXXXHH, and QXXHH). To date, these sphingolipid desaturases have been identified in plants, such as Arabidopsis thaliana and Borago officinalis, and fungi such as Candida albicans and Kluyveromyces lactis (Libisch, Michaelson, Lewis, Shewry, & Napier, 2000; Oura & Kajiwara, 2008; Sperling, Libisch, Zähringer, Napier, & Heinz, 2001; Sperling et al., 1998; Takakuwa, Kinoshita, Oda, & Ohnishi, 2002). Previous phylogenetic studies (Feng et al., 2017; Gostinčar, Turk, & Gunde-Cimerman, 2010; Meesapyodsuk & Qiu, 2012) have shown that the paralogs of the genes encoding Fads and sphingolipid desaturases cluster together, as opposed to their respective orthologs. These sphingolipid desaturases that share the same structural characteristics as Fads have yet to be identified in metazoan taxa. Indeed, if these sequences are sphingolipid desaturases, it will be the first report of metazoan sphingolipid desaturase with the same structural characteristics as Fads and would reveal insights into evolution of the Fads and sphingolipid desaturases. The fatty acid analysis in A. tenebrosa revealed a similar fatty acid profile as M. alpina, with the presence of EPA and ARA, and an absence of DHA. Although A. tenebrosa was starved prior to fatty acid analysis, it is likely that some fatty acids from the diet are incorporated into its lipid profile. This is evident with both A. tenebrosa and prawn sharing similar lipid profiles; however, this inflated concentration of FAME, does not account for the lack of DHA found in the fatty acid profile of A. tenebrosa. In mammals, the Elovl gene family has been comprehensively investigated, revealing repeated rounds of gene duplication, resulting in seven members: Elovl1-7 (Jakobsson et al., 2006; Leonard et al., 2004). Of these seven genes, only the proteins encoded by Elovl2, 4, and 5 play a role in the elongation of PUFAs to LC-PUFAs, with Elovl1, 3, 6, and 7 having roles in elongating other types of fatty acids, such as SFAs and MUFAs. Overall, few orthologous PUFA elongases (Elovl2, 4, and 5) were identified in cnidarians relative to bilaterian taxa. Only bilaterian Elovl4 proteins were found in a clade with actiniarian Elovl proteins. This suggests cnidarians, including actiniarians, lack the diversity of elongases required to biosynthesise LC-PUFAs. Although chordates are considered inefficient at biosynthesising LC-PUFAs, their fatty acid profiles contain LC-PUFAs > C20 (e.g., DHA) (Sprague, Dick, & Tocher,

2016). The presence of LC-PUFAs > C20 in bilaterian taxa and their absence in A.

46 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

tenebrosa, may be explained by a diversification of the Elovl gene family in some bilaterians, resulting in Elovl2, 4, and 5. This diversification of PUFA elongases is not present in cnidarians, with only Elovl4 identified in most species investigated. It should be noted that the Elovl4 protein has been shown to elongate LC-PUFAs > C20 (Castro et al., 2016; Monroig et al., 2016, 2013), and therefore functional characterisation of the Elovl4 protein in cnidarians is required. In A. tenebrosa, our fatty acid analysis results revealed a high proportion of SFAs compared with MUFAs and PUFAs. The higher levels of SFAs, especially the presence of 20:0, 22:0, and 24:0 suggest a capacity of A. tenebrosa to elongate SFA from 16:0 and 18:0. In bilaterians, the elongases capable of these SFA elongation are Elovl1, 3, 6 and 7. Furthermore, non-metazoan eukaryote taxa and early diverging metazoans have somewhat similar fatty acid profiles, with the presence of C20 LC-

PUFAs (ARA and EPA) and an absence of LC-PUFAs > C20. The elongase capabilities of the sequences found within both novel subclades (Novel ElovlA and Novel ElovlB) are unknown, as no functionally characterised sequences clustered with these subclades. Currently the Novel ElovlA and Novel ElovlB subclades appear to be actiniarian-specific and cnidarian-specific, respectively. However, including more taxa from other phyla, such as Porifera, Ctenophora and Placozoa, would be essential in discerning whether the genes from these subclades are lineage-specific. A lack of DPA and DHA in the fatty acid profile of A. tenebrosa, however, suggests that the putative function of the proteins encoded by genes that cluster within the Novel ElovlA and Novel ElovlB subclades is unlikely to have an action consistent with those elongase enzymes capable of elongating LC-PUFAs > C20. Selection analyses of actiniarian Fad and Elovl gene families revealed significant evolutionary constraint in their CDS, despite repeated rounds of duplications. The dN/dS ratio for Fad and Elovl gene families was > 0.1 for both, indicating patterns of nucleotide variation consistent with the action of purifying selection. The application of a combination of CODEML and FUBAR identified no codons to be under pervasive diversifying selection; however, FUBAR revealed that the vast majority of codons are under pervasive purifying selection. A total of 67 % (282/419) and 88% (197/223) of all codons are under pervasive purifying selection for the Fad and Elovl gene families, respectively. While both gene families were found to be under pervasive purifying selection, some codons were observed to be under episodically diversifying selection in specific clades.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 47

The branch-site model implemented in CODEML identified no evidence of episodic diversifying selection for the Fad gene family, whereas strong evidence of episodic diversifying selection was observed following each duplication event for the Elovl gene family. Codons in the Elovl gene family which were identified to be under episodic diversifying selection may also be responsible for targeting different fatty acids, such as SFA, MUFA and PUFA. Codons were identified to be under episodic diversifying selection on all four branches tested (Figure 2.3). From Figure 2.2, the Elovl genes that clustered to the Novel ElovlA and Novel ElovlB subclades appear to be actiniarian-specific and cnidarian-specific, respectively. In particular, 15 codons were identified to be under episodic diversifying selection on branch 4 of Figure 2.3, which corresponds to actiniarian Elovl genes orthologous to bilaterian Elovl4 (Figure 2.2). As Elovl4 proteins are responsible for elongating PUFAs, these codons may have a role in the targeting and elongating of PUFAs. A study from the genus Drosophila revealed that the Fad gene family in this group is under the influence of pervasive purifying selection but also episodic diversifying selection at specific codons (Fang et al., 2009). This study also found that the majority of the codons under episodic diversifying selection occurred in clades produced by duplication events. The authors suggest that the amino acid residues under positive selection may be responsible for altered substrate selectivity. To date, research investigating the ability of metazoan taxa to biosynthesise LC- PUFAs has been restricted to bilaterian taxa. Here we provide the first comprehensive analysis of Fad and Elovl gene families across phylum Cnidaria. Our analysis revealed that lineage-specific gene duplication has played a major role in the distribution and diversification of both the Fad and Elovl gene families in actiniarians. The molecular evolutionary histories were investigated revealing pervasive purifying selection for both gene families in actiniarians. However, in the Elovl gene family, codons were identified to be under episodic diversifying selection following gene duplication. The amino acids that are encoded by these codons under episodic diversifying selection may be functionally important for targeting and elongating different fatty acids, such as SFA, MUFA and PUFA. The fatty acid composition data implies that Elovl enzymes found in A. tenebrosa are not actively contributing to FAs of longer than 20 carbons, but this speculation must be viewed with caution as further functional validation is required before this result is validated. Overall, this study has revealed that actiniarian

48 Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria

species possess Fad and Elovl genes required for the biosynthesis of some LC-PUFAs, and these genes appear to share a greater similarity to non-metazoan eukaryotes.

Acknowledgments: The authors would like to thank the Evolutionary and Physiological Genomics Lab (ePGL), in particular Chloé A. van der Burg, for their continual help and support. The authors would also like to thank QUT Marine group for their help and advice caring for the animals. The authors would like to acknowledge QUT Molecular Genetics Research Facility for the use of their facilities. The data reported in this article were generated at the Central Analytical Research Facility operated by the Institute for Future Environments. Computational resources and services used in this work were provided by the High Performance Computing and Research Support Group, Queensland University of Technology, Brisbane, Australia. We acknowledge the Brazilian National Council for Scientific and Technological Development (CNPq-Brazil) for providing scholarship to Tarik M. Toledo to study at the QUT and contribute toward this project’s work.

Author contributions: AP, JS, TT and PP conceived and designed the project. JS performed phylogenetic and selection analysis TT performed fatty acid analysis. AP, JS, TT and PP wrote and edited the manuscript. All authors read and approved the final manuscript.

Chapter 2: Insights into the phylogenetic and molecular evolutionary histories of Fad and Elovl gene families in Actiniaria 49

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Joachim M. Surm1,2*, Hayden L. Smith3,4, Bruno Madio5, Eivind A. B. Undheim6, Glenn F. King5, Brett R. Hamilton6,7, Chloé A. van der Burg1,2,, Ana Pavasovic1, and Peter J. Prentis3,4

1School of Biomedical Sciences, Faculty of Health, Queensland University of Technology 2Institute of Health and Biomedical Innovation, Queensland University of Technology 3School of Earth, Environmental and Biological Sciences, Science and Engineering Faculty, Queensland University of Technology 4Institute for Future Environments, Queensland University of Technology 5Institute for Molecular Bioscience, University of Queensland 6Centre for Advanced Imaging, University of Queensland 7Centre for Microscopy and Microanalysis, University of Queensland *Correspondence: [email protected];

Surm, J. M., Smith, H. L., Madio, B., Undheim, E. A. B., King, G. F., Hamilton, B. R., … Prentis, P. J. (2019). A process of convergent amplification and tissue- specific expression dominates the evolution of toxin and toxin-like genes in sea anemones. Molecular Ecology, 0. https://doi.org/10.1111/mec.15084

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 51

Statement of Contribution of Co-Authors for Thesis by Published Paper The following is the format for the required declaration provided at the start of any thesis chapter which includes a co-authored publication. The authors listed below have certified that: 1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT ePrints database consistent with any limitations set by publisher requirements.

In the case of this chapter: Publication title and date of publication or status: A process of convergent amplification and tissue-restricted expression dominate the evolution of toxin and toxin-like genes in sea anemones

Contributor Statement of contribution Joachim M. Surm Wrote and edited the manuscript, conceived experimental design, performed selection and phylogenetic analysis, sequence validation, collected organism samples, read and approved the final manuscript. Hayden L. Smith Wrote and edited the manuscript, performed sequence validation, collected organism samples, read and approved the final manuscript. Bruno Madio Wrote and edited the manuscript, performed mass spectrometry, venom extraction, HPLC, MALDI-TOF, LC- MS/MS, IMS, read and approved the final manuscript.

52 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Eivind A. B. Undheim Wrote and edited the manuscript, performed mass spectrometry, venom extraction, HPLC, MALDI-TOF, LC- MS/MS, IMS, read and approved the final manuscript. Glenn F. King Wrote and edited the manuscript, performed mass spectrometry, venom extraction, HPLC, MALDI-TOF, LC- MS/MS, IMS, read and approved the final manuscript. Brett R. Hamilton Wrote and edited the manuscript, performed mass spectrometry, venom extraction, HPLC, MALDI-TOF, LC- MS/MS, IMS, read and approved the final manuscript. Chloé A. van der Burg Wrote and edited the manuscript, collected organism samples, read and approved the final manuscript. Ana Pavasovic Wrote and edited the manuscript, conceived experimental design, collected organism samples, read and approved the final manuscript. Peter J. Prentis Wrote and edited the manuscript, conceived experimental design, performed selection and phylogenetic analysis, collected organism samples, read and approved the final manuscript.

Principal Supervisor Confirmation I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date Professor Louise Hafner

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 53

ABSTRACT

Members of phylum Cnidaria are an ancient group of venomous animals and rely on a number of specialised tissues to produce toxins in order to fulfil a range of ecological roles including prey capture, defence against predators, digestion, and aggressive encounters. However, limited comprehensive analyses of the evolution and expression of toxin genes currently exists for cnidarian species. In this study, we use genomic and transcriptomic sequencing data to examine gene copy number variation and selective pressure on toxin gene families in phylum Cnidaria. Additionally, we use quantitative RNA-seq and mass spectrometry imaging to understand expression patterns and tissue localisation of toxin production in sea anemones. Using genomic data, we demonstrate that the first large scale expansion and diversification of known toxin genes occurs in phylum Cnidaria, a process we also observe in other venomous lineages, which we refer to as convergent amplification. Our analyses of selective pressure on sea anemone toxin gene families reveal that purifying selection is the dominant mode of evolution for these genes and that phylogenetic inertia is an important determinant of toxin gene complement in this group. The gene expression and tissue localisation data revealed that specific genes and proteins from toxin gene families show strong patterns of tissue and developmental-phase specificity in sea anemones. Overall, convergent amplification and phylogenetic inertia has strongly influenced the distribution and evolution of the toxin complement observed in sea anemones, while the production of venoms with different compositions across tissues is related to the functional and ecological roles undertaken by each tissue type.

54 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

3.1 INTRODUCTION

Venomous animals rely on their toxins for a range of ecological processes, including prey capture, defence against predators, and intra and interspecific aggression (Casewell, Wüster, Vonk, Harrison, & Fry, 2013; Fry et al., 2009). Toxins are primarily gene-encoded peptides and proteins that evolved from ancestral “house- keeping” molecules that perform functions unrelated to venom production in the body (Casewell et al., 2013). Venomous taxa have evolved multiple times during metazoan evolution, and the genes that encode peptide and protein toxins are often considered to evolve rapidly under positive Darwinian selection, enhanced by a genetic redundancy generated through gene duplication events (Casewell et al., 2013; Fry et al., 2009; Sunagar & Moran, 2015). New evidence suggests that the evolution of toxin and toxin-like (TTL) genes is dominated by purifying selection in ancient venomous lineages such as cnidarians, coleoids, and arthropods (Jouiaei et al., 2015a; Pineda et al., 2014; Ruder et al., 2013; Sunagar et al., 2015, 2013; Undheim et al., 2014a, 2014b). This observation, however, does not account for gene age within these taxa. Consequently, this calls for a comprehensive analysis of selective pressures on widespread gene families (i.e., those shared in venomous lineages across a broad taxonomic distribution) versus those that are lineage-specific (i.e., gene families restricted to particular phylum or order) to better understand venom evolution in ancient lineages. Importantly, a lack of positive selection on TTL genes indicates other evolutionary processes may play a key role in venom evolution in ancient taxa. Cnidarians are the oldest venomous metazoan lineage (Erwin et al., 2011; Menon, McIlroy, & Brasier, 2013; Park et al., 2012) and they are defined by their envenomation system, which consists of specialised cells called cnidocytes (Fautin, 2009; Fautin & Mariscal, 1991; Kass-Simon & Scappaticci, 2002). Cnidocytes are distributed throughout the body, but they vary in density and morphology across tissues (Beckmann & Özbek, 2012; David et al., 2008; Fautin, 2009; Fautin et al., 1991; Özbek, 2010). This envenomation system is unique, and it allows cnidarians to produce toxins across multiple tissues (Macrander, Broe, & Daly, 2016a), whereas most venomous lineages show restricted expression of toxin genes within one or more isolated gland (Dutertre et al., 2014; Fingerhut et al., 2018; Gao et al., 2018; Modica, Lombardo, Franchini, & Oliverio, 2015; Undheim et al., 2015; Walker et al., 2018). In

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 55

sea anemones, three tissue types have become highly specialised for different ecological roles associated with venom delivery: acrorhagi are inflatable aggressive organs used in intraspecific aggressive encounters, tentacles are used in prey capture and defence, and mesenteric filaments are multifunctional morphological structures used principally in digestion and killing of prey (Fautin et al., 1991; Kass-Simon et al., 2002; Macrander, Brugler, & Daly, 2015; Prentis, Pavasovic, & Norton, 2018). While venoms in sea anemones are by far the most well studied among cnidarians, toxin gene expression and protein localisation patterns across these functionally distinct tissues remains largely unexplored. Significantly, as evidence supports that changes in TTL gene expression may generate different venom profiles (Amazonas et al., 2018), we hypothesise that in sea anemones, TTL gene expression varies among tissue types and correlates with their distinct ecological functions. In this paper, we comprehensively surveyed the TTL gene complement in both venomous and non-venomous taxa using comparative genomics, which revealed that the total TTL gene repertoire has expanded in the majority of known venomous lineages investigated. We refer to this process as convergent amplification and it is the result of convergent recruitment (Fry et al., 2009) followed by an increase in copy number of toxin-encoding genes. Our results indicate that the first evidence of convergent amplification is observed in phylum Cnidaria, the oldest extant venomous lineage. As with all cnidarians, sea anemones (actiniarians) share a common venomous ancestor, and their TTL genes are the best studied among all cnidarian groups. Consequently, we performed a fine scale comparative analysis on actiniarian transcriptomes, to systematically investigate the toxin gene complement and selective forces acting on widespread and lineage-specific TTL gene families in this group. Finally, functional genomic analyses were performed on a candidate sea anemone, Actinia tenebrosa, using quantitative RNA-seq and mass spectrometry imaging (MSI), to investigate whether functionally distinct tissue types generate different venom profiles consistent with their ecological functions.

56 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

3.2 MATERIALS AND METHODS

3.2.1 Identification of TTL genes

The first aim of this study was to comprehensively investigate the distribution, copy number and evolution of TTL genes and gene families across Metazoan taxa. In particular, we wanted to examine if the first large expansion of TTL genes and gene families occurred in phylum Cnidaria as it is the oldest extant venomous lineage. To achieve these aims we first identified candidate genes in gene sets and predicted protein sets from the sequenced genomes of representative taxa from following phyla and taxonomic groupings Ctenophora, Porifera, Placozoa, Cnidaria, Ecdysozoa, Lophotrochozoa, and Deuterostomia. As only few genomes currently exist for Ctenophora, Porifera and Placozoa, we combined them into an artificial group called CPP. A list of the genomes from the specific species used in this study can be found in Supplementary Table 7, and it includes four species from CPP, five species from Cnidaria, 24 species from Ecdysozoa, eight species from Lophotrochozoa, and nine species from Deuterostomia. BLASTP was performed to identify TTL candidate genes in all predicted proteomes from transcriptomes and genomes against the manually curated Swiss-Prot database (accessed 18/04/18) (e value < 1e-05). Significant queries with top BLAST annotations from proteins in the Tox-Prot database (Jungo & Bairoch, 2005) were considered candidate TTL genes. The presence of a signal peptide in these candidate TTL proteins was examined using SignalP (Petersen, Brunak, Heijne, & Nielsen, 2011). We grouped candidate TTLs with a signal peptide into protein families using their top BLAST hit. This hit description included the sequence similarity with other proteins (family and domains) as described in the Swiss-Prot knowledgebase. In order to determine the distribution and expansion of TTL gene families, we compared the number of different protein families and copy number in each species. This was based on the number of predicted proteins that received significant BLAST hits against the Tox-Prot database. This data was used to determine the extent of shared and lineage-specific TTL genes across the broad metazoan groupings listed above. Gene families found in multiple metazoan groups were considered shared, while those found in a single metazoan group were classified as lineage-specific. To further examine the distribution of lineage-specific TTL gene families within previously

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 57

defined metazoan groups, we reported copy number variation across the representative species. Phylum Cnidaria was selected to compare the frequency of lineage-specific TTL gene families in a phylum as it shares a venomous common ancestor and is the oldest extant venomous group. The distribution and expansion of TTL gene families within this phylum was compared using the number of predicted proteins that received significant BLAST hits against the Tox-Prot database. The genomes investigated included a Medusozoan (hydrozoan, Hydra vulgaris) and four Anthozoans. The four anthozoans consisted of three scleractinians (Acropora digitifera, Stylophora pistillata, and Orbicella faveolata) and one actiniarian (Exaiptasia pallida). Although candidates identified in this study had a top BLAST hit to a Tox-Prot database sequence, it is unlikely all TTLs identified are functional toxins (Madio, Undheim, & King, 2017; von Reumont, Undheim, Jauss, & Jenner, 2017). Some TTL gene families, such as sea anemone 8 toxin, have not been functionally validated as toxins. Sea anemone 8 toxin has been observed in multiple toxin studies and we have included this putative TTL protein to remain consistent with previous literature (Macrander et al., 2016a; Madio et al., 2017; Oliveira, Fuentes-Silva, & King, 2012).

3.2.2 Comparative genomic and phylogenetic analyses

Comparative analysis of TTL gene families in Actiniarians We undertook a fine scale analysis of TTL genes in actiniarian species to better understand whether phylogenetic inertia and/or ecological similarity among species influenced the distribution of TTL gene families. Actiniarians were selected as they have multiple well described and validated TTL gene families (Honma & Shiomi, 2006; Jouiaei et al., 2015a; Norton, 1991, 2009; Prentis et al., 2018; Shiomi, 2009), whose distribution and evolution are not well understood (Daly, 2016) (see Supplementary Table 8.1 for full list of gene families). Fourteen transcriptomes were used in this analysis from three different superfamilies (Baumgarten et al., 2015; Dnyansagar et al., 2018; Macrander et al., 2016a; Madio et al., 2017; Schwaiger et al., 2014; Sorek et al., 2018; van der Burg et al., 2016), including Actinioidea, Metridioidea, and Edwardsioidea (Rodríguez et al., 2014). Raw reads were retrieved from the sequence read archive and converted to FASTQ files. The Trinity software package version 2.0.6 was used to assemble the majority of the transcriptomes, with

58 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Trinity 2.2.0 used to assemble Aiptasia. diaphana, Edwardsiella carnea, Nematostella vectensis, and E. pallida (see Supplementary Table 9, 10, 11, and 12 for transcriptome assembly statistics), with the data used after Trimmomatic quality filtering (Bolger et al., 2014; Grabherr et al., 2011). BUSCO was used to validate the quality and completeness of the transcriptomes (Simão et al., 2015; Waterhouse et al., 2018). Downstream RNA-seq analysis was performed using software leveraged in the Trinity package version 2.2.0 (Haas et al., 2013). Individual reads were mapped back to reference transcriptome assemblies independently for each species using Bowtie2 and abundance estimated using RSEM (Li & Dewey, 2011). Normalised abundance estimates were calculated as fragments per kilobase of transcript per million mapped (FPKM). Transcripts with FPKM values of zero were removed as assembly artefacts. ORFfinder was used to identify open reading frames encoding for proteins > 25 amino acid residues in length to produce a predicted proteome for the 14 transcriptomes (Haas et al., 2013). CD-HIT was then used to cluster 100% identical proteins for each individual proteome to remove redundancy (Fu et al., 2012). TTL candidate genes were identified as above. In order to determine the distribution and expansion of TTL gene families in actiniarians, we compared 39 different protein families and copy number in each species. Principal component analysis (PCA) was performed using a matrix, which was log2 and median centred, of TTL gene to cluster species. To investigate whether genome-wide expansions of gene families could have confounded our results we examined a second gene family and genome-wide patterns of gene duplication in sea anemones. The second gene family investigated was Green Fluorescent Proteins (GFP). Candidates genes from this family were identified by performing BLASTPs against a custom protein database generated from functionally characterised proteins in the Swiss-Prot database that contain a GFP Pfam domain (PF01353) (e value < 1e-05). Sea anemone queries that received a significant hit were then manually examined to ensure they contained a GFP Pfam domain using HMMER 3.1b2 against the Pfam database (e value < 1e-05). To further examine genome-wide duplication across the taxa examined we used BUSCO on the predicted proteome of each species to determine the amount of duplication in complete single-copy orthologs present in the transcriptome of each sea anemone species. To evaluate if gene families showed a taxonomically restricted distribution, we constructed a species tree for the 14 transcriptomes investigated. Single-copy orthologous genes from the 14 actiniarian transcriptomes were identified using

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 59

OrthoMCL (Li, Stoeckert, & Roos, 2003) (e value < 1e-05, inflation 1.5). From this, 1,004 single-copy orthologous genes were aligned using MAFFT. Aligned orthologs were concatenated with the final alignment consisting of approximately 269,033 amino acids. The concatenated protein alignment was imported into IQ-TREE (Nguyen et al., 2014) to determine the best-fit model of protein evolution. The JTT model with Gamma rate heterogeneity, invariable sites and empirical codon frequencies was selected, and a Maximum Likelihood tree generated using 1,000 bootstrap iterations (Stamatakis, 2014). To investigate the gain and loss of TTL gene families in actiniarians, we used the DOLLOP program from the PHYLIP package version 3.696 (Felsenstein, 1989) (http://evolution.genetics.washington.edu/phylip.html). The species tree and a presence/absence matrix of TTL gene families, previously constructed, were imported into the DOLLOP program. The most parsimonious evolutionary scenario for the gain and loss of TTL gene families was estimated using Dollo’s parsimony law, which assumes genes arise once on the evolutionary tree and can be lost independently in different evolutionary lineages (Farris, 1977).

Selection analyses To determine whether different selective pressures have acted on gene families shared across order Actiniaria compared to those restricted to a single actiniarian family we used the gene family distribution and sequence data generated in section 2.2.1. Specifically, lineage-specific and widely distributed TTL gene families were tested for evidence of nucleotide variation consistent with the action of positive or negative selection, by analysing the ratio of synonymous to non-synonymous mutations using CODEML within the PAML package version 4.8 (Yang, 2007). Protein sequences from all TTL gene families were aligned using MAFFT version 7 (Katoh & Standley, 2013). The protein alignments were back translated using Pal2Nal (Suyama, Torrents, & Bork, 2006) to generate codon alignments. Only gene families with alignments that contained at least three sequences were used for downstream selection analyses. Codon alignments were imported into IQ-TREE (Nguyen et al., 2015) to determine the best nucleotide substitution model and to generate Maximum Likelihood phylogenetic trees. Maximum Likelihood models implemented in the CODEML package of PAML (Yang, 2007) were used to assess whether specific TTL

60 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

gene families were under positive selection. This analysis was performed as described in the study by Jouiaei et al. (2015) on 29 candidate TTL gene families.

Sanger sequencing of TTL genes in actiniarians Sanger sequencing was performed to validate multiple lineage-specific TTL gene families. Thirteen TTL genes identified in the transcriptomes for five species (A. tenebrosa, Anthopleura buddemeieri, Aulactina veratra, Telmatactis sp., and Nemanthus annamensis) were validated (Supplementary Table 13). Primer 3 software (Untergasser et al., 2012) was used to design primers from transcripts generated from the transcriptomes (Supplementary Table 14). cDNA was synthesised using the SensiFASTTM cDNA synthesis kit (Bioline). Toxin genes were amplified using the MyFiTM DNA polymerase mix (Bioline). Amplified fragments were purified using ISOLATE II PCR and Gel kit (Bioline). Amplified fragments were sequenced using BigDye® Terminator version 3.1 (Thermo Fisher). Sequences were then cleaned using an ethanol/EDTA protocol (Surm et al., 2015). Sequence chromatograms were visualised in Geneious version 9.1.3 (Kearse et al., 2012) and aligned to the transcript from which the primers were designed and the percentage similarity was calculated.

Tissue-specific and ontogenetic expression patterns of TTL genes To investigate whether TTL genes showed tissue specific expression patterns across functionally distinct tissue types, we undertook an RNA-seq experiment using tentacles, acrorhagi, and mesenteric filaments in A. tenebrosa. We also performed a separate analysis with the model sea anemone N. vectensis using tentacles, nematosomes, and mesenteric filaments to determine if similar patterns occurred in another species. Actinia tenebrosa was selected as multiple TTL proteins have been functionally characterised in this species and is closely related to Actinia equina, which also has long been used for the discovery of toxin proteins. In fact, many of the toxin genes identified and validated in sea anemone species have been discovered in species from the genus Actinia. Furthermore, A. tenebrosa possess functional acrorhagi, a novel envenomation structure used in intraspecific combat that is unique to certain species from the Actinioidea superfamily. All individuals used for the tissue-specific and ontogenetic expression patterns of TTL genes were housed in holding tanks under standard aquarium conditions for one week following field collection (van der Burg et

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 61

al., 2016). The three tissue types considered to contain the highest density of nematocysts were isolated from nine individuals (three replicate pools of three individuals for each tissue type). Total RNA was extracted as described (Prentis & Pavasovic, 2014), with minor modifications. These modifications included homogenisation of tissue in a Tissuelyser II (Qiagen) using a stainless-steel ball bearing (Qiagen) in Trizol®. All samples were assessed for integrity, quality and concentration using a Bioanalyzer 2100 (Agilent) and a QubitTM Fluorometer (ThermoFisher). Sequencing libraries were prepared using the Illumina TruSeq® Stranded mRNA Library Preparation Kit for 75 bp paired-end chemistry on an Illumina NextSeq 500. Strand-specific raw reads from all nine libraries were quality checked (Q > 20, N < 1%) and trimmed using Trimmomatic (Bolger et al., 2014). Trinity version 2.0.6 was used to assemble trimmed reads using default settings (Grabherr et al., 2011). The assembled transcriptome was assessed for completeness, using BUSCO. Strand- specific raw reads from all nine libraries were mapped to reference transcriptomes to generate FPKM values. The edgeR package (Robinson, McCarthy, & Smyth, 2010) was used to perform differential gene expression analysis among the libraries after a TMM normalisation step to account for differences in total RNA abundance across the samples. Transcripts were considered differentially expressed for a given false discovery rate (FDR) value of < 1e-03 and a fold-change of 4. Using the Trinity pipeline, heat maps were generated in R and used to visualise differentially expressed transcripts. Differentially expressed transcripts with similar expression patterns were further partitioned into subclusters by cutting the dendrogram at 50% of its height. Quality control analyses were performed on samples and replicates to validate if any discrepancies or batch effects were present. Pearson correlation was used to test for correlation in gene expression among replicates. The relationship among the sample replicates were also explored using PCA to ensure no batch effect was present (Supplementary Figure 2). ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder) was used to identify open reading frames in the forward–strand, encoding for proteins > 25 amino acid residues in length to produce a predicted proteome. The predicted proteome was used as a query against the Swiss-Prot database. Protein sequences with a significant hit were used to map GO terms using UniProt idmapping. Following annotation, gene ontology (GO) enrichment analysis was performed using GOseq (Young, Wakefield, Smyth, & Oshlack, 2010) to determine if specific GO terms were

62 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

over or underrepresented in the differentially expressed transcripts and subclusters. GO terms were considered significantly enriched or depleted at FDR < 0.05. Following GO enrichment analysis, significantly enriched GO terms were visualised using REVIGO with SimRel semantic similarities (Supek, Bošnjak, Škunca, & Šmuc, 2011). TTL candidate genes were identified as previously described. To evaluate whether gene expression patterns of TTL genes vary over the life history of actiniarians, different ontogenetic stages of A. tenebrosa were investigated. This included four size classes of juveniles (animal petal disc diameter of 1, 3, 6, and 9 mm), which consisted of pools of three individuals (Angeli, Zara, Turra, & Gorman, 2016; Larson, 2017). Genetic variability may contribute to variation in expression as individuals were not genetically identical. A single RNA-seq library was generated for each size class. Total RNA was extracted as above and sequencing libraries were prepared using the Illumina TruSeq® Stranded mRNA Library Preparation Kit for 75 bp single-end chemistry on an Illumina NextSeq 500. Assembly, annotation, and downstream analysis was performed as previously described. Species-specific differences in tissue and developmental TTL expression were investigated by performing additional differential expression analysis in the model species, N. vectensis (Bioproject PRJEB13676 (Babonis, Martindale, & Ryan, 2016), PRJNA200689 (Schwaiger et al., 2014)). Assembly, annotation, and downstream analysis was performed as previously described, however, strand-specific flags were not included.

qPCR of TTL genes across tissue-types Quantitative PCR (qPCR) was performed to validate the differential expression of candidate toxins in A. tenebrosa for 10 sequences across tissue types. Total RNA previously used for each RNA-seq library was used to synthesise cDNA using a SensiFASTTM cDNA synthesis kit (Bioline). Primers were designed from the reference transcriptome to amplify regions within candidate TTL transcripts. FastStart Essential DNA Green Master kit (Roche) was used for qPCR and run on the LightCycler® 96 System (Roche) to measure specific fluorescence at each cycle and quantify the initial levels of mRNA for each gene in each tissue. All qPCR analyses comprised of three technical replicates, with three biological replicates, to validate the expression of TTL genes across tissue types. Negative controls (no cDNA) were also performed for each

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 63

gene in each sample, and the 18S gene was used as a housekeeping control gene (Reitzel & Tarrant, 2009; Tarrant, Reitzel, Kwok, & Jenny, 2014). Relative quantification analysis was performed using the analysis function of the Lightcycler96 software using the ΔΔCT method. This method uses the reference gene and provides a basis for comparing levels of target sequences to levels of reference sequences and the final result is expressed as a relative ratio. Significance of results was assessed through ANOVA testing and differences were considered significant with P value < 0.05.

3.2.3 Mass Spectrometry

Venom extraction Venom was obtained from A. tenebrosa specimens by electrical stimulation (Malpezzi, de Freitas, Muramoto, & Kamiya, 1993) after a starvation period of at least 48 h, and it was then fractioned using reversed-phase HPLC (RP-HPLC) as described previously (Madio et al., 2018, 2017).

MALDI-TOF Lyophilized RP-HPLC fractions were dissolved in 0.1% (v/v) TFA/water and 0.5 µl was spotted onto a matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) plate with 0.5 µl α-cyano-4-hydroxycinnamic acid (CHCA) as the matrix (10 mg/ml in 60% acetonitrile (ACN)). Spots were analysed using a TOF/TOF 5800 System (AB SCIEX) in linear positive ion mode.

LC-MS/MS To identify proteins present in the milked venom, we used a bottom-up proteomics approach to analyse the digested RP-HPLC fractions. Reduction and alkylation of cysteine residues in venom proteins and peptides was performed as reported previously (Hale, Butler, Gelfanova, You, & Knierman, 2004). Reduced/alkylated venom was incubated overnight at 37 °C in 10 µl of 40 ng/µl proteomics-grade trypsin (Sigma) in 40 mM NH4CO3, pH 8. The digested reduced/alkylated samples were then resuspended in a final concentration of 1% formic acid (FA) and centrifuged for 15 min at 12,000 g prior to LC-MS/MS. For

64 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

analysis of RP-HPLC fractions, tryptic peptides were fractionated on an Agilent

Zorbax stable-bond C18 column (2.1 mm × 100 mm, 1.8 µm particle size, 300 Å pore size) using a flow rate of 180 µl/min and a gradient of 1–40% solvent B (90% ACN, 0.1% FA) in 0.1% FA over 15 min on a Shimadzu Nexera UHPLC coupled with an AB SCIEX 5600 mass spectrometer equipped with a Turbo V ion source heated to 500 °C. MS/MS spectra were acquired at a rate of 20 scans/s, with accumulation time of 0.25 ms, resulting in a cycle time of 2.3 s, and optimised for high resolution. Precursor ions with m/z of 300–1,800 m/z, a charge of +2 to +5, and an intensity of at least 120 counts/s were selected, with a unit mass precursor ion inclusion window of ± 0.7 Da, and excluding isotopes within ±2 Da for MS/MS. The crude venom digest was analysed as above except using a gradient of 1–40% solvent B in 0.1% FA over 60 min. Mass spectra were searched against predicted coding sequences (CDSs) from the assembled transcriptome using ProteinPilot v4.5 (AB SCIEX). Searches were run as thorough identification searches, specifying tryptic digestion and the alkylation reagent as appropriate. Biological modifications and amino acid substitutions were allowed in order to maximize the identification of protein sequences from the transcriptome despite the inherent variability of toxins, potential isoform mismatch with the transcriptomic data, and to account for experimental artefacts leading to chemical modifications. We used a stringent detected protein threshold score of 1% FDR as calculated by decoy searches.

Mass spectrometry imaging (MSI) To localise specific toxin peptides to morphological structures we used MSI. Mass spectrometry imaging was guided by published protocols (Caprioli, Farmer, & Gile, 1997) but with sample preparation optimised as recently described (Madio et al., 2018; Mitchell et al., 2017; Undheim et al., 2014b). Briefly, specimens of A. tenebrosa were left in 50% RCL2/ethanol at room temperature overnight, then dehydrated sequentially using 50%, 60%, 70%, 90%, 95% and 100% ethanol (3 x 15 min at each concentration), cleared in xylene for 30 min, and embedded in paraffin wax. A whole embedded animal was sectioned transversally at 7 µm thickness. Sections were de- paraffinized by careful washing with xylene, and optically imaged prior to applying CHCA (7 mg/ml in 50% ACN, 0.2% TFA) using a Bruker ImagePrep automated

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 65

matrix sprayer. FlexControl 3.3 (Bruker) was used to operate an UltraFlex III TOF- TOF mass spectrometer (Bruker) in linear positive mode, with m/z range set to 1,000– 20,000. A small laser size was chosen to achieve a spatial resolution of 50 µm, and matrix ion suppression was enabled up to 980 m/z. Individual MSI experiments were performed using FlexImaging 4.0 (Bruker). FlexImaging was used to establish the geometry and location of the section on the slide based upon the optical image, choose the spatial resolution, and call upon FlexControl to acquire individual spectra, accumulating 200 shots per raster point. FlexImaging was subsequently used to visualise the data in 2D ion-intensity maps, producing an averaged spectrum based upon the normalised individual spectra collected during the experiment. Spectra to regions was assigned using probabilistic sematic analyses, as incorporated in ClinProTools 3.0 (Bruker) and SCiLS Lab (SCiLS). The number of groups specified for the analyses were therefore derived from an Aikake information criterion calculation as incorporated in ClinProTools.

66 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

3.3 RESULTS

3.3.1 Comparative analysis of TTL genes across Metazoa

Venomous lineages have evolved independently numerous times across the metazoan tree of life (Casewell et al., 2013; Fry et al., 2009; Pisani et al., 2015). Our analysis captures the expansion and diversification of TTL genes in currently available genomes of representative taxa across both venomous and non-venomous lineages (Figure 3.1A). Comparative genomic analysis across Metazoa reveals multiple TTL genes are present in all lineages investigated, including those known to be non- venomous. These non-venomous lineages, however, have fewer TTL genes compared to venomous lineages (Figure 3.1A).

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 67

68 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Figure 3.1. Distribution and expansion of toxin and toxin-like genes across Metazoa. A) Metazoan phylogeny showing distribution and expansion of TTL genes in representative genomes, including Ctenophore, Porifera and Placozoa (CPP), Cnidaria, Ecdysozoa, Lophotrochozoa, and Deuterostomia (opaque bars represent number of different gene families; coloured bars represent copy number). Abbreviations Ve and Hs refer to species that are considered venomous or hematophagous specialists (specialised venomous subtype), respectively (Fry et al., 2009). B) Venn diagram showing the overlap of toxin gene families across major metazoan groupings. See Supplementary Table 7 for full list TTL gene copy number.

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 69

The process of convergent amplification, in which TTL genes and gene families are both expanded, is observed in the majority of known venomous lineages. Overall CPP taxa have a lower number of known TTL genes (7-8) and TTL gene families (4- 7), which is not surprising given their lack of venomous representatives. The first large expansion of TTL genes is observed in the cnidarian E. pallida, which has the fifth largest number of TTL genes (86; cnidarian range 31-86) representing 14 gene families (cnidarian range 10–15). This expansion of TTL genes is concordant with evidence that phylum Cnidaria is the earliest known venomous lineage (Jouiaei et al., 2015a). We observed a process of convergent amplification of TTL genes in multiple species in Ecdysozoa. The largest expansion occurs in the Arizona bark scorpion Centruroides sculpturatus with 258 TTL genes (Ecdysozoa range 4–258) and 24 TTL gene families (arthropod range 3–24). In Lophotrochozoa, multiple venomous lineages occur, however, some of the more exhaustively investigated lineages, such as cone snails, are absent in our analysis due to a lack of available whole genome data. Based on available data, we report a range of 7–84 TTL genes found in 4-15 TTL gene families in Lophotrochozoa. In Deuterostomia, venomous lineages have evolved multiple times, but a convergent amplification of TTL genes is restricted to the pit viper Protobothrops mucrosquamatus (88 TTLs across 25 gene families). TTL gene copy number is highly variable in this group ranging from 3 to 88, which are found across 2–25 TTL gene families. The pit viper P. mucrosquamatus, King cobra Ophiophagus hannah, platypus Ornithorhynchus anatinus, and crown-of-thorns starfish Acanthaster planci, are all venomous, but show divergent patterns of TTL gene copy number and distribution, with O. anatinus having only three TTL genes in two known toxin gene families. This indicates that copy number variation can be extensive among independently evolved venomous lineages. To understand the distribution of shared and lineage-specific TTL genes, we undertook a comparative analysis of all characterised metazoan toxins. A total of 69 gene families are reported across Metazoa, but only six are common to all broad metazoan groupings (Figure 3.1B). These include phospholipase (PLA2), venom Kunitz-type, DNase II, type-B carboxylesterase/lipase, true venom lectin, and snaclec. Much of the TTL gene family diversity observed across Metazoa is lineage-specific (43%; 29 of 68 gene families) and associated with independent origins of venomous taxa (Casewell et al., 2013; Fry et al., 2009) (Figure 3.1B). Cnidaria (5), Ecdysozoa (17), Deuterostomia (5) and Lophotrochozoa (2) all have lineage-specific TTL gene

70 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

families, while taxa from the CPP grouping have none (Figure 3.1B). Significantly, few lineage-specific TTL gene families show patterns of expansion, with the exceptions of Cnidaria small cysteine-rich protein (SCRiP) (12 copies found in A. digitifera), and long (4 C-C) scorpion toxin (59 copies found in C. sculpturatus). These findings indicate that although lineage-specific gene families contribute to a significant proportion of the diversity and complexity of the TTL gene complement, their impact on the total TTL gene number in most venomous species is less than convergently recruited TTL gene families. Taken together, venomous lineages appear to evolve in a consistent manner, relying on the convergent recruitment of shared gene families followed by gene duplication, as well as a smaller component driven by the evolution of new toxin families that lack homologs in other venomous species. We investigated the evolution of lineage-specific TTL gene families within Cnidaria, a phylum with a venomous common ancestor. A total of five TTL gene families (PLA2, multicopper oxidase, peptidase M12A, actinoporin, and snaclec) are shared by all cnidarian taxa (Figure 3.2A). Lineage-specific TTL gene families are found in all cnidarian species, with A. digitifera as an exception, with three TTL gene families in E. pallida (sea anemone 8, AB hydrolase, and sea anemone structural class 9a), two TTL gene families in H. vulgaris (CRISP and DNase II), three TTL gene families in S. pistillata (conopeptide P-like, insulin, and phospholipase B-like), and two TTL gene families in O. faveolata (latrotoxin-like and venom metalloproteinase (M12B)) (Figure 3.2B). The majority of these lineage-specific TTL gene families are in fact common to other venomous lineages, with only two TTL gene families (sea anemone 8 toxin and sea anemone structural class 9a TTL gene families) restricted to phylum Cnidaria. The most expanded gene family in all cnidarian taxa was PLA2, with the exception of SCRiP in A. digitifera. The most expanded lineage-specific TTL gene family is sea anemone 8 toxin in E. pallida. The comparison of TTL copy number and gene families across Cnidaria are consistent with a process of convergent amplification with limited evidence of TTL gene families contributing to this process.

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 71

72 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Figure 3.2. Comparative analysis of TTL within Cnidaria. A) Venn diagram of the distribution of TTL gene families within cnidarians. B) Heat map of the distribution and copy number of TTL gene families within cnidarians..

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 73

3.3.2 Comparative analysis of TTL genes across Actiniaria

In total, 39 TTL gene families are found across the 14 transcriptomes. Ancestral reconstruction analysis suggests 17 TTL gene families are present in the last common actiniarian ancestor (LCAA) (Figure 3.3A), three of which can be found in all actiniarians (venom Kunitz-type, PLA2, and sea anemone 8 toxin). Of the 17 TTL gene families found in the LCAA, sea anemone sodium channel inhibitory toxin (NaTx), sea anemone 8 toxin, sea anemone type 1 potassium channel toxin (KTx), and sea anemone type 5 KTx are all restricted to Actiniaria (Figure 3.3A). A gain of five TTL gene families and a loss of a single TTL gene family occurs in the Actinioidea superfamily following divergence from Metridioidea. The Metridioidea superfamily experienced only a gene family loss following the split from Actinioidea. Species- specific TTL gene family losses occur in all species, while species-specific gains are limited to A. diaphana, Calliactis polypus, and five in Stichodactyla haddoni. These TTL gene families gained at the species level, however, are not true species-specific gains as they are found in other venomous lineages and sea anemone species. Sanger sequencing validated 12 lineage-specific TTL genes in sea anemones with greater than 98.5% similarity to the transcript they were designed from (Supplementary Table 14 and 15).

74 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

76 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Figure 3.3. Comparative analysis and molecular evolution of TTL within Actiniaria. A) Maximum Likelihood protein tree generated to determine actiniarian phylogeny, all bootstrap support > 95 %. TTL gene family gains (green) and losses (red) are represented above and below branches, respectively. Bubble plot of the distribution and copy number of TTL gene families within actiniarians and TTL gene families with dN/dS > 1 highlighted with a black circle, dN/dS = 1 highlighted with a grey circle, and dN/dS < 1 highlighted with a white circle, above respective gene family. B) A plot of site-specific dN/dS values against amino acid residue positions for TTL gene families within actiniarians.

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 77

Species from Actinioidea have an increased mean copy number of TTL genes, compared to species from Metridioidea and Edwardsioidea (Figure 3.3A). Anthopleura buddemeieri and A. tenebrosa have the highest copy number of TTL genes, with 99 and 98 copies, respectively. Additionally, Anemonia sulcata, A. veratra, S. haddoni, Anthopleura dowii, and Megalactis griffithsi have 79, 74, 66, 60, and 42 copies, respectively. Average copy number in the Metridioidea superfamily is lower, with the highest copy number in A. diaphana (62), followed by N. annamensis (49), Telmatactis sp., (47), C. polypus (46), and E. pallida (40). In Edwardsioidea, E. carnea and N. vectensis have 50 and 45 copies, respectively. Principal component analysis revealed that the distribution and copy number of TTL genes clustered the species based on superfamily (Supplementary Figure 3). Copy number variation is observed within species from Actinioidea. Specifically, A. diaphana and E. pallida have recently been synonymized (Grajales & Rodríguez, 2014), and show variation in their TTL copy number. These samples were collected from different geographical locations and differences in TTL copy number and diversity may be due to population-level genetic differences. Variation in copy number could also be related to different degrees of transcriptome completeness, with A. diaphana having a more complete transcriptome compared to E. pallida (Supplementary Table 11). This pattern was not observed among the four transcriptomes for A. tenebrosa, however, with the blue ecotype having both the highest BUSCO score and the least copies of TTL genes. Furthermore, the number of individuals does not appear to affect TTL copy number variation. This is evident with the red ecotype transcriptome (n=2) showing similar TTL copy number with the brown (n=1) and green (n=1) ecotype transcriptomes. Allelic variability has the potential to inflate the observed TTL gene copy number observed in transcriptomes, while this has been minimised as much possible, these artefacts may potentially impact the TTL copy number variation reported. Evidence of genome-wide expansions were also explored that could potentially bias the TTL copy numbers reported. Variations in the copy number of non-toxin genes, in similar datasets to those investigated here, have been previously reported (Smith, Pavasovic, Surm, Phillips, & Prentis, 2018; Surm, Toledo, Prentis, & Pavasovic, 2018), but the expansions in these gene families are not consistent with the expansions we observe in TTL gene families. Results from copy number analysis of the GFP gene family also showed copy variation different from that observed in TTL gene families (Supplementary Table 8.2). Furthermore, we

78 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

observed no patterns of genome-wide expansion in single-copy orthologs in any of our transcriptomes (Supplementary Table 8.2). Taken together, these results do not support a role for genome-wide expansion in gene families confounding our copy number analysis of TTL genes in order Actiniaria. A systematic approach was used to investigate whether lineage-specific TTLs show divergent selective pressures in comparison to widely distributed TTL genes. Two TTL gene families display patterns of nucleotide variation (dN/dS ratio (ω)) consistent with positive selection (Figure 3.3A). These TTL gene families are true venom lectin (ω = 5.3753) and ficolin lectin (ω = 3.3567). Seven TTL gene families have codons under positive selection, six of which have multiple sites under positive selection (Figure 3.3B). Both the sea anemone type 3 (BDS-LIKE) KTx and ficolin lectin families have eight sites under positive selection, while huwentoxin-1 has four sites. Sea anemone 8 toxin, SCRiP, actinoporin, and CREC families all have two sites, while snaclec has one site evolving under positive selection. No difference was observed between the evolutionary pressures on lineage-specific and widespread TTL gene families with the majority of genes and sites under purifying selection (Supplementary Table 15)

3.3.3 TTLs show marked differences in expression and distribution across tissue types

Patterns of TTL gene expression and GO enrichment analysis across tissue types in A. tenebrosa is consistent with the functional specialisation of acrorhagi, mesenteric filaments, and tentacles (see Supplementary Table 12 for transcriptome assembly statistics). In total, 24,453 transcripts are differentially expressed across tissue types (Supplementary Figure 4A). Expression patterns are more similar in acrorhagi and tentacle, with mesenteric filaments having the most divergent expression profile. Enriched GO terms related to the functionality of specific tissues included digestion (GO:0007586), polysaccharide catabolic process (GO:0000272) and cellular defense response (GO:0006968) in mesenteric filaments; metalloendopeptidase activity (GO:0004222) and hemolysis in other organism involved in symbiotic interaction (GO:0052331) in acrorhagi; ion channel inhibitor activity (GO:0008200) and voltage- gated potassium channel activity (GO:0005249) in tentacles (Supplementary Table 16). The nematocyst (GO:0042151; cnidarian toxin delivery system), response to

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 79

stimulus (GO:0050896), and toxin activity (GO:0090729) GO terms are also significantly enriched in all comparisons among tissue types. The overrepresentation of the nematocyst and toxin activity GO terms indicates that the process of envenomation and TTL genes are an important component of the differentially expressed transcripts across these three tissues. In total, 114 TTL transcripts are found in the three tissue types, with 113, 111, and 111 TTL transcripts expressed with an FPKM > 0 in acrorhagi, tentacles, and mesenteric filaments, respectively (Supplementary Table 17). Approximately, 68% (78/114) of TTL transcripts are differentially expressed across the three tissue types (Figure 3.4A). As observed in the total dataset, the expression profiles of TTL transcripts in tentacles and acrorhagi are more similar compared to mesenteric filaments. Differentially expressed TTL transcripts are divided into five subclusters, representing transcripts upregulated in tentacles (subcluster 1), transcripts upregulated in acrorhagi (subcluster 2), transcripts upregulated in acrorhagi and tentacles (subcluster 3), a cluster of transcripts upregulated in mesenteric filaments (subcluster 4), and a cluster of transcripts massively upregulated in acrorhagi (subcluster 5) (Figure 3.4B). The different subclusters consist of 13, 18, 7, 24, and 16 transcripts for subclusters 1-5, respectively.

80 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

82 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Figure 3.4. Toxin expression profile across tissue types and ontogeny in Actinia tenebrosa. A) Heat map of differentially expressed TTL, Z- scaled FPKM values, for morphological structure: acrorhagi, mesenteric filaments and tentacle. B) Plot of the subclusters of differentially expressed TTL transcripts. C) Bar plot of the respective subclusters showing copy-number variation of differentially expressed TTLs across tissue types. D) Heat map of differentially expressed TTL, Z-scaled FPKM values, for ontogeny: 1, 3, 6, and 9 mm size classes. E) Plot of the subclusters of differentially expressed TTL transcripts. F) Bar plot of the respective subclusters showing copy-number variation of differentially expressed TTLs across tissue types.

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 83

All lineage-specific TTL gene families found in the reference transcriptome show differential expression across the three tissue types. Expression patterns of multiple lineage-specific TTL gene families are restricted exclusively to acrorhagi (Figure 3.4C). Copies of acrorhagin 1 and 2, SCRiP, and sea anemone NaTx are upregulated only in acrorhagi. Sea anemone type 3 (BDS-LIKE) KTx and sea anemone 8 toxin are upregulated in acrorhagi and tentacles. Sea anemone type 5 KTX is upregulated only in tentacles. Different members of sea anemone type 1 KTx toxin are upregulated in multiple tissues. Widely distributed and expanded toxin families are also found to have members differentially expressed across multiple tissues, including venom Kunitz-type, and PLA2. In addition, some widely distributed and expanded toxin families also showed patterns of restricted expression, with natterin upregulated exclusively in mesenteric filaments, true venom lectin upregulated exclusively in tentacle, and snaclec and ficolin lectin upregulated exclusively in acrorhagi. Quantitative PCR was performed to validate tissue-specific TTL differential expression analysis (Supplementary Table 18). Housekeeping control gene (18S) shows little variation across tissue types. The 10 genes showed statistically significant differential expression in concordance with tissue-specific RNA-seq results. Similarly, distinct patterns of TTL gene expression are observed across multiple tissue types in N. vectensis (Supplementary Figure 5A). In total, 45 TTL genes are identified in the transcriptome of N. vectensis across three tissue types, mesenteric filaments, tentacles and nematosomes (Supplementary Figure 6A). Of the 45 TTL genes, 18 are significantly differentially expressed, with tentacles and nematosomes showing greater similarity than mesenteric filaments, which has the most divergent expression pattern (Supplementary Table 17). The 18 differentially expressed TTL genes can be divided into four subclusters, representing two and eight transcripts upregulated in mesenteric filaments (subcluster 1 and 3, respectively), three transcripts upregulated in tentacle (subcluster 2), and five transcripts upregulated in nematosomes (subcluster 4). In order to examine whether the differences observed in gene expression translated into proteomic difference in venom profiles among tissues, we examined cross-sections of A. tenebrosa by MALDI MSI. Supporting the differential expression of TTL genes across tissue types, PCA by probabilistic sematic analyses revealed distinct mass profiles correlating with tissue types (Figure 3.5A and B). Notably, regions such as tentacles, oral disc, mesenteric filaments, and gonads were all

84 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

recovered as distinct groups, as was the general non-pedal disc epiderm. A peptide with low sequence similarity to the sea anemone structural class 9a family is found in the venom of A. tenebrosa. This peptide, although widely distributed, has a higher concentration in the tentacle region (Figure 3.5C). Further examination of the regions corresponding to the acrorhagi revealed several unique masses compared to the rest of the body. However, none of these masses were similar to those of acrorhagin 1 or 2. Instead, the most intense acrorhagi-associated peak corresponded to a sea anemone type 3 (BDS-LIKE) KTx, albeit with a significant extension in loop one of the b- defensin scaffold (Figure 3.5D and E).

3.3.4 TTLs show marked differences in expression across ontogenetic stages

We observed ontogenetic differences in expression of TTL gene families (we considered four ontogenetic stages, namely 1, 3, 6, and 9 mm) (see Supplementary Table 13 for transcriptome assembly statistics). In total, 2,227 transcripts are differentially expressed across the different ontogenetic stages (Supplementary Figure 4B). The nematocyst GO term is enriched in the 1 mm ontogenetic stage, supporting that envenomation and TTL transcripts contribute to a component of the differentially expressed genes. In total, 103 TTL transcripts are identified in the four ontogenetic stages, with 103, 103, 103 and 100 TTLs expressed with an FPKM > 0 in the 1, 3, 6, and 9 mm stages, respectively (Supplementary Table 17). Only 13 TTL transcripts are differentially expressed across ontogeny (Figure 3.4D). TTL expression profiles are most similar in the 3 and 6 mm stages, while the largest stage (9 mm) has the most divergent profile. Differentially expressed TTL transcripts are divided into four subclusters, representing transcripts upregulated in 1 and 3 mm stage (subcluster 1), transcripts upregulated in 1 mm stage (subcluster 2), transcripts upregulated in the 9 mm stage (subcluster 3), transcripts upregulated in 3 and 6 mm stages (subcluster 4) (Figure 3.4E). The different subclusters are made up of three, four, three, and three transcripts for subclusters 1–4, respectively (Figure 3.4F). This indicates that the expression of TTL genes is weakly influenced by ontogeny, where different TTLs are expressed in juveniles of different sizes. Limited differences in the expression patterns of TTL genes are observed across the developmental stages (gastrula, planula, and adult) in N. vectensis (Supplementary Figure 5B and 6B). Of the 63 TTL transcripts

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 85

identified in N. vectensis, only five are differentially expressed, all of which are upregulated in the adult developmental stage (Supplementary Table 17).

86 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 87

Figure 3.5. Mass spectrometry imaging (MSI) positive mode spectra acquired from cross-sectioned animal. A) Histological image of the section that was used for MSI experiments (stained with PAS). Tagged regions of interest (ROI) were selected based on biological functions and associated cnidae profile. ROI 01 is related to actinopharnyx, column and mesenterial filaments regions; ROI O2 is the acrorhagi; ROI 03 and 04 are regions related to tentacles. B) Slide sprayed with matrix CHCA. C) MSI of the average mass related to a peptide widely distributed with higher concentration in the tentacle region. D) MSI of the average mass related to a peptide with a distribution restricted to acrorhagi. E) Projection of the MSI linear positive mode spectra of ROIs and overall spectra.

88 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

3.4 DISCUSSION

3.4.1 Comparative analysis of TTL genes

Previous comparative genomic studies of venom evolution have been restricted to specific TTL gene families within taxa (Sunagar et al., 2015), or have had limited taxonomic representation (Casewell, Huttley, & Wüster, 2012; Casewell et al., 2013; Fry et al., 2009). Here we present a large scale comparative analysis of TTL gene family distribution across multiple metazoan genomes and show that a process of convergent amplification of TTL genes and gene families occurs in most venomous lineages (Figure 3.1A). This process of convergent amplification is explained by both gene redundancy, through increased copy number within a given gene family, and increased diversity of TTL gene families. Genetic redundancy is thought to increase the abundance of specific proteins acting against a limited number of molecular targets (Barve & Wagner, 2013; Jackson et al., 2016; Kafri, Springer, & Pilpel, 2009; Moran et al., 2008; Morgenstern & King, 2013; Nicosia et al., 2013; Wang, Yap, Chua, & Khoo, 2008), while the evolution of new gene families leads to the production of proteins with increased molecular target diversity (Casewell et al., 2013; Fry et al., 2009). This supports the idea that venomous animals are characterised by a high proportion of lineage-specific TTL genes (Casewell et al., 2013; Fry et al., 2009, 2003; Habermann, 1972; Olivera et al., 2012; Terlau & Olivera, 2004). This is evident in our results with lineage-specific genes contributing to almost half of the total TTL gene families identified across Metazoa. While lineage-specific TTL gene families are common in venomous species, our analysis indicates that they contribute less to the process of convergent amplification as they have lower copy number compared to widespread TTL gene families. The first convergent amplification of TTL gene families was found in phylum Cnidaria, the oldest extant venomous lineage. In line with previous studies, our analysis found multiple TTL gene families shared across all cnidarian species examined (Jaimes-Becerra et al., 2017; Rachamim et al., 2015). These shared TTL gene families have undergone significant and repeated gene duplication events, accounting for the majority of the TTL gene complement identified. While we observe lineage-specific variation of TTL gene complements within cnidarians, this is likely a consequence of gene loss or convergent recruitment, with a limited role of de novo gene formation. Such a pattern supports the hypothesis of convergent recruitment where the repurposing of pre-existing gene families into

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 89

toxins plays a dominant role in the evolution of venomous lineages. In addition, our data supports the notion that the repeated duplication and evolution of gene families plays an important role in the evolution of venom in cnidarians (Jaimes-Becerra et al., 2017; Jouiaei et al., 2015a; Moran et al., 2008; Rachamim et al., 2015). Currently, there is no consensus about whether selective constraints act similarly on both lineage-specific and widespread TTL genes in ancient venomous taxa (Jouiaei et al., 2015a; Pineda et al., 2014; Ruder et al., 2013; Sunagar et al., 2013; Undheim et al., 2014a, 2014b). In actiniarians, we demonstrate that lineage-specific TTL genes evolve in a similar way to widespread TTL genes. In fact, purifying selection plays the dominant role in the evolution of both lineage-specific and widespread TTL gene families in this group (Figure 3.3). This extends the findings of Sunagar and Moran (2015) that TTL genes in ancient animal lineages are more likely to be under purifying selection. In their two-speed mode of evolution hypothesis, it is suggested that toxin- encoding genes in younger venomous lineages are evolving under positive selection to confer an advantage with a shift in their ecological niche. The evolution of venom composition in a number of lineages is thought to be dominated by ecological factors, such as prey type availability (Amazonas et al., 2018; Dowell et al., 2018; Gibbs & Mackessy, 2009; Mackessy, Sixberry, Heyborne, & Fritts, 2006; Sunagar, Morgenstern, Reitzel, & Moran, 2016). In our analysis of sea anemone TTL genes we found that the toxin gene complement is consistent with the relatedness of species and share a greater number of TTL genes compared to those that share an ecological niche. This suggests that potentially phylogenetic inertia is an important mechanism in the evolution and distribution of TTL genes in sea anemones. Studies investigating the sequence variation of TTL gene families in cnidarians consistently report a similar pattern (Jouiaei et al., 2015a; Macrander et al., 2016a, 2015; Macrander & Daly, 2016b). This is in contrast to what has been observed in other venomous lineages, such as snakes, which show significant differences in their sequence variation and in TTL gene complement within and across lineages (Amazonas et al., 2018; Chippaux, Williams, & White, 1991; Dowell et al., 2018, 2016; Wooldridge et al., 2001). Many genes encoding toxins in snakes show strong evidence of positive selection, where non-synonymous mutations are thought to confer advantages to the venomous organism in their ecological niche (Gibbs et al., 2009; Gibbs, Sanz, Sovic, & Calvete, 2013). Given the strong evidence of positive selection acting on many genes encoding toxins in snakes (Sunagar et al., 2015), a dominant

90 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

role for ecology driving venom composition is possible. In contrast, most TTL genes in actiniarians display patterns of nucleotide variation consistent with the action of purifying selection and the toxin gene complement is related to phylogeny. This may mean that other molecular mechanisms, such as gene regulation that drives changes in gene expression, are important determinants of venom composition in these species. Interestingly, ecological factors, such as temperature, have been shown to impact expression of toxin genes in some sea anemones (O’Hara, Caldwell, & Bythell, 2018). While TTL gene families are under strong selective constraint in actiniarians, some members are highly expressed in a tissue-specific pattern, highlighting an alternative mechanism for a venomous lineage to generate different venom profiles to meet the organisms ecological requirements (Ames & Macrander, 2016; Fingerhut et al., 2018; Gao et al., 2018; Hu, Bandyopadhyay, Olivera, & Yandell, 2012; Macrander et al., 2016; Modica et al., 2015; Walker et al., 2018).

3.4.2 Expression differences of TTLs and the production of multiple venoms

In sea anemones, venom peptides are restricted primarily to gland cells and nematocytes, a stinging cell type that contains the envenomation machinery (nematocyst) and is widely distributed throughout the cnidarian body (Fautin, 2009; Fautin et al., 1991; Kass-Simon et al., 2002; Moran et al., 2012a). Importantly, nematocysts show significant heterogeneity in their density and morphology across tissue types (Basulto et al., 2006; Ewer & Fox, 1947; Fautin, 2009; Fautin & Mariscal, 1991). Our results demonstrate that venom gene expression and protein localisation are consistent with changes in nematocyte populations across the three tissue types that use venoms for different functions. This data is consistent with recent observations that some venomous animals produce functionally distinct venoms used in predation and defence from two discrete venom glands or even regions of the same venom gland (Dutertre et al., 2014; Fingerhut et al., 2018; Gao et al., 2018; Hu et al., 2012; Modica et al., 2015; Walker et al., 2018). Our data indicates that sea anemone species probably produce at least three distinct venoms across the three tissue types we analysed. This is consistent with previous studies that have revealed dynamic spatiotemporal gene expression of toxins across both development stages and the whole body of actiniarians, with differences observable at single cells . Taken together, these results support the evidence of the compartmentalisation of toxins to cells located within and

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 91

among different tissue types in cnidarians. We hypothesise that differences in regulatory variation drive the observed expression changes that underlie functionally distinct venom profiles among cells within an organism. In venomous taxa, novel morphological (venom gland) and genetic innovations (toxin genes) co-evolve to meet the ecological requirements of an organism (Dutertre et al., 2014; Undheim et al., 2015; Walker et al., 2018). In sea anemones, the functional and ecological roles of the morphological structures used for envenomation modulates the gene expression of toxins. This is evident with enzymatic toxins that have a role in protein, lipid, and carbohydrate metabolism, expressed in the mesenteric filaments, a morphological structure used for digestion and envenomation. Previous evidence supports this with expression of PLA2 localised to the nematocytes in the mesenteric filaments, suggesting its role in both digestion and envenomation (Fautin et al., 1991; Schlesinger, Zlotkin, Kramarsky-Winter, & Loya, 2009). Additionally, we found that neurotoxins are principally expressed in the tentacles and acrorhagi of A. tenebrosa. This pattern of neurotoxin expression corresponds well with function as tentacles are used in prey capture and defence. In support of our data, Ate1a, a potassium channel neurotoxin which paralyses potential prey species, was found to be localised to nematocytes in the tentacles of A. tenebrosa (Madio et al., 2018). Previous studies have also identified toxins to be expressed in the acrorhagi of actiniarians. However, limited studies have characterised the function of venoms restricted to this novel morphological structure (Honma et al., 2005). Since acrorhagi are solely used in intraspecific aggression, we hypothesise that the venom cocktail distinct to acrorhagi is specialised to envenomate other sea anemones (Honma et al., 2005). This is plausible given evidence of venom cocktails being prey-specific (Barlow, Pook, Harrison, & Wüster, 2009; Gibbs et al., 2009). Furthermore, the acrorhagi, which is an Actinioidea-specific morphological structure, shows evidence of tissue-specific expression of Actinioidea-specific TTLs, specifically acrorhagins 1, 2, and sea anemone type 3 (BDS-LIKE) KTx. We also provide evidence of the protein localisation of sea anemone type 3 (BDS-LIKE) KTx to the acrorhagi. This evidence supports that even within a lineage that shares a common venomous ancestor, novel morphological and genetic innovations co-evolve to meet the ecological requirements of the organism. Whether differences in the combination of toxins in venom are associated with gene expression changes, or divergent selection regimes acting on changes in protein

92 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

sequence, is fundamental to understanding venom evolution (McLysaght & Hurst, 2016). The prevailing paradigm describes the evolution of venom based on divergent selective pressure on new protein variants generated through convergent amplification of TTL gene families, and driven by ecological pressures (Casewell et al., 2013; Fry et al., 2009; Sunagar et al., 2015). Contrary to this expectation, we find that convergent amplification works in concert with changes in gene expression levels to generate multiple distinct venom profiles within a single organism. Specifically, in sea anemones phylogeny is correlated with the toxin gene complement, and ecological factors help drive changes in the expression of toxin genes to produce functionally distinct venoms profiles. Compartmentalisation of toxins across tissue types allows sea anemones to produce venoms with distinct biochemical and pharmacological properties that supports the functional roles of the tissue types they are restricted to. Indeed, current evidence supports an additional level of complexity, with venoms localised to discrete cells, or populations of cells within tissue types in sea anemones. Currently, it remains unresolved whether the localisation of functionally distinct venoms to discrete cells is a unique adaptive trait of sea anemones, with further evidence required to determine if this is shared across Cnidaria, or has independently evolved in other venomous lineages. The multiple lines of evidence indicate that the expression of TTL genes can be restricted spatiotemporally to produce functionally different venom cocktails to meet the ecological and life history requirements of the organism.

Acknowledgments: The authors would like to thank QUT Marine group for their help and advice caring for the animals. Computational resources and services used in this work were provided by the High Performance Computing and Research Support Group, Queensland University of Technology, Brisbane, Australia. Some of the data reported in this paper were obtained at the Central Analytical Research Facility operated by the Institute for Future Environments (QUT). This work was supported by QUT’s PhD Enabling Award, the Brazilian Government (Science Without Borders PhD scholarship to BM), Australian Research Council (DECRA Fellowship DE160101142 to EABU, ARC Linkage Grant LP140100832 to BRH and GFK), and National Health & Medical Research Council (Principal Research

Fellowship APP1044414 to GFK).

Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones 93

Author Contributions: J.M.S., H.L.S., C.A.V.D.B., P.J.P. and A.P. collected organism samples. J.M.S., H.L.S and C.A.V.D.B. assembled and annotated transcriptomes. Selection and phylogenetic analyses were performed by J.M.S. and P.J.P. Sequence validation was performed by H.L.S. and J.M.S. Mass spectrometry, venom extraction, HPLC, MALDI, LC-MS/MS and MSI were performed by B.M., E.A.B.U., G.F.K. and B.R.H. All authors read, edited and approved the final manuscript. Data accessibility: Tissue-specific and ontogenetic RNA-seq data are available at the NCBI sequence read archive under the accession numbers SUB2040667 and SUB2043941, respectively. A description and overview of the project are available under the BioProject accession number PRJNA350366. A description of the validated TTL genes using Sanger sequencing can be found in Supplementary Table14, and list of primers used in Supplementary Table 13. Briefly these validated TTL genes’ GenBank accession numbers are KY176759, KY176760, KY176761, KY176762, KY176763, KY176764, KY176765, KY176766, KY176768, KY176769, KY176770, and KY176771.

94 Chapter 3: A process of convergent amplification and tissue-specific expression dominates the evolution of toxin and toxin-like genes in sea anemones

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Joachim M. Surm1 ,2*, Zachary K. Stewart3, 4, Alexie Papanicolaou5, Ana Pavasovic1, and Peter J. Prentis3, 4

1School of Biomedical Sciences, Faculty of Health, Queensland University of Technology 2Institute of Health and Biomedical Innovation, Queensland University of Technology 3School of Earth, Environmental and Biological Sciences, Science and Engineering Faculty, Queensland University of Technology 4Institute for Future Environments, Queensland University of Technology 5 Hawkesbury Institute for the Environment, Sydney, NSW, Australia *Correspondence: [email protected];

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 95

Statement of Contribution of Co-Authors for Thesis by Published Paper The following is the format for the required declaration provided at the start of any thesis chapter which includes a co-authored publication. The authors listed below have certified that: 1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise; 2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication; 3. there are no other authors of the publication according to these criteria; 4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and 5. they agree to the use of the publication in the student’s thesis and its publication on the QUT ePrints database consistent with any limitations set by publisher requirements.

In the case of this chapter: Publication title and date of publication or status: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Contributor Statement of contribution Joachim M. Surm Wrote and edited the manuscript, conceived experimental design, performed DNA extraction, comparative genomics and phylogenetic analysis, collected organism samples, read and approved the final manuscript. Zachary K. Stewart Wrote and edited the manuscript, performed genome annotation, read and approved the final manuscript. Alexie Papanicolaou Wrote and edited the manuscript, performed genome assembly, read and approved the final manuscript. Peter J. Prentis Wrote and edited the manuscript, conceived experimental design,

96 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

collected organism samples, read and approved the final manuscript. Ana Pavasovic Wrote and edited the manuscript, conceived experimental design, collected organism samples, read and approved the final manuscript.

Principal Supervisor Confirmation I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date Professor Louise Hafner

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 97

ABSTRACT

Sea anemones have a wide array of toxic compounds (toxin peptides found in their venom) which can be of use to the pharmaceutical industries. To date, the majority of studies characterising toxins in sea anemones have been restricted to species from the Actinioidea superfamily. No draft genomes are currently available for this superfamily, however, highlighting our limited understanding of the toxin- coding genes in this important group. Here we have sequenced, assembled and annotated the first Actinioidean draft genome for Actinia tenebrosa. The genome is estimated to be approximately 255 megabases, with 31,556 protein-coding genes. Quality (scaffold and contig N50) and completeness (BUSCO) metrics revealed that this draft genome matches the quality and completeness of other model cnidarian genomes, including Nematostella, Hydra, and Acropora. Phylogenomic analyses revealed strong conservation of the cnidarian and hexacorallian core-gene set. We further found that species-specific genes and gene families dominate the evolution of cnidarian genomes, undergoing significant expansion events compared to shared gene families. Enrichment analysis performed for both gene ontologies and protein domains revealed that toxin genes contribute to a significant proportion of the species-specific genes and gene families in A. tenebrosa. The results make clear that the draft genome of A. tenebrosa will provide insight into the evolution of toxins, species-specific genes, the cnidarian and hexacorallian core-gene set, and provide an important resource for novel biological compounds.

98 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

4.1 INTRODUCTION

Cnidarian venom consists of a diverse array of peptides that have distinct biochemical and pharmacological properties (Jouiaei et al., 2015a, 2015b). These toxins are used for a variety of different roles, consistent with nematocyst morphology and function (Beckmann & Özbek, 2012; Fautin, 2009; Fautin & Mariscal, 1991; Kass-Simon & Scappaticci, Jr., 2002; Özbek, 2010). Multiple toxin types have been pharmacologically characterised in cnidarians, including neurotoxins, pore forming toxins, and enzymatic toxins (Casewell et al., 2013; Daly, 2016; Jouiaei et al., 2015a, 2015b; Prentis et al., 2018). Consistent with other venomous lineages, cnidarian venoms are a rich source of novel biological compounds, often being encoded by genes that lack homology to sequences other than cnidarians (Moran et al., 2012b; Sebé- Pedrós et al., 2018b; Sunagar et al., 2018). Recent studies have revealed a high frequency of cnidarian-specific genes are enriched within the cnidocyte (Sebé-Pedrós et al., 2018b; Sunagar et al., 2018). Many of these cnidarian-specific genes expressed in the cnidocytes encode for toxin peptides (Columbus-Shenkar et al., 2018; Sebé-Pedrós et al., 2018b). This highlights that cnidarians possess both morphological and biochemical novelties, and that the evolution of these innovations may be related. This is consistent with studies showing that acrorhagin 1 and 2, toxin-coding genes which are localised to the acrorhagi, a morphological structure used for envenomation that is unique to sea anemones from Actinioidea (Honma et al., 2005; Macrander et al., 2015). Indeed, understanding the evolution of venom and its delivery in cnidarians can provide broad insights into the innovation of morphological and biochemical novelties. While the majority of cnidarian toxin research has focussed on sea anemones from the Actinioidea superfamily (Prentis et al., 2018), no sequenced genomes for members of this superfamily currently exist. This lack of genomic resources available for Actinioidea limits our collective ability to understand the phylogenetic and molecular evolutionary histories of toxin-coding genes within this superfamily. Such a resource would provide an excellent model to investigate the evolution of novel morphological and cellular structures, and their relationship with novel genes. Actinia tenebrosa is a sea anemone from the superfamily Actinioidea. This species is similar in morphology to the northern hemisphere species, Actinia equina (Farquhar, 1898; Sherman et al., 2007; Watts et al., 2000), both of which have been

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 99

used as model organisms for the investigation of sea anemone toxins (Honma et al., 2005; Maček & Lebez, 1988; Minagawa et al., 2008; Moran et al., 2008; Norton, Maček, Reid, & Simpson, 1992; O’Hara et al., 2018; Prentis et al., 2018; Watts, Allcock, Lynch, & Thorpe, 2000). Here, we have sequenced and assembled the first Actinioidea draft genome in A. tenebrosa. This is an essential resource due to its key phylogenetics placement, providing a represented genome for all three major superfamilies in Actiniaria (Actinioidea Metridioidea, and Edwardsioidea). We provide insights into the evolution of lineage-specific genes in cnidarians, specifically revealing that these novel genes undergo increased rates of expansions compared gene families that have a wider distribution. Moreover, genetic innovations restricted to Actinioidea are found to be enriched for functions related to venom and its delivery.

4.2 METHODS

4.2.1 Genome assembly of Actinia tenebrosa

Sample preparation, sequencing and assembly

Samples of Actinia tenebrosa were collected from the intertidal zone at Coolum, (QLD, Australia). Tissue from a single individual was used to extract high-quality gDNA using the E.Z.N.A. Mollusc DNA Kit (Omega Bio-Tek) (Stefanik, Wolenski, Friedman, Gilmore, & Finnerty, 2013). Extracted gDNA was used to construct four paired end (PE) libraries sequenced on Illumina 2500 HiSeq platform using multiple insert sizes (170, 500, 2000, 5000 bp) with a read length of 100 bp (NCBI BioProject PRJNA505921). Sequencing resulted in over 150 million PE reads per library, with over 96 % being high-quality (Q > 30, (N (ambiguous bases) <1%)). Contiguous sequences were generated and scaffolded using a manual operation of ALLPATH-LG (Butler et al., 2008) with a focus on removing redundant sequences. The presence of the complete mitochondrial genome of A. tenebrosa in the draft genome was investigated. Assembled contigs were queried using BLASTN against a database which consisted of the complete mitochondrial genome of A. equina. Contigs receiving a significant hit (e value 1e-05) where imported into Geneious 9.1.6 and aligned using a global alignment with free end gap and 100% identity. This resolved a single sequence, of 20,691 bp, and was aligned to the complete mitochondrial genome of A. equina using eight iterations of MUSCLE. Gene order and annotation of the

100 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

mitochondrial genome of A. tenebrosa was performed as per Wilding and Weedall (2019).

4.2.2 Annotation

Repeat library generation

Homology and ab initio-based methods were used to identify repeat regions and low-complexity DNA sequences. Miniature Inverted-repeat Terminal Elements (MITEs) were predicted with MITE-HUNTER v.11-2011 (Han & Wessler, 2010) and detectMITE v.20170425 (Ye, Ji, & Liang, 2016). MITE predictions were clustered using CD-HIT v.4.6.4 (Fu et al., 2012). Parameters = “cd-hit-est -c 0.8 -s 0.8 -aL 0.99 -n 5” (same parameters used by detectMITE). Prediction of long terminal repeat retrotransposons (LTR-RTs) was performed using LTRharvest (GT 1.5.10; Ellinghaus, Kurtz, & Willhoeft, 2008) and LTR_FINDER v.1.06 (Xu & Wang, 2007) and these results were combined using LTR_retriever commit 9b1d08d (Ou & Jiang, 2018) to identify canonical and non-canonical (i.e., non-TGCA motif) LTR-RTs. MITE and LTR-RT libraries were concatenated, and the genome sequence was masked using RepeatMasker open-4.0.7 (Smit et al., 2013) with settings ‘-e ncbi -nolow -no_is -norna’. De novo repeat prediction was performed using RepeatModeler open-1.0.10 (Smit et al., 2008) with the masked genome as input. All repeat models were curated to remove models putatively part of protein- coding genes. Any models confidently annotated by LTR_retriever or RepeatModeler (i.e., not classified as “Unknown”) were removed from consideration as they are not likely to be part of protein-coding genes. Open reading frames from the remaining repeat models’ were extracted and examined using HMMER 3.1b2 (Eddy, 2011) to identify models that only contained domains associated with transposable elements. For this purpose we collated a list of transposon-associated domains which primarily consisted of domains identified by Piriyapongsa et al. (2007) with additional Pfam (Finn et al., 2014) and NCBI CDD (Marchler-Bauer et al., 2015) domains included on the basis of manual inspection of domain prediction results for putative transposable elements. Repeat models that contained a TE-associated domain prediction were removed from consideration and assumed to be true-positives. A custom database of known genes was created to enable BLAST comparison of remaining repeat models

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 101

and subsequent removal of false predictions from protein-coding genes. The database includes the UniProtKB/Swiss-Prot proteins as well as the gene models of Nematostella vectensis (v.2.0) (Putnam et al., 2007; Schwaiger et al., 2014), Exaiptasia pallida (v.1.1) (Baumgarten et al., 2015), Acropora digitifera (v.0.9) (Shinzato et al., 2011), and Hydra vulgaris (Chapman et al., 2010). This database had probable transposons removed via the same process detailed above using HMMER 3.1b2 and domain organisation. Any remaining repeat models were removed from the initial custom repeat library (CRL) if they had significant BLASTX hits (e value < 1e-02) when queried against the gene model database. The final curated CRL was used to soft-mask the A. tenebrosa genome using RepeatMasker (-e ncbi -s -nolow -no_is - norna -xsmall) for later gene prediction. Scripts were produced to automate this process, and are available from https://github.com/zkstewart/Genome_analysis_scripts/tree/master/repeat_pipeline_s cripts.

Gene model prediction and annotation

Following the masking of repeat regions, gene models were predicted using ab initio methods guided by transcriptional expression. These reads included the Red and Brown ecotypes obtained from NCBI (Bioproject PRJNA313244; van der Burg et al., 2016). Raw reads were quality trimmed using Trimmomatic (Bolger et al., 2014) with parameters used by the Trinity de novo assembler (Haas et al., 2013; MacManes, 2014). Trimmed sequences were aligned against the genome using STAR 2.5 (commit e48567c) (Dobin et al., 2013) using the 2-pass procedure for the de novo identification of transcription splice sites. The SAM file produced by STAR was converted to BAM format and sorted using samtools v.1.5 (Li et al., 2009). Gene models were predicted by BRAKER1 v1.11 (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016) using the soft-masked genome assembly and the STAR alignment file as inputs. The completeness of the protein-coding genes was then assessed using BUSCO (Simão et al., 2015; Waterhouse et al., 2018). Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al., 2017) using MMseqs2 with an e value < 1e-05 (Steinegger & Söding, 2017). Gene Ontology (GO) terms associated with the representative UniProtKB sequence for each Uniclust90 hit were attributed to the A. tenebrosa gene

102 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

model using the idmapping_selected.tab file provided by UniProtKB (file dated 26/10/17). Protein domain predictions were performed by HMMER 3.1b2 using a custom domain database, which included NCBI’s CDD in addition to CATH (S35 v.4.1.0) (Dawson et al., 2017) and SUPERFAMILY (1.75) (Gough, Karplus, Hughey, & Chothia, 2001), and tabulated using scripts available from https://github.com/zkstewart/Genome_analysis_scripts/tree/master/annotation_table.

4.2.3 Gene family evolution

Using translated gene models from Nematostella vectensis, Exaiptasia pallida, Acropora digitifera, Amplexidiscus fenestrafer, Discosoma sp., and Hydra vulgaris, an “all-against-all” BLASTP analysis (e value <10e −5) was performed. ORTHOMCL version 2.0.9 (Li, Stoeckert, & Roos, 2003) was used, with default parameters, to assign proteins into orthologous gene groups. Phylogenetic analyses were performed using single-copy orthologs (SCO) for each species. A total of 1,314 SCO were identified and aligned using clustal-omega (Sievers et al., 2011). The alignments were concatenated and the best evolutionary protein model (JTT+F+I+G4) was determined. Finally, a maximum-likelihood tree with 1,000 ultrafast bootstrap replicates was generated using IQ-TREE (Nguyen et al., 2015). Following the generation of a cnidarian species tree, the gain and loss of gene families across Cnidaria was inferred using the DOLLOP program from the PHYLIP package version 3.696 (Felsenstein 1989) (http://evolution.genetics.washington.edu/phylip.html). The species tree and a presence/absence matrix of gene families were imported into the DOLLOP program. The most parsimonious evolutionary scenario for the gain and loss of gene families was estimated using Dollo’s parsimony law, which assumes genes arise once on the evolutionary tree and can be lost independently in different evolutionary lineages (Farris 1977). The predicted proteomes from cnidarian species with sequenced genomes were used to investigate the evolution of protein domains. Protein domains were predicted using HMMER 3.1b2 against the Pfam database (e value < 1e-05) and the best hit was retained and overlapping domains removed. A Fischer exact test was performed to determine Pfam enrichment with p-value of 0.05. Finally, we investigated the proportion of shared and unique gene families in actiniarian species. A BLASTP analysis (e value < 1e-05) was performed with

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 103

OrthoVenn (Wang et al., 2015) using gene models from A. tenebrosa, N. vectensis and E. pallida to determine the number of shared and unique gene families in each species. The microsynteny of sea anemone type 3 (BDS-LIKE) potassium channel toxin (KTx) was investigated in A. tenebrosa. BLASTP was performed to identify sea anemone type 3 (BDS-LIKE) KTx candidate genes in the predicted proteome of A. tenebrosa against the manually curated Swiss-Prot database (accessed 18/04/18) (e value < 1e-05). Significant queries with top BLAST annotations from sea anemone type 3 (BDS-LIKE) KTx in the Tox-Prot database (Jungo et al., 2005) were considered candidate proteins. Candidate proteins were imported into Geneious 9.1.6 and presence of a signal peptide was examined using SignalP (Petersen et al., 2011). The presence of conserved Pfam domains was identified using InterProScan (Jones et al., 2014), and confirmed. Presence of conserved cysteine framework was confirmed by aligning using MUSCLE in Geneious 9.1.6, to full length functionally characterised sea anemone type 3 (BDS-LIKE) KTx in the Tox-Prot database. Aligned sea anemone type 3 (BDS-LIKE) KTx was used for phylogenetic analysis to determine their distribution among Actiniaria. Protein alignment was imported into IQ-TREE (v1.4.2) (Nguyen et al., 2015) to determine best-fit of protein model evolution. Using Bayesian information criterion, a PMB+G4 model was selected as the best-fit model of protein evolution. was generated from alignment using 1,000 ultrafast bootstrap iterations, and visualised using Figtree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/).

4.3 RESULTS

4.3.1 Genome assembly

Using a whole-genome shotgun strategy, we sequenced and assembled the genome of Actinia tenebrosa. A total 1.2 billion paired-end reads, with a length of 100bp, were sequenced across four different insert size libraries (170, 500, 2000, and 5000 bp) (Supplementary Table 19). Raw reads were used to assemble the A. tenebrosa genome using ALLPATHS-LG. The genome size of A. tenebrosa is estimated to be ~255 Mbp (Supplementary Table 20). The draft genome assembled is of similar quality to other cnidarian genomes, specifically the completeness (see Supplementary Table 21). Although the assembly resulted in the scaffold and contig N50 lower than

104 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

other cnidarian genomes, the predicted genome completeness using metazoan Augustus gene models is among the highest (89.6 %) for cnidarian genomes, with only N. vectensis having a more complete assembly (91.6 %). The assembly contains ~19 % repetitive DNA sequences, which is similar to reported values for other cnidarians (Supplementary Table 22).

4.3.2 Functional annotation of predicted gene models

The ab initio gene model prediction identified 31,556 protein-coding genes in A. tenebrosa. All gene models were validated, receiving significant BLAST hits against multiple A. tenebrosa transcriptomes. Our ab initio gene model prediction was highly complete compared to other cnidarian genomes, increasing the previous BUSCO score to 94.6 % (Table 4.1). Only E. pallida gene models were more complete (94.7 %). Of the 31,556 protein-coding genes, 19,022 and 25,478 returned a significant BLAST hit (e value 1e-05) against the Swiss-prot and TREMBL database, respectively. This highlight ~80 % of the predicted proteome shares sequence similarity to known protein sequences, with ~20 % having no similarity to other proteins. In contrast, only 6.56 % of E. pallida predicted proteome returned no hits to known proteins at this stringency. However, other cnidarian genomes returned similar levels of novelty, with Discosoma sp. having 16.17 % of proteins returning no hits. The annotation of protein domains revealed 19,056 (~60 %) gene models to contain identifiable Pfam domains. This is less than other sea anemone genomes, with 78.64 % and 68.35 % of E. pallida and N. vectensis gene models having a protein domain, respectively. Additionally, both corallimorpharians genomes reported less than 60 % of gene models to encode proteins with known protein domains. Taken together, these results highlight that the draft genome of A. tenebrosa is mostly complete, yet a significant proportion of its genes are unique.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 105

Table 4.1. Functional annotation of gene models from seven cnidarian genomes

Annotation metrics ADIG AFEN ATEN DSPP EPAL HVUG NVEC

BUSCO (%) 80.5 72.8 94.6 68.6 94.7 91.5 93.8

Protein-coding genes 33,878 21,372 31,556 23,199 26,087 21,990 24,780

SP annotation 24,094 12,959 19,022 13,562 20,515 15,923 18,974

SP annotation (%) 71.12 60.64 60.28 58.46 78.64 72.41 76.57

No SP annotation (%) 28.88 39.36 39.72 41.54 21.36 27.59 23.43

TREMBL annotation 30,116 18,106 25,478 19,447 24,376 19,992 247,80/*20,869

TREMBL annotation (%) 88.90 84.72 80.74 83.83 93.44 90.91 100.00/84.21

No TREMBL annotation (%) 11.10 15.28 19.26 16.17 6.56 9.09 0/15.78

Pfam 24,000 12,686 19,056 13,283 20,514 15,665 16,938

Pfam annotated (%) 70.84 59.36 60.39 57.26 78.64 71.24 68.35

No Pfam annotated (%) 29.16 40.64 39.61 42.74 21.36 28.76 31.65

Total Pfam found 52,242 27,154 42,834 27,355 45,944 28,984 30,605

Pfam per gene 1.54 1.27 1.36 1.18 1.76 1.32 1.24 * as the predicted proteome of N. vectensis is incorporated into the TREMBL protein database, a subset of TREMBL’s database with N. vectensis predicted proteins removed was used instead. ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis.

Our assembly also resolved the complete mitochondrial genome for A. tenebrosa (GenBank accession MK291977), shown to be 20,691 bp long (Supplementary Figure 8). The mitochondrion of A. tenebrosa was aligned to the recently completed A. equina mitochondrion (Wilding et al., 2019), revealing identical gene order and protein- coding sequence similarity. Nucleotide differences in the mitochondrion of A. tenebrosa and A. equina included a thymine insertion in the intergenic region between genes ND6 and CYTB in A. tenebrosa, a transversion SNP was identified in the large RNA subunit, and a transition SNP identified in the intergenic region between COIII and COI genes.

4.3.3 Gene family evolution

Cnidarian gene turnover was investigated through manual curation and phylogenomic characterisation of seven cnidarian species. OrthoMCL identified 1,314 SCO, and from this we built a cnidarian species tree for all seven genomes (Figure

106 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

4.1). This species tree confirmed the phylogenetic position of A. tenebrosa with previously published species trees (Daly et al., 2017; Rodríguez et al., 2014; Wang et al., 2017). We found 7,373 gene families were shared among all cnidarian taxa investigated. 7,026 gene families were gained in the Anthozoa following their divergence from (H. vulgaris). While in the actiniarian lineage (which includes A. tenebrosa, E. pallida, and N. vectensis), 1,389 and 185 gene families were gained and lost, respectively. Examination of the genome of A. tenebrosa found that 947 gene families (3,963 genes) were gained in this species following divergence from other sea anemone taxa investigated. In all cnidarians, species-specific gene families have undergone a greater expansion compared to gene families shared among cnidarians (Table 4.2). This is most apparent in A. tenebrosa and H. vulgaris, with species-specific gene families having a mean copy number of 4.18 and 4.99 genes, respectively. Additional novelty is observed with 6,705 (21.26 %) singletons (species- specific genes not in gene families) found in the gene models of A. tenebrosa. These results suggest significant gene family conservation across cnidarians, particularly in Anthozoans, but with species-specific genes contributing to a significant proportion of the genome.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 107

Figure 4.1. Comparative analysis of gene families within Cnidaria. Maximum Likelihood protein tree generated to determine cnidarian phylogeny, with all bootstrap support equal to 100 %. TTL gene family gains (green) and losses (red) are represented above and below branches, respectively. ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis.

108 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Table 4.2. Expansion of shared and species-specific gene families in cnidarians

ADIG AFEN ATEN DSPP EPAL HVUG NVEC

Total genes 33,878 21,372 31,556 23,199 26,087 21,990 24,780 Singletons 4,053 5,261 6,705 5,752 2,590 2,800 5,492 Singletons (%) 11.96 24.62 21.25 24.79 9.93 12.73 22.16

Total gene families 14,285 13,279 15,576 13,306 14,501 8,666 13,323 Total genes in gene families 29,825 16,111 24,851 17,447 23,497 19,190 19,288 Expansion 2.09 1.21 1.6 1.31 1.62 2.21 1.45

Species-specific gene families 1,210 279 947 496 602 1,293 1,037 Species-specific gene families (%) 8.47 2.1 6.08 3.73 4.15 14.92 7.78 Species-specific genes 4,238 659 3,963 1,232 1,830 6,451 3,447 Expansion 3.5 2.36 4.18 2.48 3.04 4.99 3.32 Shared gene families 13,075 13,000 14,629 12,810 13,899 7,373 12,286 Shared gene families (%) 91.53 97.90 93.92 96.27 95.85 85.08 92.22 Shared genes 25,587 15,452 20,888 16,215 21,667 12,739 15,841 Expansion 1.96 1.19 1.43 1.27 1.56 1.73 1.29 ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis.

A closer examination of gene families within Actiniaria revealed 10,260 orthologs shared across the three actiniarian genomes investigated (Figure 4.2). These 10,260 actiniarian orthologs, however, do not exhibit any species-specific GO term enrichment. Five GO terms, including nematocyst (GO: 0042151; Supplementary Table 23) were over-represented in the predicted protein sequences from the 1,208 genes unique to A. tenebrosa. This highlights that a significant proportion of genes unique to A. tenebrosa have roles related to envenomation. Although all actiniarians are venomous, we observe, therefore, the first expansion of lineage-specific genes related to venom delivery.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 109

Figure 4.2. Comparative analysis of gene families among Actiniarians. Venn diagram highlighting orthologous genes between Actiniarian genomes. ATEN = Actinia tenebrosa, EPAL = Exaiptasia pallida, NVEC = Nematostella vectensis.

To better understand the evolution of protein domains across cnidarian genomes, we also investigated Pfam domain enrichment. Using a Fischer exact test, 25 Pfam domains were significantly enriched in A. tenebrosa, in comparison to other cnidarian genomes (Figure 4.3). Enrichment of ShK and Defensin_4 domains underpinned much of the expansion of toxin related genes in A. tenebrosa. Both ShK and Defensin_4 domains are associated with potassium-channel blocking toxins in sea anemones, specifically sea anemone type 1 potassium channel toxin (KTx) and type 3 (BDS- LIKE) KTx, respectively (Castañeda et al., 1995). .

110 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Figure 4.3. Protein domain enrichment across Cnidaria. Heat map of Pfam domains enriched in Actinia tenebrosa. Abundance of Pfam domains in cnidarians log2 and median centred ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp., EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 111

We identified multiple copies of sea anemone type 3 (BDS-LIKE) KTx in the genome of A. tenebrosa. While multiple copies were found, only four were observed receive a significant BLAST hit (e value 1e-05) to functionally characterised sea anemone type 3 (BDS-LIKE) KTx, as well as containing a signal peptide, Defensin_4 domain and conserved cysteine framework. All four genes are found on different scaffolds and contain two exons and a single intron (Figure 4.4 A). The intron accounts for the majority of the gene sequence accounting for up to 86 % of its length. The exons for two copies are found to be encoded on the reverse strand, while in all copies the first exon, which is smaller, encodes for the signal peptide and the second, larger exon, encodes the Defensin_4 domain (Figure 4.4 B). Phylogenetic analysis revealed copies 1 (CDS ID: g5504.t1) and 2 (CDS ID: g15111.t1) clustered together, and copies 3 (CDS ID: g19671.t1) and 4 (CDS ID: g28928.t1) clustered together (Supplementary Figure 8). All four copies clustered together in a broad clade consisting of sequences from Anemonia viridis and Antheopsis maculate. In general, sequences from the same species appear to cluster together.

112 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Figure 4.4. Microsynteny of sea anemone type 3 (BDS-LIKE) potassium channel toxin (KTx) copies in Actinia tenebrosa. A) Intron-exon structure of sea anemone type 3 (BDS-LIKE) potassium channel toxin. Exon 1 coloured red, exon 2 coloured blue, and intron coloured grey. Arrows depicting strand directionality and scale bar representing 100 nucleotides. B) Protein alignment of sea anemone type 3 (BDS-LIKE) potassium channel toxin with proteins sequences annotated with signal peptide (yellow) and Pfam domain (purple). Coloured box surrounding sequences corresponds to exon number as per A.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 113

4.4 DISCUSSION

In this manuscript, we present a draft genome assembly of A. tenebrosa. This assembly is the first from any species of the superfamily Actinioidea. Overall, the assembly was of similar quality (scaffold and contig N50) and completeness (BUSCO) to currently published anthozoan genomes (Baumgarten et al., 2015; Chapman et al., 2010; Putnam et al., 2007; Shinzato et al., 2011; Wang et al., 2017), verifying its suitability for comparative genomic studies. Insights into the evolution of gene families across Cnidaria revealed significant conservation among anthozoan species, with the many gene families gained in either the last common ancestor of Cnidaria or Anthozoa. Notably, all anthozoans used in this study are from Hexacorallia, highlighting a high conservation of gene families shared among this subclass. This is consistent with previous studies that have suggested that this shared gene set plays an important role in the evolution of traits essential to Hexacorallia taxa, including symbiosis with dinoflagellates, stress response, and delivery of venom (Baumgarten et al., 2015; Rachamim et al., 2015; Wang et al., 2017). Furthermore, our investigation of sequenced cnidarian genomes is consistent with other studies, revealing a high proportion of lineage-specific genes not found in other groups. (Baumgarten et al., 2015; Putnam et al., 2007). These results indicate that sea anemones are a useful model system to study the evolution of novel genes The origin of new genes is considered to be an important source of evolutionary novelty, by providing the substrate upon which natural selection can act. New genes may be formed through multiple processes, ranging from gene duplication through exon shuffling to de novo gene formation (Kaessmann, 2010; McLysaght et al., 2016; Tautz et al., 2011). Genes created through these processes produce copies of a gene that are identical to the ancestral sequence or generate genes with novel sequences that are restricted to specific lineages (Capra et al., 2010). Our analysis revealed that lineage-specific gene families undergo increased rates of gene duplication compared to gene families shared in cnidarian species. This suggests that following the formation of new genes in cnidarian taxa, repeated duplication events occur. However, this also suggest that minimal new genes arise through de novo gene evolution in cnidarians, as genes generated through this mechanism have been reported to undergo limited gene duplication (Schlötterer, 2015).

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 115

We propose that the major contributor to the evolution of new genes in cnidarians is through a process of gene duplication and divergence. A similar mechanism is observed for the antifreeze proteins (AFP) which show limited distribution, being restricted to Lycodichthys dearborni (Deng, Cheng, Ye, He, & Chen, 2010). These AFPs evolved from sialic acid synthase which are found in the genome of closely related fish. Duplication of > 30 copies have accumulated point mutations that contribute to ice binding and a secretion signal in the amino terminal to direct extracellular secretion (Andersson, Jerlström-Hultqvist, & Näsvall, 2015; Deng et al., 2010). Another example of massive duplication of lineage-specific genes is neurotoxins in scorpions. Fifty-one neurotoxin genes are found to be arranged across 17 scaffold clusters, many of which have undergone tandem duplication of the same family. Moreover, these neurotoxins that have undergone massive duplication lack homologs with species other than scorpions (Cao et al., 2013). Significant expansions of neurotoxins are also observed in A. tenebrosa. This was evident from the significant expansion of Pfam domains (ShK and Defensin_4) associated with neurotoxins that modulate potassium ion-channels. The Defensin_4 domain is associated with the sea anemone type 3 (BDS-LIKE) potassium channel toxin family, and both the gene family and protein domain are restricted to Actinioidea (Diochot, Schweitz, Béress, & Lazdunski, 1998). Sea anemones, and in particular species from the genus Actinia, are an important group used to understand the evolution of venoms. Our results highlight this potential as gene families encoding peptide toxins are enriched in A. tenebrosa relative to other sea anemone species. For example, genes involved in venom production (toxin peptides) or delivery (cnidocyte) are associated with the nematocyst GO term, which are significantly over-represented in the gene families restricted to A. tenebrosa. This GO term was not enriched for toxin genes restricted to N. vectensis or E. pallida. This result may be a consequence of ascertainment bias, however, as the majority toxins characterised in actiniarians have been identified in the superfamily, Actinioidea (206 of the 236 cnidarian toxins; Prentis et al., 2018). Another confounding factor in this analysis was that it does not include another Actinioidea species, and the high portion of species-specific genes may in fact be those that are gained in Actinioidea. This is evident with acrorhagin 1 and 2, found as species-specific genes in our analysis, identified in the previous chapter in multiple closely related species from the superfamily Actinioidea (Macrander et al., 2015). It is therefore likely that genes

116 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

identified as species-specific are in fact restricted to a specific lineage. Greater resolution of bona fide species-specific gene families will be possible as additional cnidarian genomes are sequenced. The A. tenebrosa genome is the most gene dense among cnidarian, with only E. pallida having a smaller genome, and only A. digitifera, having more protein-coding genes. However, flow cytometry revealed the genome size of Actinia equina is much larger than predicted here at ~520 Mb (Adachi, Miyake, Kuramochi, Mizusawa, & Okumura, 2017). Given the significant similarity between A. equina and A. tenebrosa, it is likely that they would have similar genome sizes. The relatively complete BUSCO score for the gene models suggests that much of the gene set has been captured, however, the discrepancy observed between genome sizes may be the result of repeat regions that have not been captured. The A. tenebrosa genome also contained a higher proportion of lineage-specific genes compared to other cnidarian genomes. Previous studies have identified this pattern in species from the superfamily Actinioidea, particularly those genes that encode for toxin peptides. Recent studies have shown that there is relatively little overlap of toxin genes among cnidarian species and that a high proportion are restricted to specific lineages (Rachamim et al., 2015). Many lineage- specific toxins from A. tenebrosa have expression restricted to acrorhagi, a novel structure used for envenomation (Surm et al., 2019). This data suggests the hypothesis that novel genes are expressed in novel morphological structures. Evidence in support of this hypothesis in other cnidarian species is equivocal. For example, although Nematostella-specific genes comprise a significant proportion of genes expressed in the nematosomes, a novel structure only found in this genus, many of these genes were also expressed in tissues common to all sea anemone species (Babonis et al., 2016). In venomous animals, biochemical and morphological innovations result in phenotypic adaptations, such as toxin peptides and an envenomation system. Although the cnidarian envenomation system is largely conserved across this phylum, this analysis revealed duplication events in gene families enriched in A. tenebrosa include many nematocyte–related proteins such as toxin peptides. Therefore, we propose that the genome sequence of A. tenebrosa will aid future research to improve our understanding of Actinioidean innovations involved in venom production and its delivery.

Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations 117

Acknowledgments: The authors would like to thank the Evolutionary and Physiological Genomics Lab (ePGL), in particular Chloé A. van der Burg, for their continual help and support. The authors would also like to thank QUT Marine group for their help and advice caring for the animals. The authors would like to acknowledge QUT Molecular Genetics Research Facility for the use of their facilities and the Hawkesbury Institute for the Environment for computational resources. The data reported in this paper were generated at the Central Analytical Research Facility operated by the Institute for Future Environments. Computational resources and services used in this work were provided by the High Performance Computing and Research Support Group, Queensland University of Technology, Brisbane, Australia. We declare there are no conflicts of interest.

Author contributions: All authors conceived and designed the project. JMS, PJP, and AnP collected organism samples. JMS performed DNA extraction. AlP assembled genome, and ZKS annotated genome. JMS performed comparative genomics and phylogenetic analysis. JMS led the draft of the manuscript with contributions from all authors. All authors read and approved the final version.

Data accessibility: A description and overview of the project are available under the BioProject accession number PRJNA505921. A description of the complete mitochondrion is available through GenBank accession number MK291977

118 Chapter 4: The draft genome of Actinia tenebrosa reveals insights into the evolution of venom innovations

Chapter 5: General discussion

This project aimed to combine functional genomic and bioinformatic approaches to understand the origin and evolution of novel genes in phylum Cnidaria. Genes can originate through multiple mechanisms such as duplication, horizontal gene transfer, exon shuffling, gene fusion/fission, and de novo gene formation. As new genes arise, there is the potential for new innovations, including both morphological and biochemical novelties. The previous research chapters have identified different mechanisms that generated novel genes, as well as the different selection pressures acting on them. These combined results help to increase our understanding of the origin and evolution of novel genes. In this chapter, I summarise the results from the three independent studies, compare them with previous studies on other species, and discuss how these results help to better understand novel gene evolution and their consequences. In the first study, I investigated Fad and Elovl gene families. Research into these gene families that desaturate and elongate LC-PUFAs has largely been restricted to bilaterian taxa (Carmona-Antoñanzas et al., 2011; Castro et al., 2012; Fonseca- Madrigal et al., 2014; Kabeya et al., 2017; Li et al., 2017, 2010; Mohd-Yusof et al., 2010; Monroig et al., 2017, 2012a, 2016, 2012b, 2013, 2011b, 2010; Morais et al., 2009; Surm et al., 2015). Here I provide insights into the phylogenetic and molecular evolutionary histories of the Fad and Elovl gene families in Cnidaria. My phylogenetic analysis revealed lineage-specific gene duplication in actiniarians for both the Fad and Elovl gene families. Furthermore, two additional Elovl clades were identified, one actiniarian-specific (Novel ElovlA) and the other cnidarian-specific (Novel ElovlB), both of which did not cluster with any functionally characterised Elovl proteins. The selections analysis revealed pervasive purifying selection to be acting on both gene families. However, following repeated gene duplication events in the Elovl gene family, codons are observed to have patterns of nucleotide variation consistent with the action of episodic diversifying selection. The adaptive evolution of these codons may result in the Elovl enzymes ability to target and elongate different precursor fatty acids. Overall, this study has revealed that actiniarians possess Fad and Elovl genes

Chapter 5: General discussion 119

required for the biosynthesis of some LC-PUFAs, and that these genes appear to be distinct from bilaterians. Previous comparative genomic studies of venom evolution have been restricted to specific TTL gene families within taxa (Sunagar et al., 2015), or have had limited taxonomic representation (Casewell et al., 2012, 2013; Fry et al., 2009). In this study, I examined gene copy number, patterns of molecular evolution, as well as expression patterns and protein localisation across toxin gene families in phylum Cnidaria. I demonstrate that toxin gene copy number varies widely within this phylum, and that different toxin gene families are expanded in different Orders. In sea anemones, multiple toxin gene families have undergone a process of convergent amplification in species from different superfamilies and lineage-specific toxin genes comprise a large proportion of the toxin profile. My results indicate that lineage-specific toxin genes evolve in a similar manner to more ancestral toxin genes, but that specific genes within toxin gene families show strong patterns of tissue restricted expression. This gene expression data is consistent with the whole-body distribution of the venom delivery system in cnidarians. To date, no draft genomes are currently available for the members of the Actinioidea superfamily. This is significant as the majority of work characterising toxins in sea anemones have been restricted to this superfamily. This highlights a lack of genomic resources that are required for essential comparative studies to understand toxin evolution. Here I have sequenced, assembled and annotated the first draft genome from Actinioidea. Quality metrics revealed that this draft genome is of similar quality and completeness to other cnidarian genomes. Comparative genomic analysis revealed a large proportion of gene families are shared among cnidarians, specifically revealing 7,026 gene families gained in the last common Hexacorallian ancestor. Gene family evolution performed across Cnidaria revealed the presence of multiple protein domains enriched in A. tenebrosa, including those known to be associated with neurotoxins (SHK and Defensin_4 domains). Furthermore, gene families restricted to A. tenebrosa are enriched for the nematocyst GO term, highlighting that lineage- specific differences in A. tenebrosa is related to venom and its delivery. My results also revealed that gene families that are restricted to lineages among cnidarians undergo increased duplication rates compared to those gene families that are shared. Overall, this genome constitutes an important genomic resource for future comparative

120 Chapter 5: General discussion

studies and will provide insights into the evolution of venoms and sea anemones in general. These three studies have revealed insights into the evolution of genes and gene families in cnidarians. Lineage-specific genes appear to play a major role in contributing to the diversity of the gene families in cnidarian genomes, but contribute less to the total gene complement. Concomitantly, lineage-specific duplications play a major role in the evolution of cnidarian genomes. My analyses also revealed that purifying selection dominates the evolution of both lineage-specific and wide-spread gene families in Cnidaria, with some evidence for episodic diversifying selection acting on codons in paralogs that have undergone multiple duplication events. While gene families are under strong selective constraint in cnidarians, some members of gene families are highly expressed in a tissue-specific pattern. This tissue-specific expression coincides with the functional roles of different morphological structures in sea anemones allowing gene expression to meet the ecological and life history requirements of these organisms.

5.1 LINEAGE-SPECIFIC DUPLICATIONS

Gene duplication has consistently been shown to be the major contributing mechanism in the evolution of new genes. The canonical model of gene evolution by duplication was proposed by Ohno (Ohno, 1970) and has been shown to occur in gene families across all domains of life in the post genomic era. Following gene duplication, multiple different evolutionary outcomes can occur, these include neofunctionalisation, subfunctionalisation or pseudogenisation. The most common outcome is an accumulation of deleterious mutations, resulting in a loss of function. Although comparatively rare, beneficial mutations can also accumulate, which can result in a new a gene with new function. Duplication, mutation and subsequent positive selection have the ability to generate gene products with novel functions. For example in primates, eosinophil cationic protein and eosinophil-derived neurotoxin are paralogs of the ribonuclease gene family, and have divergent functions, which is the result of positive selection driving nonsynonymous nucleotide substitutions to accumulate (Zhang, Rosenberg, & Nei, 1998). Similar patterns have been observed in many other gene families after lineage-specific duplications (e.g. lungfish myoglobins), which occur when a

Chapter 5: General discussion 121

duplication event is unique to a given lineage. Lineage-specific duplicates can provide an important source of genes with a novel function. Indeed in mammals, a high proportion (~10% ) of lineage-specific duplicates show patterns of nucleotide variation consistent with the action of diversifying selection (Han, Demuth, McGrath, Casola, & Hahn, 2009). Examples of human gene families evolving under positive selection include genes involved in neurotransmission (RIMBP3B, RIMBP3C), as well as immune response genes and response to inflammation or stress (Han et al., 2009). Such lineage-specific duplications appear to be a consistent driver of evolution of new genes in Cnidaria as well and were observed in all studies in this thesis. Lineage-specific duplications dominate cnidarian genome evolution. This is evident from the results presented in all data chapters, as well as the patterns observed in other studies investigating gene family evolution in cnidarians (Shpirer, Diamant, Cartwright, & Huchon, 2018; Smith et al., 2018; van der Burg et al., 2016). An excellent example is found in a recent study by Sunagar et al. (2018) which found a lineage specific duplication of the proto-oncogenic transcription factors c-Jun and c- Fos in Hexacorallia. c-Jun and c-Fos transcription factors dimerize into a complex essential to various stress responses (Elran et al., 2014; Hess, Angel, & Schorpp- Kistner, 2004; Meng & Xia, 2011), but the duplicated hexacorallian c-Jun and c-Fos are essential in nematogenesis, affecting cnidocyte morphology and density in sea anemones. This lineage-specific duplication has resulted in the neofunctionalisation of these transcription factors. Similar patterns were observed in an anthozoan-specific expansion of Toll/interleukin-1 receptor (TIR) proteins (van der Burg et al., 2016) and myxozoan copies of nematocyst-specific protein 2 (Shpirer et al., 2018). In cnidarians, a process of neofunctionalisation generated through the action of positive selection is possible following lineage-specific duplications, however, pervasive passive purifying selection has been shown to dominate the evolution of these and other gene families. Potential lineage-specific duplications of TTL genes can occur even within a species, and in cases where purifying selection is strong, can result in extreme conservation among members within a multigene toxin family. This pattern has been observed for some neurotoxin gene families in other sea anemone species (Madio et al., 2018; Moran et al., 2008). In these examples, concerted evolution has been used to explain this pattern, where genes within a family evolve as a single unit. I believe the Birth-and-Death Model of Evolution and strong purifying selection, however, may better explain sequence conservation in many of the multigene toxin families I

122 Chapter 5: General discussion

investigated (Nei & Rooney, 2005). Under this model, new genes are generated by gene duplication, with some duplicates being retained, whereas others are pseudogenised through deleterious mutations. The Birth-and-Death Model of Evolution and strong purifying selection better explains the lower level of protein homogeneity I observed across duplicates in my studies and those in many other recent studies in cnidarians. The evolutionary innovations following gene duplication is not always reliant on dramatic divergence of protein function, but in some cases may result from changes in gene expression. Subfunctionalisation can explain these changes in expression restricted to different tissues or developmental stages, providing a new function from the same protein. In cnidarians, lineage-specific gene duplicates exhibit spatiotemporal gene expression patterns, which is a key characteristic of subfunctionalisation. This is evident in the expression of toxins within multiple tissues used in envenomation, such as acrorhagi, mesenteric filaments and tentacles. Similar patterns have also been observed for toxins across complex life cycle of N. vectensis (Columbus-Shenkar et al., 2018). Furthermore, lineage-specific expansions of globin genes were identified in phylum Cnidaria, which were the result of repeated rounds of gene duplication (Smith et al., 2018). In actiniarians, this expansion was followed by subfunctionalisation that has resulted in tissue and developmental specific expression patterns, with predicted structural and functional protein variation following one of these duplication events. Consequently, from my results and supporting literature, lineage-specific duplication, purifying selection and subfunctionalisation seem to dominate the evolution of cnidarian gene families and play an important role in lineage-specific innovations within this group. In fact, my data supports the notion that the repeated duplication and evolution of gene families plays an important role in the evolution of venom in cnidarians (Jaimes-Becerra et al., 2017; Jouiaei et al., 2015a; Moran et al., 2008; Rachamim et al., 2015)

5.2 NOVEL GENES

Lineage-specific genes can play a major role in the evolution of genomes, being implicated in the generation of novel traits by providing additional genetic diversity. Following the divergence from Metridioidea, sea anemones from Actinioidea underwent novel genetic innovations which also coincided with morphological and

Chapter 5: General discussion 123

biochemical innovations. Specifically, from my analysis the genome of A. tenebrosa revealed a higher proportion of genes that are lineage-specific compared to other cnidarian genomes. Furthermore, putative lineage-specific gene families among cnidarians have undergone increased expansions compared to those that are shared. This pattern indicates that a combination of both new gene formation and subsequent gene duplication events occur at a high frequency in cnidarian species. Recently, it has been shown in mammals that novel ORFs generated through de novo gene formation, or from non-coding sequence, is pervasive, but are rapidly lost (Schmitz et al., 2018). Comparatively, novel ORFs generated from the divergence of coding sequences have an increased likelihood of being retained (Schmitz et al., 2018). Taken together with my results, I suggest that given the long evolutionary histories that diverge cnidarians, the novel genes identified are more likely to arise from coding sequences. This would more likely result in the novel genes to be retained, as opposed to being generated through de novo gene formation. These retained novel genes can then undergo repeated rounds of duplication. This complies with studies of de novo gene formation that indicate that these de novo genes undergo limited duplication events (McLysaght et al., 2016; Schlötterer, 2015). Furthermore, an example of this can be observed in genes encoding sea anemone toxins. Specifically, the sea anemone type 3 (BDS-LIKE) KTx is a gene family unique to Actinioidea that it is reported to have evolved from sea anemone NaTx family through a process of gene duplicating and positive selection (Jouiaei et al., 2015a). These two toxin families show limited sequence similarity beyond their conserved cysteine framework and is likely a unique evolutionary innovation in the Actinioidea lineage. Multiple lines of evidence supports the notion that over long time-scales, high loss rates diminish the number of surviving de novo genes, supporting that duplication remains to be the dominant process that underlies new gene formation (McLysaght et al., 2016; Schlötterer, 2015; Schmitz et al., 2018; Zhao et al., 2014). Long evolutionary histories and patterns of purifying selection has constrained the trajectories of cnidarian gene families to be consistent with phylogeny. This is evident from the toxin gene copy number, and gene family diversity and distribution. This is in contrast to that observed in other venomous lineages, in which the evolution of toxin genes and gene families is dominated by ecological factors that shape their biochemical and pharmacological adaptations (Gibbs et al., 2009; Mackessy et al.,

124 Chapter 5: General discussion

2006). My results suggest that phylogeny, and not ecology, is the most influential driver of the TTL gene complement in Actiniaria. A similar hypothesis is suggested by Sunagar and Moran (2015) to explain the divergence in the selection pressure observed between evolutionary younger venomous lineages (snakes, cone snails; nucleotide variation consistent with the action of positive selection), and more ancient venomous lineages (cnidarians, spiders, scorpions; nucleotide variation consistent with the action of negative selection). Specifically, the authors hypothesise that differences in the selection regimes is a result of the younger lineages toxin peptides undergoing adaptive evolution as a consequence of shift in an ecological niche. I further build on this concept, suggesting that both sequence variation and the total toxin gene complement in these older lineages shows evidence of being constrained to phylogeny, and suggestive of phylogenetic inertia or phylogenetic signal (these terms are inconsistently defined; Blomberg & Garland, 2002). Broadly, here I define this as the pattern observed when closely related relatives are more similar than distant relatives. This patterns is viewed by some as the alternative, or null, hypothesis to test adaptation by natural selection for the presence (or absence) of a character in a taxon, traits are a product both of their evolutionary history and natural selection in the recent and current environment (Blomberg et al., 2002). For example, the early evolution of venom in sea anemones, or even cnidarians, that constrains later venom evolution in these taxa. Alternatively, if stabilising selection is acting, adaptation by natural selection may be involved in the maintenance of the trait (Griffiths, 1996). This distribution of the toxin gene complement also shows a pattern consistent with the evolution of novel morphological structures in in sea anemones. Acrorhagi is a highly specialised structure used in intraspecific aggression, found only in a few lineages of the superfamily Actinioidea (Fautin et al., 1991; Honma et al., 2005; Minagawa et al., 2008; Ottaway, 1978). Recently it was suggested that acrorhagi are pleisiomorphic in Actinioidea, with lineage-specific losses explaining the presence and absence of this morphological structure observed across this superfamily (Daly et al., 2017). Similarly, acontia is another novel morphological structure used for envenomation, which is found in species from Metridioidea. Acontia have also been observed to have undergone lineage-specific losses, for example being lost in Nemanthus annamensis (Rodríguez et al., 2014). It remains unclear, however,

Chapter 5: General discussion 125

whether the patchy distribution of morphological structures observed is the result of changes in gene regulation, gene loss events, or both in concert. Gene loss has been a major influence in the evolution of myxozoan genomes. Species from , a divergent cnidarian class, represents an extreme evolutionary transition from a free-living cnidarian to a microscopic parasite (Barlow et al., 2009). This diverse group of parasitic invertebrates are hypothesised to be highly reduced cnidarians, with evidence supporting this by the presence of nematocyst-like polar capsules (Chang et al., 2015; Evans, Lindner, Raikova, Collins, & Cartwright, 2008; Shpirer et al., 2018). Compared to other cnidarians, myxozoan genomes are also reflective of significant reduction, having one of the smallest reported animal genomes (Chang et al., 2015). Significant gene loss events are also reported to be related to development, cell differentiation, cell–cell communication, and cell signalling (Chang et al., 2015). Chang et al. (2015) conclude that the reduction in the myxozoan body plan, genome size and gene content is related. This highlights that gene loss in cnidarians can be associated to a shift in an ecological niche. Innovations through gene loss have been reported across a broad taxonomic groups, including in venomous lineages (Albalat & Cañestro, 2016). An a example of this is observed in the loss of globin genes in Antarctic fish, opsin genes in blind cavefish, and olfactory genes in old world monkeys (Cocca et al., 1995; Gilad, Wiebe, Przeworski, Lancet, & Pääbo, 2004; Yang et al., 2016). Interestingly, gene loss is observed to also contribute to evolutionary novelties. Evidence of gene loss, or pseudogenisation, being adaptive is reliant on the previously functional gene becoming detrimental to an organism. Through positive selection acting on nucleotide variation, the detrimental functional gene will accumulate mutations to disrupt the gene. Such a process is suggestive of adaptive pseudogenisation. An example of this is the replacement of the pseudogene rcsA gene in Yersinia pestis with its functional version, results in the repressed formation of biofilms, and thus reduces the transmission rate of this bacteria (Sun, Hinnebusch, & Darby, 2008). Recently, gene loss has been identified to play an adaptive role in venom evolution. Dowell et al. (2016) found that species-specific differences in rattlesnake toxins are due to gene loss events, not gene duplications. Alternatively, when pseudogenisation is deleterious, the action of purifying selection prevents this process. The evidence of pervasive purifying selection from mine and supporting data supports the notion that TTL genes are functional (Jouiaei et al., 2015a; Macrander et al., 2016b; Sunagar et al., 2015).

126 Chapter 5: General discussion

However, the high turnover rate of TTL genes also suggest that pseudogenisation is potentially not deleterious. Although patterns of purifying selection dominate the evolution of sea anemone gene families, dynamic changes in gene expression provides an alternative mechanism that can confer an advantage to the meet ecological requirements of an organism.

5.3 NOVEL STRUCTURES

In venomous taxa, novel morphological (venom gland) and genetic innovations (lineage-specific toxin genes) co-evolve to meet the ecological requirements of an organism. The expression of genes encoding toxin peptides are often massively upregulated in the venom gland (Casewell et al., 2013). Furthermore, compartmentalisation of toxin expression within and among morphological structures, such as a single or multiple venom glands, has been observed across multiple venomous lineages (Dutertre et al., 2014; Macrander et al., 2016a; Morgenstern et al., 2013; Undheim et al., 2015; Walker et al., 2018). However, limited studies have observed evidence of the localisation of venoms to specific cells within venom glands. Undheim et al. (2015) demonstrated that in centipedes, greater venom gland complexity is associated with a greater diversity in the toxin gene complement. In fact, the centipede venom gland is suggested to be a composite of semiautonomous subglands, with evidence suggesting compartmentalisation of peptide toxins to individual subgland cells. In sea anemones, venom peptides are restricted primarily to specialised cells, such as gland cells and nematocytes (Fautin, 2009; Fautin et al., 1991; Kass-Simon et al., 2002; Moran et al., 2012a). My results demonstrate that toxin gene expression and protein localisation in sea anemones is consistent with changes in nematocyte populations across the three morphological structures (acrorhagi, mesenteric filaments, and tentacles) that use venoms for different functions. This is consistent with previous studies that have revealed dynamic spatiotemporal expression of toxin-coding genes across both development and the body plan of actiniarians, with differences observable at a single-cell level (Columbus-Shenkar et al., 2018; Macrander et al., 2016a; Moran et al., 2012b; Nicosia et al., 2013; Sebé-Pedrós et al., 2018b; Sunagar et al., 2018). Lineage-specific TTL gene families restricted to the Actinioidea superfamily, such as acrorhagin 1 and 2, are found to be upregulated in acrorhagi. Additionally, I

Chapter 5: General discussion 127

have found multiple copies of the neurotoxin sea anemone type 3 (BDS-LIKE) potassium channel toxin gene to be both upregulated and localised to the acrorhagi. Specifically, I hypothesise that the venom cocktails distinct to acrorhagi are specialised to envenomate anemones (Honma et al., 2005). This is plausible with evidence from other venomous lineages supporting the evidence that venom cocktails can be specific to a given prey (Barlow et al., 2009; Gibbs et al., 2009). However, Acrorhagin 1 and 2 have been shown to be lethal to crabs, even though it has massively upregulated gene expression, and is the dominant protein in acrorhagi. Conservation of molecular targets such as ion channels may help explain this non prey-specific activity (Casewell et al., 2013). Together, the evidence supports that novel genes and novel structures have an important relationship. Taken together, these results support the evidence of the compartmentalisation of toxins to cells located within and among different tissue types in cnidarians. This cellular localisation of venoms may also be a trait shared by other venomous lineages, which can result in functionally and biochemically distinct venom profiles. These results support the evidence that lineage-specific toxin genes have restricted expression to lineage-specific structures. However, not all members from lineage-specific gene families show restricted expression to the acrorhagi or other novel structures. A similar observation is reported by (Babonis et al., 2016) in N. vectensis, which show that although some novel genes are restricted to novel structures, this is not universal for all novel genes. Additionally, a N. vectensis single- cell atlas revealed that the cnidocyte-specific expression can be shared across morphological structure, such as tentacles, mesenteric filaments, and column (Sebé- Pedrós et al., 2018b). An example of this is the cnidocyte-specific expression of paxA, transcription factor, correlates to both the epidermis and tentacle tips. This analysis also revealed specific gene age distributions, specifically the enrichment of cnidarian- specific genes in cnidocytes. This remains unclear, however, as cell populations correlating to cnidocytes are also enriched in genes that are conserved in eukaryotes. Another study in N. vectensis revealed that novel genes have cnidocyte-specific expression, with nine of ten novel genes selected show significant upregulated expression in cnidocytes (Sunagar et al., 2018). The pattern of cell-specific expression of lineage-specific genes is not restricted to cnidarians and is also observed in other lineages.

128 Chapter 5: General discussion

Single-Cell RNA-seq in metazoan species (Cnidaria, Ctenophora, Placozoa, and Porifera) have revealed novel genes showing strong patterns of tissue-specific expression, often in phylum-specific cells (Sebé-Pedrós et al., 2018a, 2018b; Sunagar et al., 2018). Alternatively, conserved genes found across different phyla show little cell type specificity (Sebé-Pedrós et al., 2018a, 2018b). For example, choanocyte gene expression is dominated by sponge-specific genes, whereas, archaeocytes and sperm cells are enriched with genes of older phylogenetic origin (Sebé-Pedrós et al., 2018a). In the ctenophore, digestive cells are enriched in genes shared among animals and unicellular organisms, while multiple uncharacterized cell types are enriched for ctenophore-specific genes. Similar expression patterns are also found for lineage-specific structure in placozoans. These datasets highlight that many lineage-specific cell types, such as nematocytes, have greater expression of lineage- specific genes compared to shared cell types. The transcriptional regulatory network controlling taxonomically restricted gene expression for this cell-type specific expression is not yet characterised but may contain transcription factors or cis- regulatory elements.

5.4 LIMITATIONS AND FUTURE DIRECTIONS

The aim of this study was to investigate the evolution of novel genes, using cnidarians as a model group. Large-scale comparative analyses, such as that undertaken here, inherently have various biases. This is due to sequencing genomes with a focus on species that are medically relevant, experimentally tractable, and easy to sequence (del Campo et al., 2014). This is especially apparent for the current genomes available for cnidarian species. While additional medusozoan genomes have become available, the majority of those available are from Anthozoa. Left unaddressed, these biases can strongly impact comparative analysis, specifically those aimed at reconstructing the genes shared in the last common ancestor (Lewis & Dunn, 2018). Furthermore, the difficulties associated with reconstructing the genes shared in turn affects our ability to identify genes that are unique. Specifically, resulting in the false positive discovery of novel genes, that may in fact be more widely distributed than that observed. For future studies to mitigate this, greater sampling and improved genomes are essential. Furthermore, greater sampling of sea anemone genomes would

Chapter 5: General discussion 129

be of great value, specifically to resolve the TTL gene copy number within actiniarians, as currently this is reliant on transcriptome data for comparative studies. Improving genome assemblies is essential as low-quality draft genomes can affect in our ability to reconstruct ancestral genes, and further bias the characterisation of species-specific genes. An essential component would be the improved contiguity of genomes achieved through long-read sequencing, such as PacBio SMRT and Oxford NanoPore sequencing. Such long-read sequencing has yet to be utilised for sea anemone genomes. This can be further resolved by scaffolding entire chromosomes. These chromosome-scale reference assemblies can be achieved using linked reads, or optical map data. Additionally, Hi-C and related chromatin crosslinking protocols are also able to create very long-range mate pair-like data that have a remarkable capability for phasing and scaffolding and when combined with a high-quality draft assembly, nearly entire eukaryotic chromosomes can be resolved. The use of transcriptomes to resolve copy number variation is inherently difficult as candidate genes maybe missed as a result of low gene expression. It would also be beneficial while increasing the breadth of genomes, to also include a finer-scale sample. Ideally, sequencing multiple genomes from the genus Actinia would be insightful in determining the molecular evolution of genes and gene families on a finer scale. It would also provide evidence of true species-specific genes and the insights in the mechanisms of their venom evolution. The improved granularity of the presence and absence of TTLs in Actinioidea, however, would reinforce the current ascertainment bias. Future functional characterisation studies of TTL proteins identified in sea anemone species outside of Actinioidea is essential. Using the framework used by Madio et al. (2017) to identify novel TTL candidates, followed by their functional characterisation is essential (Madio et al., 2018). Ideally this would also be replicated in other cnidarians as well. The limited characterisation of toxins from sea anemones, however, has already led to clinical success such as the potassium channel blocker from Stichodactyla helianthus known as ShK. Specifically, an analogue of this Shk, ShK-186 (known as dalazatide), has successfully completed Phase 1 clinical trials and is about to enter Phase 2 trials for the treatment of autoimmune diseases. Therefore, the systematic characterisation of toxin peptides from a broad sampling of sea anemones also has important implications in the discovery of potential pharmaceuticals and insecticides.

130 Chapter 5: General discussion

One area of research that is currently largely unexplored is the effect of sea anemone specific toxin peptides on other sea anemones. This is of interest regarding the venom profile identified in the acrorhagi. As stated throughout this thesis, in A. tenebrosa, acrorhagi is used for solely in agnostic encounters with non-clonal conspecifics. It is hypothesised that this venom profile would therefore be specialised in its ability to target sea anemone biology. Such as neurotoxins that are optimised to target the ion channels of sea anemones.

Chapter 5: General discussion 131

Chapter 6: Conclusion

Understanding how genes originate and evolve is central to our understanding of biological processes and have fundamental implications for the ecological and life history requirements of an organism. Overall my results, and previously published data, indicate that lineage-specific genes, or genes that have undergone lineage- specific duplications, show pronounced signatures of novel cell-specific expression or expression in morphological structures found only in these lineages (Sebé-Pedrós et al., 2018a, 2018b; Sunagar et al., 2018). This highlights that novel morphological structures may evolve in concert with the emergence and expression of novel genes. Such a pattern indicates that gene regulation can promote the evolution of novelty in cnidarians. Indeed, cnidarian genomics provide insights into understanding novel genes and their role in the evolution of biochemical and morphological innovations.

Chapter 6: Conclusion 133

Bibliography

Adachi, K., Miyake, H., Kuramochi, T., Mizusawa, K., & Okumura, S. (2017).

Genome size distribution in phylum Cnidaria. Fisheries Science, 83(1), 107–

112. doi.org/10.1007/s12562-016-1050-4

Albalat, R., & Cañestro, C. (2016). Evolution by gene loss. Nature Reviews Genetics,

17(7), 379–391. doi.org/10.1038/nrg.2016.39

Albertin, C. B., Simakov, O., Mitros, T., Wang, Z. Y., Pungor, J. R., Edsinger-

Gonzales, E., … Rokhsar, D. S. (2015). The octopus genome and the

evolution of cephalopod neural and morphological novelties. Nature,

524(7564), 220–224. doi.org/10.1038/nature14668

Almeida Da Silva, P. E., & Palomino, J. C. (2011). Molecular basis and mechanisms

of drug resistance in Mycobacterium tuberculosis: classical and new drugs.

The Journal of Antimicrobial Chemotherapy, 66(7), 1417–1430.

doi.org/10.1093/jac/dkr173

Amazonas, D. R., Portes-Junior, J. A., Nishiyama-Jr, M. Y., Nicolau, C. A.,

Chalkidis, H. M., Mourão, R. H. V., … Moura-da-Silva, A. M. (2018).

Molecular mechanisms underlying intraspecific variation in snake venom.

Journal of Proteomics, 181, 60–72. doi.org/10.1016/j.jprot.2018.03.032

Ames, C. L., & Macrander, J. (2016). Evidence for an alternative mechanism of

toxin production in the box jellyfish Alatina alata. Integrative and

Comparative Biology, 56(5), 973–988. doi.org/10.1093/icb/icw113

Bibliography 135

Andersson, D. I., Jerlström-Hultqvist, J., & Näsvall, J. (2015). Evolution of new

functions de novo and from preexisting genes. Cold Spring Harbor

Perspectives in Biology, 7, a017996. doi.org/10.1101/cshperspect.a017996

Angeli, A., Zara, F. J., Turra, A., & Gorman, D. (2016). Towards a standard measure

of sea anemone size: assessing the accuracy and precision of morphological

measures for cantilever-like animals. Marine Ecology, 37(5), 1019–1026.

doi.org/10.1111/maec.12315

Arendsee, Z. W., Li, L., & Wurtele, E. S. (2014). Coming of age: orphan genes in

plants. Trends in Plant Science, 19(11), 698–708.

doi.org/10.1016/j.tplants.2014.07.003

Ayre, D. J. (1982). Inter-genotype aggression in the solitary sea anemone Actinia

tenebrosa. Marine Biology, 68(2), 199–205. doi.org/10.1007/BF00397607

Babonis, L. S., Martindale, M. Q., & Ryan, J. F. (2016). Do novel genes drive

morphological novelty? An investigation of the nematosomes in the sea

anemone Nematostella vectensis. BMC Evolutionary Biology, 16, 114.

doi.org/10.1186/s12862-016-0683-3

Barlow, A., Pook, C. E., Harrison, R. A., & Wüster, W. (2009). Coevolution of diet

and prey-specific venom activity supports the role of selection in snake

venom evolution. Proceedings of the Royal Society of London B: Biological

Sciences, 276(1666), 2443–2449. doi.org/10.1098/rspb.2009.0048

Barve, A., & Wagner, A. (2013). A latent capacity for evolutionary innovation

through exaptation in metabolic systems. Nature, 500(7461), 203–206.

doi.org/10.1038/nature12301

Basulto, A., Pérez, V. M., Noa, Y., Varela, C., Otero, A. J., & Pico, M. C. (2006).

Immunohistochemical targeting of sea anemone cytolysins on tentacles,

136 Bibliography

mesenteric filaments and isolated nematocysts of Stichodactyla helianthus.

Journal of Experimental Zoology Part A: Comparative Experimental Biology,

305(3), 253–258. doi.org/10.1002/jez.a.256

Baumgarten, S., Simakov, O., Esherick, L. Y., Liew, Y. J., Lehnert, E. M., Michell,

C. T., … Voolstra, C. R. (2015). The genome of Aiptasia, a sea anemone

model for coral symbiosis. Proceedings of the National Academy of Sciences,

112(38), 11893–11898. doi.org/10.1073/pnas.1513318112

Beckmann, A., & Özbek, S. (2012). The nematocyst: a molecular map of the

cnidarian stinging organelle. International Journal of Developmental Biology,

56(6-8), 577–582. doi.org/10.1387/ijdb.113472ab

Beckmann, A., Xiao, S., Müller, J. P., Mercadante, D., Nüchter, T., Kröger, N., …

Özbek, S. (2015). A fast recoiling silk-like elastomer facilitates nanosecond

nematocyst discharge. BMC Biology, 13, 3. doi.org/10.1186/s12915-014-

0113-1

Ben-Neriah, Y., Daley, G. Q., Mes-Masson, A. M., Witte, O. N., & Baltimore, D.

(1986). The chronic myelogenous leukemia-specific P210 protein is the

product of the bcr/abl hybrid gene. Science, 233(4760), 212–214.

Black, R., & Johnson, M. S. (1979). Asexual viviparity and population genetics of

Actinia tenebrosa. Marine Biology, 53(1), 27–31.

doi.org/10.1007/BF00386526

Blomberg, S. P., & Garland, T. (2002). Tempo and mode in evolution: phylogenetic

inertia, adaptation and comparative methods. Journal of Evolutionary

Biology, 15(6), 899–910. doi.org/10.1046/j.1420-9101.2002.00472.x

Bibliography 137

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for

Illumina sequence data. Bioinformatics, 30(15), 2114–2120.

doi.org/10.1093/bioinformatics/btu170

Boothby, T. C., Tenlen, J. R., Smith, F. W., Wang, J. R., Patanella, K. A., Nishimura,

E. O., … Goldstein, B. (2015). Evidence for extensive horizontal gene

transfer from the draft genome of a tardigrade. Proceedings of the National

Academy of Sciences, 112(52), 15976–15981.

doi.org/10.1073/pnas.1510461112

Bosch, T. C. G. (2014). Rethinking the role of immunity: lessons from Hydra.

Trends in Immunology, 35(10), 495–502. doi.org/10.1016/j.it.2014.07.008

Brittain, T. (2002). Molecular aspects of embryonic hemoglobin function. Molecular

Aspects of Medicine, 23(4), 293–342. doi.org/10.1016/S0098-

2997(02)00004-3

Brockes, J. P., & Gates, P. B. (2014). Mechanisms underlying vertebrate limb

regeneration: lessons from the salamander. Biochemical Society Transactions,

42(3), 625–630. doi.org/10.1042/BST20140002

Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E.

S., … Jaffe, D. B. (2008). ALLPATHS: de novo assembly of whole-genome

shotgun microreads. Genome Research. 18(5), 810-820.

doi.org/10.1101/gr.7337908

Cai, J., Zhao, R., Jiang, H., & Wang, W. (2008). De Novo origination of a new

protein-coding gene in Saccharomyces cerevisiae. Genetics, 179(1), 487–496.

doi.org/10.1534/genetics.107.084491

Cameron, S. L., Yoshizawa, K., Mizukoshi, A., Whiting, M. F., & Johnson, K. P.

(2011). Mitochondrial genome deletions and minicircles are common in lice

138 Bibliography

(Insecta: Phthiraptera). BMC Genomics, 12, 394. doi.org/10.1186/1471-2164-

12-394

Cao, Z., Yu, Y., Wu, Y., Hao, P., Di, Z., He, Y., … Li, W. (2013). The genome of

Mesobuthus martensii reveals a unique adaptation model of arthropods.

Nature Communications, 4, 2602. doi.org/10.1038/ncomms3602

Capra, J. A., Pollard, K. S., & Singh, M. (2010). Novel genes exhibit distinct patterns

of function acquisition and network integration. Genome Biology, 11(12),

R127. doi.org/10.1186/gb-2010-11-12-r127

Caprioli, R. M., Farmer, T. B., & Gile, J. (1997). Molecular imaging of biological

samples: localization of peptides and proteins using MALDI-TOF MS.

Analytical Chemistry, 69(23), 4751–4760. doi.org/10.1021/ac970888i

Carmona-Antoñanzas, G., Monroig, Ó., Dick, J. R., Davie, A., & Tocher, D. R.

(2011). Biosynthesis of very long-chain fatty acids (C > 24) in Atlantic

salmon: cloning, functional characterisation, and tissue distribution of an

Elovl4 elongase. Comparative Biochemistry and Physiology Part B:

Biochemistry and Molecular Biology, 159(2), 122–129.

doi.org/10.1016/j.cbpb.2011.02.007

Cartwright, P., Halgedahl, S. L., Hendricks, J. R., Jarrard, R. D., Marques, A. C.,

Collins, A. G., & Lieberman, B. S. (2007). Exceptionally preserved

jellyfishes from the middle Cambrian. PLOS ONE, 2(10), e1121.

doi.org/10.1371/journal.pone.0001121

Carvunis, A.-R., Rolland, T., Wapinski, I., Calderwood, M. A., Yildirim, M. A.,

Simonis, N., … Vidal, M. (2012). Proto-genes and de novo gene birth.

Nature, 487(7407), 370–374. doi.org/10.1038/nature11184

Bibliography 139

Casewell, N. R., Huttley, G. A., & Wüster, W. (2012). Dynamic evolution of venom

proteins in squamate reptiles. Nature Communications, 3, 1066.

doi.org/10.1038/ncomms2065

Casewell, N. R., Wüster, W., Vonk, F. J., Harrison, R. A., & Fry, B. G. (2013).

Complex cocktails: the evolutionary novelty of venoms. Trends in Ecology &

Evolution, 28(4), 219–229. doi.org/10.1016/j.tree.2012.10.020

Castañeda, O., Sotolongo, V., Amor, A. M., Stöcklin, R., Anderson, A. J., Harvey,

A. L., … Karlsson, E. (1995). Characterization of a potassium channel toxin

from the Caribbean sea anemone Stichodactyla helianthus. Toxicon, 33(5),

603–613. doi.org/10.1016/0041-0101(95)00013-C

Castro, L. F. C., Monroig, Ó., Leaver, M. J., Wilson, J., Cunha, I., & Tocher, D. R.

(2012). Functional desaturase Fads1 (Δ5) and Fads2 (Δ6) orthologues

evolved before the origin of jawed vertebrates. PLOS ONE, 7(2), e31950.

doi.org/10.1371/journal.pone.0031950

Castro, L. F. C., Tocher, D. R., & Monroig, Ó. (2016). Long-chain polyunsaturated

fatty acid biosynthesis in chordates: insights into the evolution of Fads and

Elovl gene repertoire. Progress in Lipid Research, 62, 25–40.

doi.org/10.1016/j.plipres.2016.01.001

Catterall, W. A., Cestèle, S., Yarov-Yarovoy, V., Yu, F. H., Konoki, K., & Scheuer,

T. (2007). Voltage-gated ion channels and gating modifier toxins. Toxicon,

49(2), 124–141. doi.org/10.1016/j.toxicon.2006.09.022

Chang, E. S., Neuhof, M., Rubinstein, N. D., Diamant, A., Philippe, H., Huchon, D.,

& Cartwright, P. (2015). Genomic insights into the evolutionary origin of

Myxozoa within Cnidaria. Proceedings of the National Academy of Sciences,

112(48), 14912–14917. doi.org/10.1073/pnas.1511468112

140 Bibliography

Chapman, J. A., Kirkness, E. F., Simakov, O., Hampson, S. E., Mitros, T.,

Weinmaier, T., … Steele, R. E. (2010). The dynamic genome of Hydra.

Nature, 464(7288), 592–596. doi.org/10.1038/nature08830

Chertemps, T., Duportets, L., Labeur, C., Ueda, R., Takahashi, K., Saigo, K., &

Wicker-Thomas, C. (2007). A female-biased expressed elongase involved in

long-chain hydrocarbon biosynthesis and courtship behavior in Drosophila

melanogaster. Proceedings of the National Academy of Sciences, 104(11),

4273–4278. doi.org/10.1073/pnas.0608142104

Chippaux, J. P., Williams, V., & White, J. (1991). Snake venom variability: methods

of study, results and interpretation. Toxicon, 29(11), 1279–1303.

Cocca, E., Ratnayake-Lecamwasam, M., Parker, S. K., Camardella, L., Ciaramella,

M., Prisco, G. di, & Detrich, H. W. (1995). Genomic remnants of alpha-

globin genes in the hemoglobinless antarctic icefishes. Proceedings of the

National Academy of Sciences, 92(6), 1817–1821.

doi.org/10.1073/pnas.92.6.1817

Colbourne, J. K., Pfrender, M. E., Gilbert, D., Thomas, W. K., Tucker, A., Oakley,

T. H., … Boore, J. L. (2011). The Ecoresponsive genome of Daphnia pulex.

Science, 331(6017), 555–561. doi.org/10.1126/science.1197761

Columbus-Shenkar, Yaara Y, Sachkova, M. Y., Macrander, J., Fridrich, A.,

Modepalli, V., Reitzel, A. M., … Moran, Y. (2018). Dynamics of venom

composition across a complex life cycle. eLife, 7, e35014.

doi.org/10.7554/eLife.35014

Cook, H. W., & McMaster, C. R. (2002). Chapter 7 Fatty acid desaturation and chain

elongation in eukaryotes. New Comprehensive Biochemistry, 36, 181–204.

doi.org/10.1016/S0167-7306(02)36009-5

Bibliography 141

Crisp, A., Boschetti, C., Perry, M., Tunnacliffe, A., & Micklem, G. (2015).

Expression of multiple horizontally acquired genes is a hallmark of both

vertebrate and invertebrate genomes. Genome Biology, 16(1), 50.

doi.org/10.1186/s13059-015-0607-3 da Silva, S. M., Gates, P. B., & Brockes, J. P. (2002). The newt ortholog of CD59 is

implicated in proximodistal identity during amphibian limb regeneration.

Developmental Cell, 3(4), 547–555. doi.org/10.1016/S1534-5807(02)00288-5

Dagan, T., Artzy-Randrup, Y., & Martin, W. (2008). Modular networks and

cumulative impact of lateral transfer in prokaryote genome evolution.

Proceedings of the National Academy of Sciences, 105(29), 10039–10044.

doi.org/10.1073/pnas.0800679105

Daly, M. (2016). Functional and genetic diversity of toxins in sea anemones. In P.

Gopalakrishnakone & A. Malhotra (Eds.), Evolution of Venomous Animals

and Their Toxins (pp. 1–18). Dordrecht: Springer Netherlands.

doi.org/10.1007/978-94-007-6727-0_17-1

Daly, M., Crowley, L. M., Larson, P., Rodríguez, E., Saucier, E. H., & Fautin, D. G.

(2017). Anthopleura and the phylogeny of Actinioidea (Cnidaria: Anthozoa:

Actiniaria). Organisms Diversity & Evolution, 17(3), 545-564.

doi.org/10.1007/s13127-017-0326-6

David, C. N., Özbek, S., Adamczyk, P., Meier, S., Pauly, B., Chapman, J., …

Holstein, T. W. (2008). Evolution of complex structures: minicollagens shape

the cnidarian nematocyst. Trends in Genetics, 24(9), 431–438.

doi.org/10.1016/j.tig.2008.07.001

Dawson, N. L., Lewis, T. E., Das, S., Lees, J. G., Lee, D., Ashford, P., … Sillitoe, I.

(2017). CATH: an expanded resource to predict protein function through

142 Bibliography

structure and sequence. Nucleic Acids Research, 45, D289–D295.

doi.org/10.1093/nar/gkw1098 de Sousa-Pereira, P., Abrantes, J., Pinheiro, A., Colaco, B., Vitorino, R., & Esteves,

P. J. (2014). Evolution of C, D and S-Type Cystatins in mammals: an

extensive gene duplication in primates. PLOS ONE, 9(10), e109050.

doi.org/10.1371/journal.pone.0109050 del Campo, J., Sieracki, M. E., Molestina, R., Keeling, P., Massana, R., & Ruiz-

Trillo, I. (2014). The others: our biased perspective of eukaryotic genomes.

Trends in Ecology & Evolution, 29(5), 252–259.

doi.org/10.1016/j.tree.2014.03.006

Demuth, J. P., & Hahn, M. W. (2009). The life and death of gene families.

BioEssays, 31(1), 29–39. doi.org/10.1002/bies.080085

Deng, C., Cheng, C.-H. C., Ye, H., He, X., & Chen, L. (2010). Evolution of an

antifreeze protein by neofunctionalization under escape from adaptive

conflict. Proceedings of the National Academy of Sciences, 107(50), 21593–

21598. doi.org/10.1073/pnas.1007883107

Ding, Y., Zhou, Q., & Wang, W. (2012). Origins of new genes and evolution of their

novel functions. Annual Review of Ecology, Evolution, and Systematics,

43(1), 345–363. doi.org/10.1146/annurev-ecolsys-110411-160513

Diochot, S., Schweitz, H., Béress, L., & Lazdunski, M. (1998). Sea anemone

peptides with a specific blocking activity against the fast inactivating

potassium channel Kv3.4. Journal of Biological Chemistry, 273(12), 6744–

6749. doi.org/10.1074/jbc.273.12.6744

Dnyansagar, R., Zimmermann, B., Moran, Y., Praher, D., Sundberg, P., Møller, L.

F., & Technau, U. (2018). Dispersal and speciation: the cross Atlantic

Bibliography 143

relationship of two parasitic cnidarians. Molecular Phylogenetics and

Evolution, 126, 346–355. doi.org/10.1016/j.ympev.2018.04.035

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., …

Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner.

Bioinformatics, 29(1), 15–21. doi.org/10.1093/bioinformatics/bts635

Donoghue, M. T., Keshavaiah, C., Swamidatta, S. H., & Spillane, C. (2011).

Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana.

BMC Evolutionary Biology, 11, 47. doi.org/10.1186/1471-2148-11-47

Dowell, N. L., Giorgianni, M. W., Griffin, S., Kassner, V. A., Selegue, J. E.,

Sanchez, E. E., & Carroll, S. B. (2018). Extremely divergent haplotypes in

two toxin gene complexes encode alternative venom types within rattlesnake

species. Current Biology, 28(7), 1016-1026.e4.

doi.org/10.1016/j.cub.2018.02.031

Dowell, N. L., Giorgianni, M. W., Kassner, V. A., Selegue, J. E., Sanchez, E. E., &

Carroll, S. B. (2016). The deep origin and recent loss of venom toxin genes in

rattlesnakes. Current Biology, 26(18), 2434–2445.

doi.org/10.1016/j.cub.2016.07.038

Dunn, C. W., & Ryan, J. F. (2015). The evolution of animal genomes. Current

Opinion in Genetics & Development, 35, 25–32.

doi.org/10.1016/j.gde.2015.08.006

Dutertre, S., Jin, A.-H., Vetter, I., Hamilton, B., Sunagar, K., Lavergne, V., …

Lewis, R. J. (2014). Evolution of separate predation- and defence-evoked

venoms in carnivorous cone snails. Nature Communications, 5, 3521.

doi.org/10.1038/ncomms4521

144 Bibliography

Eddy, S. R. (2011). Accelerated Profile HMM Searches. PLOS Computational

Biology, 7(10), e1002195. doi.org/10.1371/journal.pcbi.1002195

Edger, P. P., Heidel-Fischer, H. M., Bekaert, M., Rota, J., Glöckner, G., Platts, A. E.,

… Wheat, C. W. (2015). The butterfly plant arms-race escalated by gene and

genome duplications. Proceedings of the National Academy of Sciences,

112(27), 8362–8366. doi.org/10.1073/pnas.1503926112

Ellinghaus, D., Kurtz, S., & Willhoeft, U. (2008). LTRharvest, an efficient and

flexible software for de novo detection of LTR retrotransposons. BMC

Bioinformatics, 9, 18. doi.org/10.1186/1471-2105-9-18

Elran, R., Raam, M., Kraus, R., Brekhman, V., Sher, N., Plaschkes, I., … Lotan, T.

(2014). Early and late response of Nematostella vectensis transcriptome to

heavy metals. Molecular Ecology, 23(19), 4722–4736.

doi.org/10.1111/mec.12891

Elzinga, D. A., & Jander, G. (2013). The role of protein effectors in plant–aphid

interactions. Current Opinion in Plant Biology, 16(4), 451–456.

doi.org/10.1016/j.pbi.2013.06.018

Erwin, D. H., Laflamme, M., Tweedt, S. M., Sperling, E. A., Pisani, D., & Peterson,

K. J. (2011). The Cambrian conundrum: early divergence and later ecological

success in the early history of animals. Science, 334(6059), 1091–1097.

doi.org/10.1126/science.1206375

Evans, N. M., Lindner, A., Raikova, E. V., Collins, A. G., & Cartwright, P. (2008).

Phylogenetic placement of the enigmatic parasite, Polypodium hydriforme,

within the phylum Cnidaria. BMC Evolutionary Biology, 8, 139.

doi.org/10.1186/1471-2148-8-139

Bibliography 145

Ewer, R. F., & Fox, H. M. (1947). On the functions and mode of action of the

nematocysts of Hydra. Proceedings of the Zoological Society of London,

117(2–3), 365–376. doi.org/10.1111/j.1096-3642.1947.tb00524.x

Fang, S., Ting, C.-T., Lee, C.-R., Chu, K.-H., Wang, C.-C., & Tsaur, S.-C. (2009).

Molecular evolution and functional diversification of fatty acid desaturases

after recurrent gene duplication in Drosophila. Molecular Biology and

Evolution, 26(7), 1447–1456. doi.org/10.1093/molbev/msp057

Farquhar, H. (1898). Preliminary account of some New-Zealand Actiniaria. Journal

of the Linnean Society of London, Zoology, 26(171), 527–536.

doi.org/10.1111/j.1096-3642.1898.tb00409.x

Farris, J. S. (1977). Phylogenetic analysis under Dollo’s Law. Systematic Biology,

26(1), 77–88. doi.org/10.1093/sysbio/26.1.77

Fautin, D. G. (2009). Structural diversity, systematics, and evolution of cnidae.

Toxicon, 54(8), 1054–1064. doi.org/10.1016/j.toxicon.2009.02.024

Fautin, D. G., & Mariscal, R. N. (1991). Cnidaria: Anthozoa. In F. Harrison & J.

Westfall (Eds.) (Vol. 2, pp. 267 –358). New York: Wiley-Liss.

Felsenstein, J. (1989). PHYLIP - Phylogeny Inference Package (Version 3.2).

Cladistics, 5(2), 163–166. doi.org/10.1111/j.1096-0031.1989.tb00562.x

Feng, J., Dong, Y., Liu, W., He, Q., Daud, M. K., Chen, J., & Zhu, S. (2017).

Genome-wide identification of membrane-bound fatty acid desaturase genes

in Gossypium hirsutum and their expressions during abiotic stress. Scientific

Reports, 7, 45711. doi.org/10.1038/srep45711

Fingerhut, L. C. H. W., Strugnell, J. M., Faou, P., Labiaga, Á. R., Zhang, J., &

Cooke, I. R. (2018). Shotgun proteomics analysis of saliva and salivary gland

146 Bibliography

tissue from the common octopus Octopus vulgaris. Journal of Proteome

Research, 17(11), 3866–3876. doi.org/10.1021/acs.jproteome.8b00525

Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., …

Punta, M. (2014). Pfam: the protein families database. Nucleic Acids

Research, 42, D222–D230. doi.org/10.1093/nar/gkt1223

Finnerty, J. R., Pang, K., Burton, P., Paulson, D., & Martindale, M. Q. (2004).

Origins of bilateral symmetry: Hox and Dpp expression in a sea anemone.

Science, 304(5675), 1335–1337. doi.org/10.1126/science.1091946

Flajnik, M. F., & Kasahara, M. (2010). Origin and evolution of the adaptive immune

system: genetic events and selective pressures. Nature Reviews Genetics,

11(1), 47–59. doi.org/10.1038/nrg2703

Fletcher, W., & Yang, Z. (2010). The effect of insertions, deletions, and alignment

errors on the branch-site test of positive selection. Molecular Biology and

Evolution, 27(10), 2257–2267. doi.org/10.1093/molbev/msq115

Fonseca-Madrigal, J., Navarro, J. C., Hontoria, F., Tocher, D. R., Martínez-Palacios,

C. A., & Monroig, Ó. (2014). Diversification of substrate specificities in

teleostei Fads2: characterization of Δ4 and Δ6Δ5 desaturases of Chirostoma

estor. Journal of Lipid Research, 55(7), 1408–1419.

doi.org/10.1194/jlr.M049791

Foote, A. D., Liu, Y., Thomas, G. W. C., Vinař, T., Alföldi, J., Deng, J., … Gibbs, R.

A. (2015). Convergent evolution of the genomes of marine mammals. Nature

Genetics, 47(3), 272–275. doi.org/10.1038/ng.3198

Fox, J. W., & Serrano, S. M. T. (2005). Structural considerations of the snake venom

metalloproteinases, key members of the M12 reprolysin family of

Bibliography 147

metalloproteinases. Toxicon, 45(8), 969–985.

doi.org/10.1016/j.toxicon.2005.02.012

Frazão, B., Vasconcelos, V., & Antunes, A. (2012). Sea anemone (Cnidaria,

Anthozoa, Actiniaria) toxins: an overview. Marine Drugs, 10(8), 1812–1851.

doi.org/10.3390/md10081812

Frenkel-Morgenstern, M., Lacroix, V., Ezkurdia, I., Levin, Y., Gabashvili, A.,

Prilusky, J., … Valencia, A. (2012). Chimeras taking shape: potential

functions of proteins encoded by chimeric RNA transcripts. Genome

Research, 22(7), 1231–1242. doi.org/10.1101/gr.130062.111

Fröbisch, N. B., & Shubin, N. H. (2011). Salamander limb development:iIntegrating

genes, morphology, and fossils. Developmental Dynamics, 240(5), 1087–

1099. doi.org/10.1002/dvdy.22629

Fry, B. G., Roelants, K., Champagne, D. E., Scheib, H., Tyndall, J. D. A., King, G.

F., … Vega, R. C. R. de la. (2009). The toxicogenomic multiverse:

convergent recruitment of proteins into animal venoms. Annual Review of

Genomics and Human Genetics, 10(1), 483–511.

doi.org/10.1146/annurev.genom.9.081307.164356

Fry, B. G., Wüster, W., Kini, R. M., Brusic, V., Khan, A., Venkataraman, D., &

Rooney, A. P. (2003). Molecular evolution and phylogeny of elapid snake

venom three-finger toxins. Journal of Molecular Evolution, 57(1), 110–129.

doi.org/10.1007/s00239-003-2461-2

Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering

the next-generation sequencing data. Bioinformatics, 28(23), 3150–3152.

doi.org/10.1093/bioinformatics/bts565

148 Bibliography

Gao, B., Peng, C., Zhu, Y., Sun, Y., Zhao, T., Huang, Y., & Shi, Q. (2018). High

throughput identification of novel conotoxins from the vermivorous oak cone

snail (Conus quercinus) by transcriptome sequencing. International Journal

of Molecular Sciences, 19(12), 3901. doi.org/10.3390/ijms19123901

Garrett, T. A., Schmeitzel, J. L., Klein, J. A., Hwang, J. J., & Schwarz, J. A. (2013).

Comparative lipid profiling of the cnidarian Aiptasia pallida and its

dinoflagellate symbiont. PLOS ONE, 8(3), e57975.

doi.org/10.1371/journal.pone.0057975

Gaudry, M. J., Storz, J. F., Butts, G. T., Campbell, K. L., & Hoffmann, F. G. (2014).

Repeated evolution of chimeric fusion genes in the β-Globin Gene Family of

Laurasiatherian Mammals. Genome Biology and Evolution, 6(5), 1219–1233.

doi.org/10.1093/gbe/evu097

Gibbs, H. L., & Mackessy, S. P. (2009). Functional basis of a molecular adaptation:

prey-specific toxic effects of venom from Sistrurus rattlesnakes. Toxicon,

53(6), 672–679. doi.org/10.1016/j.toxicon.2009.01.034

Gibbs, H. L., Sanz, L., Sovic, M. G., & Calvete, J. J. (2013). Phylogeny-based

comparative analysis of venom proteome variation in a clade of rattlesnakes

(Sistrurus sp.). PLOS ONE, 8(6), e67220.

doi.org/10.1371/journal.pone.0067220

Gilad, Y., Wiebe, V., Przeworski, M., Lancet, D., & Pääbo, S. (2004). Loss of

olfactory receptor genes coincides with the acquisition of full trichromatic

vision in primates. PLOS Biology, 2(1), E5.

doi.org/10.1371/journal.pbio.0020005

Gilding, E. K., Jackson, M. A., Poth, A. G., Henriques, S. T., Prentis, P. J.,

Mahatmanto, T., & Craik, D. J. (2016). Gene coevolution and regulation lock

Bibliography 149

cyclic plant defence peptides to their targets. New Phytologist, 210(2), 717–

730. doi.org/10.1111/nph.13789

Gómez-Brandón, M., Lores, M., & Domínguez, J. (2010). A new combination of

extraction and derivatization methods that reduces the complexity and

preparation time in determining phospholipid fatty acids in solid

environmental samples. Bioresource Technology, 101(4), 1348–1354.

doi.org/10.1016/j.biortech.2009.09.047

Goodman, M., Koop, B. F., Czelusniak, J., & Weiss, M. L. (1984). The η-globin

gene. Its long evolutionary history in the β-globin gene family of mammals.

Journal of Molecular Biology, 180(4), 803–823.

Gostinčar, C., Turk, M., & Gunde-Cimerman, N. (2010). The evolution of fatty acid

desaturases and cytochrome b5 in eukaryotes. Journal of Membrane Biology,

233(1–3), 63–72. doi.org/10.1007/s00232-010-9225-x

Gough, J., Karplus, K., Hughey, R., & Chothia, C. (2001). Assignment of homology

to genome sequences using a library of hidden Markov models that represent

all proteins of known structure. Journal of Molecular Biology, 313(4), 903–

919. doi.org/10.1006/jmbi.2001.5080

Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I.,

… Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data

without a reference genome. Nature Biotechnology, 29(7), 644–652.

doi.org/10.1038/nbt.1883

Grajales, A., & Rodríguez, E. (2014). Morphological revision of the genus Aiptasia

and the family Aiptasiidae (Cnidaria, Actiniaria, Metridioidea). Zootaxa,

3826(1), 55–100. doi.org/10.11646/zootaxa.3826.1.2

150 Bibliography

Griffiths, P. E. (1996). The historical turn in the study of adaptation. The British

Journal for the Philosophy of Science, 47(4), 511–532.

doi.org/10.1093/bjps/47.4.511

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J.,

… Regev, A. (2013). De novo transcript sequence reconstruction from RNA-

seq using the Trinity platform for reference generation and analysis. Nature

Protocols, 8(8), 1494–1512. doi.org/10.1038/nprot.2013.084

Habermann, E. (1972). Bee and wasp venoms. Science, 177(4046), 314–322.

doi.org/10.1126/science.177.4046.314

Hale, J. E., Butler, J. P., Gelfanova, V., You, J.-S., & Knierman, M. D. (2004). A

simplified procedure for the reduction and alkylation of cysteine residues in

proteins prior to proteolytic digestion and mass spectral analysis. Analytical

Biochemistry, 333(1), 174–181. doi.org/10.1016/j.ab.2004.04.013

Han, M. V., Demuth, J. P., McGrath, C. L., Casola, C., & Hahn, M. W. (2009).

Adaptive evolution of young gene duplicates in mammals. Genome Research,

19(5), 859–867. doi.org/10.1101/gr.085951.108

Han, Y., & Wessler, S. R. (2010). MITE-Hunter: a program for discovering

miniature inverted-repeat transposable elements from genomic sequences.

Nucleic Acids Research, 38(22), e199. doi.org/10.1093/nar/gkq862

Hardies, S. C., Edgell, M. H., & Hutchison, C. A. (1984). Evolution of the

mammalian β-globin gene cluster. Journal of Biological Chemistry, 259(6),

3748–3756.

Hardison, R. C. (2012). Evolution of hemoglobin and its genes. Cold Spring Harbor

Perspectives in Medicine, 2(12), a011627.

doi.org/10.1101/cshperspect.a011627

Bibliography 151

Harland, A. D., Fixter, L. M., Davies, P. S., & Anderson, R. A. (1991). Distribution

of lipids between the zooxanthellae and animal compartment in the symbiotic

sea anemone Anemonia viridis: wax esters, triglycerides and fatty acids.

Marine Biology, 110(1), 13–19. doi.org/10.1007/BF01313087

Harland, A. D., Fixter, L. M., Davies, P. S., & Anderson, R. A. (1992). Effect of

light on the total lipid content and storage lipids of the symbiotic sea

anemone Anemonia viridis. Marine Biology, 112(2), 253–258.

doi.org/10.1007/BF00702469

Hastings, N., Agaba, M., Tocher, D. R., Leaver, M. J., Dick, J. R., Sargent, J. R., &

Teale, A. J. (2001). A vertebrate fatty acid desaturase with Δ5 and Δ6

activities. Proceedings of the National Academy of Sciences, 98(25), 14304–

14309. doi.org/10.1073/pnas.251516598

He, S., Viso, F. del, Chen, C.-Y., Ikmi, A., Kroesen, A. E., & Gibson, M. C. (2018).

An axial Hox code controls tissue segmentation and body patterning in

Nematostella vectensis. Science, 361(6409), 1377–1380.

doi.org/10.1126/science.aar8384

Hess, J., Angel, P., & Schorpp-Kistner, M. (2004). AP-1 subunits: quarrel and

harmony among siblings. Journal of Cell Science, 117(25), 5965–5973.

doi.org/10.1242/jcs.01589

Hessinger, D. A., & Lenhoff, H. M. (1976). Mechanism of hemolysis induced by

nematocyst venom: roles of phospholipase A and direct lytic factor. Archives

of Biochemistry and Biophysics, 173(2), 603–613. doi.org/10.1016/0003-

9861(76)90297-6

152 Bibliography

Hittinger, C. T., & Carroll, S. B. (2007). Gene duplication and the adaptive evolution

of a classic genetic switch. Nature, 449(7163), 677–681.

doi.org/10.1038/nature06151

Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M., & Stanke, M. (2016).

BRAKER1: unsupervised RNA-seq-based genome annotation with

GeneMark-ET and AUGUSTUS. Bioinformatics, 32(5), 767–769.

doi.org/10.1093/bioinformatics/btv661

Hoffmann, F. G., Opazo, J. C., & Storz, J. F. (2010). Gene cooption and convergent

evolution of oxygen transport hemoglobins in jawed and jawless vertebrates.

Proceedings of the National Academy of Sciences, 107(32), 14274–14279.

doi.org/10.1073/pnas.1006756107

Honma, T., Minagawa, S., Nagai, H., Ishida, M., Nagashima, Y., & Shiomi, K.

(2005). Novel peptide toxins from acrorhagi, aggressive organs of the sea

anemone Actinia equina. Toxicon, 46(7), 768–774.

doi.org/10.1016/j.toxicon.2005.08.003

Honma, T., & Shiomi, K. (2006). Peptide toxins in sea anemones: structural and

functional aspects. Marine Biotechnology, 8(1), 1–10.

doi.org/10.1007/s10126-005-5093-2

Hu, H., Bandyopadhyay, P. K., Olivera, B. M., & Yandell, M. (2012). Elucidation of

the molecular envenomation strategy of the cone snail Conus geographus

through transcriptome sequencing of its venom duct. BMC Genomics, 13(1),

284. doi.org/10.1186/1471-2164-13-284

Hunt, B. G., Ometto, L., Wurm, Y., Shoemaker, D., Yi, S. V., Keller, L., &

Goodisman, M. A. D. (2011). Relaxed selection is a precursor to the

Bibliography 153

evolution of phenotypic plasticity. Proceedings of the National Academy of

Sciences, 108(38), 15936–15941. doi.org/10.1073/pnas.1104825108

Innan, H., & Kondrashov, F. (2010). The evolution of gene duplications: classifying

and distinguishing between models. Nature Reviews Genetics, 11(2), 97–108.

doi.org/10.1038/nrg2689

Inoue, J., Sato, Y., Sinclair, R., Tsukamoto, K., & Nishida, M. (2015). Rapid genome

reshaping by multiple-gene loss after whole-genome duplication in teleost

fish suggested by mathematical modelling. Proceedings of the National

Academy of Sciences, 112(48), 14918–14923.

doi.org/10.1073/pnas.1507669112

Jackson, T. N. W., Koludarov, I., Ali, S. A., Dobson, J., Zdenek, C. N., Dashevsky,

D., … Fry, B. G. (2016). Rapid radiations and the race to redundancy: an

investigation of the evolution of Australian elapid snake venoms. Toxins,

8(11), 309. doi.org/10.3390/toxins8110309

Jaimes-Becerra, A., Chung, R., Morandini, A. C., Weston, A. J., Padilla, G., Gacesa,

R., … Marques, A. C. (2017). Comparative proteomics reveals recruitment

patterns of some protein families in the venoms of Cnidaria. Toxicon, 137,

19–26. doi.org/10.1016/j.toxicon.2017.07.012

Jakobsson, A., Westerberg, R., & Jacobsson, A. (2006). Fatty acid elongases in

mammals: their regulation and roles in metabolism. Progress in Lipid

Research, 45(3), 237–249. doi.org/10.1016/j.plipres.2006.01.004

Jankowski, T., Collins, A. G., & Campbell, R. (2007). Global diversity of inland

water cnidarians. Hydrobiologia, 595(1), 35–40. doi.org/10.1007/s10750-

007-9001-9

154 Bibliography

Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., … Hunter, S.

(2014). InterProScan 5: genome-scale protein function classification.

Bioinformatics, 30(9), 1236–1240. doi.org/10.1093/bioinformatics/btu031

Jouiaei, M., Sunagar, K., Federman Gross, A., Scheib, H., Alewood, P. F., Moran,

Y., & Fry, B. G. (2015a). Evolution of an ancient venom: recognition of a

novel family of cnidarian toxins and the common evolutionary origin of

sodium and potassium neurotoxins in sea anemone. Molecular Biology and

Evolution, 32(6), 1598–1610. doi.org/10.1093/molbev/msv050

Jouiaei, M., Yanagihara, A. A., Madio, B., Nevalainen, T. J., Alewood, P. F., & Fry,

B. G. (2015b). Ancient venom systems: a review on Cnidaria toxins. Toxins,

7(6), 2251–2271. doi.org/10.3390/toxins7062251

Jungo, F., & Bairoch, A. (2005). Tox-Prot, the toxin protein annotation program of

the Swiss-Prot protein knowledgebase. Toxicon, 45(3), 293–301.

doi.org/10.1016/j.toxicon.2004.10.018

Kabeya, N., Sanz-Jorquera, A., Carboni, S., Davie, A., Oboh, A., & Monroig, Ó.

(2017). Biosynthesis of polyunsaturated fatty acids in sea urchins: molecular

and functional characterisation of three fatty acyl desaturases from

Paracentrotus lividus (Lamark 1816). PLOS ONE, 12(1), e0169374.

doi.org/10.1371/journal.pone.0169374

Kaessmann, H. (2010). Origins, evolution, and phenotypic impact of new genes.

Genome Research, 20(10), 1313–1326. doi.org/10.1101/gr.101386.109

Kafri, R., Springer, M., & Pilpel, Y. (2009). Genetic redundancy: new tricks for old

genes. Cell, 136(3), 389–392. doi.org/10.1016/j.cell.2009.01.027

Bibliography 155

Kass-Simon, G., & Scappaticci, Jr., A. A. (2002). The behavioral and developmental

physiology of nematocysts. Canadian Journal of Zoology, 80(10), 1772–

1794. doi.org/10.1139/z02-135

Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software

version 7: improvements in performance and usability. Molecular Biology

and Evolution, 30(4), 772–780. doi.org/10.1093/molbev/mst010

Kawashima, T., Kawashima, S., Tanaka, C., Murai, M., Yoneda, M., Putnam, N. H.,

… Wada, H. (2009). Domain shuffling and the evolution of vertebrates.

Genome Research, 19(8), 1393–1403. doi.org/10.1101/gr.087072.108

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., …

Drummond, A. (2012). Geneious basic: an integrated and extendable desktop

software platform for the organization and analysis of sequence data.

Bioinformatics, 28(12), 1647–1649. doi.org/10.1093/bioinformatics/bts199

Khalturin, K., Anton-Erxleben, F., Sassmann, S., Wittlieb, J., Hemmrich, G., &

Bosch, T. C. G. (2008). A novel gene family controls species-specific

morphological traits in Hydra. PLOS Biology, 6(11), e278.

doi.org/10.1371/journal.pbio.0060278

Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R., & Bosch, T. C. G. (2009).

More than just orphans: are taxonomically-restricted genes important in

evolution? Trends in Genetics, 25(9), 404–413.

doi.org/10.1016/j.tig.2009.07.006

Knutzon, D. S., Thurmond, J. M., Huang, Y.-S., Chaudhary, S., Bobik, E. G., Chan,

G. M., … Mukerji, P. (1998). Identification of Δ5-desaturase from

Mortierella alpina by heterologous expression in bakers’ yeast and canola.

156 Bibliography

Journal of Biological Chemistry, 273(45), 29360–29366.

doi.org/10.1074/jbc.273.45.29360

Koutsovoulos, G., Kumar, S., Laetsch, D. R., Stevens, L., Daub, J., Conlon, C., …

Blaxter, M. (2016). No evidence for extensive horizontal gene transfer in the

genome of the tardigrade Hypsibius dujardini. Proceedings of the National

Academy of Sciences, 113(18), 5053–5058.

doi.org/10.1073/pnas.1600338113

Kumar, A., Gates, P. B., Czarkwiani, A., & Brockes, J. P. (2015). An orphan gene is

necessary for preaxial digit formation during salamander limb development.

Nature Communications, 6, 8684. doi.org/10.1038/ncomms9684

Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: molecular evolutionary

genetics analysis version 7.0 for bigger datasets. Molecular Biology and

Evolution, 33(7), 1870–1874. doi.org/10.1093/molbev/msw054

Larson, P. (2017). Brooding sea anemones (Cnidaria: Anthozoa: Actiniaria):

paragons of diversity in mode, morphology, and maternity. Invertebrate

Biology, 136(1), 92–112. doi.org/10.1111/ivb.12159

Leonard, A. E., Pereira, S. L., Sprecher, H., & Huang, Y.-S. (2004). Elongation of

long-chain fatty acids. Progress in Lipid Research, 43(1), 36–54.

doi.org/10.1016/S0163-7827(03)00040-7

Letunic, I., & Bork, P. (2016). Interactive tree of life (iTOL) v3: an online tool for

the display and annotation of phylogenetic and other trees. Nucleic Acids

Research, 44, W242–W245. doi.org/10.1093/nar/gkw290

Lewis, Z. R., & Dunn, C. W. (2018). We are not so special. eLife, 7, e38726.

doi.org/10.7554/eLife.38726

Bibliography 157

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-

Seq data with or without a reference genome. BMC Bioinformatics, 12, 323.

doi.org/10.1186/1471-2105-12-323

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, R.

(2009). The sequence alignment/map format and SAMtools. Bioinformatics,

25(16), 2078–2079. doi.org/10.1093/bioinformatics/btp352

Li, L., Stoeckert, C. J., & Roos, D. S. (2003). OrthoMCL: identification of ortholog

groups for eukaryotic genomes. Genome Research, 13(9), 2178–2189.

doi.org/10.1101/gr.1224503

Li, M., Mai, K., He, G., Ai, Q., Zhang, W., Xu, W., … Zhou, H. (2013).

Characterization of two Δ5 fatty acyl desaturases in abalone (Haliotis discus

hannai Ino). Aquaculture, 416–417, 48–56.

doi.org/10.1016/j.aquaculture.2013.08.030

Li, S., Monroig, Ó., Wang, T., Yuan, Y., Carlos Navarro, J., Hontoria, F., … Ai, Q.

(2017). Functional characterization and differential nutritional regulation of

putative Elovl5 and Elovl4 elongases in large yellow croaker (Larimichthys

crocea). Scientific Reports, 7, 2303. doi.org/10.1038/s41598-017-02646-8

Li, Y., Monroig, Ó., Zhang, L., Wang, S., Zheng, X., Dick, J. R., … Tocher, D. R.

(2010). Vertebrate fatty acyl desaturase with Δ4 activity. Proceedings of the

National Academy of Sciences, 107(39), 16840–16845.

doi.org/10.1073/pnas.1008429107

Libisch, B., Michaelson, L. V., Lewis, M. J., Shewry, P. R., & Napier, J. A. (2000).

Chimeras of Δ6-Fatty Acid and Δ8-Sphingolipid Desaturases. Biochemical

and Biophysical Research Communications, 279(3), 779–785.

doi.org/10.1006/bbrc.2000.4023

158 Bibliography

Liu, H.-Q., Li, Y., Irwin, D. M., Zhang, Y.-P., & Wu, D.-D. (2014). Integrative

analysis of young genes, positively selected genes and lncRNAs in the

development of Drosophila melanogaster. BMC Evolutionary Biology, 14.

doi.org/10.1186/s12862-014-0241-9

Lom, J., & Dyková, I. (2013). Myxozoan genera: definition and notes on ,

life-cycle terminology and pathogenic species. Folia Parasitologica, 53(1),

1–36. doi.org/10.14411/fp.2006.001

Long, M., Betrán, E., Thornton, K., & Wang, W. (2003). The origin of new genes:

glimpses from the young and old. Nature Reviews Genetics, 4(11), 865–875.

doi.org/10.1038/nrg1204

Lynch, M., & Conery, J. S. (2003). The evolutionary demography of duplicate genes.

Journal of Structural and Functional Genomics, 3(1–4), 35–44.

doi.org/10.1023/A:1022696612931

Maček, P., & Lebez, D. (1988). Isolation and characterization of three lethal and

hemolytic toxins from the sea anemone Actinia equina L. Toxicon, 26(5),

441–451. doi.org/10.1016/0041-0101(88)90183-3

Mackessy, S. P., Sixberry, N. M., Heyborne, W. H., & Fritts, T. (2006). Venom of

the brown treesnake, Boiga irregularis: ontogenetic shifts and taxa-specific

toxicity. Toxicon, 47(5), 537–548. doi.org/10.1016/j.toxicon.2006.01.007

MacManes, M. D. (2014). On the optimal trimming of high-throughput mRNA

sequence data. Frontiers in Genetics, 5, 13.

doi.org/10.3389/fgene.2014.00013

Macrander, J., Broe, M., & Daly, M. (2016). Tissue-specific venom composition and

differential gene expression in sea anemones. Genome Biology and Evolution,

8(8), 2358–2375. doi.org/10.1093/gbe/evw155

Bibliography 159

Macrander, J., Brugler, M. R., & Daly, M. (2015). A RNA-seq approach to identify

putative toxins from acrorhagi in aggressive and non-aggressive Anthopleura

elegantissima polyps. BMC Genomics, 16, 221. doi.org/10.1186/s12864-015-

1417-4

Macrander, J., & Daly, M. (2016). Evolution of the cytolytic pore-forming proteins

(Actinoporins) in sea anemones. Toxins, 8(12), 368.

doi.org/10.3390/toxins8120368

Madio, B., Peigneur, S., Chin, Y. K. Y., Hamilton, B. R., Henriques, S. T., Smith, J.

J., … Undheim, E. A. B. (2018). PHAB toxins: a unique family of predatory

sea anemone toxins evolving via intra-gene concerted evolution defines a

new peptide fold. Cellular and Molecular Life Sciences, 75(24), 4511–4524.

doi.org/10.1007/s00018-018-2897-6

Madio, B., Undheim, E. A. B., & King, G. F. (2017). Revisiting venom of the sea

anemone Stichodactyla haddoni: omics techniques reveal the complete toxin

arsenal of a well-studied sea anemone genus. Journal of Proteomics, 166, 83–

92. doi.org/10.1016/j.jprot.2017.07.007

Malpezzi, E. L., de Freitas, J. C., Muramoto, K., & Kamiya, H. (1993).

Characterization of peptides in sea anemone venom collected by a novel

procedure. Toxicon, 31(7), 853–864.

Marchler-Bauer, A., Derbyshire, M. K., Gonzales, N. R., Lu, S., Chitsaz, F., Geer, L.

Y., … Bryant, S. H. (2015). CDD: NCBI’s conserved domain database.

Nucleic Acids Research, 43, D222–D226. doi.org/10.1093/nar/gku1221

Matyash, V., Liebisch, G., Kurzchalia, T. V., Shevchenko, A., & Schwudke, D.

(2008). Lipid extraction by methyl-tert-butyl ether for high-throughput

160 Bibliography

lipidomics. Journal of Lipid Research, 49(5), 1137–1146.

doi.org/10.1194/jlr.D700041-JLR200

Mayer, M. G., Rödelsperger, C., Witte, H., Riebesell, M., & Sommer, R. J. (2015).

The orphan gene dauerless regulates dauer development and intraspecific

competition in nematodes by copy number variation. PLOS Genetics, 11(6),

e1005146. doi.org/10.1371/journal.pgen.1005146

McDougall, C., Aguilera, F., & Degnan, B. M. (2013). Rapid evolution of pearl

oyster shell matrix proteins with repetitive, low-complexity domains. Journal

of the Royal Society Interface, 10(82), 20130041.

doi.org/10.1098/rsif.2013.0041

McLysaght, A., & Hurst, L. D. (2016). Open questions in the study of de novo genes:

what, how and why. Nature Reviews Genetics, 17(9), 567–578.

doi.org/10.1038/nrg.2016.78

Meesapyodsuk, D., & Qiu, X. (2012). The front-end desaturase: structure, function,

evolution and biotechnological use. Lipids, 47(3), 227–237.

doi.org/10.1007/s11745-011-3617-2

Meng, Q., & Xia, Y. (2011). c-Jun, at the crossroad of the signaling network. Protein

& Cell, 2(11), 889–898. doi.org/10.1007/s13238-011-1113-3

Menon, L. R., McIlroy, D., & Brasier, M. D. (2013). Evidence for Cnidaria-like

behavior in ca. 560 Ma Ediacaran Aspidella. Geology, 41(8), 895–898.

doi.org/10.1130/G34424.1

Mertens, F., Johansson, B., Fioretos, T., & Mitelman, F. (2015). The emerging

complexity of gene fusions in cancer. Nature Reviews Cancer, 15(6), 371–

381. doi.org/10.1038/nrc3947

Bibliography 161

Michaelson, L. V., Lazarus, C. M., Griffiths, G., Napier, J. A., & Stobart, A. K.

(1998). Isolation of a Δ5-fatty acid desaturase gene from Mortierella alpina.

Journal of Biological Chemistry, 273(30), 19055–19059.

doi.org/10.1074/jbc.273.30.19055

Minagawa, S., Sugiyama, M., Ishida, M., Nagashima, Y., & Shiomi, K. (2008).

Kunitz-type protease inhibitors from acrorhagi of three species of sea

anemones. Comparative Biochemistry and Physiology Part B: Biochemistry

and Molecular Biology, 150(2), 240–245.

doi.org/10.1016/j.cbpb.2008.03.010

Mirdita, M., von den Driesch, L., Galiez, C., Martin, M. J., Söding, J., & Steinegger,

M. (2017). Uniclust databases of clustered and deeply annotated protein

sequences and alignments. Nucleic Acids Research, 45, D170–D176.

doi.org/10.1093/nar/gkw1081

Mitchell, M. L., Hamilton, B. R., Madio, B., Morales, R. A. V., Tonkin-Hill, G. Q.,

Papenfuss, A. T., … Norton, R. S. (2017). The use of imaging mass

spectrometry to study peptide toxin distribution in Australian sea anemones.

Australian Journal of Chemistry, 70(11), 1235–1237.

doi.org/10.1071/CH17228

Mitelman, F., Johansson, B., & Mertens, F. (2007). The impact of translocations and

gene fusions on cancer causation. Nature Reviews Cancer, 7(4), 233–245.

doi.org/10.1038/nrc2091

Modica, M. V., Lombardo, F., Franchini, P., & Oliverio, M. (2015). The venomous

cocktail of the vampire snail Colubraria reticulata (, ).

BMC Genomics, 16(1), 441. doi.org/10.1186/s12864-015-1648-4

162 Bibliography

Mohd-Yusof, N. Y., Monroig, Ó., Mohd-Adnan, A., Wan, K.-L., & Tocher, D. R.

(2010). Investigation of highly unsaturated fatty acid metabolism in the Asian

sea bass, Lates calcarifer. Fish Physiology and Biochemistry, 36(4), 827–

843. doi.org/10.1007/s10695-010-9409-4

Moleirinho, A., Carneiro, J., Matthiesen, R., Silva, R. M., Amorim, A., & Azevedo,

L. (2011). Gains, losses and changes of function after gene duplication: study

of the metallothionein family. PLOS ONE, 6(4), e18487.

doi.org/10.1371/journal.pone.0018487

Monroig, Ó., de Llanos, R., Varó, I., Hontoria, F., Tocher, D. R., Puig, S., &

Navarro, J. C. (2017). Biosynthesis of polyunsaturated fatty acids in Octopus

vulgaris: molecular cloning and functional characterisation of a stearoyl-coa

desaturase and an elongation of very long-chain fatty acid 4 protein. Marine

Drugs, 15(3), 82. doi.org/10.3390/md15030082

Monroig, Ó., Guinot, D., Hontoria, F., Tocher, D. R., & Navarro, J. C. (2012a).

Biosynthesis of essential fatty acids in Octopus vulgaris (Cuvier, 1797):

molecular cloning, functional characterisation and tissue distribution of a

fatty acyl elongase. Aquaculture, 360–361, 45–53.

doi.org/10.1016/j.aquaculture.2012.07.016

Monroig, Ó., Li, Y., & Tocher, D. R. (2011a). Delta-8 desaturation activity varies

among fatty acyl desaturases of teleost fish: high activity in delta-6

desaturases of marine species. Comparative Biochemistry and Physiology

Part B: Biochemistry and Molecular Biology, 159(4), 206–213.

doi.org/10.1016/j.cbpb.2011.04.007

Monroig, Ó., Lopes-Marques, M., Navarro, J. C., Hontoria, F., Ruivo, R., Santos, M.

M., … Castro, L. F. C. (2016). Evolutionary functional elaboration of the

Bibliography 163

Elovl2/5 gene family in chordates. Scientific Reports, 6, 20510.

doi.org/10.1038/srep20510

Monroig, Ó., Navarro, J. C., Dick, J. R., Alemany, F., & Tocher, D. R. (2012b).

Identification of a Δ5-like fatty acyl desaturase from the cephalopod Octopus

vulgaris (Cuvier 1797) involved in the biosynthesis of essential fatty acids.

Marine Biotechnology, 14(4), 411–422. doi.org/10.1007/s10126-011-9423-2

Monroig, Ó., Tocher, D. R., & Navarro, J. C. (2013). Biosynthesis of

polyunsaturated fatty acids in marine invertebrates: recent advances in

molecular mechanisms. Marine Drugs, 11(10), 3998–4018.

doi.org/10.3390/md11103998

Monroig, Ó., Webb, K., Ibarra-Castro, L., Holt, G. J., & Tocher, D. R. (2011b).

Biosynthesis of long-chain polyunsaturated fatty acids in marine fish:

characterization of an Elovl4-like elongase from cobia Rachycentron

canadum and activation of the pathway during early life stages. Aquaculture,

312(1–4), 145–153. doi.org/10.1016/j.aquaculture.2010.12.024

Monroig, Ó., Zheng, X., Morais, S., Leaver, M. J., Taggart, J. B., & Tocher, D. R.

(2010). Multiple genes for functional ∆6 fatty acyl desaturases (Fad) in

Atlantic salmon (Salmo salar L.): gene and cDNA characterization,

functional expression, tissue distribution and nutritional regulation.

Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids,

1801(9), 1072–1081. doi.org/10.1016/j.bbalip.2010.04.007

Morais, S., Monroig, Ó., Zheng, X., Leaver, M. J., & Tocher, D. R. (2009). Highly

unsaturated fatty acid synthesis in Atlantic salmon: characterization of

Elovl5- and Elovl2-like elongases. Marine Biotechnology, 11(5), 627–639.

doi.org/10.1007/s10126-009-9179-0

164 Bibliography

Moran, Y., Genikhovich, G., Gordon, D., Wienkoop, S., Zenkert, C., Özbek, S., …

Gurevitz, M. (2012a). Neurotoxin localization to ectodermal gland cells

uncovers an alternative mechanism of venom delivery in sea anemones.

Proceedings of the Royal Society B: Biological Sciences, 279(1732), 1351–

1358. doi.org/10.1098/rspb.2011.1731

Moran, Y., Praher, D., Schlesinger, A., Ayalon, A., Tal, Y., & Technau, U. (2012b).

Analysis of soluble protein contents from the nematocysts of a model sea

anemone sheds light on venom evolution. Marine Biotechnology, 15(3), 329–

339. doi.org/10.1007/s10126-012-9491-y

Moran, Y., Weinberger, H., Sullivan, J. C., Reitzel, A. M., Finnerty, J. R., &

Gurevitz, M. (2008). Concerted evolution of sea anemone neurotoxin genes is

revealed through analysis of the Nematostella vectensis genome. Molecular

Biology and Evolution, 25(4), 737–747. doi.org/10.1093/molbev/msn021

Morgenstern, D., & King, G. F. (2013). The venom optimization hypothesis

revisited. Toxicon, 63, 120–128. doi.org/10.1016/j.toxicon.2012.11.022

Muller, E. M., Fine, M., & Ritchie, K. B. (2016). The stable microbiome of inter and

sub-tidal anemone species under increasing pCO2. Scientific Reports, 6,

37387. doi.org/10.1038/srep37387

Murrell, B., Moola, S., Mabona, A., Weighill, T., Sheward, D., Pond, K., …

Scheffler, K. (2013). FUBAR: a fast, unconstrained bayesian approximation

for inferring selection. Molecular Biology and Evolution, 30(5), 1196–1205.

doi.org/10.1093/molbev/mst030

Najmabadi, H., Hu, H., Garshasbi, M., Zemojtel, T., Abedini, S. S., Chen, W., …

Ropers, H. H. (2011). Deep sequencing reveals 50 novel genes for recessive

cognitive disorders. Nature, 478(7367), 57–63. doi.org/10.1038/nature10423

Bibliography 165

Nathans, J., Thomas, D., & Hogness, D. S. (1986). Molecular genetics of human

color vision: the genes encoding blue, green, and red pigments. Science,

232(4747), 193–202. doi.org/10.1126/science.2937147

Nei, M., & Rooney, A. P. (2005). Concerted and birth-and-death evolution of

multigene families. Annual Review of Genetics, 39, 121–152.

doi.org/10.1146/annurev.genet.39.073003.112240

Neme, R., & Tautz, D. (2014). Evolution: dynamics of de novo gene emergence.

Current Biology, 24(6), R238–R240. doi.org/10.1016/j.cub.2014.02.016

Nevalainen, T. J., Peuravuori, H. J., Quinn, R. J., Llewellyn, L. E., Benzie, J. A. H.,

Fenner, P. J., & Winkel, K. D. (2004). Phospholipase A2 in Cnidaria.

Comparative Biochemistry and Physiology Part B: Biochemistry and

Molecular Biology, 139(4), 731–735. doi.org/10.1016/j.cbpc.2004.09.006

Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: a

fast and effective stochastic algorithm for estimating maximum-likelihood

phylogenies. Molecular Biology and Evolution, 32(1), 268–274.

doi.org/10.1093/molbev/msu300

Nicosia, A., Maggio, T., Mazzola, S., Cuttitta, A., Nicosia, A., Maggio, T., …

Cuttitta, A. (2013). Evidence of accelerated evolution and ectodermal-

specific expression of presumptive BDS toxin cDNAs from Anemonia viridis.

Marine Drugs, 11(11), 4213–4231. doi.org/10.3390/md11114213

Norton, R. S. (1991). Structure and structure-function relationships of sea anemone

proteins that interact with the sodium channel. Toxicon, 29(9), 1051–1084.

doi.org/10.1016/0041-0101(91)90205-6

Norton, R. S. (2009). Structures of sea anemone toxins. Toxicon, 54(8), 1075–1088.

doi.org/10.1016/j.toxicon.2009.02.035

166 Bibliography

Norton, R. S., Maček, P., Reid, G. E., & Simpson, R. J. (1992). Relationship between

the cytolysins tenebrosin-C from Actinia tenebrosa and equinatoxin II from

Actinia equina. Toxicon, 30(1), 13–23. doi.org/10.1016/0041-0101(92)90497-

S

O’Hara, E. P., Caldwell, G. S., & Bythell, J. (2018). Equistatin and equinatoxin gene

expression is influenced by environmental temperature in the sea anemone

Actinia equina. Toxicon, 153, 12–16. doi.org/10.1016/j.toxicon.2018.08.004

Ohno, S. (1970). Evolution by Gene Duplication. Springer Science & Business

Media.

Ohno, S. (1972). So much “junk” DNA in our genome. Brookhaven Symposia in

Biology, 23, 366–370.

Ohno, Y., Suto, S., Yamanaka, M., Mizutani, Y., Mitsutake, S., Igarashi, Y., …

Kihara, A. (2010). ELOVL1 production of C24 acyl-CoAs is linked to C24

sphingolipid synthesis. Proceedings of the National Academy of Sciences,

107(43), 18439–18444. doi.org/10.1073/pnas.1005572107

Oliveira, J. S., Fuentes-Silva, D., & King, G. F. (2012). Development of a rational

nomenclature for naming peptide and protein toxins from sea anemones.

Toxicon, 60(4), 539–550. doi.org/10.1016/j.toxicon.2012.05.020

Olivera, B. M., Watkins, M., Bandyopadhyay, P., Imperial, J. S., de la Cotera, E. P.

H., Aguilar, M. B., … Lluisma, A. (2012). Adaptive radiation of venomous

marine snail lineages and the accelerated evolution of venom peptide genes.

Annals of the New York Academy of Sciences, 1267, 61–70.

doi.org/10.1111/j.1749-6632.2012.06603.x

Opazo, J. C., Hoffmann, F. G., & Storz, J. F. (2008). Genomic evidence for

independent origins of β-like globin genes in monotremes and therian

Bibliography 167

mammals. Proceedings of the National Academy of Sciences, 105(5), 1590–

1595. doi.org/10.1073/pnas.0710531105

Orts, D. J. B., Peigneur, S., Madio, B., Cassoli, J. S., Montandon, G. G., Pimenta, A.

M. C., … Tytgat, J. (2013). Biochemical and electrophysiological

characterization of two sea anemone type 1 potassium toxins from a

geographically distant population of Bunodosoma caissarum. Marine Drugs,

11(3), 655–679. doi.org/10.3390/md11030655

Ottaway, J. R. (1978). Population ecology of the intertidal anemone Actinia

tenebrosa I. Pedal locomotion and intraspecific aggression. Marine and

Freshwater Research, 29(6), 787–802.

Ottaway, J. R. (1979). Population ecology of the intertidal anemone Actinia

tenebrosa II. Geographical distribution, synonymy, reproductive cycle and

fecundity. Australian Journal of Zoology, 27, 273–290.

Ottaway, J. R., & Kirby, G. C. (1975). Genetic relationships between brooding and

brooded Actinia tenebrosa. Nature, 255(5505), 221–223.

doi.org/10.1038/255221a0

Ou, S., & Jiang, N. (2018). LTR_retriever: a highly accurate and sensitive program

for identification of long terminal repeat retrotransposons. Plant Physiology,

176(2), 1410–1422. doi.org/10.1104/pp.17.01310

Oura, T., & Kajiwara, S. (2008). Disruption of the sphingolipid Δ8-desaturase gene

causes a delay in morphological changes in Candida albicans. Microbiology,

154(12), 3795–3803. doi.org/10.1099/mic.0.2008/018788-0

Özbek, S. (2010). The cnidarian nematocyst: a miniature extracellular matrix within

a secretory vesicle. Protoplasma, 248(4), 635–640. doi.org/10.1007/s00709-

010-0219-4

168 Bibliography

Palmieri, N., Kosiol, C., & Schlötterer, C. (2014). The life cycle of Drosophila

orphan genes. eLife, 3. doi.org/10.7554/eLife.01311

Papina, M., Meziane, T., & van Woesik, R. (2003). Symbiotic zooxanthellae provide

the host-coral Montipora digitata with polyunsaturated fatty acids.

Comparative Biochemistry and Physiology Part B: Biochemistry and

Molecular Biology, 135(3), 533–537. doi.org/10.1016/S1096-4959(03)00118-

0

Park, E., Hwang, D.-S., Lee, J.-S., Song, J.-I., Seo, T.-K., & Won, Y.-J. (2012).

Estimation of divergence times in cnidarian evolution based on mitochondrial

protein-coding genes and the fossil record. Molecular Phylogenetics and

Evolution, 62(1), 329–345. doi.org/10.1016/j.ympev.2011.10.008

Park, H. S., Jang, M. H., Kim, E. J., Kim, H. J., Lee, H. J., Kim, Y. J., … Park, S. Y.

(2014). High EGFR gene copy number predicts poor outcome in triple-

negative breast cancer. Modern Pathology, 27(9), 1212–1222.

doi.org/10.1038/modpathol.2013.251

Parker, M. W., & Feil, S. C. (2005). Pore-forming protein toxins: from structure to

function. Progress in Biophysics and Molecular Biology, 88(1), 91–142.

doi.org/10.1016/j.pbiomolbio.2004.01.009

Parra, G., Bradnam, K., Ning, Z., Keane, T., & Korf, I. (2009). Assessing the gene

space in draft genomes. Nucleic Acids Research, 37(1), 289–297.

doi.org/10.1093/nar/gkn916

Patthy, L. (2003). Modular Assembly of Genes and the Evolution of New Functions.

Genetica, 118(2–3), 217–231. doi.org/10.1023/A:1024182432483

Bibliography 169

Petersen, T. N., Brunak, S., Heijne, G. von, & Nielsen, H. (2011). SignalP 4.0:

discriminating signal peptides from transmembrane regions. Nature Methods,

8(10), 785–786. doi.org/10.1038/nmeth.1701

Pineda, S. S., Sollod, B. L., Wilson, D., Darling, A., Sunagar, K., Undheim, E. A. B.,

… King, G. F. (2014). Diversification of a single ancestral gene into a

successful toxin superfamily in highly venomous Australian funnel-web

spiders. BMC Genomics, 15, 177. doi.org/10.1186/1471-2164-15-177

Piriyapongsa, J., Rutledge, M. T., Patel, S., Borodovsky, M., & Jordan, I. K. (2007).

Evaluating the protein coding potential of exonized transposable element

sequences. Biology Direct, 2, 31. doi.org/10.1186/1745-6150-2-31

Pisani, D., Pett, W., Dohrmann, M., Feuda, R., Rota-Stabelli, O., Philippe, H., …

Wörheide, G. (2015). Genomic data do not support comb jellies as the sister

group to all other animals. Proceedings of the National Academy of Sciences,

112(50), 15402–15407. doi.org/10.1073/pnas.1518127112

Polz, M. F., Alm, E. J., & Hanage, W. P. (2013). Horizontal gene transfer and the

evolution of bacterial and archaeal population structure. Trends in Genetics,

29(3), 170–175. doi.org/10.1016/j.tig.2012.12.006

Ponce, R., Martinsen, L., Vicente, L. M., & Hartl, D. L. (2012). Novel Genes from

Formation to Function, Novel Genes from Formation to Function.

International Journal of Evolutionary Biology, International Journal of

Evolutionary Biology, 2012, 821645. doi.org/10.1155/2012/821645

Pond, S. L. K., Frost, S. D. W., & Muse, S. V. (2005). HyPhy: hypothesis testing

using phylogenies. Bioinformatics, 21(5), 676–679.

doi.org/10.1093/bioinformatics/bti079

170 Bibliography

Prentis, P. J., & Pavasovic, A. (2014). The Anadara trapezia transcriptome: a

resource for molluscan physiological genomics. Marine Genomics, 18, 113–

115. doi.org/10.1016/j.margen.2014.08.004

Prentis, P. J., Pavasovic, A., & Norton, R. S. (2018). Sea anemones: quiet achievers

in the field of peptide toxins. Toxins, 10(1), 36.

doi.org/10.3390/toxins10010036

Putnam, N. H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., …

Rokhsar, D. S. (2007). Sea anemone genome reveals ancestral eumetazoan

gene repertoire and genomic organization. Science, 317(5834), 86–94.

doi.org/10.1126/science.1139158

Rachamim, T., Morgenstern, D., Aharonovich, D., Brekhman, V., Lotan, T., & Sher,

D. (2015). The dynamically evolving nematocyst content of an anthozoan, a

scyphozoan, and a hydrozoan. Molecular Biology and Evolution, 32(3), 740–

753. doi.org/10.1093/molbev/msu335

Reitzel, A. M., & Tarrant, A. M. (2009). Nuclear receptor complement of the

cnidarian Nematostella vectensis: phylogenetic relationships and

developmental expression patterns. BMC Evolutionary Biology, 9(1), 230.

doi.org/10.1186/1471-2148-9-230

Ribeiro, J. M., Arcà, B., Lombardo, F., Calvo, E., Van My Phan, Chandra, P. K., &

Wikel, S. K. (2007). An annotated catalogue of salivary gland transcripts in

the adult female mosquito, Ædes ægypti. BMC Genomics, 8, 6.

doi.org/10.1186/1471-2164-8-6

Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor

package for differential expression analysis of digital gene expression data.

Bioinformatics, 26(1), 139–140. doi.org/10.1093/bioinformatics/btp616

Bibliography 171

Rodríguez, E., Barbeitos, M. S., Brugler, M. R., Crowley, L. M., Grajales, A.,

Gusmão, L., … Daly, M. (2014). Hidden among sea anemones: the first

comprehensive phylogenetic reconstruction of the order Actiniaria (Cnidaria,

Anthozoa, Hexacorallia) reveals a novel group of hexacorals. PLOS ONE,

9(5), e96998. doi.org/10.1371/journal.pone.0096998

Rogers, R. L., & Hartl, D. L. (2012). Chimeric genes as a source of rapid evolution

in Drosophila melanogaster. Molecular Biology and Evolution, 29(2), 517–

529. doi.org/10.1093/molbev/msr184

Rowley, J. D. (1973). Letter: a new consistent chromosomal abnormality in chronic

myelogenous leukaemia identified by quinacrine fluorescence and Giemsa

staining. Nature, 243(5405), 290–293. doi.org/10.1038/243290a0

Ruder, T., Sunagar, K., Undheim, E. A. B., Ali, S. A., Wai, T.-C., Low, D. H. W., …

Fry, B. G. (2013). Molecular phylogeny and evolution of the proteins

encoded by coleoid (cuttlefish, octopus, and squid) posterior venom glands.

Journal of Molecular Evolution, 76(4), 192–204. doi.org/10.1007/s00239-

013-9552-5

Sayanova, O., Smith, M. A., Lapinskas, P., Stobart, A. K., Dobson, G., Christie, W.

W., … Napier, J. A. (1997). Expression of a borage desaturase cDNA

containing an N-terminal cytochrome b5 domain results in the accumulation

of high levels of Δ6-desaturated fatty acids in transgenic tobacco.

Proceedings of the National Academy of Sciences, 94(8), 4211–

4216.doi.org/10.1073/pnas.94.8.4211

Schlesinger, A., Zlotkin, E., Kramarsky-Winter, E., & Loya, Y. (2009). Cnidarian

internal stinging mechanism. Proceedings of the Royal Society B: Biological

Sciences, 276(1659), 1063–1067. doi.org/10.1098/rspb.2008.1586

172 Bibliography

Schlötterer, C. (2015). Genes from scratch – the evolutionary fate of de novo genes.

Trends in Genetics, 31(4), 215–219. doi.org/10.1016/j.tig.2015.02.007

Schmitz, J. F., Ullrich, K. K., & Bornberg-Bauer, E. (2018). Incipient de novo genes

can evolve from frozen accidents that escaped rapid transcript turnover.

Nature Ecology & Evolution, 2(10), 1626–1632. doi.org/10.1038/s41559-

018-0639-7

Schwaiger, M., Schönauer, A., Rendeiro, A. F., Pribitzer, C., Schauer, A., Gilles, A.

F., … Technau, U. (2014). Evolutionary conservation of the eumetazoan gene

regulatory landscape. Genome Research, 24(4), 639–650.

doi.org/10.1101/gr.162529.113

Sebé-Pedrós, A., Chomsky, E., Pang, K., Lara-Astiaso, D., Gaiti, F., Mukamel, Z.,

… Tanay, A. (2018a). Early metazoan cell type diversity and the evolution of

multicellular gene regulation. Nature Ecology & Evolution, 2(7), 1176–1188.

doi.org/10.1038/s41559-018-0575-6

Sebé-Pedrós, A., Saudemont, B., Chomsky, E., Plessier, F., Mailhé, M.-P., Renno, J.,

… Marlow, H. (2018b). Cnidarian cell type diversity and regulation revealed

by whole-organism single-cell RNA-seq. Cell, 173(6), 1520-1534.

doi.org/10.1016/j.cell.2018.05.019

Shenkarev, Z. O., Panteleev, P. V., Balandin, S. V., Gizatullina, A. K., Altukhov, D.

A., Finkina, E. I., … Ovchinnikova, T. V. (2012). Recombinant expression

and solution structure of antimicrobial peptide aurelin from jellyfish Aurelia

aurita. Biochemical and Biophysical Research Communications, 429(1–2),

63–69. doi.org/10.1016/j.bbrc.2012.10.092

Sherman, C. D. H., Peucker, A. J., & Ayre, D. J. (2007). Do reproductive tactics vary

with habitat heterogeneity in the intertidal sea anemone Actinia tenebrosa?

Bibliography 173

Journal of Experimental Marine Biology and Ecology, 340(2), 259–267.

doi.org/10.1016/j.jembe.2006.09.016

Shick, J. M. (1991). A Functional Biology of Sea Anemones, 1st edn. London:

Chapman & Hall.

Shigenobu, S., & Stern, D. L. (2013). Aphids evolved novel secreted proteins for

symbiosis with bacterial endosymbiont. Proceedings of the Royal Society of

London B: Biological Sciences, 280(1750), 20121952.

doi.org/10.1098/rspb.2012.1952

Shinzato, C., Shoguchi, E., Kawashima, T., Hamada, M., Hisata, K., Tanaka, M., …

Satoh, N. (2011). Using the Acropora digitifera genome to understand coral

responses to environmental change. Nature, 476(7360), 320–323.

doi.org/10.1038/nature10249

Shiomi, K. (2009). Novel peptide toxins recently isolated from sea anemones.

Toxicon, 54(8), 1112–1118. doi.org/10.1016/j.toxicon.2009.02.031

Shpirer, E., Diamant, A., Cartwright, P., & Huchon, D. (2018). A genome wide

survey reveals multiple nematocyst-specific genes in Myxozoa. BMC

Evolutionary Biology, 18(1), 138. doi.org/10.1186/s12862-018-1253-7

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., … Higgins, D.

G. (2011). Fast, scalable generation of high-quality protein multiple sequence

alignments using Clustal Omega. Molecular Systems Biology, 7, 539.

doi.org/10.1038/msb.2011.75

Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E.

M. (2015). BUSCO: assessing genome assembly and annotation

completeness with single-copy orthologs. Bioinformatics, 31(19), 3210–3212.

doi.org/10.1093/bioinformatics/btv351

174 Bibliography

Smith, H. L., Pavasovic, A., Surm, J. M., Phillips, M. J., & Prentis, P. J. (2018).

Evidence for a large expansion and subfunctionalization of globin genes in

sea anemones. Genome Biology and Evolution, 10(8), 1892–1901.

doi.org/10.1093/gbe/evy128

Smith, J. J., & Blumenthal, K. M. (2007). Site-3 sea anemone toxins: molecular

probes of gating mechanisms in voltage-dependent sodium channels.

Toxicon, 49(2), 159–170. doi.org/10.1016/j.toxicon.2006.09.020

Sommer, R. J., & Ogawa, A. (2011). Hormone signalling and phenotypic plasticity in

nematode development and evolution. Current Biology, 21(18), R758–R766.

doi.org/10.1016/j.cub.2011.06.034

Sorek, M., Schnytzer, Y., Ben-Asher, H. W., Caspi, V. C., Chen, C.-S., Miller, D. J.,

& Levy, O. (2018). Setting the pace: host rhythmic behaviour and gene

expression patterns in the facultatively symbiotic cnidarian Aiptasia are

determined largely by Symbiodinium. Microbiome, 6(1), 83.

doi.org/10.1186/s40168-018-0465-9

Sperling, P., Libisch, B., Zähringer, U., Napier, J. A., & Heinz, E. (2001). Functional

identification of a Δ8-sphingolipid desaturase from Borago officinalis.

Archives of Biochemistry and Biophysics, 388(2), 293–298.

doi.org/10.1006/abbi.2001.2308

Sperling, P., Zähringer, U., & Heinz, E. (1998). A sphingolipid desaturase from

higher plants identification of a new cytochrome b5 fusion protein. Journal of

Biological Chemistry, 273(44), 28590–28596.

doi.org/10.1074/jbc.273.44.28590

Sperstad, S. V., Haug, T., Blencke, H.-M., Styrvold, O. B., Li, C., & Stensvåg, K.

(2011). Antimicrobial peptides from marine invertebrates: challenges and

Bibliography 175

perspectives in marine antimicrobial peptide discovery. Biotechnology

Advances, 29(5), 519–530. doi.org/10.1016/j.biotechadv.2011.05.021

Sprague, M., Dick, J. R., & Tocher, D. R. (2016). Impact of sustainable feeds on

omega-3 long-chain fatty acid levels in farmed Atlantic salmon, 2006–2015.

Scientific Reports, 6, 21892. doi.org/10.1038/srep21892

Sprecher, H. (2000). Metabolism of highly unsaturated n-3 and n-6 fatty acids.

Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids,

1486, 219–231. doi.org/10.1016/S1388-1981(00)00077-9

Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-

analysis of large phylogenies. Bioinformatics, 30(9), 1312–1313.

doi.org/10.1093/bioinformatics/btu033

Stankiewicz, P., & Lupski, J. R. (2010). Structural variation in the human genome

and its role in disease. Annual Review of Medicine, 61(1), 437–455.

doi.org/10.1146/annurev-med-100708-204735

Stefanik, D. J., Wolenski, F. S., Friedman, L. E., Gilmore, T. D., & Finnerty, J. R.

(2013). Isolation of DNA, RNA and protein from the starlet sea anemone

Nematostella vectensis. Nature Protocols, 8(5), 892–899.

doi.org/10.1038/nprot.2012.151

Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence

searching for the analysis of massive data sets. Nature Biotechnology, 35,

1026–1028. doi.org/10.1038/nbt.3988

Storz, J. F., Opazo, J. C., & Hoffmann, F. G. (2013). Gene duplication, genome

duplication, and the functional diversification of vertebrate globins.

Molecular Phylogenetics and Evolution, 66(2), 469–478.

doi.org/10.1016/j.ympev.2012.07.013

176 Bibliography

Sun, Y.-C., Hinnebusch, B. J., & Darby, C. (2008). Experimental evidence for

negative selection in the evolution of a Yersinia pestis pseudogene.

Proceedings of the National Academy of Sciences, 105(23), 8097–8101.

doi.org/10.1073/pnas.0803525105

Sunagar, K., Columbus-Shenkar, Y. Y., Fridrich, A., Gutkovich, N., Aharoni, R., &

Moran, Y. (2018). Cell type-specific expression profiling unravels the

development and evolution of stinging cells in sea anemone. BMC Biology,

16(1), 108. doi.org/10.1186/s12915-018-0578-4

Sunagar, K., & Moran, Y. (2015). The rise and fall of an evolutionary innovation:

contrasting strategies of venom evolution in ancient and young animals.

PLOS Genetics, 11(10), e1005596. doi.org/10.1371/journal.pgen.1005596

Sunagar, K., Morgenstern, D., Reitzel, A. M., & Moran, Y. (2016). Ecological

venomics: how genomics, transcriptomics and proteomics can shed new light

on the ecology and evolution of venom. Journal of Proteomics, 135, 62–72.

doi.org/10.1016/j.jprot.2015.09.015

Sunagar, K., Undheim, E. A. B., Chan, A. H. C., Koludarov, I., Muñoz-Gómez, S.

A., Antunes, A., & Fry, B. G. (2013). Evolution stings: the origin and

diversification of scorpion toxin peptide scaffolds. Toxins, 5(12), 2456–2487.

doi.org/10.3390/toxins5122456

Supek, F., Bošnjak, M., Škunca, N., & Šmuc, T. (2011). Revigo summarizes and

visualizes long lists of gene ontology terms. PLOS ONE, 6(7), e21800.

doi.org/10.1371/journal.pone.0021800

Surm, J. M., Prentis, P. J., & Pavasovic, A. (2015). Comparative analysis and

distribution of omega-3 lcPUFA biosynthesis genes in marine molluscs.

PLOS ONE, 10(8), e0136301. doi.org/10.1371/journal.pone.0136301

Bibliography 177

Surm, J. M., Smith, H. L., Madio, B., Undheim, E. A. B., King, G. F., Hamilton, B.

R., … Prentis, P. J. (2019). A process of convergent amplification and tissue-

specific expression dominates the evolution of toxin and toxin-like genes in

sea anemones. Molecular Ecology, 0(ja). doi.org/10.1111/mec.15084

Surm, J. M., Toledo, T. M., Prentis, P. J., & Pavasovic, A. (2018). Insights into the

phylogenetic and molecular evolutionary histories of Fad and Elovl gene

families in Actiniaria. Ecology and Evolution, 8(11), 5323–5335.

doi.org/10.1002/ece3.4044

Suyama, M., Torrents, D., & Bork, P. (2006). PAL2NAL: robust conversion of

protein sequence alignments into the corresponding codon alignments.

Nucleic Acids Research, 34, W609–W612. doi.org/10.1093/nar/gkl315

Takakuwa, N., Kinoshita, M., Oda, Y., & Ohnishi, M. (2002). Isolation and

characterization of the genes encoding Δ8 sphingolipid desaturase from

Saccharomyces kluyveri and Kluyveromyces lactis. Current Microbiology,

45(6), 459–461. doi.org/10.1007/s00284-002-3860-0

Talvinen, K. A., & Nevalainen, T. J. (2002). Cloning of a novel phospholipase A2

from the cnidarian Adamsia carciniopados. Comparative Biochemistry and

Physiology Part B: Biochemistry and Molecular Biology, 132(3), 571–578.

doi.org/10.1016/S1096-4959(02)00073-8

Tamura, K., Makino, A., Hullin-Matsuda, F., Kobayashi, T., Furihata, M., Chung, S.,

… Nakagawa, H. (2009). Novel lipogenic enzyme ELOVL7 is involved in

prostate cancer growth through saturated long-chain fatty acid metabolism.

Cancer Research, 69(20), 8133–8140. doi.org/10.1158/0008-5472.CAN-09-

0775

178 Bibliography

Tardent, P. (1995). The cnidarian cnidocyte, a hightech cellular weaponry.

BioEssays, 17(4), 351–362. doi.org/10.1002/bies.950170411

Tarrant, A. M., Reitzel, A. M., Kwok, C. K., & Jenny, M. J. (2014). Activation of the

cnidarian oxidative stress response by ultraviolet radiation, polycyclic

aromatic hydrocarbons and crude oil. The Journal of Experimental Biology,

217(9), 1444–1453. doi.org/10.1242/jeb.093690

Tautz, D., & Domazet-Lošo, T. (2011). The evolutionary origin of orphan genes.

Nature Reviews Genetics, 12(10), 692–702. doi.org/10.1038/nrg3053

Technau, U., & Genikhovich, G. (2018). Evolution: directives from sea anemone

Hox genes. Current Biology, 28(22), R1303–R1305.

doi.org/10.1016/j.cub.2018.09.040

Technau, U., Rudd, S., Maxwell, P., Gordon, P. M. K., Saina, M., Grasso, L. C., …

Miller, D. J. (2005). Maintenance of ancestral complexity and non-metazoan

genes in two basal cnidarians. Trends in Genetics, 21(12), 633–639.

doi.org/10.1016/j.tig.2005.09.007

Technau, U., & Schwaiger, M. (2015). Recent advances in genomics and

transcriptomics of cnidarians. Marine Genomics, 24, Part 2, 131–138.

doi.org/10.1016/j.margen.2015.09.007

Terlau, H., & Olivera, B. M. (2004). Conus venoms: a rich source of novel ion

channel-targeted peptides. Physiological Reviews, 84(1), 41–68.

doi.org/10.1152/physrev.00020.2003

Toll-Riera, M., Bosch, N., Bellora, N., Castelo, R., Armengol, L., Estivill, X., &

Albà, M. M. (2009). Origin of primate orphan genes: a comparative genomics

approach. Molecular Biology and Evolution, 26(3), 603–612.

doi.org/10.1093/molbev/msn281

Bibliography 179

Tomalova, I., Iachia, C., Mulet, K., & Castagnone-Sereno, P. (2012). The map-1

gene family in root-knot nematodes, Meloidogyne spp.: a set of

taxonomically restricted genes specific to clonal species. PLOS ONE, 7(6),

e38656. doi.org/10.1371/journal.pone.0038656

Tran, P. N., Brown, S. H. J., Mitchell, T. W., Matuschewski, K., McMillan, P. J.,

Kirk, K., … Maier, A. G. (2014). A female gametocyte-specific ABC

transporter plays a role in lipid metabolism in the malaria parasite. Nature

Communications, 5, 4773. doi.org/10.1038/ncomms5773

Undheim, E. A. B., Hamilton, B. R., Kurniawan, N. D., Bowlay, G., Cribb, B. W.,

Merritt, D. J., … Venter, D. J. (2015). Production and packaging of a

biological arsenal: evolution of centipede venoms under morphological

constraint. Proceedings of the National Academy of Sciences, 112(13), 4026–

4031. doi.org/10.1073/pnas.1424068112

Undheim, E. A. B., Jones, A., Clauser, K. R., Holland, J. W., Pineda, S. S., King, G.

F., & Fry, B. G. (2014a). Clawing through evolution: toxin diversification

and convergence in the ancient lineage Chilopoda (Centipedes). Molecular

Biology and Evolution, 31(8), 2124–2148. doi.org/10.1093/molbev/msu162

Undheim, E. A. B., Sunagar, K., Hamilton, B. R., Jones, A., Venter, D. J., Fry, B. G.,

& King, G. F. (2014b). Multifunctional warheads: diversification of the toxin

arsenal of centipedes via novel multidomain transcripts. Journal of

Proteomics, 102, 1–10. doi.org/10.1016/j.jprot.2014.02.024

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., &

Rozen, S. G. (2012). Primer3-new capabilities and interfaces. Nucleic Acids

Research, 40(15), e115. doi.org/10.1093/nar/gks596

180 Bibliography

van der Burg, C. A., Prentis, P. J., Surm, J. M., & Pavasovic, A. (2016). Insights into

the innate immunome of actiniarians using a comparative genomic approach.

BMC Genomics, 17, 850. doi.org/10.1186/s12864-016-3204-2 von Reumont, B. M., Undheim, E. A. B., Jauss, R.-T., & Jenner, R. A. (2017).

Venomics of remipede reveals novel peptide diversity and

illuminates the venom’s biological role. Toxins, 9(8), 234.

doi.org/10.3390/toxins9080234

Walker, A. A., Mayhew, M. L., Jin, J., Herzig, V., Undheim, E. A. B., Sombke, A.,

… King, G. F. (2018). The assassin bug Pristhesancus plagipennis produces

two distinct venoms in separate gland lumens. Nature Communications, 9(1),

755. doi.org/10.1038/s41467-018-03091-5

Wang, W., Yu, H., & Long, M. (2004). Duplication-degeneration as a mechanism of

gene fission and the origin of new genes in Drosophila species. Nature

Genetics, 36(5), 523–527. doi.org/10.1038/ng1338

Wang, W., Zhang, J., Alvarez, C., Llopart, A., & Long, M. (2000). The origin of the

jingwei gene and the complex modular structure of its parental gene, yellow

emperor, in Drosophila melanogaster. Molecular Biology and Evolution,

17(9), 1294–1301. doi.org/10.1093/oxfordjournals.molbev.a026413

Wang, X., Liew, Y. J., Li, Y., Zoccola, D., Tambutte, S., & Aranda, M. (2017). Draft

genomes of the corallimorpharians Amplexidiscus fenestrafer and Discosoma

sp. Molecular Ecology Resources, 17(6), e187–e195. doi.org/10.1111/1755-

0998.12680

Wang, Y., Coleman-Derr, D., Chen, G., & Gu, Y. Q. (2015). OrthoVenn: a web

server for genome wide comparison and annotation of orthologous clusters

Bibliography 181

across multiple species. Nucleic Acids Research, 43, W78-W84.

doi.org/10.1093/nar/gkv487

Wang, Y., Yap, L. L., Chua, K. L., & Khoo, H. E. (2008). A multigene family of

Heteractis magnificalysins (HMgs). Toxicon, 51(8), 1374–1382.

doi.org/10.1016/j.toxicon.2008.03.005

Waterhouse, R. M., Seppey, M., Simão, F. A., Manni, M., Ioannidis, P.,

Klioutchnikov, G., … Zdobnov, E. M. (2018). BUSCO applications from

quality assessments to gene prediction and phylogenomics. Molecular

Biology and Evolution, 35(3), 543–548. doi.org/10.1093/molbev/msx319

Watts, P. C., Allcock, A. L., Lynch, S. M., & Thorpe, J. P. (2000). An analysis of the

nematocysts of the Actinia equina and the green sea

anemone Actinia prasina. Journal of the Marine Biological Association of the

UK, 80(4), 719–724. doi.org/10.1017/s002531540000254x

Wilding, C. S., & Weedall, G. D. (2019). Morphotypes of the common beadlet

anemone Actinia equina (L.) are genetically distinct. Journal of Experimental

Marine Biology and Ecology, 510, 81–85.

doi.org/10.1016/j.jembe.2018.10.001

Wissler, L., Gadau, J., Simola, D. F., Helmkampf, M., & Bornberg-Bauer, E. (2013).

Mechanisms and dynamics of orphan gene emergence in insect genomes.

Genome Biology and Evolution, 5(2), 439–455. doi.org/10.1093/gbe/evt009

Wooldridge, B. J., Pineda, G., Banuelas-Ornelas, J. J., Dagda, R. K., Gasanov, S. E.,

Rael, E. D., & Lieb, C. S. (2001). Mojave rattlesnakes (Crotalus scutulatus

scutulatus) lacking the acidic subunit DNA sequence lack Mojave toxin in

their venom. Comparative Biochemistry and Physiology Part B: Biochemistry

182 Bibliography

and Molecular Biology, 130(2), 169–179. doi.org/10.1016/S1096-

4959(01)00422-5

Xu, Z., & Wang, H. (2007). LTR_FINDER: an efficient tool for the prediction of

full-length LTR retrotransposons. Nucleic Acids Research, 35, W265–W268.

doi.org/10.1093/nar/gkm286

Yang, J., Chen, Xiaoli, Bai, J., Fang, D., Qiu, Y., Jiang, W., … Shi, Q. (2016). The

Sinocyclocheilus cavefish genome provides insights into cave adaptation.

BMC Biology, 14(1), 1. doi.org/10.1186/s12915-015-0223-4

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular

Biology and Evolution, 24(8), 1586–1591. doi.org/10.1093/molbev/msm088

Ye, C., Ji, G., & Liang, C. (2016). detectMITE: a novel approach to detect miniature

inverted repeat transposable elements in genomes. Scientific Reports, 6,

19688. doi.org/10.1038/srep19688

Young, M. D., Wakefield, M. J., Smyth, G. K., & Oshlack, A. (2010). Gene ontology

analysis for RNA-seq: accounting for selection bias. Genome Biology, 11(2),

R14. doi.org/10.1186/gb-2010-11-2-r14

Zhang, J., Rosenberg, H. F., & Nei, M. (1998). Positive Darwinian selection after

gene duplication in primate ribonuclease genes. Proceedings of the National

Academy of Sciences, 95(7), 3708–3713. doi.org/10.1073/pnas.95.7.3708

Zhang, Q. (2013). The role of mRNA-based duplication in the evolution of the

primate genome. Febs Letters, 587(21), 3500–3507.

doi.org/10.1016/j.febslet.2013.08.042

Zhao, L., Saelao, P., Jones, C. D., & Begun, D. J. (2014). Origin and spread of de

novo genes in Drosophila melanogaster populations. Science, 343(6172),

769-772. doi.org/10.1126/science.1248286

Bibliography 183

Appendices

Appendix A: Supplementary Figures

Appendices 185

186 Appendices

Supplementary Figure 1. Maximum Likelihood tree of nucleotide sequences with midpoint root depicting relationships among A) Fad genes and B) Elovl genes. Branches are coloured and numbered according to the foreground branches used for testing for episodic diversifying selection.

Appendices 187

188 Appendices

Supplementary Figure 2. Principle component analysis (PCA) of the counts matrix for four sea anemone DGE experiments. A) PCA of counts median centred and log2 transformed for morphological structure: acrorhagi, mesenteric filaments and tentacle in Actinia tenebrosa, and across ontogeny (B): 1, 3, 6, and 9 mm size classes. C) PCA of counts median centred and log2 transformed for morphological structure: nematosomes, mesenteric filaments and tentacles in Nematostella vectensis, and across development (D): gastrula, planula and adult.

Appendices 189

Supplementary Figure 3. Principle component analysis of the distribution and copy number of TTL genes of superfamily (A) and species (B) median centred and log2 transformed.

190 Appendices

192 Appendices

Supplementary Figure 4. Transcript expression profile across tissue types and ontogenetic stages in Actinia tenebrosa. A) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for morphological structure: acrorhagi (a), mesenteric filaments (m), and tentacle (t). B) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for ontogenetic stages: 1, 3, 6, and 9 mm.

Appendices 193

Supplementary Figure 5. Transcript expression profile across tissue types and development in Nematostella vectensis. A) Heat map of differentially expressed (DE) transcripts, median centred and log2 transformed FPKM values, for morphological structure: nematosomes, mesenteric filaments, and tentacles (t). B) Heat map of DE transcripts, median centred and log2 transformed FPKM values, for development stages: gastrula, planula and adult.

194 Appendices

196 Appendices

Supplementary Figure 6. Transcript expression profile of toxins across tissue types and development in Nematostella vectensis. A) Heat map of differentially expressed (DE) TTL transcripts, z-scale transformed FPKM values, for morphological structure: nematosomes, mesenteric filaments, and tentacles (t). B) Heat map of DE TTL transcripts, z-scale transformed FPKM values, for development stages: gastrula, planula and adult.

Appendices 197

Supplementary Figure 7. Gene order in Actinia tenebrosa mitochondrial DNA (20,691 bp). Figure produced in Geneious.

198 Appendices

Supplementary Figure 8. Maximum Likelihood tree of sea anemone type 3 (BDS- LIKE) potassium channel toxin (KTx) in actiniarians ANEVI = Anemonia viridis, ANTMC = Antheopsis maculate, ACTTE = Actinia tenebrosa, BUNGR = Bunodosoma granuliferum, and ANTEL = Anthopleura elegantissima.

Appendices 199

Appendix B: Supplementary Tables

200 Appendices

Supplementary Table 1. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. tenebrosa ecotype (red, brown, blue and green)

A. tenebrosa (ecotype) Assembly Metrics Red Brown Blue Green

Number of reads 152,136,760 201,995,450 175,687,690 179,309,262

Total assembled base 170,903,166 179,276,582 113,760,033 149,624,586 pairs

Number of transcripts 241,041 243,560 116,930 199,049

N10 3,875 4,609 5,192 4,517

N30 1,992 2,345 2,889 2,325

N50 1,084 1,237 1,804 1,264

Average contig length 709.02 736.07 972.89 751.70

CEGMA (Full length %) 92.74 96.37 97.98 97.58

BUSCO (% Complete 96.3 96.9 97.3 96.9 BUSCOs)

Appendices 201

Supplementary Table 2. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. buddemeieri, A. veratra, C. polypus, N. annamensis and Telmatactis sp.

Species Assembly Metrics A. buddemeieri A. veratra C. polypus N. annamensis Telmatactis sp.

Number of reads 51,620,970 69,513,111 209,875,116 205,911,634 79,893,721

Total assembled base pairs 115,667,107 175,217,930 183,142,054 110,971,901 114,935,169

Number of transcripts 332,206 264,252 225,965 116,120 190,166

N10 2,997 3,836 5,025 5,354 3,220

N30 1,552 1,928 2,708 2,933 1,551

N50 851 993 1,520 1,824 806

Average contig length 622.43 663.07 810.49 955.67 604.39

CEGMA (Full length %) 91.13 96.37 97.98 97.58 77.02

BUSCO (% Complete BUSCOs) 95 96.9 96.7 97.3 83.4

202 Appendices

Supplementary Table 3. Detecting pervasive purifying and diversifying selection using FUBAR (Murrell et al., 2013) within HyPHy(Pond et al., 2005) package at posterior probability of ≥ 0.95

Number of Codons Gene family dN/dS <1 dN/dS >1 Fad 282 0 Elovl 197 0

Appendices 203

Supplementary Table 4. Detecting codons under episodic diversifying selection with branch-site models implemented in CODEML for the Fad and Elovl gene families from actiniarian transcriptome assemblies. Significance at ≤ 0.05 and ≤ 0.01 following Bonferroni's correction are highlighted as * and **, respectively. Codons under episodic diversifying selected detected at ≥ 0.95 significance are indicated and ≥ 0.99 significance in the parenthesis using the Bayes Empirical Bayes analysis. NS refers to not significant.

Diversifying Gene H0 H1 Branch P-value selected family Likelihood Likelihood codons 1 -8082.83 -8081.41 9.16 e-02NS NS Fad 2 -8082.83 -8081.42 9.34 e-02NS NS 1 -9441.42 -9435.93 9.22 e-04** 14 (5) 2 -9442.85 -9435.43 1.17 e-04** 11 (4) Elovl 3 -9443.91 -9438.58 1.10 e-03** 15 (6) 4 -9441.46 -9434.17 1.34 e-04** 11 (3)

204 Appendices

Supplementary Table 5. Codons encoding amino acids under episodic diversifying selection from the branch-site models implemented in CODEML for Elovl gene families from actiniarian transcriptome assemblies and a significance at ≥ 0.95 using the Bayes Empirical Bayes analysis. The number refers to the consensus position and letter refers to the amino acid of TR115686_c0_g1_i1_m.966674_Anthopleura_buddemeieri.

Gene Branch Codon under diversifying selection family 19K, 51I, 68P, 77A, 78L, 100P, 101Q, 110P, 142T, 165L, 211A, 1 227Q, 229V, 258S 11A, 15M, 19K, 79T, 105D, 114Y, 160A, 167G, 211A, 218T, 2 227Q Elovl 22V, 44A, 66R, 67K, 74F, 91L, 117H, 120Y, 121V, 167G, 191Y, 3 230G, 242F, 245Y, 264A

4 13E, 28T, 56W, 63L, 69M, 80S, 81L, 89M, 157M, 183S, 263N

Appendices 205

Supplementary Table 6. Average fatty acid profile from whole-organism (n=3) of anemone and prawn. The concentration of FAME (given in mmol/kg and % of total FAME).

Anemone Prawn Fatty acid mmol/kg (S. D.) % of FAME (S. D.) mmol/kg (S. D.) % of FAME (S. D.) Tridecylic acid (C13:0) 0 (0) 0 (0) 0.31 (0.16) 2.07 (1.08) Myristic acid (14:0) 0.35 (0.02) 2.1 (0.23) 0.59 (0.16) 3.96 (1.08) Pentadecylic acid (C15:0) 0 (0) 0 (0) 0.3 (0.04) 1.99 (0.27) Palmitic acid (16:0) 3.45 (0.63) 21.17 (3.64) 4.47 (0.46) 29.87 (3.05) Margaric acid (C17:0) 0 (0) 0 (0) 0.48 (0.02) 3.21 (0.16) Stearic acid (18:0) 2.94 (0.44) 18.34 (2.86) 2.46 (0.34) 16.42 (2.27) Arachidic acid (20:0) 1.24 (0.14) 7.89 (0.99) 0.21 (0.01) 1.41 (0.05) Behenic acid (22:0) 1.33 (0.2) 8.53 (1.36) 0.16 (0.02) 1.05 (0.16) Lignoceric acid (24:0) 1.19 (0.22) 7.71 (1.28) 0 (0) 0 (0) ∑SFA 10.5 65.74 8.98 59.98 Palmitoleic acid (16:1n-7) 0.17 (0.08) 1.15 (0.36) 1.24 (0.1) 8.3 (0.64) Heptadecenoic acid (C17:1n-7) 0 (0) 0 (0) 0.22 (0.06) 1.44 (0.43) Oleic acid (18:1n-9) 0.7 (0.2) 4.32 (0.96) 1.54 (0.22) 10.27 (1.49) Vaccenic acid (18:1n-7) 0.43 (0.07) 2.46 (0.07) 0.62 (0.14) 4.12 (0.92) Erucic acid (22:1n-9) 0.36 (0.03) 2.44 (0.32) 0 (0) 0 (0) ∑MUFA 1.66 10.37 3.62 24.13 Linoleic acid (18:2n-6) 0.58 (0.32) 3.35 (0.95) 0.17 (0.05) 1.16 (0.32) α -Linolenic acid (18:3n-3) 0.71 (0.4) 4.22 (1.34) 0 (0) 0 (0) Eicosatrienoic acid (20:3n-3) 0.4 (0.03) 2.57 (0.33) 0 (0) 0 (0) Arachidonic acid (20:4n-6) 0.31 (0.01) 2.06 (0.21) 0.91 (0.2) 6.09 (1.34) Eicosapentaenoic acid (20:5n-3) 0.47 (0.05) 3.09 (0.5) 0.78 (0.05) 5.19 (0.31) Docosahexaenoic acid (C22:6n-3) 0 (0) 0 (0) 0.52 (0.23) 3.45 (1.51) ∑PUFA 2.47 15.29 2.38 15.89 ∑n-6 FA 0.89 5.41 1.08 7.25 ∑n-3 FA 1.58 9.88 1.3 8.64 n-6/n-3 0.56 0.55 0.83 0.84

206 Appendices

Supplementary Table 7. TTL gene family distribution across Metazoa. TTL genes identified using BLAST analysis (e value < 1e-05) against the Swiss-prot database.

Pleurobrachia Mnemiopsis Trichoplax Amphimedon Gene family bachei leidyi adhaerens queenslandica peptidase M12A family 0 0 0 1 phospholipase A2 family 0 0 1 0 actinoporin family 0 0 0 0 venom complement C3 homolog family 0 0 0 0 multicopper oxidase family 0 0 1 0 arthropod phospholipase D family 1 0 0 0 venom Kunitz-type family 0 0 0 2 DNase II family 1 1 0 0 cystatin family 1 1 0 0 SNTX/VTX toxin family 1 4 0 0 true venom lectin family 1 0 1 0 C-terminal natriuretic peptide family 0 1 0 0 natterin family 1 0 0 0 5'-nucleotidase family 1 0 0 0 ficolin lectin family 0 0 0 3 long chain scorpion toxin family 0 0 0 0 insulin family 0 0 0 0 type-B carboxylesterase/lipase family 0 0 3 0 snaclec family 0 0 1 0 CRISP family 0 0 0 0 venom metalloproteinase (M12B) family 0 0 0 0 peptidase S1 family 0 0 0 0 Cnidaria small cysteine-rich protein 0 0 0 0 (SCRiP) family histidine acid phosphatase family 0 0 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 0 PDGF/VEGF growth factor family 0 0 0 0 glycoprotein hormones subunit alpha 0 0 0 0 family spider wap-2 family 0 0 0 0 glycosyl hydrolase 56 family 0 0 0 0 snake waprin family 0 0 0 0 AVIT (prokineticin) family 0 0 0 0 peptidase M13 family 0 0 0 0 N-terminal section bradykinin- 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 0 0 0 0 (M12B) family CREC family 0 0 0 0 AB hydrolase superfamily 0 0 0 0 long (3 C-C) scorpion toxin superfamily 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 snake three-finger toxin family 0 0 0 0

Appendices 207

peptidase S10 family 0 0 0 0 conopeptide P-like superfamily 0 0 0 0 glycoprotein hormones subunit beta 0 0 0 0 family cathelicidin family 0 0 0 0 short scorpion toxin superfamily 0 0 0 0 peptidase S9B family 0 0 0 0 protease inhibitor I19 family 0 0 0 0 jellyfish toxin family 0 0 0 0 phospholipase B-like family 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 glycosyl hydrolase 37 family 0 0 0 1 melittin family 0 0 0 0 U12-lycotoxin family 0 0 0 0 NPY family 0 0 0 0 NGF-beta family 0 0 0 0 secapin family 0 0 0 0 MIT-like AcTx family 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 0 0 family ohanin/vespryn family 0 0 0 0 conotoxin L superfamily 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 spider wap-1 family 0 0 0 0 ARMT1 family 0 0 0 0 Unclassed 0 1 0 0 Total 7 8 7 7

208 Appendices

Exaiptasia Orbicella Acropora Stylophora Hydra Gene family pallida faveolata digitifera pistillata vulgaris peptidase M12A family 10 3 1 5 3 phospholipase A2 family 20 13 8 13 9 actinoporin family 3 1 2 1 3 venom complement C3 homolog family 0 0 0 0 0 multicopper oxidase family 10 4 3 6 2 arthropod phospholipase D family 0 0 0 0 0 venom Kunitz-type family 4 3 3 1 0 DNase II family 0 0 0 0 1 cystatin family 0 2 0 1 2 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 9 2 0 0 0 C-terminal natriuretic peptide family 0 0 0 0 0 natterin family 0 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 2 5 2 5 0 long chain scorpion toxin family 0 0 0 0 0 insulin family 0 0 0 1 0 type-B carboxylesterase/lipase family 5 1 0 1 0 snaclec family 1 1 1 2 2 CRISP family 0 0 0 0 2 venom metalloproteinase (M12B) 0 3 0 0 0 family peptidase S1 family 0 0 0 0 0 Cnidaria small cysteine-rich protein 0 8 12 1 0 (SCRiP) family histidine acid phosphatase family 0 0 0 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 0 0 PDGF/VEGF growth factor family 0 0 0 0 0 glycoprotein hormones subunit alpha 0 0 0 0 0 family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 0 0 0 0 0 snake waprin family 0 0 0 0 0 AVIT (prokineticin) family 0 0 0 0 0 peptidase M13 family 0 0 0 0 0 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 0 0 0 0 0 (M12B) family CREC family 0 0 1 1 0 AB hydrolase superfamily 1 0 0 0 0 long (3 C-C) scorpion toxin superfamily 0 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 0 0 0 0 0 conopeptide P-like superfamily 0 0 0 3 0 glycoprotein hormones subunit beta 0 0 0 0 0 family cathelicidin family 0 0 0 0 0

Appendices 209

short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 0 0 0 0 0 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 2 0 1 0 4 phospholipase B-like family 0 0 0 1 0 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 4 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 0 superfamily latrotoxin superfamily 0 1 0 0 0 glycosyl hydrolase 37 family 0 0 0 0 0 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 0 0 0 0 0 NGF-beta family 0 0 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 0 0 0 family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a family 1 0 0 0 0 spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 14 7 8 5 3 Total 86 54 42 47 31

210 Appendices

Danaus Centruroides Solenopsi Stegodyphus Nasonia Gene family plexippus sculpturatus s invicta mimosarum vitripennis plexippus peptidase M12A family 20 0 26 0 0 phospholipase A2 family 6 7 8 2 8 actinoporin family 0 0 0 0 0 venom complement C3 homolog 0 0 0 0 0 family multicopper oxidase family 0 0 0 0 0 arthropod phospholipase D family 3 0 3 0 0 venom Kunitz-type family 11 1 0 1 0 DNase II family 0 0 0 0 1 cystatin family 0 0 2 1 0 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 2 1 0 0 3 C-terminal natriuretic peptide 0 0 0 0 0 family natterin family 0 0 0 1 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 0 0 0 0 0 long chain scorpion toxin family 2 0 0 0 0 insulin family 0 0 0 0 0 type-B carboxylesterase/lipase 8 3 4 5 17 family snaclec family 1 0 0 0 0 CRISP family 7 2 5 3 9 venom metalloproteinase (M12B) 29 0 1 0 0 family peptidase S1 family 1 7 1 4 9 Cnidaria small cysteine-rich 0 0 0 0 0 protein (SCRiP) family histidine acid phosphatase family 0 4 0 1 29 long (4 C-C) scorpion toxin 59 0 0 0 0 superfamily PDGF/VEGF growth factor 2 0 0 0 0 family glycoprotein hormones subunit 2 0 1 1 0 alpha family spider wap-2 family 0 0 2 0 0 glycosyl hydrolase 56 family 2 1 0 2 1 snake waprin family 0 1 0 0 0 AVIT (prokineticin) family 4 0 4 0 0 peptidase M13 family 3 2 0 0 0 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom 0 10 0 0 18 metalloproteinase (M12B) family CREC family 0 0 1 1 1 AB hydrolase superfamily 0 4 0 1 0 long (3 C-C) scorpion toxin 6 0 0 0 0 superfamily ant venom allergen 2/4 family 0 6 0 0 0 scorpion La1-like peptide family 4 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 0 1 0 1 4 conopeptide P-like superfamily 0 0 0 0 1

Appendices 211

glycoprotein hormones subunit 2 0 0 0 0 beta family cathelicidin family 0 0 0 0 0 short scorpion toxin superfamily 2 0 0 0 0 peptidase S9B family 0 1 0 2 6 protease inhibitor I19 family 0 0 0 0 4 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiestera 0 0 0 0 0 se family non-disulfide-bridged peptide 13 0 0 0 0 (NDBP) superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 0 1 0 0 1 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 2 0 0 NPY family 0 0 1 0 0 NGF-beta family 0 0 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 1 0 0 0 0 glutaminyl-peptide 0 0 0 0 0 cyclotransferase family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a 0 0 0 0 0 family spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 68 4 21 2 8 Total 258 56 82 28 120

212 Appendices

Apis Caenorhabditis Hypsibius Daphnia Oryctes Gene family mellifera elegans dujardini pulex borbonicus peptidase M12A family 0 0 2 0 0 phospholipase A2 family 9 2 1 5 2 actinoporin family 0 0 0 0 0 venom complement C3 homolog family 0 0 0 0 0 multicopper oxidase family 0 0 0 0 0 arthropod phospholipase D family 0 0 0 3 0 venom Kunitz-type family 0 3 0 2 0 DNase II family 0 0 0 0 0 cystatin family 0 0 0 0 0 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 0 18 1 1 0 C-terminal natriuretic peptide family 0 0 0 0 0 natterin family 0 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 0 0 0 0 0 long chain scorpion toxin family 0 0 0 0 0 insulin family 0 0 0 0 0 type-B carboxylesterase/lipase family 4 0 0 5 2 snaclec family 0 8 0 0 0 CRISP family 2 4 9 8 2 venom metalloproteinase (M12B) 0 2 0 0 0 family peptidase S1 family 1 2 0 3 0 Cnidaria small cysteine-rich protein 0 0 0 0 0 (SCRiP) family histidine acid phosphatase family 9 0 0 0 4 long (4 C-C) scorpion toxin superfamily 0 0 0 0 0 PDGF/VEGF growth factor family 0 0 0 0 0 glycoprotein hormones subunit alpha 0 0 0 1 0 family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 0 1 0 0 1 snake waprin family 2 0 2 0 1 AVIT (prokineticin) family 0 0 0 0 0 peptidase M13 family 0 0 0 2 0 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 11 0 0 0 0 (M12B) family CREC family 0 0 0 1 1 AB hydrolase superfamily 2 0 0 0 0 long (3 C-C) scorpion toxin superfamily 0 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 2 0 0 0 0 conopeptide P-like superfamily 0 0 0 0 0 glycoprotein hormones subunit beta 0 0 0 0 1 family cathelicidin family 0 0 0 0 0

Appendices 213

short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 1 0 0 0 0 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 1 0 0 0 0 melittin family 1 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 0 0 0 0 0 NGF-beta family 0 0 0 0 0 secapin family 1 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 1 0 0 family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 0 spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 8 0 2 0 0 Total 54 40 18 31 14

214 Appendices

Atta Ramazzottius Melipona Trichuris Anopheles Gene family cephalote varieornatus quadrifasciata trichiura darlingi s peptidase M12A family 0 0 0 0 0 phospholipase A2 family 3 0 2 1 2 actinoporin family 0 0 0 0 0 venom complement C3 homolog 0 0 0 0 0 family multicopper oxidase family 0 0 0 0 0 arthropod phospholipase D family 0 0 0 0 0 venom Kunitz-type family 0 1 1 0 1 DNase II family 0 0 0 2 0 cystatin family 0 0 0 0 0 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 0 0 0 0 0 C-terminal natriuretic peptide 0 0 0 0 0 family natterin family 0 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 0 0 0 0 0 long chain scorpion toxin family 0 0 0 0 0 insulin family 0 0 0 0 0 type-B carboxylesterase/lipase 1 0 2 0 5 family snaclec family 0 0 0 0 1 CRISP family 1 4 0 1 13 venom metalloproteinase (M12B) 0 0 0 0 0 family peptidase S1 family 3 2 1 0 7 Cnidaria small cysteine-rich protein 0 0 0 0 0 (SCRiP) family histidine acid phosphatase family 2 0 1 0 0 long (4 C-C) scorpion toxin 0 0 0 0 0 superfamily PDGF/VEGF growth factor family 0 1 1 0 0 glycoprotein hormones subunit 0 0 0 0 1 alpha family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 1 0 0 0 0 snake waprin family 1 0 0 0 0 AVIT (prokineticin) family 0 0 0 0 0 peptidase M13 family 0 0 0 0 0 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 2 0 0 0 0 (M12B) family CREC family 0 0 0 0 0 AB hydrolase superfamily 3 0 0 0 0 long (3 C-C) scorpion toxin 0 0 0 0 0 superfamily ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 1 0 1 0 1 conopeptide P-like superfamily 0 0 0 0 0

Appendices 215

glycoprotein hormones subunit beta 0 0 0 0 1 family cathelicidin family 0 0 0 0 0 short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 1 0 0 0 1 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family non-disulfide-bridged peptide 0 0 0 0 0 (NDBP) superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 1 0 1 0 0 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 0 0 0 0 0 NGF-beta family 0 1 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 0 0 0 family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a 0 0 0 0 0 family spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 4 1 1 0 1 Total 24 10 11 4 34

216 Appendices

Tribolium Anopheles Athalia Aedes Tetranychus Gene family castaneum gambiae str rosae aegypti urticae peptidase M12A family 0 0 0 2 4 phospholipase A2 family 10 2 5 8 4 actinoporin family 0 0 0 0 0 venom complement C3 homolog family 0 0 0 0 0 multicopper oxidase family 0 0 0 0 0 arthropod phospholipase D family 0 0 0 0 4 venom Kunitz-type family 2 1 1 1 5 DNase II family 1 0 0 1 1 cystatin family 0 0 0 0 14 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 2 2 0 0 1 C-terminal natriuretic peptide family 0 0 0 0 0 natterin family 0 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 0 0 0 0 0 long chain scorpion toxin family 0 0 0 0 0 insulin family 0 0 0 0 0 type-B carboxylesterase/lipase family 21 8 7 11 23 snaclec family 0 0 1 0 0 CRISP family 30 12 3 21 0 venom metalloproteinase (M12B) 0 0 0 0 0 family peptidase S1 family 5 8 5 9 1 Cnidaria small cysteine-rich protein 0 0 0 0 0 (SCRiP) family histidine acid phosphatase family 8 1 10 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 0 0 PDGF/VEGF growth factor family 0 0 0 0 6 glycoprotein hormones subunit alpha 1 1 1 1 1 family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 1 0 0 0 0 snake waprin family 0 0 2 0 0 AVIT (prokineticin) family 0 0 0 0 0 peptidase M13 family 0 0 0 0 4 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 3 0 11 6 1 (M12B) family CREC family 1 0 1 2 0 AB hydrolase superfamily 2 1 1 6 2 long (3 C-C) scorpion toxin superfamily 0 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 1 1 1 1 0 conopeptide P-like superfamily 0 0 0 0 0 glycoprotein hormones subunit beta 2 1 0 1 1 family cathelicidin family 0 0 0 0 0

Appendices 217

short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 1 1 1 2 0 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 0 0 1 0 0 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 0 0 0 0 0 NGF-beta family 0 0 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 0 0 1 family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 0 spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 7 7 6 6 5 Total 98 46 57 78 78

218 Appendices

Ixodes Nephila Anopheles Clunio Gene family scapularis clavipes sinensis marinus peptidase M12A family 0 5 0 0 phospholipase A2 family 3 3 1 4 actinoporin family 0 0 0 0 venom complement C3 homolog family 0 0 0 0 multicopper oxidase family 0 0 0 0 arthropod phospholipase D family 1 3 0 0 venom Kunitz-type family 4 2 0 4 DNase II family 0 0 0 1 cystatin family 0 1 0 0 SNTX/VTX toxin family 0 1 0 0 true venom lectin family 0 0 0 0 C-terminal natriuretic peptide family 0 0 0 0 natterin family 0 0 0 0 5'-nucleotidase family 3 0 0 0 ficolin lectin family 0 0 0 0 long chain scorpion toxin family 0 0 0 0 insulin family 0 0 0 0 type-B carboxylesterase/lipase family 21 7 4 4 snaclec family 0 0 0 0 CRISP family 4 2 9 4 venom metalloproteinase (M12B) family 4 0 0 0 peptidase S1 family 0 0 4 3 Cnidaria small cysteine-rich protein (SCRiP) 0 0 0 0 family histidine acid phosphatase family 0 0 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 0 PDGF/VEGF growth factor family 1 0 0 0 glycoprotein hormones subunit alpha family 0 0 1 0 spider wap-2 family 0 0 0 0 glycosyl hydrolase 56 family 0 0 0 0 snake waprin family 0 0 0 0 AVIT (prokineticin) family 0 1 0 0 peptidase M13 family 1 0 0 0 N-terminal section bradykinin-potentiating 0 0 0 0 peptide family C-terminal venom metalloproteinase (M12B) 0 0 1 0 family CREC family 0 0 1 0 AB hydrolase superfamily 0 0 1 1 long (3 C-C) scorpion toxin superfamily 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 scorpion La1-like peptide family 1 0 0 0 snake three-finger toxin family 0 0 0 0 peptidase S10 family 3 0 1 1 conopeptide P-like superfamily 0 0 0 0 glycoprotein hormones subunit beta family 0 0 1 1 cathelicidin family 0 0 0 0

Appendices 219

short scorpion toxin superfamily 0 0 0 0 peptidase S9B family 0 0 1 1 protease inhibitor I19 family 0 0 0 0 jellyfish toxin family 0 0 0 0 phospholipase B-like family 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 glycosyl hydrolase 37 family 0 0 0 0 melittin family 0 0 0 0 U12-lycotoxin family 0 0 0 0 NPY family 0 0 0 0 NGF-beta family 0 0 0 0 secapin family 0 0 0 0 MIT-like AcTx family 0 0 0 0 glutaminyl-peptide cyclotransferase family 0 0 0 0 ohanin/vespryn family 0 0 0 0 conotoxin L superfamily 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 spider wap-1 family 0 1 0 0 ARMT1 family 0 0 0 0 Unclassed 2 7 1 3 Total 48 33 26 27

220 Appendices

Capitella Octopus Lottia Crassostrea Biomphalaria Gene family teleta bimaculoides gigantea gigas glabrata peptidase M12A family 0 0 0 0 0 phospholipase A2 family 1 7 2 10 8 actinoporin family 0 0 0 0 0 venom complement C3 homolog 0 0 0 0 0 family multicopper oxidase family 0 0 0 1 0 arthropod phospholipase D family 0 0 0 0 0 venom Kunitz-type family 0 1 0 8 3 DNase II family 3 1 2 11 0 cystatin family 0 0 0 0 0 SNTX/VTX toxin family 0 0 0 0 0 true venom lectin family 3 0 3 10 1 C-terminal natriuretic peptide 0 0 0 0 0 family natterin family 0 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 2 0 0 17 1 long chain scorpion toxin family 0 0 0 0 0 insulin family 1 1 1 3 0 type-B carboxylesterase/lipase 0 0 1 1 0 family snaclec family 6 0 2 2 2 CRISP family 0 2 2 15 1 venom metalloproteinase (M12B) 0 0 0 1 1 family peptidase S1 family 0 1 0 0 0 Cnidaria small cysteine-rich 0 0 0 0 0 protein (SCRiP) family histidine acid phosphatase family 0 0 0 0 0 long (4 C-C) scorpion toxin 0 0 0 0 0 superfamily PDGF/VEGF growth factor family 0 0 0 0 0 glycoprotein hormones subunit 0 0 0 1 0 alpha family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 1 0 0 0 1 snake waprin family 0 0 1 1 0 AVIT (prokineticin) family 0 0 0 0 0 peptidase M13 family 0 0 0 0 0 N-terminal section bradykinin- 0 0 0 0 0 potentiating peptide family C-terminal venom 0 0 0 0 0 metalloproteinase (M12B) family CREC family 1 0 1 0 0 AB hydrolase superfamily 0 0 0 0 0 long (3 C-C) scorpion toxin 0 0 0 0 0 superfamily ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 0 0 0 0 0 peptidase S10 family 0 0 0 0 1 conopeptide P-like superfamily 0 0 0 0 4

Appendices 221

glycoprotein hormones subunit 2 0 0 2 1 beta family cathelicidin family 0 0 0 0 0 short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 0 0 0 0 0 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 1 0 0 teretoxin N (TN) superfamily 0 0 1 0 0 flavin monoamine oxidase family 0 0 0 0 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesteras 0 0 0 0 0 e family non-disulfide-bridged peptide 0 0 0 0 0 (NDBP) superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 0 0 0 0 0 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 0 0 0 0 0 NGF-beta family 0 0 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide 0 0 0 0 0 cyclotransferase family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a 0 0 0 0 0 family spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 1 0 4 1 10 Total 21 13 21 84 34

222 Appendices

Mizuhopecte Aplysia Helobdella Gene family n yessoensis californica robusta peptidase M12A family 0 1 0 phospholipase A2 family 10 2 0 actinoporin family 1 0 0 venom complement C3 homolog family 0 1 0 multicopper oxidase family 0 0 0 arthropod phospholipase D family 0 0 0 venom Kunitz-type family 3 8 0 DNase II family 2 1 1 cystatin family 0 0 0 SNTX/VTX toxin family 0 0 0 true venom lectin family 5 5 4 C-terminal natriuretic peptide family 0 0 0 natterin family 0 0 0 5'-nucleotidase family 0 0 0 ficolin lectin family 1 0 0 long chain scorpion toxin family 0 0 0 insulin family 0 0 0 type-B carboxylesterase/lipase family 3 5 0 snaclec family 7 2 0 CRISP family 8 2 0 venom metalloproteinase (M12B) family 0 0 1 peptidase S1 family 0 0 0 Cnidaria small cysteine-rich protein (SCRiP) family 0 0 0 histidine acid phosphatase family 0 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 PDGF/VEGF growth factor family 0 2 0 glycoprotein hormones subunit alpha family 0 0 0 spider wap-2 family 0 0 0 glycosyl hydrolase 56 family 0 1 0 snake waprin family 0 0 0 AVIT (prokineticin) family 1 0 0 peptidase M13 family 0 2 0 N-terminal section bradykinin-potentiating peptide 0 0 0 family C-terminal venom metalloproteinase (M12B) family 0 0 0 CREC family 0 5 0 AB hydrolase superfamily 0 0 0 long (3 C-C) scorpion toxin superfamily 0 0 0 ant venom allergen 2/4 family 0 0 0 scorpion La1-like peptide family 0 0 0 snake three-finger toxin family 0 0 0 peptidase S10 family 0 0 0 conopeptide P-like superfamily 0 0 0 glycoprotein hormones subunit beta family 0 0 0 cathelicidin family 0 0 0 short scorpion toxin superfamily 0 0 0

Appendices 223

peptidase S9B family 0 0 0 protease inhibitor I19 family 0 0 0 jellyfish toxin family 0 0 0 phospholipase B-like family 0 0 0 teretoxin N (TN) superfamily 0 0 0 flavin monoamine oxidase family 0 0 0 sea anemone 8 toxin family 0 0 0 nucleotide pyrophosphatase/phosphodiesterase family 0 0 0 non-disulfide-bridged peptide (NDBP) superfamily 0 0 0 latrotoxin superfamily 0 0 0 glycosyl hydrolase 37 family 0 0 0 melittin family 0 0 0 U12-lycotoxin family 0 0 0 NPY family 0 0 0 NGF-beta family 0 1 0 secapin family 0 0 0 MIT-like AcTx family 0 0 0 glutaminyl-peptide cyclotransferase family 0 0 0 ohanin/vespryn family 0 0 0 conotoxin L superfamily 1 0 0 sea anemone structural class 9a family 0 0 0 spider wap-1 family 0 0 0 ARMT1 family 0 0 0 Unclassed 2 3 1 Total 44 41 7

224 Appendices

Protobothrops Xenopus Acanthaster Gallus Danio Gene family mucrosquamatus tropicalis planci gallus rerio peptidase M12A family 0 0 0 0 0 phospholipase A2 family 7 1 14 12 1 actinoporin family 0 0 0 0 0 venom complement C3 homolog family 1 3 1 1 1 multicopper oxidase family 0 0 5 0 0 arthropod phospholipase D family 0 0 0 0 0 venom Kunitz-type family 9 3 1 0 1 DNase II family 0 0 6 0 0 cystatin family 4 1 1 0 0 SNTX/VTX toxin family 0 0 0 0 12 true venom lectin family 8 0 1 0 11 C-terminal natriuretic peptide family 0 0 0 0 0 natterin family 1 0 0 2 2 5'-nucleotidase family 0 0 0 0 0 ficolin lectin family 3 0 0 0 0 long chain scorpion toxin family 0 0 0 0 0 insulin family 0 0 0 0 0 type-B carboxylesterase/lipase family 2 2 3 0 0 snaclec family 8 0 10 1 0 CRISP family 0 6 0 1 2 venom metalloproteinase (M12B) 6 3 2 0 1 family peptidase S1 family 9 0 0 0 0 Cnidaria small cysteine-rich protein 0 0 0 0 0 (SCRiP) family histidine acid phosphatase family 0 0 0 0 0 long (4 C-C) scorpion toxin 0 0 0 0 0 superfamily PDGF/VEGF growth factor family 4 5 0 0 0 glycoprotein hormones subunit alpha 0 0 1 0 0 family spider wap-2 family 0 0 0 0 0 glycosyl hydrolase 56 family 2 0 0 1 1 snake waprin family 3 0 0 0 0 AVIT (prokineticin) family 0 1 0 1 1 peptidase M13 family 0 0 0 0 0 N-terminal section bradykinin- 1 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 0 0 0 0 0 (M12B) family CREC family 1 0 1 0 0 AB hydrolase superfamily 2 0 0 1 1 long (3 C-C) scorpion toxin 0 0 0 0 0 superfamily ant venom allergen 2/4 family 0 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 0 snake three-finger toxin family 4 0 0 0 0 peptidase S10 family 0 0 0 0 0 conopeptide P-like superfamily 0 0 0 0 0 glycoprotein hormones subunit beta 0 0 0 0 0 family

Appendices 225

cathelicidin family 4 2 0 0 0 short scorpion toxin superfamily 0 0 0 0 0 peptidase S9B family 0 0 0 0 0 protease inhibitor I19 family 0 0 0 0 0 jellyfish toxin family 0 0 0 0 0 phospholipase B-like family 0 0 0 0 1 teretoxin N (TN) superfamily 0 0 0 0 0 flavin monoamine oxidase family 3 1 0 1 0 sea anemone 8 toxin family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 2 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 0 glycosyl hydrolase 37 family 0 0 0 0 0 melittin family 0 0 0 0 0 U12-lycotoxin family 0 0 0 0 0 NPY family 1 1 0 0 0 NGF-beta family 1 0 0 0 0 secapin family 0 0 0 0 0 MIT-like AcTx family 0 0 0 0 0 glutaminyl-peptide cyclotransferase 1 0 0 0 1 family ohanin/vespryn family 0 0 0 0 0 conotoxin L superfamily 0 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 0 spider wap-1 family 0 0 0 0 0 ARMT1 family 0 0 0 0 0 Unclassed 1 0 3 0 1 Total 88 29 49 21 37

226 Appendices

Strongylocentrotus Ophiophagus Ornithorhynchus Mus Gene family purpuratus hannah anatinus musculus peptidase M12A family 0 0 0 0 phospholipase A2 family 9 0 0 0 actinoporin family 0 0 0 0 venom complement C3 homolog family 0 0 0 0 multicopper oxidase family 0 1 0 0 arthropod phospholipase D family 0 0 0 0 venom Kunitz-type family 0 3 0 0 DNase II family 2 0 0 0 cystatin family 0 0 0 0 SNTX/VTX toxin family 0 0 0 0 true venom lectin family 5 3 0 0 C-terminal natriuretic peptide family 0 0 2 0 natterin family 0 1 0 0 5'-nucleotidase family 0 0 0 0 ficolin lectin family 1 0 0 0 long chain scorpion toxin family 0 0 0 0 insulin family 0 0 0 0 type-B carboxylesterase/lipase family 12 1 0 0 snaclec family 0 0 0 0 CRISP family 0 1 0 0 venom metalloproteinase (M12B) family 0 1 0 0 peptidase S1 family 0 1 0 0 Cnidaria small cysteine-rich protein 0 0 0 0 (SCRiP) family histidine acid phosphatase family 0 0 0 0 long (4 C-C) scorpion toxin superfamily 0 0 0 0 PDGF/VEGF growth factor family 0 0 0 0 glycoprotein hormones subunit alpha 5 0 0 0 family spider wap-2 family 0 0 0 0 glycosyl hydrolase 56 family 0 0 0 1 snake waprin family 0 1 0 0 AVIT (prokineticin) family 0 0 0 0 peptidase M13 family 0 0 0 0 N-terminal section bradykinin- 0 0 0 0 potentiating peptide family C-terminal venom metalloproteinase 0 0 0 0 (M12B) family CREC family 0 0 0 0 AB hydrolase superfamily 0 0 0 0 long (3 C-C) scorpion toxin superfamily 0 0 0 0 ant venom allergen 2/4 family 0 0 0 0 scorpion La1-like peptide family 0 0 0 0 snake three-finger toxin family 0 4 0 0 peptidase S10 family 0 0 0 0 conopeptide P-like superfamily 0 0 0 0 glycoprotein hormones subunit beta 0 0 0 0 family cathelicidin family 0 0 1 0

Appendices 227

short scorpion toxin superfamily 0 0 0 0 peptidase S9B family 0 0 0 0 protease inhibitor I19 family 0 0 0 0 jellyfish toxin family 0 0 0 0 phospholipase B-like family 0 0 0 0 teretoxin N (TN) superfamily 0 0 0 0 flavin monoamine oxidase family 0 1 0 3 sea anemone 8 toxin family 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 family non-disulfide-bridged peptide (NDBP) 0 0 0 0 superfamily latrotoxin superfamily 0 0 0 0 glycosyl hydrolase 37 family 0 0 0 0 melittin family 0 0 0 0 U12-lycotoxin family 0 0 0 0 NPY family 0 1 0 0 NGF-beta family 0 2 0 0 secapin family 0 0 0 0 MIT-like AcTx family 0 0 0 0 glutaminyl-peptide cyclotransferase 0 0 0 0 family ohanin/vespryn family 0 0 0 0 conotoxin L superfamily 0 0 0 0 sea anemone structural class 9a family 0 0 0 0 spider wap-1 family 0 0 0 0 ARMT1 family 0 0 0 0 Unclassed 4 0 0 0 Total 38 21 3 4

228 Appendices

Supplementary Table 8. TTL gene family distribution across actiniarian transcriptomes. TTL genes identified using BLAST analysis (e value < 1e-05) against the Swiss-prot database.

Actinia Anemonia Aulactinia Stichodactyl Anthopleura TTL gene family tenebrosa sulcata veratra a haddoni buddemeieri (brown) phospholipase A2 family 10 9 12 6 14 sea anemone type 3 (BDS-LIKE) 18 26 3 13 23 potassium channel toxin family venom Kunitz-type family 11 13 8 5 8 sea anemone 8 toxin family 5 5 7 7 13 sea anemone type 1 potassium 8 4 9 6 7 channel toxin family peptidase M12A family 4 1 5 1 2 Unclassed 2 0 5 2 2 multicopper oxidase family 0 5 2 0 3 snaclec family 3 4 4 1 1 sea anemone sodium channel 5 4 1 4 5 inhibitory toxin family type-B carboxylesterase/lipase family 2 5 4 0 2 actinoporin family 1 4 1 3 5 ficolin lectin family 0 2 1 0 1 true venom lectin family 2 1 2 0 0 sea anemone type 5 potassium 1 3 0 1 1 channel toxin family Cnidaria small cysteine-rich protein 2 1 1 1 3 (SCRiP) family conopeptide P-like superfamily 1 0 2 0 1 natterin family 0 2 1 1 3 Acrorhagin1 1 3 0 1 2 venom complement C3 homolog 0 2 1 0 0 family jellyfish toxin family 1 1 0 0 0 sea anemone structural class 9a 0 0 3 0 0 family AB hydrolase superfamily 1 0 1 1 0 huwentoxin-1 family 0 0 0 5 0 peptidase M13 family 0 0 0 0 1 phospholipase B-like family 0 0 0 0 0 CRISP family 0 1 0 0 0 cystatin family 0 1 0 0 2 sea anemone short toxin (NaTx type 0 0 0 3 0 III) family CREC family 0 0 0 1 0 peptidase S1 family 1 0 0 0 0 5'-nucleotidase family 0 0 0 0 0 venom metalloproteinase (M12B) 0 0 0 0 0 family magi-1 superfamily 0 0 0 2 0 Acrorhagin2 0 1 1 0 0 flavin monoamine oxidase family 0 0 0 0 0 AVIT (prokineticin) family 0 0 0 0 0

Appendices 229

psalmotoxin-1 family 0 0 0 1 0 huwentoxin-2 family 0 0 0 1 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family Total 79 98 74 66 99

230 Appendices

TTL gene family Anthopleura dowii Megalactis griffithsi phospholipase A2 family 6 5 sea anemone type 3 (BDS-LIKE) potassium channel toxin family 11 14 venom Kunitz-type family 9 5 sea anemone 8 toxin family 6 2 sea anemone type 1 potassium channel toxin family 9 1 peptidase M12A family 1 0 Unclassed 2 0 multicopper oxidase family 1 0 snaclec family 0 5 sea anemone sodium channel inhibitory toxin family 2 0 type-B carboxylesterase/lipase family 2 0 actinoporin family 0 1 ficolin lectin family 3 0 true venom lectin family 0 0 sea anemone type 5 potassium channel toxin family 2 0 Cnidaria small cysteine-rich protein (SCRiP) family 1 2 conopeptide P-like superfamily 0 1 natterin family 0 1 Acrorhagin1 1 0 venom complement C3 homolog family 1 0 jellyfish toxin family 0 0 sea anemone structural class 9a family 0 1 AB hydrolase superfamily 1 0 huwentoxin-1 family 0 0 peptidase M13 family 0 0 phospholipase B-like family 0 3 CRISP family 0 0 cystatin family 0 0 sea anemone short toxin (NaTx type III) family 0 0 CREC family 1 1 peptidase S1 family 0 0 5'-nucleotidase family 1 0 venom metalloproteinase (M12B) family 0 0 magi-1 superfamily 0 0 Acrorhagin2 0 0 flavin monoamine oxidase family 0 0 AVIT (prokineticin) family 0 0 psalmotoxin-1 family 0 0 huwentoxin-2 family 0 0 nucleotide pyrophosphatase/phosphodiesterase family 0 0 Total 60 42

Appendices 231

Exaiptas Aiptasia Calliactis Nemanthus Telmatactis TTL gene family ia diaphana polypus annamensis sp. pallida phospholipase A2 family 9 12 9 11 11 sea anemone type 3 (BDS-LIKE) 0 0 0 0 0 potassium channel toxin family venom Kunitz-type family 3 5 11 8 5 sea anemone 8 toxin family 4 8 6 12 5 sea anemone type 1 potassium channel 0 4 0 0 0 toxin family peptidase M12A family 10 4 2 3 2 Unclassed 8 2 2 1 3 multicopper oxidase family 7 2 1 4 5 snaclec family 4 0 4 1 2 sea anemone sodium channel inhibitory 2 1 3 0 1 toxin family type-B carboxylesterase/lipase family 4 0 3 0 0 actinoporin family 2 0 0 0 1 ficolin lectin family 1 1 1 1 0 true venom lectin family 2 2 1 1 2 sea anemone type 5 potassium channel 0 2 0 1 0 toxin family Cnidaria small cysteine-rich protein 0 0 0 0 0 (SCRiP) family conopeptide P-like superfamily 1 0 0 0 1 natterin family 0 0 0 0 0 Acrorhagin1 0 0 0 0 0 venom complement C3 homolog family 1 0 1 0 0 jellyfish toxin family 3 0 1 0 0 sea anemone structural class 9a family 0 0 2 0 0 AB hydrolase superfamily 0 0 1 0 0 huwentoxin-1 family 0 0 0 0 0 peptidase M13 family 0 0 1 2 1 phospholipase B-like family 0 0 0 1 1 CRISP family 0 1 0 0 0 cystatin family 0 0 0 0 0 sea anemone short toxin (NaTx type III) 0 0 0 0 0 family CREC family 0 0 0 0 0 peptidase S1 family 0 0 0 0 0 5'-nucleotidase family 0 1 0 0 0 venom metalloproteinase (M12B) family 0 0 0 1 0 magi-1 superfamily 0 0 0 0 0 Acrorhagin2 0 0 0 0 0 flavin monoamine oxidase family 1 0 0 0 0 AVIT (prokineticin) family 0 1 0 0 0 psalmotoxin-1 family 0 0 0 0 0 huwentoxin-2 family 0 0 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 0 0 family Total 62 46 49 47 40

232 Appendices

TTL gene family Edwardsiella carnea Nematostella vectensis phospholipase A2 family 13 12 sea anemone type 3 (BDS-LIKE) potassium channel toxin family 0 0 venom Kunitz-type family 7 4 sea anemone 8 toxin family 3 4 sea anemone type 1 potassium channel toxin family 1 0 peptidase M12A family 4 7 Unclassed 7 6 multicopper oxidase family 3 2 snaclec family 3 0 sea anemone sodium channel inhibitory toxin family 0 4 type-B carboxylesterase/lipase family 1 1 actinoporin family 0 0 ficolin lectin family 1 2 true venom lectin family 0 0 sea anemone type 5 potassium channel toxin family 0 1 Cnidaria small cysteine-rich protein (SCRiP) family 0 0 conopeptide P-like superfamily 3 1 natterin family 0 0 Acrorhagin1 0 0 venom complement C3 homolog family 0 0 jellyfish toxin family 0 0 sea anemone structural class 9a family 0 0 AB hydrolase superfamily 0 1 huwentoxin-1 family 0 0 peptidase M13 family 0 0 phospholipase B-like family 0 0 CRISP family 2 0 cystatin family 0 0 sea anemone short toxin (NaTx type III) family 0 0 CREC family 0 0 peptidase S1 family 1 0 5'-nucleotidase family 0 0 venom metalloproteinase (M12B) family 1 0 magi-1 superfamily 0 0 Acrorhagin2 0 0 flavin monoamine oxidase family 0 0 AVIT (prokineticin) family 0 0 psalmotoxin-1 family 0 0 huwentoxin-2 family 0 0 nucleotide pyrophosphatase/phosphodiesterase family 0 0 Total 50 45

Appendices 233

Actinia tenebrosa Actinia tenebrosa Actinia tenebrosa TTL gene family (red) (green) (blue) phospholipase A2 family 8 9 8 sea anemone type 3 (BDS-LIKE) potassium 21 20 15 channel toxin family venom Kunitz-type family 15 12 12 sea anemone 8 toxin family 6 6 5 sea anemone type 1 potassium channel toxin family 5 4 4 peptidase M12A family 3 1 2 Unclassed 0 0 0 multicopper oxidase family 3 7 7 snaclec family 2 5 3 sea anemone sodium channel inhibitory toxin 4 4 2 family type-B carboxylesterase/lipase family 3 4 3 actinoporin family 4 6 2 ficolin lectin family 3 1 1 true venom lectin family 1 2 1 sea anemone type 5 potassium channel toxin family 3 2 3 Cnidaria small cysteine-rich protein (SCRiP) 1 1 1 family conopeptide P-like superfamily 0 0 0 natterin family 2 2 2 Acrorhagin1 2 3 3 venom complement C3 homolog family 2 0 1 jellyfish toxin family 0 1 0 sea anemone structural class 9a family 0 0 0 AB hydrolase superfamily 0 0 0 huwentoxin-1 family 0 0 0 peptidase M13 family 0 0 0 phospholipase B-like family 0 0 0 CRISP family 0 0 0 cystatin family 1 1 1 sea anemone short toxin (NaTx type III) family 0 0 0 CREC family 0 0 0 peptidase S1 family 0 0 0 5'-nucleotidase family 0 0 0 venom metalloproteinase (M12B) family 0 2 0 magi-1 superfamily 0 0 0 Acrorhagin2 1 1 1 flavin monoamine oxidase family 0 0 0 AVIT (prokineticin) family 0 0 0 psalmotoxin-1 family 0 0 0 huwentoxin-2 family 0 0 0 nucleotide pyrophosphatase/phosphodiesterase 0 0 0 family Total 90 94 77

234 Appendices

Supplementary Table 9. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of A. tenebrosa ecotypes (red, brown, blue and green)

A. tenebrosa (ecotype) Assembly statistics

Red Brown Blue Green

Number of reads 152,136,760 201,995,450 175,687,690 179,309,262

Number of transcripts 241,041 243,560 116,930 199,049

N10 3,875 4,609 5,192 4,517

N30 1,992 2,345 2,889 2,325

N50 1,084 1,237 1,804 1,264

Average contig length 709.02 736.07 972.89 751.70

BUSCO (% Complete 96.3 96.9 97.3 96.9 BUSCOs)

Appendices 235

Supplementary Table 10. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of Anthopleura buddemeieri, Aulactinia veratra, Calliactis polypus, Nemanthus annamensis and Telmatactis sp.

Species Assembly statistics

A. buddemeieri A. veratra C. polypus N. annamensis Telmatactis sp.

Number of reads 51,620,970 69,513,111 209,875,116 205,911,634 79,893,721

Number of transcripts 332,206 264,252 225,965 116,120 190,166

N10 2,997 3,836 5,025 5,354 3,220

N30 1,552 1,928 2,708 2,933 1,551

N50 851 993 1,520 1,824 806

Average contig length 622.43 663.07 810.49 955.67 604.39

BUSCO (% Complete 95 96.9 96.7 97.3 83.4 BUSCOs)

236 Appendices

Supplementary Table 11. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of Megalactis griffithsi, Anemonia sulcata, Stichodactyla haddoni, Anthopleura dowii, Aiptasia diaphana, Edwardsiella carnea, and E. pallida

Assembly Species statistics A. diaphana E. carnea E. pallida A. sulcata M. grifiths S. haddoni A. dowii

NCBI PRJNA317913 PRJNA397639 PRJNA261862 PRJNA279590 PRJNA280517 PRJNA381121 PRJNA329297 Bioproject

Number of 198,322,994 94,701,666 35,210,718 169,653,911 111,517,610 14,580,416 41,592,719 reads

Number of 220,076 201,545 74,937 166,884 473,912 150,121 60,336 transcripts

N10 4,811 3,821 3,513 4,574 2,851 2,350 3,152

N30 2,255 1,800 1,873 2,052 1,400 1,243 1,812

N50 1,198 866 1,135 1,106 748 766 1,066

Average 715.48 601.33 717.58 712.64 544.98 582.84 678.44 contig length

BUSCO (% Complete 96.2 97.5 86.7 94.0 92.0 81.4 83.2 BUSCOs)

Appendices 237

Supplementary Table 12. Transcriptome assembly statistics. Assembly statistics from the Trinity de novo assembly of RNA-seq reference transcriptomes of Actinia tenebrosa across morphological structures (acrorhagi, mesenteric filaments and tentacles) and ontogeny (1, 3, 6 and 9 mm).

Species Assembly statistics A. tenebrosa N. vectensis

RNA-seq Morphological-novelty Ontogeny Morphological-novelty Development

NCBI Bioproject PRJNA350366 PRJNA350366 PRJEB13676 PRJNA200689

Number of reads 818,138,236 285,473,862 364,726,242 473,946,460

Number of transcripts 221,845 149,582 100,423 137,989

N10 4,450 4,412 3,601 4,904

N30 2,267 2,328 2,134 2,507

N50 1,256 1,332 1,339 1,379

Average contig length 775.14 786.64 790.71 791.47

BUSCO (% Complete 97.8 97.2 92.5 96.2 BUSCOs)

238 Appendices

Supplementary Table 13. TTL gene validation. Primers used for Sanger sequence validation of TTL genes identified in actiniarian transcriptomes.

Primer Primer sequence Primer sequence Product size name (forward) (reverse) (bp)

TGAGTCTGTGATTCTTGACTTCTT Acacro1div AGACCAAGAGTAGAGCGGGA 252 G

Acacro1a TGGTAAGCAGCGTCCTTTGA GTTTGCACACCGACGAACAA 219

AtTR12503 AAGTGCAGGCTACATTACACTA CGGACTCAAGGTCGAAACAC 228

AtTR31583 ACTACACAGTTCGTTTCACGTT GTTAAGACAGCGGCCGAGAT 312

AbTR110002 AATGGCTGTACAAGCAAGGT GTCCAAACGTCCAAGCATCA 258

AvTR48111 AGCCCATTGGTTTAAAGCGT GCACACTCGAAGCGAAACTG 216

AvTR96049 CTGGAAATTGGAAATGCGACCA TGCAGAGTTCAACTTACACGC 255

NaTR2030 TTCATGCCGGTCGTTGATTG TCGAGTTATCTGACGCGGTG 258

NaTR11115 TGTCCAGTGCTGTAGAATGA CCTCAACCATGTTCTAAGCCA 171

NaTR24513 ATATGCTTCACGCCAGACAA CAAATTCAACAGCCATCCACC 351

NaTR25370 AGCCTCAAGACAATTTCGCA TCAGTGGGAGGGAAGTTGAT 252

TsTR8763 CCCTAATCGCGGTACAAACA CTGAACTAAGTCACCTCAGCCA 255

Appendices 239

Supplementary Table 14. Validation of TTL genes. Sanger sequencing alignments with TTL transcripts from Trinity de novo transcriptome assembly of actiniarian species.

Primer NCBI Similarity Species Gene family name accession (%)

Acacro1div KY176770 Acrorhagin 1 100

Acacro1a KY176769 Acrorhagin 1a 100

Actinia tenebrosa Sea anemone type 3 (BDS-LIKE) AtTR12503 KY176768 100 potassium channel toxin

Cnidaria small cysteine-rich protein AtTR31583 KY176766 99.7 (SCRiP)

Anthopleura Sea anemone type 1 potassium channel AbTR110002 KY176771 100 buddemeieri toxin

Sea anemone sodium channel inhibitory AvTR96049 KY176764 100 toxin Aulactinia veratra

AvTR48111 KY176765 Venom Kunitz-type 100

NaTR2030 KY176763 Sea anemone 8 toxin 100

NaTR11115 KY176762 Venom Kunitz-type 100 Nemanthus annamensis NaTR24513 KY176761 Sea anemone structural class 9a 97.9

Sea anemone sodium channel inhibitory NaTR25370 KY176760 100 toxin

Sea anemone type 5 potassium channel Telmatactis sp. TsTR8763 KY176759 98.8 toxin

240 Appendices

Supplementary Table 15. Selective pressures of TTL genes using phylogenetic analysis by Maximum Likelihood (PAML). Selection analysis determined using CODEML program within PAML of TTL genes found in actiniarian transcriptomes.

Gene family Model lnL m0-m3 m1-m2 m7-m8 M3 M2 M8 dN/dS ts/tv Sites. 0 -1261.645474 ------0.3315 1.55891 - 1 -1211.920222 ------0.6309 1.77478 - 2 -1208.759764 - 6.320916 - - 0.042406 - 0.9943 1.90522 0 Acrorhagin 1 3 -1203.941558 115.4078 - - 0 - - 0.5851 1.64775 - 7 -1206.335795 ------0.4606 1.59477 - 8 -1204.242356 - - 4.186878 - - 0.123263 0.6751 1.68899 0 0 -9245.507647 ------0.1971 1.301 - 1 -9092.070917 ------0.4142 1.43717 - 2 -9092.070917 - 0 - - 1 - 0.4142 1.43717 0 multicopper oxidase family 3 -8983.368892 524.2775 - - 0 - - 0.2325 1.34399 - 7 -8976.0182 ------0.2322 1.34475 - 8 -8973.171305 - - 5.69379 - - 0.058024 0.238 1.35428 0 0 -3502.327257 ------0.31789 1.69385 - 1 -3403.127476 ------0.4254 1.82349 - 2 -3403.127485 - -1.8E-05 - - 1 - 0.4255 1.82349 0 natterin family 3 -3402.916217 198.8221 - - 0 - - 0.4042 1.80459 - 7 -3399.870589 ------0.3636 1.76482 - 8 -3395.383252 - - 8.974674 - - 0.011251 0.4303 1.83503 0 0 -11428.87377 ------0.0851 1.51557 - peptidase M12A family 1 -11369.73786 ------0.1462 1.59318 - 2 -11369.73786 - 0 - - 1 - 0.1462 1.59318 0

Appendices 241

3 -11057.385 742.9775 - - 0 - - 0.0981 1.53808 - 7 -11049.78664 ------0.1022 1.54202 - 8 -11048.39546 - - 2.782348 - - 0.248783 0.1031 1.5508 0 0 -2980.419318 ------0.12184 1.87888 - 1 -2958.931261 ------0.2823 2.04341 - 2 -2958.931261 - 0 - - 1 - 0.2823 2.04341 0 peptidase M13 family 3 -2949.745319 61.348 - - 1.51E-12 - - 0.2472 2.05133 - 7 -2950.982194 ------0.1782 2.04495 - 8 -2950.608174 - - 0.74804 - - 0.687963 0.2374 2.05836 0 0 -18330.06649 ------0.14934 1.42974 - 1 -17994.06917 ------0.4831 1.82123 - 2 -18031.07823 - -74.0181 - - 1 - 0.6714 1.91487 1 NS phospholipase A2 family 3 -17479.94601 1700.241 - - 0 - - 0.1939 1.57654 - 7 -17423.20473 ------0.1886 1.60704 - 8 -17423.20576 - - -0.00206 - - 1 0.1886 1.60704 0 0 -1626.058122 ------0.01035 1.36113 - 1 -1602.2908 ------0.3697 1.25537 - 2 -1602.2908 - 0 - - 1 - 0.3697 1.25537 0 phospholipase B-like family 3 -1597.973163 56.16992 - - 1.85E-11 - - 0.071 1.45783 - 7 -1601.215139 ------0.0212 1.42253 - 8 -1598.438759 - - 5.55276 - - 0.062263 0.3461 1.32112 0 0 -8520.170541 ------0.27702 1.53254 - 1 -8171.312561 ------0.4189 1.6469 - 2 -8148.406833 - 45.81146 - - 1.13E-10 - 0.6424 1.81932 3 sea anemone 8 toxin family 3 -8034.875568 970.5899 - - 0 - - 0.3613 1.56692 - 7 -8009.391454 ------0.3554 1.53913 - 8 -7991.16408 - - 36.45475 - - 1.21E-08 0.4129 1.61333 2

242 Appendices

0 -396.821599 ------0.6819 1.87968 - 1 -394.799392 ------0.4786 1.6701 - 2 -393.533523 - 2.531738 - - 0.281994 - 0.9303 1.84381 0 sea anemone short toxin (NaTx type III) family 3 -393.533523 6.576152 - - 0.160055 - - 0.9303 1.84381 - 7 -394.808089 ------0.5 1.69053 - 8 -393.533568 - - 2.549042 - - 0.279565 0.9304 1.84364 0 0 -2833.091341 ------0.38468 1.75349 - 1 -2755.290494 ------0.7302 1.79468 - 2 -2752.05356 - 6.473868 - - 0.039284 - 0.8375 1.88998 1 sea anemone sodium channel inhibitory toxin family 3 -2725.831878 214.5189 - - 0 - - 0.4667 1.64767 - 7 -2726.601527 ------0.4613 1.64991 - 8 -2725.675049 - - 1.852956 - - 0.395946 0.5201 1.72066 0 0 -3247.589501 ------0.29874 1.43304 1 -3126.413109 ------0.7556 1.84272 2 -3134.60313 - -16.38 - - 1 - 0.9668 1.86361 3 NS sea anemone type 1 potassium channel toxin family 3 -3077.663678 339.8516 - - 0 - - 0.3874 1.53476 - 7 -3075.712766 ------0.381 1.54529 - 8 -3075.547229 - - 0.331074 - - 0.847439 0.3972 1.55777 0 0 -7210.5691 ------0.44016 1.39703 - 1 -6656.382488 ------0.643 1.39919 - 2 -6576.703442 - 159.3581 - - 0 - 1.1379 1.61227 9 sea anemone type 3 (BDS-LIKE) potassium channel toxin family 3 -6543.708686 1333.721 - - 0 - - 0.649 1.32289 - 7 -6558.941209 ------0.1239 1.20219 - 8 -6489.442742 - - 138.9969 - - 0 0.6587 1.369 8 0 -1315.640053 ------0.15135 1.24827 - sea anemone type 5 potassium channel toxin family 1 -1314.581439 ------0.21 1.30913 - 2 -1314.581439 - 0 - - 1 - 0.21 1.30913 0

Appendices 243

3 -1300.759633 29.76084 - - 5.47E-06 - - 0.1718 1.35515 - 7 -1302.078761 ------0.1756 1.37197 - 8 -1302.079089 - - -0.00066 - - 1 0.1757 1.37197 0 0 -3846.618618 ------0.23437 1.19614 - 1 -3792.349097 ------0.5428 1.37862 - 2 -3792.349097 - 0 - - 1 - 0.5428 1.37862 1 NS snaclec family 3 -3740.929304 211.3786 - - 0 - - 0.2812 1.20107 - 7 -3740.4189 ------0.2912 1.20396 - 8 -3732.725034 - - 15.38773 - - 0.000456 0.9803 1.22478 1 0 -2100.372971 ------0.2416 1.60009 - 1 -2094.15444 ------0.3598 1.71657 - 2 -2094.15444 - 0 - - 1 - 0.3598 1.71657 0 true venom lectin family 3 -2083.363476 34.01899 - - 7.39E-07 - - 0.3295 1.77061 - 7 -2088.11331 ------0.3107 1.7308 - 8 -2083.708899 - - 8.808822 - - 0.012223 5.3753 1.78797 0 0 -18088.38431 ------0.17243 1.51179 - 1 -17943.51094 ------0.36 1.75893 - 2 -17943.51094 - 0 - - 1 - 0.36 1.75893 0 type-B carboxylesterase/lipase family 3 -17739.12018 698.5283 - - 0 - - 0.1965 1.60945 - 7 -17737.11698 ------0.2026 1.64381 - 8 -17734.86386 - - 4.506224 - - 0.105072 0.2126 1.65576 0 0 -18957.05833 ------0.1201 1.64932 - 1 -18786.99225 ------0.2345 1.87904 - 2 -18786.99225 - 0 - - 1 - 0.2345 1.87904 0 venom complement C3 homolog family 3 -18682.10314 549.9104 - - 0 - - 0.1498 1.78955 - 7 -18685.29754 ------0.1463 1.78811 - 8 -18682.17162 - - 6.251852 - - 0.043896 0.1895 1.80801 0

244 Appendices

0 -10404.02413 ------0.1912 1.40923 - 1 -10026.70758 ------0.6278 1.80942 - 2 -10026.70758 - 0 - - 1 - 0.6278 1.80942 0 venom Kunitz-type family 3 -9753.602287 1300.844 - - 0 - - 0.2426 1.48727 - 7 -9769.520634 ------0.2286 1.49912 - 8 -9769.521226 - - -0.00118 - - 1 0.2286 1.49912 0 0 -1392.53087 ------0.28069 1.31199 - 1 -1332.477811 ------0.5328 1.35214 - 2 -1328.765322 - 7.424978 - - 0.024417 - 1.1447 1.4545 1 Cnidaria small cysteine-rich protein (SCRiP) family 3 -1318.667905 147.7259 - - 0 - - 0.4908 1.31344 - 7 -1322.359489 ------0.3648 1.2068 - 8 -1318.90392 - - 6.911138 - - 0.031569 0.7766 1.27255 2 0 -1857.035137 ------0.1917 1.28996 - 1 -1825.065193 ------0.7139 1.53995 - 2 -1825.065193 - 0 - - 1 - 0.7139 1.53995 0 conopeptide P-like superfamily 3 -1803.519626 107.031 - - 0 - - 0.238 1.24284 - 7 -1805.65851 ------0.2897 1.21455 - 8 -1805.658548 - - -7.6E-05 - - 1 0.2897 1.21455 0 0 -5711.629038 ------0.1256 1.6782 - 1 -5594.521117 ------0.2512 1.98066 - 2 -5594.521117 - 0 - - 1 - 0.2512 1.98066 0 AB hydrolase superfamily 3 -5571.321226 280.6156 - - 0 - - 0.1708 1.8656 - 7 -5573.028837 ------0.1644 1.85093 - 8 -5571.856655 - - 2.344364 - - 0.30969 0.1769 1.87055 0 0 -6234.549272 ------0.3344 1.8261 - actinoporin family 1 -6117.893683 ------0.5631 2.05527 - 2 -6117.893683 - 0 - - 1 - 0.5631 2.05527 1 NS

Appendices 245

3 -6086.511618 296.0753 - - 0 - - 0.4244 1.98443 - 7 -6087.038635 ------0.4005 1.95477 - 8 -6081.542354 - - 10.99256 - - 0.004102 0.4933 2.00911 2 0 -2215.158811 ------0.16015 0.98532 - 1 -2194.178025 ------0.7664 1.84218 - 2 -2194.178025 - 0 - - 1 - 0.7664 1.84218 0 CREC family 3 -2190.14444 50.02874 - - 3.56E-10 - - 0.2506 1.34082 - 7 -2191.819913 ------0.2557 1.32488 - 8 -2189.583256 - - 4.473314 - - 0.106815 0.3763 1.25989 2 NS 0 -2222.120136 ------0.03144 1.58571 - 1 -2203.708679 ------0.2088 1.58005 - 2 -2203.708679 - 0 - - 1 - 0.2088 1.58008 0 CRISP family 3 -2192.479218 59.28184 - - 4.11E-12 - - 0.2864 1.78585 - 7 -2196.205152 ------0.0434 1.9123 - 8 -2192.889426 - - 6.631452 - - 0.036308 0.2438 1.88604 0 0 -996.727316 ------0.57461 2.04625 - 1 -987.644894 ------0.5186 1.99435 - 2 -985.447093 - 4.395602 - - 0.111047 - 0.9122 2.3036 0 cystatin family 3 -985.447093 22.56045 - - 0.000155 - - 0.9122 2.3036 - 7 -987.679207 ------0.5 1.98011 - 8 -985.447609 - - 4.463196 - - 0.107357 0.9123 2.30367 0 0 -3878.26466 ------0.1389 1.34931 - 1 -3837.309161 ------0.2819 1.57864 - 2 -3837.309161 - 0 - - 1 - 0.2819 1.57864 0 ficolin lectin family 3 -3809.241005 138.0473 - - 0 - - 3.6901 1.51349 - 7 -3835.232625 ------0.2007 1.45694 - 8 -3809.868805 - - 50.72764 - - 9.65E-12 3.3567 1.55067 8

246 Appendices

0 -857.358654 ------0.54942 1.16848 - 1 -847.005492 ------0.4815 1.18799 - 2 -843.861878 - 6.287228 - - 0.043127 - 1.0258 1.24451 1 huwentoxin-1 family 3 -843.287233 28.14284 - - 1.17E-05 - - 1.0411 1.24126 - 7 -849.521547 ------0.5744 1.13382 - 8 -843.573273 - - 11.89655 - - 0.00261 1.0384 1.26487 4 0 -6576.734089 ------0.02913 1.51725 - 1 -6541.411331 ------0.3314 1.84621 - 2 -6541.411331 - 0 - - 1 - 0.3314 1.84621 0 jellyfish toxin family 3 -6540.134955 73.19827 - - 4.77E-15 - - 0.1832 1.87963 - 7 -6546.84453 ------0.0396 1.67228 - 8 -6540.785589 - - 12.11788 - - 0.002337 0.8034 1.82571 0

Appendices 247

Supplementary Table 16. Gene set enrichment analysis (GSEA). REVIGO output from GSEA of differentially expressed transcripts across morphological structures in Actinia tenebrosa.

Acrorhagi Term ID Description GO GO:0007155 cell adhesion BP GO:0016486 peptide hormone processing BP GO:0035094 response to nicotine BP GO:0050896 response to stimulus BP GO:0060037 pharyngeal system development BP GO:0061738 late endosomal microautophagy BP GO:0007154 cell communication BP GO:0008218 bioluminescence BP GO:0016999 antibiotic metabolic process BP GO:0014719 skeletal muscle satellite cell activation BP GO:0015813 L-glutamate transport BP GO:0051261 protein depolymerization BP GO:0009698 phenylpropanoid metabolic process BP GO:0001921 positive regulation of receptor recycling BP GO:0070212 protein poly-ADP-ribosylation BP GO:0043490 malate-aspartate shuttle BP GO:0032510 endosome to lysosome transport via multivesicular body sorting pathway BP GO:0042982 amyloid precursor protein metabolic process BP GO:0051302 regulation of cell division BP GO:0007017 microtubule-based process BP GO:0002426 immunoglobulin production in mucosal tissue BP GO:0006354 DNA-templated transcription, elongation BP GO:0070383 DNA cytosine deamination BP GO:0006507 GPI anchor release BP GO:0006508 proteolysis BP GO:0010814 substance P catabolic process BP GO:0006811 ion transport BP GO:0001778 plasma membrane repair BP GO:0090611 ubiquitin-independent protein catabolic process via the multivesicular body sorting pathway BP GO:0070857 regulation of bile acid biosynthetic process BP GO:0016055 Wnt signaling pathway BP GO:0009607 response to biotic stimulus BP GO:0009451 RNA modification BP GO:0006581 acetylcholine catabolic process BP GO:0006833 water transport BP GO:0006828 manganese ion transport BP GO:0052331 hemolysis in other organism involved in symbiotic interaction BP GO:1990697 protein depalmitoleylation BP GO:0006997 nucleus organization BP

248 Appendices

GO:0030198 extracellular matrix organization BP GO:0030301 cholesterol transport BP GO:1903724 positive regulation of centriole elongation BP GO:0030104 water homeostasis BP GO:0070121 Kupffer's vesicle development BP GO:0021688 cerebellar molecular layer formation BP GO:0034394 protein localization to cell surface BP GO:0003014 renal system process BP GO:0030574 collagen catabolic process BP GO:0051666 actin cortical patch localization BP GO:0015695 organic cation transport BP GO:0060066 oviduct development BP GO:0031668 cellular response to extracellular stimulus BP GO:0042391 regulation of membrane potential BP GO:0060972 left/right pattern formation BP GO:0006548 histidine catabolic process BP GO:0045596 negative regulation of cell differentiation BP GO:0010634 positive regulation of epithelial cell migration BP GO:1902216 positive regulation of interleukin-4-mediated signaling pathway BP GO:0019752 carboxylic acid metabolic process BP GO:0042532 negative regulation of tyrosine phosphorylation of STAT protein BP GO:0002882 positive regulation of chronic inflammatory response to non-antigenic stimulus BP GO:2001150 positive regulation of dipeptide transmembrane transport BP GO:0015881 creatine transport BP GO:0031104 dendrite regeneration BP GO:1903542 negative regulation of exosomal secretion BP GO:0060853 Notch signaling pathway involved in arterial endothelial cell fate commitment BP GO:0010847 regulation of chromatin assembly BP GO:0043583 ear development BP GO:0050957 equilibrioception BP GO:0046931 pore complex assembly BP GO:0019827 stem cell population maintenance BP GO:0072488 ammonium transmembrane transport BP GO:0007338 single fertilization BP GO:0048630 skeletal muscle tissue growth BP GO:0002076 osteoblast development BP GO:0015812 gamma-aminobutyric acid transport BP GO:0046662 regulation of oviposition BP GO:0019228 neuronal action potential BP GO:0000920 cell separation after cytokinesis BP GO:0030048 actin filament-based movement BP GO:0014002 astrocyte development BP GO:0015809 arginine transport BP GO:0008284 positive regulation of cell proliferation BP GO:1900746 regulation of vascular endothelial growth factor signaling pathway BP GO:0048793 pronephros development BP

Appendices 249

GO:0044267 cellular protein metabolic process BP GO:0006471 protein ADP-ribosylation BP GO:0032720 negative regulation of tumor necrosis factor production BP GO:0002315 marginal zone B cell differentiation BP GO:0010815 bradykinin catabolic process BP GO:0006813 potassium ion transport BP GO:0015696 ammonium transport BP GO:0071873 response to norepinephrine BP GO:0033993 response to lipid BP GO:0000187 activation of MAPK activity BP GO:0003417 growth plate cartilage development BP GO:0001501 skeletal system development BP GO:0039702 viral budding via host ESCRT complex BP GO:1903672 positive regulation of sprouting angiogenesis BP GO:0007601 visual perception BP GO:0051967 negative regulation of synaptic transmission, glutamatergic BP GO:0046851 negative regulation of bone remodeling BP GO:0042733 embryonic digit morphogenesis BP GO:0001955 blood vessel maturation BP GO:0006939 smooth muscle contraction BP GO:0010816 calcitonin catabolic process BP GO:0034959 endothelin maturation BP GO:0060336 negative regulation of interferon-gamma-mediated signaling pathway BP GO:0035907 dorsal aorta development BP GO:0051480 regulation of cytosolic calcium ion concentration BP GO:0019229 regulation of vasoconstriction BP GO:0034220 ion transmembrane transport BP GO:0036258 multivesicular body assembly BP GO:0032355 response to estradiol BP GO:0003146 heart jogging BP GO:0050776 regulation of immune response BP GO:0045165 cell fate commitment BP GO:0001709 cell fate determination BP GO:0014807 regulation of somitogenesis BP GO:0042698 ovulation cycle BP GO:0007041 lysosomal transport BP GO:0045807 positive regulation of endocytosis BP GO:0038166 angiotensin-activated signaling pathway BP GO:0007219 Notch signaling pathway BP GO:0019233 sensory perception of pain BP GO:0045212 neurotransmitter receptor biosynthetic process BP GO:0051965 positive regulation of synapse assembly BP GO:0035659 Wnt signaling pathway involved in wound healing, spreading of epidermal cells BP GO:0042447 hormone catabolic process BP GO:0070634 transepithelial ammonium transport BP GO:0045167 asymmetric protein localization involved in cell fate determination BP

250 Appendices

GO:0050890 cognition BP GO:0032223 negative regulation of synaptic transmission, cholinergic BP GO:0005576 extracellular region CC GO:0005578 proteinaceous extracellular matrix CC GO:0005581 collagen trimer CC GO:0005623 cell CC GO:0016021 integral component of membrane CC GO:0030054 cell junction CC GO:0042151 nematocyst CC GO:0044218 other organism cell membrane CC GO:0032433 filopodium tip CC GO:0009986 cell surface CC GO:0043204 perikaryon CC GO:0033093 Weibel-Palade body CC GO:0048471 perinuclear region of cytoplasm CC GO:0005887 integral component of plasma membrane CC GO:0033593 BRCA2-MAGE-D1 complex CC GO:0005594 collagen type IX trimer CC GO:0005641 nuclear envelope lumen CC GO:0016459 myosin complex CC GO:0043083 synaptic cleft CC GO:0032983 kainate selective glutamate receptor complex CC GO:0032579 apical lamina of hyaline layer CC GO:0030658 transport vesicle membrane CC GO:0005886 plasma membrane CC GO:0031302 intrinsic component of endosome membrane CC GO:0005871 kinesin complex CC GO:0009897 external side of plasma membrane CC GO:0005892 acetylcholine-gated channel complex CC GO:0045211 postsynaptic membrane CC GO:0032839 dendrite cytoplasm CC GO:0005769 early endosome CC GO:0042589 zymogen granule membrane CC GO:0031012 extracellular matrix CC GO:0003824 catalytic activity MF GO:0004222 metalloendopeptidase activity MF GO:0005201 extracellular matrix structural constituent MF GO:0005314 high-affinity glutamate transmembrane transporter activity MF GO:0005509 calcium ion binding MF GO:0008200 ion channel inhibitor activity MF GO:0090729 toxin activity MF GO:0004503 monophenol monooxygenase activity MF GO:0046975 histone methyltransferase activity (H3-K36 specific) MF GO:0004397 histidine ammonia-lyase activity MF GO:0004088 carbamoyl-phosphate synthase (glutamine-hydrolyzing) activity MF GO:1990275 preribosome binding MF

Appendices 251

GO:0001540 beta-amyloid binding MF GO:0015643 toxic substance binding MF GO:0042803 protein homodimerization activity MF GO:0008109 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0043531 ADP binding MF GO:0004104 cholinesterase activity MF GO:0008810 cellulase activity MF GO:0004151 dihydroorotase activity MF GO:0070403 NAD+ binding MF GO:0008201 heparin binding MF GO:0042166 acetylcholine binding MF GO:0003777 microtubule motor activity MF GO:0017171 serine hydrolase activity MF GO:0047987 hydroperoxide dehydratase activity MF GO:0030898 actin-dependent ATPase activity MF GO:0008233 peptidase activity MF GO:0004837 tyrosine decarboxylase activity MF GO:0031013 troponin I binding MF GO:0016207 4-coumarate-CoA ligase activity MF GO:0061676 importin-alpha family protein binding MF GO:0032794 GTPase activating protein binding MF GO:0005112 Notch binding MF GO:0004070 aspartate carbamoyltransferase activity MF GO:0022848 acetylcholine-gated cation-selective channel activity MF GO:0003950 NAD+ ADP-ribosyltransferase activity MF GO:0052689 carboxylic ester hydrolase activity MF GO:0005308 creatine transmembrane transporter activity MF GO:0047275 glucosaminylgalactosylglucosylceramide beta-galactosyltransferase activity MF GO:0001640 adenylate cyclase inhibiting G-protein coupled glutamate receptor activity MF GO:0008519 ammonium transmembrane transporter activity MF GO:1990699 palmitoleyl hydrolase activity MF GO:0019900 kinase binding MF GO:0004126 cytidine deaminase activity MF GO:0001594 trace-amine receptor activity MF GO:0003990 acetylcholinesterase activity MF GO:0004068 aspartate 1-decarboxylase activity MF GO:0015172 acidic amino acid transmembrane transporter activity MF GO:0042623 ATPase activity, coupled MF GO:0003829 beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0004867 serine-type endopeptidase inhibitor activity MF GO:0004725 protein tyrosine phosphatase activity MF GO:0047225 acetylgalactosaminyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0015181 arginine transmembrane transporter activity MF GO:0015250 water channel activity MF GO:0004623 phospholipase A2 activity MF GO:0004180 carboxypeptidase activity MF

252 Appendices

GO:0060002 plus-end directed microfilament motor activity MF GO:0015279 store-operated calcium channel activity MF GO:0015271 outward rectifier potassium channel activity MF GO:0005125 cytokine activity MF GO:0005109 frizzled binding MF GO:0000146 microfilament motor activity MF GO:0004888 transmembrane signaling receptor activity MF

Appendices 253

Mesenteric filaments Term ID Description GO GO:0007155 cell adhesion BP GO:0016486 peptide hormone processing BP GO:0035094 response to nicotine BP GO:0050896 response to stimulus BP GO:0060037 pharyngeal system development BP GO:0061738 late endosomal microautophagy BP GO:0007154 cell communication BP GO:0008218 bioluminescence BP GO:0016999 antibiotic metabolic process BP GO:0014719 skeletal muscle satellite cell activation BP GO:0015813 L-glutamate transport BP GO:0051261 protein depolymerization BP GO:0009698 phenylpropanoid metabolic process BP GO:0001921 positive regulation of receptor recycling BP GO:0070212 protein poly-ADP-ribosylation BP GO:0043490 malate-aspartate shuttle BP GO:0032510 endosome to lysosome transport via multivesicular body sorting pathway BP GO:0042982 amyloid precursor protein metabolic process BP GO:0051302 regulation of cell division BP GO:0007017 microtubule-based process BP GO:0002426 immunoglobulin production in mucosal tissue BP GO:0006354 DNA-templated transcription, elongation BP GO:0070383 DNA cytosine deamination BP GO:0006507 GPI anchor release BP GO:0006508 proteolysis BP GO:0010814 substance P catabolic process BP GO:0006811 ion transport BP GO:0001778 plasma membrane repair BP GO:0090611 ubiquitin-independent protein catabolic process via the multivesicular body sorting pathway BP GO:0070857 regulation of bile acid biosynthetic process BP GO:0016055 Wnt signaling pathway BP GO:0009607 response to biotic stimulus BP GO:0009451 RNA modification BP GO:0006581 acetylcholine catabolic process BP GO:0006833 water transport BP GO:0006828 manganese ion transport BP GO:0052331 hemolysis in other organism involved in symbiotic interaction BP GO:1990697 protein depalmitoleylation BP GO:0006997 nucleus organization BP GO:0030198 extracellular matrix organization BP GO:0030301 cholesterol transport BP GO:1903724 positive regulation of centriole elongation BP GO:0030104 water homeostasis BP GO:0070121 Kupffer's vesicle development BP

254 Appendices

GO:0021688 cerebellar molecular layer formation BP GO:0034394 protein localization to cell surface BP GO:0003014 renal system process BP GO:0030574 collagen catabolic process BP GO:0051666 actin cortical patch localization BP GO:0015695 organic cation transport BP GO:0060066 oviduct development BP GO:0031668 cellular response to extracellular stimulus BP GO:0042391 regulation of membrane potential BP GO:0060972 left/right pattern formation BP GO:0006548 histidine catabolic process BP GO:0045596 negative regulation of cell differentiation BP GO:0010634 positive regulation of epithelial cell migration BP GO:1902216 positive regulation of interleukin-4-mediated signaling pathway BP GO:0019752 carboxylic acid metabolic process BP GO:0042532 negative regulation of tyrosine phosphorylation of STAT protein BP GO:0002882 positive regulation of chronic inflammatory response to non-antigenic stimulus BP GO:2001150 positive regulation of dipeptide transmembrane transport BP GO:0015881 creatine transport BP GO:0031104 dendrite regeneration BP GO:1903542 negative regulation of exosomal secretion BP GO:0060853 Notch signaling pathway involved in arterial endothelial cell fate commitment BP GO:0010847 regulation of chromatin assembly BP GO:0043583 ear development BP GO:0050957 equilibrioception BP GO:0046931 pore complex assembly BP GO:0019827 stem cell population maintenance BP GO:0072488 ammonium transmembrane transport BP GO:0007338 single fertilization BP GO:0048630 skeletal muscle tissue growth BP GO:0002076 osteoblast development BP GO:0015812 gamma-aminobutyric acid transport BP GO:0046662 regulation of oviposition BP GO:0019228 neuronal action potential BP GO:0000920 cell separation after cytokinesis BP GO:0030048 actin filament-based movement BP GO:0014002 astrocyte development BP GO:0015809 arginine transport BP GO:0008284 positive regulation of cell proliferation BP GO:1900746 regulation of vascular endothelial growth factor signaling pathway BP GO:0048793 pronephros development BP GO:0044267 cellular protein metabolic process BP GO:0006471 protein ADP-ribosylation BP GO:0032720 negative regulation of tumor necrosis factor production BP GO:0002315 marginal zone B cell differentiation BP GO:0010815 bradykinin catabolic process BP

Appendices 255

GO:0006813 potassium ion transport BP GO:0015696 ammonium transport BP GO:0071873 response to norepinephrine BP GO:0033993 response to lipid BP GO:0000187 activation of MAPK activity BP GO:0003417 growth plate cartilage development BP GO:0001501 skeletal system development BP GO:0039702 viral budding via host ESCRT complex BP GO:1903672 positive regulation of sprouting angiogenesis BP GO:0007601 visual perception BP GO:0051967 negative regulation of synaptic transmission, glutamatergic BP GO:0046851 negative regulation of bone remodeling BP GO:0042733 embryonic digit morphogenesis BP GO:0001955 blood vessel maturation BP GO:0006939 smooth muscle contraction BP GO:0010816 calcitonin catabolic process BP GO:0034959 endothelin maturation BP GO:0060336 negative regulation of interferon-gamma-mediated signaling pathway BP GO:0035907 dorsal aorta development BP GO:0051480 regulation of cytosolic calcium ion concentration BP GO:0019229 regulation of vasoconstriction BP GO:0034220 ion transmembrane transport BP GO:0036258 multivesicular body assembly BP GO:0032355 response to estradiol BP GO:0003146 heart jogging BP GO:0050776 regulation of immune response BP GO:0045165 cell fate commitment BP GO:0001709 cell fate determination BP GO:0014807 regulation of somitogenesis BP GO:0042698 ovulation cycle BP GO:0007041 lysosomal transport BP GO:0045807 positive regulation of endocytosis BP GO:0038166 angiotensin-activated signaling pathway BP GO:0007219 Notch signaling pathway BP GO:0019233 sensory perception of pain BP GO:0045212 neurotransmitter receptor biosynthetic process BP GO:0051965 positive regulation of synapse assembly BP GO:0035659 Wnt signaling pathway involved in wound healing, spreading of epidermal cells BP GO:0042447 hormone catabolic process BP GO:0070634 transepithelial ammonium transport BP GO:0045167 asymmetric protein localization involved in cell fate determination BP GO:0050890 cognition BP GO:0032223 negative regulation of synaptic transmission, cholinergic BP GO:0005576 extracellular region CC GO:0005578 proteinaceous extracellular matrix CC GO:0005581 collagen trimer CC

256 Appendices

GO:0005623 cell CC GO:0016021 integral component of membrane CC GO:0030054 cell junction CC GO:0042151 nematocyst CC GO:0044218 other organism cell membrane CC GO:0032433 filopodium tip CC GO:0009986 cell surface CC GO:0043204 perikaryon CC GO:0033093 Weibel-Palade body CC GO:0048471 perinuclear region of cytoplasm CC GO:0005887 integral component of plasma membrane CC GO:0033593 BRCA2-MAGE-D1 complex CC GO:0005594 collagen type IX trimer CC GO:0005641 nuclear envelope lumen CC GO:0016459 myosin complex CC GO:0043083 synaptic cleft CC GO:0032983 kainate selective glutamate receptor complex CC GO:0032579 apical lamina of hyaline layer CC GO:0030658 transport vesicle membrane CC GO:0005886 plasma membrane CC GO:0031302 intrinsic component of endosome membrane CC GO:0005871 kinesin complex CC GO:0009897 external side of plasma membrane CC GO:0005892 acetylcholine-gated channel complex CC GO:0045211 postsynaptic membrane CC GO:0032839 dendrite cytoplasm CC GO:0005769 early endosome CC GO:0042589 zymogen granule membrane CC GO:0031012 extracellular matrix CC GO:0003824 catalytic activity MF GO:0004222 metalloendopeptidase activity MF GO:0005201 extracellular matrix structural constituent MF GO:0005314 high-affinity glutamate transmembrane transporter activity MF GO:0005509 calcium ion binding MF GO:0008200 ion channel inhibitor activity MF GO:0090729 toxin activity MF GO:0004503 monophenol monooxygenase activity MF GO:0046975 histone methyltransferase activity (H3-K36 specific) MF GO:0004397 histidine ammonia-lyase activity MF GO:0004088 carbamoyl-phosphate synthase (glutamine-hydrolyzing) activity MF GO:1990275 preribosome binding MF GO:0001540 beta-amyloid binding MF GO:0015643 toxic substance binding MF GO:0042803 protein homodimerization activity MF GO:0008109 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0043531 ADP binding MF

Appendices 257

GO:0004104 cholinesterase activity MF GO:0008810 cellulase activity MF GO:0004151 dihydroorotase activity MF GO:0070403 NAD+ binding MF GO:0008201 heparin binding MF GO:0042166 acetylcholine binding MF GO:0003777 microtubule motor activity MF GO:0017171 serine hydrolase activity MF GO:0047987 hydroperoxide dehydratase activity MF GO:0030898 actin-dependent ATPase activity MF GO:0008233 peptidase activity MF GO:0004837 tyrosine decarboxylase activity MF GO:0031013 troponin I binding MF GO:0016207 4-coumarate-CoA ligase activity MF GO:0061676 importin-alpha family protein binding MF GO:0032794 GTPase activating protein binding MF GO:0005112 Notch binding MF GO:0004070 aspartate carbamoyltransferase activity MF GO:0022848 acetylcholine-gated cation-selective channel activity MF GO:0003950 NAD+ ADP-ribosyltransferase activity MF GO:0052689 carboxylic ester hydrolase activity MF GO:0005308 creatine transmembrane transporter activity MF GO:0047275 glucosaminylgalactosylglucosylceramide beta-galactosyltransferase activity MF GO:0001640 adenylate cyclase inhibiting G-protein coupled glutamate receptor activity MF GO:0008519 ammonium transmembrane transporter activity MF GO:1990699 palmitoleyl hydrolase activity MF GO:0019900 kinase binding MF GO:0004126 cytidine deaminase activity MF GO:0001594 trace-amine receptor activity MF GO:0003990 acetylcholinesterase activity MF GO:0004068 aspartate 1-decarboxylase activity MF GO:0015172 acidic amino acid transmembrane transporter activity MF GO:0042623 ATPase activity, coupled MF GO:0003829 beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0004867 serine-type endopeptidase inhibitor activity MF GO:0004725 protein tyrosine phosphatase activity MF GO:0047225 acetylgalactosaminyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0015181 arginine transmembrane transporter activity MF GO:0015250 water channel activity MF GO:0004623 phospholipase A2 activity MF GO:0004180 carboxypeptidase activity MF GO:0060002 plus-end directed microfilament motor activity MF GO:0015279 store-operated calcium channel activity MF GO:0015271 outward rectifier potassium channel activity MF GO:0005125 cytokine activity MF GO:0005109 frizzled binding MF

258 Appendices

GO:0000146 microfilament motor activity MF GO:0004888 transmembrane signaling receptor activity MF

Appendices 259

Tentacles Term ID Description GO GO:0006813 potassium ion transport BP GO:0007155 cell adhesion BP GO:0007338 single fertilization BP GO:0007601 visual perception BP GO:0050896 response to stimulus BP GO:0050982 detection of mechanical stimulus BP GO:0051260 protein homooligomerization BP GO:1901666 positive regulation of NAD+ ADP-ribosyltransferase activity BP GO:0006104 succinyl-CoA metabolic process BP GO:0016999 antibiotic metabolic process BP GO:0070212 protein poly-ADP-ribosylation BP GO:0006103 2-oxoglutarate metabolic process BP GO:0007163 establishment or maintenance of cell polarity BP GO:0030048 actin filament-based movement BP GO:0051302 regulation of cell division BP GO:0006354 DNA-templated transcription, elongation BP GO:0007017 microtubule-based process BP GO:0046485 ether lipid metabolic process BP GO:0043490 malate-aspartate shuttle BP GO:0070383 DNA cytosine deamination BP GO:0060080 inhibitory postsynaptic potential BP GO:1990390 protein K33-linked ubiquitination BP GO:0006910 phagocytosis, recognition BP GO:0043403 skeletal muscle tissue regeneration BP GO:0044267 cellular protein metabolic process BP GO:0043627 response to estrogen BP GO:0046061 dATP catabolic process BP GO:0042532 negative regulation of tyrosine phosphorylation of STAT protein BP GO:0009607 response to biotic stimulus BP GO:0007214 gamma-aminobutyric acid signaling pathway BP GO:0097688 glutamate receptor clustering BP GO:2000646 positive regulation of receptor catabolic process BP GO:0051481 negative regulation of cytosolic calcium ion concentration BP GO:0006031 chitin biosynthetic process BP GO:0045216 cell-cell junction organization BP GO:0006811 ion transport BP GO:0071909 determination of stomach left/right asymmetry BP GO:0070121 Kupffer's vesicle development BP GO:1902216 positive regulation of interleukin-4-mediated signaling pathway BP GO:0048143 astrocyte activation BP GO:1903839 positive regulation of mRNA 3'-UTR binding BP GO:0071420 cellular response to histamine BP GO:0042391 regulation of membrane potential BP GO:0015810 aspartate transport BP

260 Appendices

GO:0045938 positive regulation of circadian sleep/wake cycle, sleep BP GO:0019752 carboxylic acid metabolic process BP GO:0002215 defense response to nematode BP GO:0071294 cellular response to zinc ion BP GO:0043524 negative regulation of neuron apoptotic process BP GO:0051151 negative regulation of smooth muscle cell differentiation BP GO:0016042 lipid catabolic process BP GO:0015739 sialic acid transport BP GO:0008284 positive regulation of cell proliferation BP GO:0038111 interleukin-7-mediated signaling pathway BP GO:0060993 kidney morphogenesis BP GO:0090104 pancreatic epsilon cell differentiation BP GO:0006636 unsaturated fatty acid biosynthetic process BP GO:1902476 chloride transmembrane transport BP GO:1900746 regulation of vascular endothelial growth factor signaling pathway BP GO:0060384 innervation BP GO:0071361 cellular response to ethanol BP GO:0034765 regulation of ion transmembrane transport BP GO:0070857 regulation of bile acid biosynthetic process BP GO:0040013 negative regulation of locomotion BP GO:0006895 Golgi to endosome transport BP GO:0090271 positive regulation of fibroblast growth factor production BP GO:0010847 regulation of chromatin assembly BP GO:1904177 regulation of adipose tissue development BP GO:0015812 gamma-aminobutyric acid transport BP GO:0021759 globus pallidus development BP GO:0015695 organic cation transport BP GO:1904800 negative regulation of neuron remodeling BP GO:0035455 response to interferon-alpha BP GO:0021688 cerebellar molecular layer formation BP GO:0061063 positive regulation of nematode larval development BP GO:0007205 protein kinase C-activating G-protein coupled receptor signaling pathway BP GO:0009624 response to nematode BP GO:0010087 phloem or xylem histogenesis BP GO:0072488 ammonium transmembrane transport BP GO:0045088 regulation of innate immune response BP GO:0014719 skeletal muscle satellite cell activation BP GO:0035810 positive regulation of urine volume BP GO:0009630 gravitropism BP GO:0007215 glutamate receptor signaling pathway BP GO:0007189 adenylate cyclase-activating G-protein coupled receptor signaling pathway BP GO:0060119 inner ear receptor cell development BP GO:1902966 positive regulation of protein localization to early endosome BP GO:0007216 G-protein coupled glutamate receptor signaling pathway BP GO:0001973 adenosine receptor signaling pathway BP GO:0071467 cellular response to pH BP

Appendices 261

GO:0010067 procambium histogenesis BP GO:0001525 angiogenesis BP GO:0046931 pore complex assembly BP GO:0006471 protein ADP-ribosylation BP GO:0030322 stabilization of membrane potential BP GO:0045786 negative regulation of cell cycle BP GO:0009642 response to light intensity BP GO:0002426 immunoglobulin production in mucosal tissue BP GO:0042311 vasodilation BP GO:0014061 regulation of norepinephrine secretion BP GO:0060134 prepulse inhibition BP GO:0036514 dopaminergic neuron axon guidance BP GO:0015696 ammonium transport BP GO:0007218 neuropeptide signaling pathway BP GO:0060579 ventral spinal cord interneuron fate commitment BP GO:0031668 cellular response to extracellular stimulus BP GO:0039595 induction by virus of catabolism of host mRNA BP GO:0030324 lung development BP GO:0035864 response to potassium ion BP GO:0003017 lymph circulation BP GO:0048496 maintenance of animal organ identity BP GO:0042755 eating behavior BP GO:0048630 skeletal muscle tissue growth BP GO:0000187 activation of MAPK activity BP GO:0001746 Bolwig's organ morphogenesis BP GO:0010185 regulation of cellular defense response BP GO:0050793 regulation of developmental process BP GO:0035659 Wnt signaling pathway involved in wound healing, spreading of epidermal cells BP GO:0021812 neuronal-glial interaction involved in cerebral cortex radial glia guided migration BP GO:2001034 positive regulation of double-strand break repair via nonhomologous end joining BP GO:0002315 marginal zone B cell differentiation BP GO:2000095 regulation of Wnt signaling pathway, planar cell polarity pathway BP GO:1902742 apoptotic process involved in development BP GO:0035094 response to nicotine BP GO:0006939 smooth muscle contraction BP GO:0007271 synaptic transmission, cholinergic BP GO:0045901 positive regulation of translational elongation BP GO:0045605 negative regulation of epidermal cell differentiation BP GO:0032720 negative regulation of tumor necrosis factor production BP GO:0051444 negative regulation of ubiquitin-protein transferase activity BP GO:0031284 positive regulation of guanylate cyclase activity BP GO:0014807 regulation of somitogenesis BP GO:1990089 response to nerve growth factor BP GO:0071625 vocalization behavior BP GO:0036515 serotonergic neuron axon guidance BP GO:0061074 regulation of neural retina development BP

262 Appendices

GO:0009405 pathogenesis BP GO:0014029 neural crest formation BP GO:0099054 presynapse assembly BP GO:0045167 asymmetric protein localization involved in cell fate determination BP GO:0060066 oviduct development BP GO:0007219 Notch signaling pathway BP GO:0002882 positive regulation of chronic inflammatory response to non-antigenic stimulus BP GO:0006644 phospholipid metabolic process BP GO:0007158 neuron cell-cell adhesion BP GO:0042052 rhabdomere development BP GO:0001963 synaptic transmission, dopaminergic BP GO:0061314 Notch signaling involved in heart development BP GO:0001975 response to amphetamine BP GO:0060026 convergent extension BP GO:0014075 response to amine BP GO:0021510 spinal cord development BP GO:1904891 positive regulation of excitatory synapse assembly BP GO:0035725 sodium ion transmembrane transport BP GO:0060972 left/right pattern formation BP GO:0060853 Notch signaling pathway involved in arterial endothelial cell fate commitment BP GO:0001868 regulation of complement activation, lectin pathway BP GO:0016339 calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules BP GO:0052331 hemolysis in other organism involved in symbiotic interaction BP GO:0051899 membrane depolarization BP GO:0060336 negative regulation of interferon-gamma-mediated signaling pathway BP GO:0086069 bundle of His cell to Purkinje myocyte communication BP GO:0035987 endodermal cell differentiation BP GO:0003146 heart jogging BP GO:0046636 negative regulation of alpha-beta T cell activation BP GO:0006816 calcium ion transport BP GO:0050890 cognition BP GO:0000578 embryonic axis specification BP GO:0005576 extracellular region CC GO:0005615 extracellular space CC GO:0016020 membrane CC GO:0030054 cell junction CC GO:0033010 paranodal junction CC GO:0033644 host cell membrane CC GO:0045202 synapse CC GO:0045211 postsynaptic membrane CC GO:0089717 spanning component of membrane CC GO:0034707 chloride channel complex CC GO:0031225 anchored component of membrane CC GO:0043204 perikaryon CC GO:0032154 cleavage furrow CC GO:0030496 midbody CC

Appendices 263

GO:0009986 cell surface CC GO:0016021 integral component of membrane CC GO:0042151 nematocyst CC GO:1902711 GABA-A receptor complex CC GO:0005871 kinesin complex CC GO:0097543 ciliary inversin compartment CC GO:0016513 core-binding factor complex CC GO:0005788 endoplasmic reticulum lumen CC GO:0030659 cytoplasmic vesicle membrane CC GO:0045252 oxoglutarate dehydrogenase complex CC GO:0031477 myosin VII complex CC GO:0005886 plasma membrane CC GO:0043259 laminin-10 complex CC GO:0098978 glutamatergic synapse CC GO:0060473 cortical granule CC GO:0031463 Cul3-RING ubiquitin ligase complex CC GO:0005577 fibrinogen complex CC GO:0098983 symmetric, GABA-ergic, inhibitory synapse CC GO:0032426 stereocilium tip CC GO:0016938 kinesin I complex CC GO:0098982 GABA-ergic synapse CC GO:0005882 intermediate filament CC GO:0005938 cell cortex CC GO:0032983 kainate selective glutamate receptor complex CC GO:0016935 glycine-gated chloride channel complex CC GO:0016591 DNA-directed RNA polymerase II, holoenzyme CC GO:0005887 integral component of plasma membrane CC GO:0016459 myosin complex CC GO:0019897 extrinsic component of plasma membrane CC GO:0099056 integral component of presynaptic membrane CC GO:0030673 axolemma CC GO:0004890 GABA-A receptor activity MF GO:0005201 extracellular matrix structural constituent MF GO:0005262 calcium channel activity MF GO:0005509 calcium ion binding MF GO:0008200 ion channel inhibitor activity MF GO:0008832 dGTPase activity MF GO:0090729 toxin activity MF GO:0004087 carbamoyl-phosphate synthase (ammonia) activity MF GO:0004837 tyrosine decarboxylase activity MF oxidoreductase activity, GO:0016715 acting on paired donors, with incorporation or reduction of molecular oxygen, MF reduced ascorbate as one donor, and incorporation of one atom of oxygen GO:0003950 NAD+ ADP-ribosyltransferase activity MF GO:0051393 alpha-actinin binding MF GO:0001540 beta-amyloid binding MF GO:0015643 toxic substance binding MF

264 Appendices

GO:0030246 carbohydrate binding MF GO:0043565 sequence-specific DNA binding MF GO:0032567 dGTP binding MF GO:0008429 phosphatidylethanolamine binding MF GO:0004102 choline O-acetyltransferase activity MF GO:0046975 histone methyltransferase activity (H3-K36 specific) MF GO:0070403 NAD+ binding MF GO:0003777 microtubule motor activity MF GO:0043531 ADP binding MF GO:0016814 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in cyclic amidines MF GO:0042166 acetylcholine binding MF GO:0004591 oxoglutarate dehydrogenase (succinyl-transferring) activity MF GO:0016594 glycine binding MF GO:0008061 chitin binding MF GO:0019964 interferon-gamma binding MF GO:0097677 STAT family protein binding MF GO:0001849 complement component C1q binding MF GO:0044548 S100 protein binding MF GO:0001965 G-protein alpha-subunit binding MF GO:0008525 phosphatidylcholine transporter activity MF GO:0004623 phospholipase A2 activity MF GO:0015183 L-aspartate transmembrane transporter activity MF GO:0019900 kinase binding MF GO:0005178 integrin binding MF GO:0044325 ion channel binding MF GO:0015136 sialic acid transmembrane transporter activity MF GO:0052689 carboxylic ester hydrolase activity MF GO:0004503 monophenol monooxygenase activity MF GO:0001640 adenylate cyclase inhibiting G-protein coupled glutamate receptor activity MF GO:0004100 chitin synthase activity MF GO:0016793 triphosphoric monoester hydrolase activity MF GO:0001609 G-protein coupled adenosine receptor activity MF GO:0038036 sphingosine-1-phosphate receptor activity MF GO:0004068 aspartate 1-decarboxylase activity MF GO:0030976 thiamine pyrophosphate binding MF GO:0008519 ammonium transmembrane transporter activity MF GO:0015538 sialic acid:proton symporter activity MF GO:0019894 kinesin binding MF GO:0031685 adenosine receptor binding MF GO:0015172 acidic amino acid transmembrane transporter activity MF GO:0004126 cytidine deaminase activity MF GO:0044736 acid-sensing ion channel activity MF GO:0001786 phosphatidylserine binding MF GO:0042043 neurexin family protein binding MF GO:0004725 protein tyrosine phosphatase activity MF GO:0003829 beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF

Appendices 265

GO:0005112 Notch binding MF GO:0005252 open rectifier potassium channel activity MF GO:0047225 acetylgalactosaminyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0022841 potassium ion leak channel activity MF GO:0005250 A-type (transient outward) potassium channel activity MF GO:0005328 neurotransmitter:sodium symporter activity MF GO:0004971 AMPA glutamate receptor activity MF GO:0008109 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyltransferase activity MF GO:0015271 outward rectifier potassium channel activity MF GO:0005251 delayed rectifier potassium channel activity MF GO:0004888 transmembrane signaling receptor activity MF GO:0008083 growth factor activity MF GO:0052740 1-acyl-2-lysophosphatidylserine acylhydrolase activity MF GO:0052739 phosphatidylserine 1-acylhydrolase activity MF

266 Appendices

Supplementary Table 17. TTL gene family distribution in RNA-seq reference transcriptomes of Actinia tenebrosa and Nematostella vectensis across morphological structures and ontogeny. TTL genes identified using BLAST analysis (e value < 1e- 05) against the Swiss-prot database.

Actinia tenebrosa Morphological-novelty Ontogeny Gene family All DGE All DGE type-B carboxylesterase/lipase family 8 4 4 0 true venom lectin family 2 2 4 3 sea anemone type 3 (BDS-LIKE) potassium channel toxin family 24 16 13 4 phospholipase A2 family 13 13 9 2 natterin family 3 1 2 0 Cnidaria small cysteine-rich protein (SCRiP) family 1 1 1 0 sea anemone sodium channel inhibitory toxin family 5 3 3 0 venom Kunitz-type family 13 11 11 0 snaclec family 3 1 4 0 cystatin family 6 0 11 0 sea anemone 8 toxin family 7 2 12 2 multicopper oxidase family 4 0 9 0 actinoporin family 5 5 2 1 sea anemone type 5 potassium channel toxin family 2 2 2 0 peptidase M12A family 4 4 1 1 sea anemone type 1 potassium channel toxin family 8 8 4 0 ficolin lectin family 1 1 5 0 acrorhagin 1 4 3 4 0 acrorhagin 2 1 1 0 0 venom complement C3 homolog family 0 0 1 0 flavin monoamine oxidase family 0 0 1 0 Unclassed 0 0 0 0 AB hydrolase superfamily 0 0 0 0 phospholipase B-like family 0 0 0 0 conopeptide P-like superfamily 0 0 0 0 venom metalloproteinase (M12B) family 0 0 0 0 Total 114 78 103 13

Appendices 267

Nematostella vectensis Morphological-novelty Ontogeny Gene family All DGE All DGE type-B carboxylesterase/lipase family 1 0 1 0 true venom lectin family 0 0 0 0 sea anemone type 3 (BDS-LIKE) potassium channel toxin family 0 0 0 0 phospholipase A2 family 15 7 17 2 natterin family 0 0 0 0 Cnidaria small cysteine-rich protein (SCRiP) family 0 0 0 0 sea anemone sodium channel inhibitory toxin family 2 1 5 0 venom Kunitz-type family 3 1 4 0 snaclec family 1 1 0 0 cystatin family 0 0 0 0 sea anemone 8 toxin family 3 1 7 2 multicopper oxidase family 6 3 4 1 actinoporin family 0 0 0 0 sea anemone type 5 potassium channel toxin family 1 1 1 0 peptidase M12A family 6 2 7 0 sea anemone type 1 potassium channel toxin family 0 0 0 0 ficolin lectin family 1 0 2 0 acrorhagin 1 0 0 0 0 acrorhagin 2 0 0 0 0 venom complement C3 homolog family 0 0 0 0 flavin monoamine oxidase family 0 0 0 0 Unclassed 4 1 10 0 AB hydrolase superfamily 1 0 1 0 phospholipase B-like family 1 0 0 0 conopeptide P-like superfamily 0 0 2 0 venom metalloproteinase (M12B) family 0 0 1 0 Total 45 18 62 5

268 Appendices

Supplementary Table 18. Tissue-specific TTL gene validation in Actinia tenebrosa. ANOVA statistical analysis from qPCR data of TTL genes across tissues (acrorhagi, mesenteric filaments and tentacles).

Average relative ΔΔct

Transcript Gene family P-value Mesenteric Acrorhagi Tentacles filaments

TR16562|c0_g1_i1 sea anemone 8 toxin 0.023768 0.009907 0.004637 0.021847

TR96513|c1_g1_i1 venom Kunitz-type 0.971189 0.002223 0.002053 0.00189

TR28247|c0_g1_i1 peptidase M12A 3.93e-05 7.81 e-07 0.024014 1e-06

-05 TR74744|c0_g2_i1 phospholipase A2 0.000722 0.000144 0.002473 1.89e

TR48036|c0_g1_i1 venom Kunitz-type 0.003721 0.000957 0.007479 0.002321

TR39026|c0_g1_i1 Acrorhagin 2 7.43e-14 0.048208 0.003183 0.003634

TR3935|c0_g1_i1 Acrorhagin 1a 2.94e-09 0.046059 0.00168 0.001394

-05 -07 -06 -07 TR40005|c0_g1_i1 phospholipase A2 1.54e 3.74 e 1.55e 2.34e

sea anemone type 3 TR89343|c0_g1_i1 (BDS-LIKE) potassium 3.49e-11 3.53e-05 0.017989 0.002522 channel toxin

Cnidaria small cysteine- TR18395|c0_g1_i1 8.06e-10 0.096192 0.005772 0.022962 rich protein (SCRiP)

TR29203|c0_g1_i1 venom Kunitz-type 1.47e-07 0.000916 0.007462 0.001263

Appendices 269

Supplementary Table 19. Illumina raw reads metrics used to generate the draft genome of Actinia tenebrosa.

Read length Insert Clean reads Clean bases Q20 (%) GC (%) (bp) 170bp 100 323,912,222 32,391,222,200 97.39 41.57 2kbp 100 313,781,050 31,378,105,000 97.20 40.27 500bp 100 324,434,076 32,443,407,600 96.15 39.22 5kbp 100 323,546,606 32,354,660,600 97.56 39.91

270 Appendices

Supplementary Table 20. Assembly metrics for the draft genome of Actinia tenebrosa.

Metrics Actinia tenebrosa BUSCO (%) 89.6 N50 scaffold (kbp) 159.384 N50 contigs (kbp) 8.410 Scaffolds 3,936 Contigs 33,030 Max scaffold length (Mbp) 1.083 Exon 200,556 Intron 170,169 GC (%) 37.20 Exon average (bp) 211.4348 Intron average (bp) 580.311 Total repeat length (Mbp) 40.12981 Repeat proportion (%) 19.57 Total tRNAs 1,002/784 Total rRNAs 58 Genome estimate (Mbp) 255 Assembly size (bp) 237,650,633 Total contig size (bp) 205,700,379 Total contig size (% of assembly size) 86.56

Appendices 271

Supplementary Table 21. Comparative genome metrics across Cnidaria

Annotation metrics ADIG AFEN ATEN DSPP EPAL NVEC HVUG Genome size (Mbp) 420 350 255 428 260 329/450 1,300 Assembly size (Mbp) 419 370 238 444 258 356 852 Total contig size (Mbp) 365 305 206 364 213 297 785 Total contig size (% of assembly) 87 82.43 86.56 81.98 82.5 83.4 92.2 Contig N50 (kbp) 10.9 20 8.4 18.7 14.9 19.8 9.7 Scaffold N50 (kbp) 191 510 159 769 440 472 92.5 Mean exon length (bp) 230 218 211 226 354 208 NA Mean intron length (bp) 952 1,047 580 1,119 638 800 NA Percent repetitive DNA 13 30.7 19.57 37.8 26 26 57 BUSCO (%) 74.7 83.7 89.6 86.3 87.3 91.6 77 ADIG = Acropora digitifera, AFEN = Amplexidiscus fenestrafer, ATEN = Actinia tenebrosa, DSPP = Discosoma sp, EPAL = Exaiptasia pallida, HVUG = Hydra vulgaris, NVEC = Nematostella vectensis.

272 Appendices

Supplementary Table 22. Repeats breakdown masked in the Actinia tenebrosa genome

Class Masked (bp) Masked (%) SINE 2,657,682 1.29 RC 407,575 0.20 SINE? 26,191 0.01 Unknown 8,816,058 4.29 MITE 21,217,408 10.31 LINE 1,049,346 0.51 LTR 2,311,971 1.12 DNA 3,643,579 1.77

Appendices 273

Supplementary Table 23. Gene set enrichment analysis of gene ontologies from genes unique to Actinia tenebrosa within actiniarians

Gene ontology Name Namespace P-value ID RNA-directed DNA polymerase -06 GO:0003964 Molecular function 8.5e activity -04 GO:0005044 scavenger receptor activity Molecular function 8.4 e

GO:0015074 DNA integration Biological process 0.0027

GO:0060107 annuli extracellular matrix Cellular component 0.032

GO:0042151 nematocyst Cellular component 0.032

274 Appendices