Deep Sequencing and Annotation of the Trichoplax adhaerens mRNA Transcriptome Identifies Novel and a Rich Repertoire of Neural Signaling Machinery, Providing Insight into Nervous System Evolution

by

Yuen Yan Wong

A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Cell and Systems Biology University of Toronto

© Copyright by Yuen Yan Wong 2018

Deep Sequencing and Annotation of the Trichoplax adhaerens mRNA Transcriptome Identifies Novel Genes and a Rich Repertoire of Neural Signaling Machinery, Providing Insight into Nervous System Evolution

Yuen Yan Wong

Master of Science

Department of Cell and Systems Biology University of Toronto

2018 Abstract Trichoplax adhaerens is an early-diverging animal capable of motile behavior such as feeding, chemotaxis, and phototaxis, despite lacking synaptically-connected neurons and muscles. Our lab has produced a high-quality T. adhaerens transcriptome in which ~85% of the assembled genes are complete -coding sequences, with 2,483 newly-identified genes missed in the genome sequencing effort. One objective of this research was to identify genes involved in neural signaling.

Using in silico prediction algorithms we identified an array of neuropeptide genes, GPCRs, ion channels, and synaptic scaffolding and signaling . We also discovered a previously unknown group of presynaptic Rab3-Interacting Molecule (RIM) homologues in the animal phylogeny, absent in humans but present in many invertebrates. Our work sets the stage for future studies aimed at understanding the concurrent evolution of metazoan cell types, and their ability to communicate with each other through various forms of cell signaling including electrochemical signaling in the nervous system.

ii

Acknowledgments

I would first like to thank my supervisor, Dr. Adriano Senatore, for his mentorship and guidance throughout the past two years of my Master’s studies, and his assistance in helping me prepare my thesis. Thank you Dr. Senatore for giving me the opportunity to further explore my research interests and to be able to push myself beyond my limits. Besides my advisor, I would like to thank my committee members, Dr. Mary Cheng and Dr. Robert Ness, for their guidance and support.

I would like to especially thank Brian Novogradac. As a beginner in bioinformatics, I am grateful for all his great help and technical support he had been providing me throughout the two years.

Next, I express my sincere gratitude to my wonderful colleagues. Thank you Sally and Alicia. Thank you so much for always being there for me, for all the sleepless nights and blood, sweat and tears we experience together while working for deadlines, and every time we celebrate our minor successes and happiness with lots of food and coffee breaks. Thank you for always being supportive and loving, and for creating all these wonderful memory with me which I will cherish for life. Thank you, Dr. Marcia Roy, for your kind support and guidance. Thank you so much for helping me with the postsynaptic proteins analysis, and in particular, thank you very much for showing me how to always love science and research! I really appreciate all of our exciting and inspiring conversations.

Last, but not least, I would like to thank my family and friends for all the moral support they have given me all along. I would especially like to deliver my enormous gratitude to my parents and my siblings, for always being extremely supportive and understanding, and for always making sure I have everything I need. I am grateful to have you all by my side.

iii

Table of Contents

ACKNOWLEDGMENTS ...... III TABLE OF CONTENTS ...... IV LIST OF FIGURES ...... VI LIST OF TABLES ...... VII LIST OF APPENDICES ...... VIII LIST OF ABBREVIATIONS/TERMINOLOGIES ...... IX CHAPTER 1 ...... 1 INTRODUCTION: TRICHOPLAX ADHAERENS AS A MODEL ORGANISM FOR NEUROSCIENCE RESEARCH ...... 1 1.1 WHAT IS A NERVOUS SYSTEM? ...... 1 1.2 THE ANIMAL PHYLOGENY AND NERVOUS SYSTEM EVOLUTION ...... 1 1.2.1 The Animal Phylogeny ...... 1 1.2.2 Controversies about the Origin of the Nervous System ...... 2 1.3 TRICHOPLAX ADHAERENS ...... 3 1.3.1 What is it? ...... 3 1.3.2 Taking a Transcriptomics Approach ...... 4 CHAPTER 2 TRANSCRIPTOME ANALYSIS OF TRICHOPLAX ADHAERENS REFLECTS A DIGESTIVE EPITHELIUM WITH CELLULAR COORDINATION AND AN ARRAY OF GENES INVOLVED IN NEURAL SIGNALING ...... 6 TRANSCRIPTOME ANALYSIS OF TRICHOPLAX ADHAERENS REFLECTS A DIGESTIVE EPITHELIUM WITH CELLULAR COORDINATION AND AN ARRAY OF GENES INVOLVED IN NEURAL SIGNALING ...... 6

2.1 METHODS...... 6 2.1.1 RNA Isolation and Illumina Sequencing ...... 6 2.1.2 Transcriptome Production ...... 6 2.1.3 Transcriptome Metrics and Ontology ...... 8 2.1.4 Comparison against the Published Genome...... 9 2.1.5 Extended and Full T. adhaerens Predicted Proteome ...... 10 2.1.6 Secretome Production...... 10 2.1.7 Regulatory/Neuro-Peptide Prediction...... 11 2.1.8 GPCRome Production and Classification Identification ...... 12 2.1.9 Protein Sequence Annotations ...... 12 2.1.10 Cross-Phyla Domain Analysis ...... 13 2.2 RESULTS ...... 15 2.2.1 Transcriptome Metrics ...... 15 2.2.2 ...... 15 2.2.3 Comparison against Genome ...... 17 2.2.4 The Secretome ...... 21 2.2.5 Regulatory/Neuro- Peptides...... 21 2.2.6 Neurotransmitter Biosynthesis and Degradation Pathways ...... 23 2.2.7 Receptors and the GPCRome ...... 24 2.2.8 Channel Protein Homologues ...... 26 2.2.9 Pre-/Post- Synaptic Scaffolding and Signaling Protein Homologues Cross-Phyla Domain Analysis .... 27 2.2.10 Cross-Phyla Protein-Protein Interaction-Related Domain Counts Analysis ...... 30 2.3 DISCUSSIONS ...... 30 2.3.1 A High-Quality T. adhaerens Gene Set ...... 30 2.3.2 The Enriched Digestive Profile of the T. adhaerens Secretome May Provide Insight into the Origin of the Enteric Nervous System (ENS) ...... 32 2.3.3 Candidate Peptidergic Signaling Peptides and Receptors in the T. adhaerens ...... 33

iv

2.3.4 Potential ‘Neurotransmitter’ Signaling in the ‘Aneural’ T. adhaerens ...... 35 2.3.5 T. adhaerens Expresses a Rich Repertoire of Channel Homologues...... 37 2.3.6 Synaptic Scaffolding and Signaling Protein Homologues Suggest T. adhaerens to have the Molecular Capacity for Primitive Synapse-like Functions ...... 39 2.3.7 Evolutionary Expansion Protein-Protein-Interaction-Related Domain Counts ...... 42 CHAPTER 3 DISCOVERY OF A NOVEL RAB3-INTERACTING MOLECULE (RIM) TYPE PROVIDES INSIGHT INTO THE EARLY MOLECULAR MACHINERY OF NERVOUS SYSTEM EVOLUTION ...... 44 DISCOVERY OF A NOVEL RAB3-INTERACTING MOLECULE (RIM) TYPE PROVIDES INSIGHT INTO THE EARLY MOLECULAR MACHINERY OF NERVOUS SYSTEM EVOLUTION ...... 44

3.1 A BRIEF DESCRIPTION OF THE RIM PROTEIN...... 44 3.2 METHODS...... 45 3.2.1 Phylogenetic Analyses and Annotations ...... 45 3.2.2 Quantification PCR for RIM Types I and II in Lymnaea stagnalis ...... 45 3.3 RESULTS ...... 46 3.3.1 Discovery of a Phylogenetically Novel Class of the Pre-synaptic Scaffolding Protein RIM ...... 46 3.3.2 Differential Tissue Expression of RIM Types I and II in Lymnaea stagnalis ...... 48 3.4 DISCUSSIONS ...... 51 3.4.1 Novel RIM Type ...... 51 3.4.2 Tissue-Specific Differential Expression of RIM Types in L. stagnalis ...... 52 CHAPTER 4 GENERAL CONCLUSIONS AND FUTURE DIRECTIONS ...... 53 GENERAL CONCLUSIONS AND FUTURE DIRECTIONS ...... 53 TABLE 1 – TOP 30 GENE ONTOLOGY CATEGORIES FOR BIOLOGICAL PROCESS ENRICHED IN THE TOP 1000 HIGHLY- EXPRESSED T. ADHAERENS GENES ...... 67 TABLE 2 – TOP 30 GENE ONTOLOGY CATEGORIES FOR CELLULAR COMPONENT ENRICHED IN THE TOP 1000 HIGHLY-EXPRESSED T. ADHAERENS GENES ...... 68 TABLE 3 – TOP 30 GENE ONTOLOGY CATEGORIES FOR MOLECULAR FUNCTIONS ENRICHED IN THE TOP 1000 HIGHLY-EXPRESSED T. ADHAERENS GENES ...... 69 TABLE 4 – T. ADHAERENS DIGESTIVE-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 70 TABLE 5 - T. ADHAERENS IMMUNE-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION...... 71 TABLE 6 - T. ADHAERENS DEVELOPMENTAL-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 72 TABLE 7 - T. ADHAERENS CELL ADHESION MATRIX-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION . 73 TABLE 8 - T. ADHAERENS HOUSEKEEPING FUNCTION-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 74 TABLE 9 – NEUROTRANSMITTER BIOSYNTHESIS/DEGRADATION PATHWAYS ...... 75 TABLE 10 – T. ADHAERENS RECEPTOR PROTEIN HOMOLOGUES ...... 76 TABLE 11 – T. ADHAERENS CHANNEL PROTEIN HOMOLOGUES ...... 77 TABLE 12 – SNARE AND SNARE-RELATED PROTEIN HOMOLOGUES ...... 79 TABLE 13 – PRE-/POST-SYNAPTIC PROTEIN HOMOLOGUES ...... 80 APPENDIX I - ACCESSIONS FOR NEMATOSTELLA VECTENSIS TRANSCRIPTOME ASSEMBLY ...... 86 APPENDIX II – ACCESSIONS FOR MNEMIOPSIS LEIDYI TRANSCRIPTOME ASSEMBLY ...... 87 APPENDIX III - ACCESSIONS FOR RIM/RPH3A PHYLOGENETIC ANALYSIS ...... 88 APPENDIX IV- PRIMERS FOR QPCR OF LYMNAEA STAGNALIS RIM TYPES I AND II ...... 90

v

List of Figures

Transcriptome Production Pipeline of the T. adhaerens Figure 1 p. 7 Transcriptome

Figure 2 Transcriptome Metrics p.16

Figure 3 Gene Ontology Categories p. 18

Gene Ontology Categories Enriched in the Highly-Expressed T. Figure 4 p. 19 adhaerens Genes

Comparison between the T. adhaerens Transcriptome and the Figure 5 p. 20 Published Genome

Figure 6 Secretome and Regulatory-/Neuro- Peptides p. 22

Figure 7 The T. adhaerens GPCRome p. 25

Figure 8 Presynaptic Protein Homologues p. 28

Figure 9 Cross-Phyla Domain Analysis p. 31

Domain Compositions of T. adhaerens Type I RIM (Tad-RIM- Figure 10 p. 47 I), Type II RIM (Tad-RIM-II), and Rabphilin-3a (Rph3a)

Phylogeny of Rab3-Interacting Molecule (RIM) and Rabphilin- Figure 11 p. 49 3a (Rph3a)

Differential Tissue Expression of RIM Types I and II in Figure 12 p. 50 Lymnaea stagnalis

vi

List of Tables

TABLE 1 – TOP 30 GENE ONTOLOGY CATEGORIES FOR BIOLOGICAL PROCESS ENRICHED IN THE TOP 1000 HIGHLY- EXPRESSED T. ADHAERENS GENES ...... 67 TABLE 2 – TOP 30 GENE ONTOLOGY CATEGORIES FOR CELLULAR COMPONENT ENRICHED IN THE TOP 1000 HIGHLY-EXPRESSED T. ADHAERENS GENES ...... 68 TABLE 3 – TOP 30 GENE ONTOLOGY CATEGORIES FOR MOLECULAR FUNCTIONS ENRICHED IN THE TOP 1000 HIGHLY-EXPRESSED T. ADHAERENS GENES ...... 69 TABLE 4 – T. ADHAERENS DIGESTIVE-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 70 TABLE 5 - T. ADHAERENS IMMUNE-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION...... 71 TABLE 6 - T. ADHAERENS DEVELOPMENTAL-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 72 TABLE 7 - T. ADHAERENS CELL ADHESION MATRIX-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION . 73 TABLE 8 - T. ADHAERENS HOUSEKEEPING FUNCTION-RELATED SECRETED PROTEINS WITH TOP TPM EXPRESSION ...... 74 TABLE 9 – NEUROTRANSMITTER BIOSYNTHESIS/DEGRADATION PATHWAYS ...... 75 TABLE 10 – T. ADHAERENS RECEPTOR PROTEIN HOMOLOGUES ...... 76 TABLE 11 – T. ADHAERENS CHANNEL PROTEIN HOMOLOGUES ...... 77 TABLE 12 – SNARE AND SNARE-RELATED PROTEIN HOMOLOGUES ...... 79 TABLE 13 – PRE-/POST-SYNAPTIC PROTEIN HOMOLOGUES ...... 80

vii

List of Appendices

APPENDIX I - ACCESSIONS FOR NEMATOSTELLA VECTENSIS TRANSCRIPTOME ASSEMBLY ...... 86 APPENDIX II – ACCESSIONS FOR MNEMIOPSIS LEIDYI TRANSCRIPTOME ASSEMBLY ...... 87 APPENDIX III - ACCESSIONS FOR RIM/RPH3A PHYLOGENETIC ANALYSIS ...... 88 APPENDIX IV- PRIMERS FOR QPCR OF L. STAGNALIS RIM TYPES I AND II ...... 90

viii

List of Abbreviations/Terminologies

Terminology Meaning/description

BLAST Basic Local Alignment Search Tool

CaMK domain Calmodulin-dependent protein kinase domain Enteric Nervous Component of the autonomic nervous system that controls the digestive tracts System (ENS) Evg-final Single output of T. adhaerens transcriptome (See section 2.2) transcriptome Extended Combination of Evg-final transcriptome and 837 transcript sequences from the 27 independent T. adhaerens transcriptome transcriptomes orignially discarded during the Evidential-gene Pipeline (see section 2.2.7) Full predicted- Full set of protein sequences predicted from FullTrixGenes proteome Full secretome Combination of three T. adhaerens secretomes predicted independently (See section 2.24) Combination of Extended transcriptome with selected genome-predicted genes from the PrdGeneSet (see section FullTrixGenes 2.2.7) Comprehensive annotation of genes in the major categories of Cellular Component, Biological Process, and Gene Ontology (GO) Molecular Function GK domain Guanylate kinase domain

GPCRome Full set of GPCR sequences identified in the T. adhaerens predicted proteome

HMM Hidden Markov Model

Lym-RIM-I Lymneaea stagnalis RIM I homologue

Lym-RIM-II Lymneaea stagnalis RIM II homologue

MAGUK Membrane associated guanylate kinases

Metazoa Animals, largest kingdom of multicellular eukaryote Multi-organism process (gene Processes that involve another individual of the same or another species ontology) PrdGeneSet Abbreviation of Predicted Gene Set, T. adhaerens gene set predicted from the published genome Refined set of Full secretome, includes proteins predicted to be secretory in at least two of the three independent Refined secretome secretomes (See section 2.24) SH3 domain Src homologue 3 domain

SNARE Soluble N-ethylmaleimide-sensitive fusion protein attachment protein receptors

Tad-RIM-I Trichoplax adhaerens RIM I homologue

Tad-RIM-II Trichoplax adhaerens RIM II homologue

TadELP Trichoplax adhaerens endomorphin-like peptide

ix

Chapter 1 Introduction: Trichoplax adhaerens as a Model Organism for Neuroscience Research Introduction: Trichoplax adhaerens as a Model Organism for Neuroscience Research 1.1 What is a Nervous System? The nervous system allows animals to sense and react to changes in the environment, attending to the presence of food or aversive elements in their surroundings. It is often defined with the presence of its hallmark structural components: neurons and synapses, which allow for the propagation of sensory and motor information via electrical and chemical signaling (E. G. Jones, 1999). Fast neural electrochemical signaling involves a suite of genes ranging from sensory receptors, voltage-gated ion channels, neurotransmitter receptors, and molecular machinery responsible for trafficking and exocytosis of neuroligands across the synaptic cleft. However, the nervous system also bears a slower, longer-lasting form of cell-cell communication, which plays crucial roles in regulating fast electrochemical signals (i.e. neuromodulation) (Marder, 2012). Neuromodulation utilizes molecular machinery that overlaps with fast electrochemical neural signaling, but a key distinction is that secreted neuroligands activate non-ionotropic receptors on target cells (i.e. metabotropic), such as G-protein coupled receptors (GPCRs), which mediate long lasting cellular effects. Both neuromodulation and electrochemical signaling are important components of the nervous system. Yet, the nervous system is canonically defined by the presence of synaptically-connected neurons (E. G. Jones, 1999), which is thought to stem from the gradual confinement of intercellular communication, building on a pre-existing set of molecular machineries to allow for enhanced specificity and efficiency of communication (Emes & Grant, 2012). By that classical definition, not all animals in evolutionary history had a nervous system.

1.2 The Animal Phylogeny and Nervous System Evolution

1.2.1 The Animal Phylogeny

Metazoa is the largest kingdom of multicellular eukaryotes, consisting of over 1.5 million animal species (Z. Q. Zhang, 2013), that achieve multicellularity through undergoing cell-type differentiation during development (Paps, 2018). However, accompanied with the rise of animal

1

multicellularity is the challenge to maintain effective communication and control between all individual cells, suggesting for the demand of an effective cell-cell communication mechanism, such as electrochemical and neuromodulatory signaling in the nervous system.

Before moving on to discuss the origins of the nervous system, it is important to first describe the early-diverging animal phyla.

Ctenophora (comb jellies, that have bi-radial body symmetry) and Porifera (sponges, that lack body symmetry) are most basal animal phyla. Over the past decade, there has been a lot of debate which of the two phyla is the most ‘basal’ one. Large-scale phylogenomic analyses placed Ctenophora to be the sister-lineage of all other metazoans (hence most early-diverging) (Dunn et al., 2008; Hejnol et al., 2009), but those results were contested as phylogenetic artifacts (Philippe et al., 2009; Pick et al., 2010), causing Poriferans to be placed again as the most basal animals. Yet, the debate continues as broad phylogenetic analyses with gene sequence data from recently- published genomes of two ctenophores species again support the Ctenophora-first phylogeny (Jákely, Paps, & Nielsen, 2015; Moroz et al., 2014; Ryan et al., 2013).

Next, comes Phylum Placozoa (lack body symmetry) which lack a canonically defined nervous system (Smith et al., 2014). Placozoa consists of a single representative species, Trichoplax adhaerens, though recent studies have proposed identification of an additional genus with significant genetic diversity in the phylum (Eitel et al., 2017). Phylum Cnidaria (with radial, though also recently proposed to have bi-radial body symmetry (Holló, 2015)), on the other hand, consists of aquatic animals with simple nervous systems organized in the form of nerve nets. Placozoa has been widely accepted to be basal to cnidarians, though some recent genomic studies propose Placozoa and Cnidaria to be sister lineages with each other, that together they form a clade basal to bilaterians (Laumer et al., 2017), animals with bilateral symmetry, consisting about 99% of all animal species (Holló, 2015).

1.2.2 Controversies about the Origin of the Nervous System

Multiple theories have been proposed regarding the origin of the nervous system, one reason being the controversial placement of early-diverging animal phyla, Ctenophora and Porifera, in the animal phylogeny (see above). While sponges lack neurons and synapses, and hence a nervous system, ctenophores have a nervous system, though it differs from the well-characterized nervous

2

systems in other metazoans in many aspects such as the absence of many critical genes involved in synaptic function (e.g. neuroligin) and neuronal cell fate (e.g. neurogenin, NeuroD, and Achaete-scute etc.) (Moroz et al., 2014).

Under the Porifera-first phylogeny, the nervous system likely evolved after the divergence of Porifera which lacks a nervous system, that the nervous system evolved prior to Ctenophora, and was lost only in Placozoa, but was kept in all subsequently-diverged animals (Cnidarians and bilaterians) (Senatore, Raiss, & Le, 2016). If the alternative is true, that Ctenophora is the most early-diverging phyla, the nervous system either evolved (1) once: prior to the divergence of ctenophores, lost in both Porifera and Placozoa, but was kept in all of Cnidaria and Bilateria; or (2) twice, that the nervous system evolved once in Ctenophora (Moroz et al., 2014), and then again, independently, prior to the divergence of Cnidaria and Bilateria.

It is unknown when the nervous system, with canonically defined neural synaptic connections, first arose during animal evolution. Nevertheless, the nervous system has been suggested to have evolved several times independently, as has been proposed to be the case in the emergence of animal multicellularity (Carr, Leadbeater, Hassan, Nelson, & Baldauf, 2008; Moroz, 2009). Therefore, it is perhaps wise to not restrict ourselves to a single theory of how/when the nervous system evolved, but rather, to consider fundamental elements of the nervous system in the context of the most simplified extant animal systems, such as T. adhaerens. T. adhaerens thus presents a unique opportunity to investigate the potential transitional state from the absence of a nervous system to having one, or vice versa.

1.3 Trichoplax adhaerens

1.3.1 What is it?

T. adhaerens is the simplest form of metazoans, bearing only six known cell types that, despite lacking synapses and hence a bona fide nervous system, can remarkably carry out motile behavior such as feeding, chemotaxis and phototaxis. T. adhaerens is a disk-shaped marine animal (2-3mm in diameter) that was a first described in 1883 by the German zoologist Franz Eilhard Shulze, as ‘adhering’ (adhaerens) ‘hairy’ (Trich) ‘plates’ (plax) based on the ciliary projections from the animal’s epithelium, giving it its full name of Trichoplax adhaerens (Syed & Schierwater, 2002). After its discovery and an early interest in T. adhaerens as a model organism for research on

3

metazoan evolution, it was subsequently forgotten by researchers. It was not until almost a century later in the 1960s that it was ‘rediscovered’ by the German protozoologist KG Grell (Syed & Schierwater, 2002). Interest in T. adhaerens research is currently undergoing a third phase of intensive growth, spurred by the genome sequencing effort that revealed its complex repertoires of genes involved in complex animal traits including nervous system functions (Srivastava et al., 2008).

The most characterized motile behavior for T. adhaerens is its feeding behavior, which involves tight regulation of the asynchronous ciliary beating of its ventral epithelial cells, allowing the animal to maneuver and pause in its environment (Smith, Pivovarova, & Reese, 2015; Ueda, Koya, & Maruyama, 1999). Such ciliary beating ceases upon exposure to food (e.g. algae), causing the animal to pause itself over the food source, after which lipophil cells secrete granular content, suggested to be digestive enzymes for external digestion of algae (Smith et al., 2015; Ueda et al., 1999). Release of the granular content is also spatially-regulated and possibly rely on chemosensory mechanisms, as secretion occurs specifically in lipophil cells in proximity to algae (Smith et al., 2015). The lysed food content then get absorbed, likely through endocytosis in the ventral epithelium (Ruthmann, Behrendt, & Wahl, 1986), and the animal would resume its gliding behavior to relocate itself to a new location with other available food source (Smith et al., 2015). Evidently, T. adhaerens’ feeding behavior is highly stereotypical and regulated, indicating that the different cell types are communicating with each other and collectively interacting with the outside world.

1.3.2 Taking a Transcriptomics Approach The first sequenced genome of T. adhaerens was published in 2008 (Srivastava et al., 2008). predicted 11,514 protein-coding genes, wherein many nervous system-related genes were identified, including genes involved in neuronal excitability and synaptic transmission, neurotransmitter biosynthesis, and GPCRs. However, these genes were predicted using ab initio methods from the genome scaffolds, and are largely not validated at the transcript level (Srivastava et al., 2008). Subsequently, proteome profiling of T. adhaerens in 2013 (Ringrose et al., 2013), quantified 6,516 protein sequences, including many genes implicated in animal signaling pathways.

4

To fully investigate whether T. adhaerens expresses key genes involved in neural signaling, we need a high quality and complete gene-set of the animal. In this thesis, we took a transcriptomics approach to encapture as many genes expressed in T. adhaerens as possible, that are validated at least at the transcript level. Furthermore, our efforts here are focused on annotating and describing genes involved in neural signaling: first by focusing on candidates of signaling molecule via identification of secretory proteins and enzymes involved in neurotransmitter biosynthesis and degradation pathways. Next, we move onto genes involved in fast and slow neuronal communication modalities, including electrogenic channel proteins, neuromodulation/paracrine signaling, exocytotic machinery, pre- and post-synaptic scaffolding proteins, as well as neuronal signaling pathways etc. Indeed, our work identifies a vast array of genes homologous to those attributed to neuronal functions, suggesting the animal has a capacity for neural-like signaling for coordinating cells and motile behavior, functioning as a ‘proto-nervous system’ (Emes & Grant, 2012).

5

Chapter 2 Transcriptome Analysis of Trichoplax adhaerens reflects a Digestive Epithelium with Cellular Coordination and an Array of Genes involved in Neural Signaling Transcriptome Analysis of Trichoplax adhaerens reflects a Digestive Epithelium with Cellular Coordination and an Array of Genes involved in Neural Signaling 2.1 Methods

2.1.1 RNA Isolation and Illumina Sequencing RNA isolation and Illumina sequencing were conducted by Dr. Adriano Senatore. Total RNA was extracted and purified from whole T. adhaerens animals cultured in the lab (Grell Strain, generously provided by Dr. Leo Buss from Yale University), using the RNeasy Plus Universal Midi Kit (Qiangen) and the PicoPure RNA Isolation Kit (ThermoFisher Scientific). Fifteen to twenty animals were used for RNA extraction with the RNeasy kit, while ten were used for the PicoPure kit. Four cDNA libraries were then prepared from the RNA sets using the TruSeq Stranded Total RNA Library preparation kit (Illumina): one library from the RNeasy RNA set and three libraries from the PicoPure RNA set. cDNA library preparation was conducted as per manufacturer of the preparation kit, excluding the cDNA fragmentation step to ensure longer reads for later sequencing and assembly. The four cDNA libraries were then submitted to Beckman Genomics for sequencing via Illumina HiSeq 2500 (2 × 125 bp paired reads), producing a total of 164,094,859 paired reads.

2.1.2 Transcriptome Production

Transcriptome production up to and including the Evidential Gene Pipeline (Orsini et al., 2016) was conducted by a previous undergraduate researcher in the Senatore Lab, Julia Phuong Le.

Paired-end reads were first quality trimmed to remove low quality ends using Trimmomatic-0.36 (Bolger, Lohse, & Usadel, 2014). Trimmed reads were then processed through our transcriptome production pipeline (Figure 1), either down the Normalized path where potential PCR duplicates

6

Figure 1 – Transcriptome Production Pipeline of the T. adhaerens Transcriptome.

7

were removed using FastUniq (Xu et al., 2012), or the Unnormalized path where trimmed reads were proceeded to assembler programs directly.

For both the normalized and unnormalized reads sets, genome-guided (ab initio) transcriptomes were assembled using Trinity v2.2.0 (Haas et al., 2013) and Cufflinks v2.2.1 (Trapnell et al., 2012) respectively. For the Trinity ab initio transcriptomes, GSNAP v2016-05-01 (Wu & Nacu, 2010) was used for mapping and aligning sequence reads to the T. adhaerens genome prior to assembly. On the other hand, TopHat v2.1.1(Kim, Daehwan; Pertea, Geo; Trapnell, Cole; Pimentel, Harold; Kelly, Ryan; Salzberg, 2013) and Bowtie v2.2.9 (Langmead & Salzberg, 2012) were used to align sequenced reads to the T. adhaerens genome and across splice junctions respectively, prior to assembly, for the Cufflinks ab initio transcriptomes. Three additional de novo assemblers (Trans- ABySS v1.5.3 (Robertson et al., 2010), Velvet v1.2.10 / Oaeses v0.2.09 (Schulz, Zerbino, Vingron, & Birney, 2012), and SOAP-denovo-Trans v1.03 (Xie et al., 2014)) were applied only to normalized reads sets due to their requirement for normalized input data, with selected K-mer sizes of 63, 71, 79, 87, 95, 103, and 111 bp.

The 27 independent T. adhaerens transcriptomes produced were then pooled and processed through the Evidential Gene Pipeline (Orsini et al., 2016), in which the algorithm identifies and selects to keep the longest best quality read for each gene from all assemblies. Further redundancy removal was conducted on the single evidential gene output transcriptome using CD-Hit v4.6 (W. Li & Godzik, 2006) with sequence identity threshold of 95% at the protein level, yielding a single non-redundant transcriptome with 17,411 unique protein-coding sequences. This output transcriptome was further processed via separation of chimerically fused genes, potential contamination filtration, and filtration of small genes (length <100 aa) with no Basic Local Alignment Search Tool (BLAST) homology in biological databases Swiss-Prot (Bairoch, 2000) (accessed: January 2017), RefSeq (O’Leary et al., 2016) (accessed: January 2017), or TrEMBL (Bairoch, 2000) (accessed: February 2017), (e-value >1e-5). This process removed 5,430 genes, resulting in the final Evidential Gene T. adhaerens (Evg-final) transcriptome of 11,981 unique protein-coding genes.

2.1.3 Transcriptome Metrics and Gene Ontology

Transcript and predicted-protein lengths, and sequence completeness were evaluated from direct output data of Evidential Gene (Orsini et al., 2016). For transcripts per kilobase million (TPM)

8

expression data, each of the four cDNA libraries of sequence reads were aligned against the Evg- final transcriptome using RSEM v1.2.30 (B. Li & Dewey, 2011), and the mean of the four TPM values for each gene was taken. The T. adhaerens Evg-final transcriptome was aligned against Swiss-Prot (accessed: February 2017), RefSeq (accessed: February 2017), and (accessed in February 2017) protein databases using NCBI BLAST v2.3.0+, protein level. To assess homology between T. adhaerens and other species (and not with T. adhaerens sequences from the genome- predicted genes), separate sets of BLAST homology data were obtained by removing T. adhaerens sequences from the biological database mentioned respectively (filtered databases). Level 2 gene ontology categories were predicted using BLAST2GO v4.1 (Götz et al., 2008) for the Evg-final transcriptome and gene subsets parsed by enriched TPM expression (Top 100, 200, 500, 1000 TPM) respectively. Ontology category enrichment was evaluated using the BLAST2GO Enrichment Analysis (Fisher’s Exact Test) feature, to compare ontology categories of the selected top 1000-expressing genes in the Evg-final transcriptome (the top 1000 TPM subset mentioned above) above (two-tailed analysis, filter value: false discover rate (FDR) of 0.05). The top 30 enriched GO terms for each major ontology category were ranked and selected by p-value, presented in the visualized depiction using the BLAST2GO WordCloud Feature.

It is important to note that for subsequent annotation purposes, TPM expression was based on a second set of expression data where the four cDNA libraries of sequence reads were aligned against the extended transcriptome (see section 2.1.5) instead of the Evg-final transcriptome mentioned.

2.1.4 Comparison against the Published Genome To provide direct comparison of our transcriptome to the published genome, the Evg-final transcriptome was aligned against the genome-predicted gene-set (PrdGenSet) via BLAST (v2.3.0+) at the nucleotide level (filtering criteria: alignment length ≥100 bp, sequence identity ≥95%, e-value ≤1e-5). A reversed BLAST was also conducted by aligning the PrdGeneSet against the Evg-final transcriptome. Since the PrdGenSet was generated via removal of intron sequences in silico using gene prediction algorithms in the previous study (Srivastava et al., 2008), we wanted to further compare our Evg-final transcriptome against the genome scaffolds itself (from which the PrdGenSet was predicted from). To assess whether Evg-final transcriptome genes that did not find homology in the PrdGenSet were previously sequenced the genome scaffolds (Srivastava et

9

al., 2008), we aligned our Evg-final transcriptome against the genome scaffolds using GMAP v2016-05-01 (Wu & Watanabe, 2005).

2.1.5 Extended and Full T. adhaerens Predicted Proteome Prior to further annotation of the T. adhaerens transcriptome profile, we revisited the transcriptome-genome comparison data (see section 2.2.5) to focus on the 2,260 genome-predicted genes that our Evg-final transcriptome did not share local with. We wanted to investigate whether the 2,260 genome-predicted genes were really missed in our transcriptome assembly effort or were possibly assembled but discarded during the convergence process by Evidential Gene (Orsini et al., 2016) (see section 2.1.2).

The 2,260 genome-predicted genes from the PrdGeneSet was first processed through CD-Hit v4.6 (W. Li & Godzik, 2006) (nucleotide level, sequence identity threshold: 0.9) to produce a non- redundant gene-set of 1,570 sequences. This gene-set was then aligned against a gene-set predicted from our merged transcriptomes (the 27 transcriptomes prior to the Evidential Gene Pipeline. See section 2.1.2), using BLASTp v2.3.0+. 932 genome-predicted genes from the PrdGeneSet (BLAST query) found a match (e-value ≤1e-5), which converged into 837 unique transcript- predicted gene sequences from the merged transcriptomes (BLAST subject). These 837 transcripts were included back into the transcriptome to produce an extended transcriptome (12,818 protein- coding genes), leaving only 638 genome-predicted genes missed in our transcriptome production effort. 27 genome-predicted genes were manually included back into our analyses based on annotation data we previously conducted on the PrdGeneSet set. Combining the extended transcriptome with the 638 and 27 genome-predicted genes, the full T. adhaerens gene set (FullTrixGenes) now contains 13,483 protein-coding genes. Subsequent analyses are largely conducted on the full T. adhaerens transcriptome (FullTrixGenes) and its corresponding proteome (full predicted-proteome) predicted from FullTrixGenes output of Evidential Gene.

2.1.6 Secretome Production

A previous undergraduate researcher in the Senatore Lab, Julia Phuong Le, also made contribution the secretome production process.

Three sets of secreted proteins were predicted from the Evg-final transcriptome respectively using different programs: SignalP-TargetP-tmHMM Pipeline (STT pipeline) (Emanuelsson, Nielsen,

10

Brunak, & Von Heijne, 2000; Krogh, Larsson, Von Heijne, & Sonnhammer, 2001; Petersen, Brunak, Von Heijne, & Nielsen, 2011), Phobius (Käll, Krogh, & Sonnhammer, 2004), and SPOCTOPUS (Viklund, Bernsel, Skwark, & Elofsson, 2008). In the STT pipeline, proteins with a signal peptide identified by SignalP v4.1 (Petersen et al., 2011) (p-value ≥0.45), that are identified to be non-mitochondrial via TargetP v1.1b (Emanuelsson et al., 2000), and have no transmembrane (TM) helices or a single TM helix within the first 60 amino acids predicted by tmHMM v2.0c (Krogh et al., 2001) (score ≥ 10), were identified as secreted proteins. Similarly, proteins with one signal peptide and no TM helices were identified to be secretory using Phobius (Käll et al., 2004) and SPOCTOPUS (Viklund et al., 2008) respectively. The three independent secretomes were combined to a single set of secreted proteins termed full secretome. Sequences predicted to be secreted proteins by at least two of the three prediction pipelines/programs were termed refined secretome for later functional annotations.

2.1.7 Regulatory/Neuro-Peptide Prediction

Production of the python script modified from Nikitin’s neuropeptide prediction (Nikitin, 2015) was conducted by a previous undergraduate researcher in the Senatore Lab, Julia Phuong Le.

The full T. adhaerens secretome was submitted to the NeuroPred webservice (Southey, Amare, Zimmerman, Rodriguez-Zas, & Sweedler, 2006) for convertase cleavage site predictions using all four available models: Known Motif, Mollusk, Mammal, and Insect. The NeuroPred output was then processed with a python script modified from Nikitin’s neuropeptide prediction with the equation: 1 � − � + 1 ∗ (� − 1) ,,

th th (N = count of cleavage sites, � is the distance between sites i and (i + 1) (Nikitin, 2015), to assign a probability score to each sequence based on probability of having a cleavage site at a certain location, and the distances between each of them. All sequences were then ranked by probability scores, and manually annotated. Neuropeptides were identified based on the presence of multiple regularly spaced cleavage sites with repetitive amino acid sequences between cleavage sites.

11

2.1.8 GPCRome Production and Classification Identification

The production of the GPCRome was conducted by a previous undergraduate researcher in the Senatore Lab, Julia Phuong Le. GPCR class annotation was also conducted in collaboration with Julia Phuong Le.

The T. adhaerens full predicted-proteome was submitted to the GPCRHMM webservice (Wistrand, Käll, & Sonnhammer, 2006) for GPCR detection based on a hidden Markov model (HMM). Results were compared to protein domain predictions made using InterProScan v5.23- 62.0 (P. Jones et al., 2014) (using InterproSequences) with selected applications including Pfam, PRINTS, ProDom, PROSITEPATTERNS, PROSITEPROFILES, TIGRFAM, SMART, SUPERFAMILY, and PANTHER. 713 sequences were identified to be GPCRs by both GPCRHMM (Wistrand et al., 2006) and InterProScan (P. Jones et al., 2014). This data set was further processed through removal of sequences with partial coding sequences and few genome- predicted sequences with BLAST homology (protein level) with non-GPCR proteins on NCBI database, to produce a final refined T. adhaerens GPCRome of 665 sequences.

Phylogenetic analysis was conducted to observe clustering of sequences in the GPCRome for classification identification. All sequences from the refined T. adhaerens GPCRome were first aligned to each other using MUSCLE v3.8.3 (Edgar, 2004) and trimmed using Trimal v1.2rev59 (Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009) with a gap threshold (gt) of 0.50. The aligned and trimmed sequences were submitted to the IQTREE webservice v1.6.5 (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) for phylogenetic model selection (selected model of prediction: VT + F + G4) and maximum likelihood tree construction with 1000 bootstraps. InterProScan domain predictions were mapped onto the phylogenetic tree branches and clusters for identification of GPCR classes. For sequences with ambiguous results of GPCR classification from InterProScan (P. Jones et al., 2014), BLAST homology on NCBI Protein Database was used to infer GPCR class.

2.1.9 Protein Sequence Annotations Protein sequences for each protein of interest from various vertebrate and invertebrate species were collected and reciprocally aligned, via NCBI BLASTp or NCBI SMART BLAST, with the full predicted-proteome (e-value ≤ 1e-5). Selected T. adhaerens candidate genes for each protein homologue of interest were assessed with domain prediction using the InterPro webservice (Finn

12

et al., 2017) to confirm homology of domain and motif architecture between each candidate gene and gene of interest. For instances where specific identity of the T. adhaerens gene cannot be determined within a protein family, phylogenetic analysis was conducted using Molecular Evolutionary Genetics Analysis 7 (MEGA7) (Kumar, Stecher, & Tamura, 2016) to determine relatedness of candidate genes to various members of each family across species.

2.1.10 Cross-Phyla Domain Analysis

Protein domains investigated in the cross-phyla proteome domain counts were selected based on domains identified during annotations for presynaptic protein homologues. Full proteomes sequences or predicted-proteomes were obtained from NCBI RefSeq protein databases (accessed in February 2018) for species: Saccharomyces cerevisiae, Exaiptasia pallida, Acropora digitifera, Crassostrea gigas, Aplysia californica, Octopus bimaculoides, Helobdella robusta, Caenorhabditis elegans, Limulus polyphemus, Apis mellifera, Drosophila melanogaster, Strongylocentrotus purpuratus, Danio rerio, Mus musculus, Rattus norvegicus, and Homo sapiens (O’Leary et al., 2016). For the Poriferans, Amphimedon queenslandica and Oscarella carmela, transcriptome-associated protein models were obtained from online resources, provided by the Amphimedon queenslandica Transcriptome Resource (University of Queensland) (Fernandez- Valverde, Calcino, & Degnan, 2015) and Compangen (Bosch Laboratory, University of Kiel) obtained from respective online resources (Ereskovsky, Richter, Lavrov, Schippers, & Nichols, 2017), respectively. Choanoflagellates Monosiga brevicollis and Salpingoeca rosetta genome- associated protein models were obtained from JGI Genome Portal (Project ID: 16178) (King et al., 2008) and Ensemble Genomes (Ensemble Protists Accession: Proterospongia_sp_ATCC50818, Project accession: PRJNA37927) .

Paired-end reads for Hormiphora californiensis were obtained from JGI portal (Project accession: PRJNA281977, runs accession: SRR1992642). Two de novo transcriptomes were generated using Trinity (v2.4.0), with the default Trinity-Trimmomatic application, one with the default in silico read normalization application of Trinity (v2.4.0) and one without. The two transcriptomes were pooled and refined into a single transcriptome using Evidential Gene.

Nematostella vectensis single-end sequence reads obtained from NCBI Sequence Read Archive (SRA: SRP096950, Bio-Project SRR5183917; for accessions, see Appendix I) were quality trimmed using Trimmomatic-0.36 (Bolger et al., 2014). An unnormalized de novo transcriptome

13

was generated with Trinity v2.2.0 (Haas et al., 2013), and a normalized de novo transcriptome was generated with Trinity v2.4.0 along with its in silico read normalization application. A genome- guided (ab initio) assembly was generated using Cufflinks v2.2.1 (Trapnell et al., 2012), in which trimmed reads were normalized using the in silico read normalization application of Trinity v2.2.0. Those normalized reads were aligned to the N. vectensis genome (Putnam et al., 2007) using TopHat v2.1.1 (Kim, Daehwan; Pertea, Geo; Trapnell, Cole; Pimentel, Harold; Kelly, Ryan; Salzberg, 2013) and Bowtie v2.2.9 (Langmead & Salzberg, 2012) prior to Cufflinks assembly. A fourth transcriptome was obtained from the online resource Figshare (Fredman, D., Fredman, D., Michaela Schwaiger, ., Fabian Rentzsch, ., & Technau, 2013). All four transcriptomes were pooled and converged into a single transcriptome with the Evidential Gene Pipeline.

For both H. californiensis and N. vectensis, the transcriptome-predicted protein sequences produced by Evidential Gene were used for subsequent analyses.

The 22 datasets mentioned were each processed through sequence redundancy removal using CD- HIT (v4.6) at the protein level with sequence identity threshold 0.95.

On the other hand, Mnemioposis leidyi paired-end reads were obtained from NCBI Sequence Read Archive (SRA: SRP014828, Bio-Project: PRJNA64405; and SRA: SRP090909 Bio-Project: PRJNA344880; for accessions, see Appendix II). All reads were pooled into four sets of paired- end reads for normalization using FastUniq (Xu et al., 2012). Normalized reads were quality trimmed using Trimmimatic-0.36 (Bolger et al., 2014) and aligned to the M. leidyi genome (Moreland et al., 2014; Ryan et al., 2013) and across splice junctions using TopHat v2.1.1 (Kim, Daehwan; Pertea, Geo; Trapnell, Cole; Pimentel, Harold; Kelly, Ryan; Salzberg, 2013) and Bowtie v2.2.9 (Langmead & Salzberg, 2012), for subsequent ab initio assembly using Cufflinks v2.2.1 (Trapnell et al., 2012). For de novo assembly, normalized reads were trimmed and assembled with the inbuilt Trinity-Trimmomatic and Trinity v2.2.0 (Haas et al., 2013). The two transcriptomes were pooled and refined into a single transcriptome using Evidential Gene (Orsini et al., 2016) and was further processed through redundancy removal using CD-Hit v4.6 (W. Li & Godzik, 2006) with sequence identity threshold of 95% at the protein level. Predicted protein sequences with length shorter than 100 amino acids that did not find BLAST homology in biological databases Swiss-Prot (accessed: April 2017), RefSeq (accessed: January 2017), or TrEMBL (accessed:

14

February 2017), were removed (e-value > 1e-5), yielding a single predicted proteome of 25,538 sequences.

Domain prediction was conducted on all 23 proteomes and our full T. adhaerens predicted- proteome using InterproScan v.2.7-66 (P. Jones et al., 2014) with selected applications. PDZ, SRC Homolog 3 (SH3), and C2 domains were identified using Superfamily application (e-value ≤ 1e- 6), while FYVE zinc-finger, guanylate kinase (GK), Rho GTPase-activating protein (Rho-GAP), Unc13 homology, and calmodulin-dependent protein kinase (CaMK) domains were identified using Pfam (e-value ≤ 1e-6).

2.2 Results

2.2.1 Transcriptome Metrics

Both transcript (Figure 2A) and protein (Figure 2B) lengths across the transcriptome resemble a normal distribution except with a slight peak for some shorter proteins (length 50-200aa). At least 84.9% of the genes are complete coding sequences (Figure 2B).

Overall, the transcriptome found greater BLAST homology in RefSeq or TrEMBL protein databases (higher bitscore) than in the Swiss-Prot database (Figure 2C-D). Comparing the level of BLAST homology before and after removal of T. adhaerens sequences from the protein databases, greatly reduced the counts of high homology (high bitscore) BLAST hits for RefSeq and TrEMBL databases but not for Swiss-Prot database. This is consistent with fact that Swiss-Prot is more curated at the functional level, while RefSeq and TrEMBL are larger databases but also include data for many less-studied and early-diverging species (e.g. T. adhaerens genome-predicted genes). This also suggests that more effort is still required for T. adhaerens genetic profiling and validation of T. adhaerens as a model organism for scientific research. Moreover, the 1,400-3500 transcriptome genes with no significant homology (BLAST homology cut-off: bitscore ≥ 50) with genes of other animal species in the databases could represent novel genes unique to T. adhaerens.

2.2.2 Gene Ontology

Gene ontology is a comprehensive annotation of genes in the major categories of Cellular Component, Biological Process, and Molecular Function to describe the location and function of the gene (Blake et al., 2015). We observed particular enhanced representation for extracellular regions (Cellular Component), multi-organisms processes (Biological Process), and structural

15

Figure 2 – Transcriptome metrics: (A) Histogram of transcript count per transcript length. (B) Histogram of protein count per protein length. (C) Number of T. adhaerens transcriptome genes that finds BLASTp homology in databases Swiss-Prot, RefSeq, and TrEMBL, before (full) and after (filtered) removal of T. adhaerens genes from the databases, segregated by bitscore ranges. (D) Series of bitscores when aligning the T. adhaerens transcriptome against the full and filtered databases via BLASTp, revealing level of homology between the transcriptome and various databases.

16

molecular activity (Molecular Function) when comparing the highly expressed gene sets compared to the full transcriptome (Figure 3). Enrichment analysis further confirmed significant enrichment of genes of extracellular and vesicular compartments (Figure 4B), structural molecular activity and various signaling-related activities (Figure 4C), and biological processes related to protein targeting to the membrane and the endoplasmic reticulum (Figure 4A). The corresponding statistics of the top 30 ontology processes for each major category are detailed in Tables 1-3. Amongst the top 30 ontology processes for Biological Process includes processes related to multi- organism cellular processes and interspecies interaction between organisms.

2.2.3 Comparison against Genome

One pertinent question is whether or not our transcriptome identified any new genes missed in the genome effort. 9,498 unique transcripts from the Evg-final transcriptome converged to find BLAST homology in 9,260 putative transcript sequences from the genome-predicted gene set (PrdGenSet) (Figure 5A). This convergence could be due to presence chimerically fused gene sequences in the PrdGenSet, or fragmented genes in the transcriptome. Nevertheless, 2,483 transcriptome genes did not find a match in the PrdGenSet, suggesting these to be newly identified T. adhaerens genes, while 2,260 genome-predicted genes did not match with the transcriptome gene queries.

To expand on this analysis, the transcriptome sequences were mapped onto the genome with GMAP (Wu & Watanabe, 2005). 11,266 transcriptome genes found a matching sequence in the T. adhaerens genome scaffolds (sequence identity ≥ 90 bp, trimmed coverage ≥ 90%), while 715 genes did not (Figure 5B). Comparing the average TPM expression of the 715 genes that failed to map onto the genome with that of genes that did, revealed similar expression levels for both gene sets, suggesting that many of the unmapped genes are bona fide expressed transcripts whose genes are absent in the genome due to incomplete sequencing. Of the 11,266 transcriptome genes found in the genome scaffolds, 329 genes were found in the PrdGenSet but did not reach our GMAP cut- off criteria (Figure 5B). 87.8% of our transcriptome found a matching sequence in the genome scaffolds despite only 79.3% finding a match in the PrdGenSet (Figure 5B). Notably, 1,768 transcriptome genes found in the genome scaffolds were not in the PrdGeneSet and could be novel genes that were missed during gene prediction from the genomic scaffolds (Figure 5C).

17

Figure 3 – Gene Ontology Categories (Biological Process, Cellular Component, and Molecular Function) for the Evg-final transcriptome-predicted proteome (Black, bottom x-axis) and gene subsets of top TPM expression (Blue, sequence counts for each gene subset are shown in the stacked blue bars, top x-axis). *TF = Transcription factor.

18

Figure 4 – Gene Ontology Categories Enriched in the Highly-Expressed T. adhaerens Genes – WordCloud comprising the top 30 ontology categories for (A) Biological Process, (B) Cellular Component, and (C) Molecular Function, significantly enriched in the top 1000 expressed gene subset of the T. adhaerens Evg-final transcriptome, level of enrichment depicted by relative font size within each major ontology category (Biological Process, Cellular Component, and Molecular Function).

19

Figure 5 – Comparison between the T. adhaerens Transcriptome and the Published Genome – (A) Venn diagram showing BLAST hit counts of reciprocal BLAST alignments (nucleotide level) between the T. adhaerens transcriptome and the PrdGenSet of the published genome. BLAST-hit cut-off criteria: AlgnLngth ≥ 100 bp, PrcntID ≥ 95%, e-value ≤ 1e-5. (B) Venn diagram showing overlap of gene-sets in transcriptome versus genome scaffolds using GMAP (Wu & Watanabe, 2005), and transcriptome versus PrdGeneSet (Forward BLAST). (C) Pie charts showing sources of all T. adhaerens genes known to date. *Orange: Transcriptome genes that did not find a hit in PrdGeneSet, novel T. adhaerens genes. Purple: Genome-predicted genes not identified as a matching sequence of transcriptome genes, i.e. missed genes of transcriptome. Pink: Genes found in transcriptome and PrdGenSet.

20

2.2.4 The Secretome Using three different program pipelines, we generated three independently-predicted sets of secreted proteins. Combining all three sets of secretomes (Figure 6A) resulted in the full secretome of 1,062 proteins (average TPM of 78.00, SD: 1243). The refined secretome (subset of proteins predicted to be a secreted protein by at least two of the three programs/pipeline) consists of 593 proteins (average TPM: 175.34, SD: 668). A one-way ANOVA test was conducted to compare the mean TPM values full and refined secretomes showed that the difference between the TPM is not statistically significant (p = 0.058). The refined secretome was aligned with the Swiss-Prot protein database (accessed April 2017) using BLAST (e-value ≤1e-5) and ranked by TPM. The most abundant secreted proteins are peptidases and phospholipases involved digestive and immune functions (Tables 4-5). Strikingly, there is extreme TPM expression for enzymes trypsin-1 (TPM: 7259.67), phospholipases A2 (TPM: 4856.47) and A1 2 (TPM: 4574.36), and lysozyme g-like protein 2 (average TPM: 1483.16). Many secreted proteins involved in development (Table 6), cell adhesion (Table 7), and basic housekeeping functions (Table 8) were also identified amongst the most highly expressed transcripts in the of the refined secretome.

2.2.5 Regulatory/Neuro- Peptides Using the program NeuroPred, we identified 20 neuropeptides among the secretome genes, of which 9 were previously identified from genome-predicted proteins by Nikitin (Nikitin, 2015), while 11 are novel neuropeptides (Figure 6B). We also found the granulin precursor peptide that was previously reported by Nikitin (Nikitin, 2015). Of the 11 novel neuropeptides, two resemble amide precursors (WWamide and DAYQamide), with a glycine residue present immediately upstream of the predicted lysine cut-site (Figure 6C). A granulin prohormone-like peptide was also found with high mRNA expression abundance (TPM = 965.03, SD: 118.29) (Figure 6C). We also validated the mRNA expression of previously identified neuropeptide QDYPFFGN peptide (TPM = 129.10, SD: 15.33, Accession: A0A288VIN8) (Figure 6C) that resembles the vertebrate pain signaling peptide, YPFFamide (endomorphin-2). During our annotation of putative regulatory- /neuro- peptides, we also observed abundance of many genes with strings of long arginine repeats (up to 12 consecutive arginine residues, see IL precursor in Figure 6C). Few arginine-rich sequences with repeating patterns of repeating sequence between arginine-rich regions were identified as putative novel regulatory-/neuro- peptides sequences (Figure 6C, Arginine-rich regulatory-/neuro- peptides).

21

Figure 6 – Secretome and Regulatory-/Neuro- Peptides – (A) Venn diagram showing secretory proteins counts predicted by SignalP-tmHMM-TargetP Pipeline, Phobius, and Spoctopus. (B) RSEM (B. Li & Dewey, 2011) TPM expression of neuropeptides (NP) identified in T. adhaerens using Neuropred. *Nikitin: NP previously predicted by Nikitin (Nikitin, 2015) *cann: cannonical. (C) Protein sequence annotation of selected novel T. adhaerens NPs, and the granulin-like precursor.

22

2.2.6 Neurotransmitter Biosynthesis and Degradation Pathways

Homologues of proteins involved in biosynthesis and degradation pathways of many neurotransmitters were identified in T. adhaerens, though only glutamate and gamma- aminobutyric acid (GABA) pathways seem have the full set of canonical biosynthesis-degradation component (Table 9). Glutaminase is required for the synthesis of glutamate from the amino acid glutamine while glutamine synthetase uses both glutamate and ammonia for the production of glutamine (Hamed et al., 2018); both of these enzymes have been identified in T. adhaerens (Table 9). Similarly, glutamic acid decarboxylase (GAD) required for GABA synthesis, and both GABA alpha-oxoglutarate transaminase and succinic semialdehyde dehydrogenase, which are required for GABA degradation, are also expressed in the transcriptome of T. adhaerens. In addition, homologues of enzymes involved in synthesis of acetylcholine and nitric oxide, choline acetyltransferase (ChAT-like) and nitric oxide synthase (NOS) were also found at the transcript level.

Consistent with the genome assembly created by Srivavstava et al., in 2008, (Srivastava et al., 2008), both DOPA (dihydroxyphenylalamine) decarboxylase (aromatic L-amino acid decarboxylase, AADC) and DBH-like monooxygenase (dopamine beta-hydroxylase, DBH) involved in catecholamine synthesis are also present in the transcriptome of T. adhaerens (Table 9). DOPA decarboxylase is an enzyme common to the biosynthetic pathways of both catecholamines (dopamine, norepinephrine and epinephrine) and the indolamine (serotonin), where it converts DOPA to dopamine in the catecholamine pathway and 5-hydroxytryptophan to 5-hydroxytryptopamine (serotonin) in the other. The upstream enzymes that produce DOPA from tyrosine (tyrosine hydroxylase) and 5-hydroxytryptophan from tryptophan (tryptophan-5- monooxygenase) however, were not found in the full predicted-proteome (Table 9). Eight homologues of DBH-like monooxygenase, required for production of norepinephrine from dopamine, were identified, but phenylethanolamine N-methyltransferase (PNMT) which catalyzes synthesis of epinephrine from norepinephrine is not present (Table 9).

Mixed results were observed in the search for the degradative enzyme for metabolism of catecholamines, catechol o-methyltransferase (COMT). Three candidate proteins found homology with COMT domain-containing proteins, which were identified as members of the O- methyltransferase protein family but not specifically as COMT. One homologue of the flavin

23

monoamine oxidase (MAO) protein, protein that catalyzes degradation of monoamine neurotransmitters (including dopamine, norepinephrine, serotonin, and histamine), was identified (Table 9). The synthetic enzyme for serotonin, tryptophan-5-monooxygenase was absent from the transcriptome, yet both aldehyde dehydrogenase and aldehyde reductase, also enzymes required for serotonin degradation, were found (Table 9). However, this is not surprising because aldehyde dehydrogenase and aldehyde reductase have other diverse roles in detoxification and a broad range of substrate specificity (C.-H. Chen, Ferreira, Gross, & Mochly-Rosen, 2014; Takahashi et al., 2012). Strikingly, one particular homologue, the aldehyde dehydrogenase 2-like protein had extremely high TPM expression (TPM: 1,788.62, SD: 152.13), while the second most highly expressed neurotransmitter biosynthetic enzyme homologue was one of the DBH-like monooxygenase homologues with TPM of 214.18, SD: 140.42.

2.2.7 Receptors and the GPCRome

Our BLAST analysis of the T. adhaerens predicted-proteome confirmed transcript expression of an adrenergic receptor homologue (ADR), an opioid receptor, and two frizzled receptors (Table 10). Many extracellular calcium-sensing receptor (CASR), metabotropic glutamate receptors

(mGluR), and metabotropic GABA receptors (GABABR) (not included in Table 10). However, local sequence homology alone is insufficient for confirmation of GPCR identity since transmembrane (TM) domains are highly conserved, while classical GPCR classification systems rely heavily on GPCR ligand-types and downstream signaling events (Bockaert, J., & Pin, 1999). Extracellular regions that interact with agonists are highly variable, and certain GPCRs can often respond to multiple ligand types, which further raises difficulty for GPCR identification via sequence homology (Bockaert, J., & Pin, 1999).

An alternative approach is the use of hidden Markov models (HMM) and phylogenetics (Fredriksson, R., Lagerstrom, M. C., Lundin, LG., Schioth, 2003) where patterns in sequence homology and topology predictions beyond local sequence alignments could be detected by the algorithm (Wistrand et al., 2006). Combining HMM predictions and protein domain analyses, we identified a total of 665 GPCRs in the full T. adhaerens predicted-proteome: 533 rhodopsin-like class GPCRs (aka Class A), 64 glutamate-like class GPCRs (aka Class C), 63 adhesion/secretin (aka Class B) class GPCRs, 2 frizzled class GPCRs (the two frizzled receptors mentioned above), 2 cyclic-AMP-like GPCRs, and 1 ocular albinisim-like receptor (Figure 7). For the 655 sequences

24

Figure 7 – The T. adhaerens GPCRome – Maximum-likelihood phylogenetic tree, constructed with VT model (+G, +F), depicting clustering of the T. adhaerens GPCRome, colour-coded by GPCR class. Bootstrap support values are indicated on branch nodes.

25

from the GPCRome which we have TPM expression data for (i.e. not from the PrdGeneSet), TPM expression of GPCRs range from 0.065 to 125.90, with an overall average TPM of 4.62, SD: 9.42.

2.2.8 Channel Protein Homologues

Consistent with the T. adhaerens genome analysis, voltage-gated sodium channels (Nav)

(additional partial Nav sequences were identified, but not included in Table 11), calcium-activated potassium channel BK, and inward-rectifying potassium channels were identified in the full predicted-proteome (Table 11). Many members of the voltage-gated potassium channels (Kv) family homologues were also expressed: shaker, shaw, shab, and Kv10-12 (KCNH), but the previously reported Kv9 channels appear to be absent (Table 11).

It has also been previously reported that T. adhaerens is the earliest-diverging animal to encompass all three types of voltage-gated calcium channels (Cav) (Smith et al., 2017; Srivastava et al., 2008):

Cav1 (length: 1,822 aa), aka L-type calcium channels, involved in post-synaptic nuclear signaling in neurons and muscles, and contraction in muscle; Cav2 (length: 2092 aa), aka N-, P-/Q- and R-type calcium channels, responsible for the rapid conversion of electrical action potentials into exocytotic release chemical transmitters at the nerve terminal; and Cav3 (length: 2035 aa), aka T-type calcium channels, involved in regulating cellular excitability. Interestingly here we also report the identification of a fourth voltage-gated calcium channel sequence that only bear two transmembrane domains (length: 1,566 aa), as opposed to the canonical four-domains Cav, Nav, and NALCN pore-loop (P-loop) channels. This sequence is also phylogenetically distinct from the two-pore calcium and two-pore potassium channels, suggesting independent evolution of the 2- domain structure in our preliminary phylogenetic analysis.

ATP-sensitive purinoreceptor were also expressed in T. adhaerens. Adenosine triphosphate (ATP) are often cotransmitters with many neurotransmitters in the nervous system, and purinoreceptors have been suggested to be involved in neuromodulation in the nervous system. Numerous ionotropic P-loop glutamate receptors (iGluR) are also expressed in T. adhaerens, but not ionotropic cys-loop receptors: GABA, acetylcholine, and glycine receptors (Table 10). There were mixed results for GPCRs that shared sequence homology for serotonin (5-hydroxytryptamine) receptors, histamine receptors, and trace amine-associated receptors (TAAR), but no specific identity for those sequences could be concluded. Those T. adhaerens sequences could represent primitive forms of the related receptors, prior to rise of those individual receptor types in animal

26

evolution. Five more channel sequences found homology in ionotropic glutamate receptors but have protein domain predictions resembling both ionotropic (channel) and metabotropic (GPCR) properties. 10 acid-sensing ion channels (ASIC), one related amiloride-sensitive sodium channel (aka epithelial sodium channel, eNaC), and a single mechanosensitive piezo channel was also found in the full predicted-proteome

T. adhaerens also expresses many transient receptor potential (TRP) cation channels, 3 inositol

1,4,5-triphosphate receptors (IP3R), a single ryanodine receptor (RyR), and an ORAI channel

(Table 11). IP3Rs and RyR are important calcium channels that function downstream of GPCR and Ca2+ signaling, while ORAI channels are important for regulators of cytosolic Ca2+ concentration and Ca2+ stores. Homologues for sodium leak ion channel (NALCN), hyperpolarization-activated cyclic-nucleotide gated (HCN) channel, Cl-/H+-exchanger transporter (CLC) were also identified in the full predicted proteome. NALCN and HCN are implicated in neuromodulation and regulation of neuronal excitability (Qiao et al., 2013; Shi et al., 2016), while CLC is implicated in membrane potential regulation in skeletal muscles, kidney and epithelial etc. (Poroca, Pelis, & Chappe, 2017).

2.2.9 Pre-/Post- Synaptic Scaffolding and Signaling Protein Homologues Cross-Phyla Domain Analysis T. adhaerens was found to express all members of the SNARE (soluble N-ethylmaleimide- sensitive fusion protein attachment protein receptors) complex (synaptobrevin, SNAP25, and syntaxin-1), and membrane-bound SNARE-associated proteins (Table 12, Figure 8A-B) such as synaptophysin and synaptotagmin at the transcript level and this is consistent with findings from the Srivastava study of the genome (Srivastava et al., 2008). SNARE complexes are important for vesicular fusion and release, though not necessarily specific to the nervous system in animals, they are very important for synaptic neurotransmission. Here we expand our analyses to core presynaptic scaffolding proteins and have identified all members the core complex (Südhof, 2012), including rab3-interacting molecule (RIM), mammalian homologue of the uncoordinated protein- 13 (Munc-13), α-liprin; and ELK proteins (aka ERC protein 2), all with highly-conserved domain organization compared to homologues in animals with canonically-defined nervous systems (Figure 8A), except for one RIM homologue: Tad-RIM-II.

27

Figure 8 – Presynaptic Protein Homologues – (A) Schematic of proposed network for presynaptic protein interactions for vesicular tethering and fusion in the presynaptic active zone. Protein domain compositions depicted represent domain compositions identified in T. adhaerens homologues. (B) RSEM (B. Li & Dewey, 2011) TPM expression of selected presynaptic and house-keeping genes in T. adhaerens.

28

RIM is a particularly interesting presynaptic scaffolding protein that is implicated in presynaptic vesicle tethering of voltage-gated Cav2 calcium channels near synaptic vesicles. Notably, two RIM genes were found in T. adhaerens, a shorter RIM that contains the canonical domain architecture of a Zn-finger, a PDZ domain, and two C2 domains, and a longer RIM lacking the PDZ domain (Figure 8A). We will continue to discuss the phylogenetic relations of RIM and its implications in T. adhaerens’ phylogenetic assignment in Chapter 3 of this thesis.

The postsynaptic density (PSD) on the other hand, is an electron-dense region in the post-synapses of neurons where over a thousand proteins are highly organized and dynamically regulated to ensure proper synaptic functions, including receptors, scaffolding proteins, ion channels and structural proteins etc. (Bayés et al., 2011; X. Chen et al., 2008; Elias & Nicoll, 2007). Notably, despite the absence of the canonical synaptic structures, T. adhaerens expresses homologues of many proteins involved in PSD organization and function (Table 13).

We were able to identify many members of the membrane associated guanylate kinases (MAGUK) protein family, including disc-large homologues (DLG1/SAP97, and DLG5), MAGI1, membrane palmitoylated proteins (MPP2, MPP5, and MPP7), tight junction proteins (TJP), calcium voltage- gated channel auxiliary subunit beta (CACNB), and calcium/calmodulin dependent serine kinase (CASK) (Table 13). Interestingly, the master scaffolding protein SH3 and multiple ankyrin repeat domains protein (Shank) appears to be absent from the full predicted proteome. However, one of the master regulators of GABAergic synapses in animal nervous systems, gephyrin, was found to be expressed in the transcriptome. Essential scaffolding proteins required for synapse formation such as neurexin and neuroligins were also present in the transcriptome. Basic cytoskeletal structural proteins such as actin, actinin, and tubulin were also expressed in extreme high abundance (actin, TPM: 7323.16, SD: 322.23; tubulin-alpha, TPM: 1984.01, SD: 404.47; tubulin- beta, TPM: 2587-81, SD: 328.34) (Table 13).

T. adhaerens also expresses homologues of many proteins important for downstream signaling processes such as protein kinase A (PKA), protein kinase C (PKC), calmodulin, and Ca2+/calmodulin-dependent protein kinase II (CaMKII), for example (Table 13). Notably, the TPM expression for calmodulin was very high (TPM: 2091.63, SD: 197.24), which is approximately 15-fold greater than average for other pre- and post-synaptic protein homologues identified in transcriptome. Members of the mTOR signaling pathway, (e.g. mTOR, raptor, and rictor) and

29

MAPK/ERK signaling pathway (e.g. extracellular signal-regulated kinase (ERK), mitogen- activated protein kinases (MAPK), ras, raf, and cAMP response element binding protein (CREB) were also found to be expressed in T. adhaerens (Table 13).

2.2.10 Cross-Phyla Protein-Protein Interaction-Related Domain Counts Analysis We further investigated the abundance of protein domains that are known to be important for either for synaptic protein-protein interactions or protein catalytic functions across different phyla, revealing a remarkable increase in the domains observed in the proteomes of mammalian species, particularly the PDZ domain, Src homologue 3 (SH3) domain, guanylate kinase domain, and Rho- GAP domain (Figure 9). A large expansion of PDZ domain counts in the pre-metazoan Monosiga brevicollis, mollusc Pacific Oyster C. gigas, and the horseshoe crab L. polyphemus was also observed (Figure 9).

2.3 Discussions

2.3.1 A High-Quality T. adhaerens Gene Set

We have generated a high-quality T. adhaerens mRNA transcriptome bearing 11,981 unique protein-coding genes. 72-87% of the transcriptome found homology different protein databases (bitscore ≥50), suggesting high level of conservation of gene content between T. adhaerens and other animal species (Figure 2).

Gene Ontology (GO) annotation of the assembled transcripts in the category of Cellular Component revealed significant enrichment of genes of ‘extracellular region’, ‘cell junction’, and ‘membrane part’ in the top 1000-expressed gene set, reflecting a reliance of T. adhaerens on genes that process or carry extracellular information (Figure 4B). Despite the absence of a canonical nervous system, T. adhaerens also has an enrichment for genes involved in multi-organism and interspecies communication, which could represent the underlying mechanism of potential T. adhaerens communicatory signaling processes (Figure 4A). Nevertheless, T. adhaerens appears to have particularly high-level expression of genes important for multicellular function, including various signaling receptor and protein binding activities (Figure 4C).

Comparative analyses between the transcriptome and the genome-predicted gene set (PrdGenSet) revealed introduction of 2,483 novel T. adhaerens genes by our transcriptome. Furthermore, 94.0%

30

Figure 9 – Cross-Phyla Domain Analysis – Bar graph showing number of respective protein domains in proteomes/predicted-proteomes of various species. Proteome sizes (number of protein sequences) written on top of species names.

31

of our transcriptome found a matching sequence in the genome scaffolds despite only 79.3% found a match in the PrdGeneSet. 1,768 of the ‘novel genes’ were found in the genome scaffolds and not in the PrdGeneSet, which could be genes missed during prediction due to gaps in the genome. Investigation of 2,260 genes in the PrdGeneSet missed in our transcriptome production efforts also lead to the recovery of 837 previously discarded transcript sequences, and thus our extended T. adhaerens transcriptome with 12,818 protein coding sequences (see section 2.1.7). These genes are not only validated at least at the mRNA level, but also have associated relative TPM expression data for us to evaluate relative abundance of genes at the whole animal level. However, 665 genes predicted from the genome were indeed absent in our transcriptome, which could be genes that are differentially expressed only at specific states or condition of the animal or could simply be missed during the sequencing effort. Nevertheless, to examine whether T. adhaerens bears the full set of genetic components required for nervous system functions despite not having true neurons and synapses, we seek to have the most complete gene possible. Combining the 665 missed genes with our extended transcriptome, we report the most complete set of T. adhaerens genes known to date (FullTrixGenes).

2.3.2 The Enriched Digestive Profile of the T. adhaerens Secretome May Provide Insight into the Origin of the Enteric Nervous System (ENS)

Secretory proteins are proteins secreted out of the cell, some of which function in cell-cell signaling through binding of target receptors (e.g. ion channels, GPCRs), while others exert direct biochemical functions such as extracellular enzymes for digestion, and extracellular matrix proteins. Profiling the T. adhaerens secretome could provide insight into the types of signaling pathways and metabolic processes the animal uses.

Our large-scale prediction of secreted proteins identified a total of 1,062 secreted proteins with 593 of them being in the refined dataset (see section 2.2.3). Annotation of the refined secretome revealed extreme high abundance of digestive enzymes followed by immune-related enzymes and signaling molecules. The high abundance of digestive function-related genes is not surprising to see since T. adhaerens relies greatly on secreted enzymes for external digestion of algae, though the question of whether T. adhaerens exhibit a fully functional immune system is still unclear.

One interesting question is how the release of these secreted proteins is regulated? Release of granular content from digestive-like lipophil cells in T. adhaerens has been reported to be spatially-

32

regulated, and possibly rely on chemosensory mechanism, as secretion occurs specifically in lipophil cells in proximity to algae (Smith et al., 2015).

Furthermore, as the animal pauses during feeding, after algal lysis, it undergoes a churning motion as if to enhance mixing of the material during food uptake (Smith et al., 2015). T. adhaerens do not have muscles, so the exact mechanism underlying this contractile motion is unknown. A good candidate for causing contractility are muscle-like fiber cells, which are mesenchyme-like cells (~4.4% of the cell population) that each bear six or more tapering processes containing microfilaments and microtubule extensions (Smith et al., 2014; Thiemann & Ruthmann, 1989). Although myofibril processes have not be observed in microscopy studies of T. adhaerens (Smith et al., 2014), isolated fiber cells have been seen to exhibit twitch-like contractions and rapid withdrawal of the processes in vitro (Thiemann & Ruthmann, 1989). Nevertheless, the refined regulation of granular release and churning motion resembles that of the enteric nervous system (ENS) in other animals, component of the autonomic nervous system that controls the digestive tracts.

Insights from research on the ENS in many vertebrate and invertebrate species, dating back to the Cnidarian Hydra vulgaris, suggest ENS to be the first ‘brain’, having close communication with the CNS while maintaining its separate neural network in nervous system evolution (Furness & Stebbing, 2018). Perhaps the early rise of the nervous system is to satisfy the demand for more efficient feeding in animals. While the secretome profile we have characterized and previous behavioral research of (Senatore, Reese, & Smith, 2017; Smith et al., 2015) appear to suggest T. adhaerens to behave like a crawling stomach, the unique position of T. adhaerens in the animal phylogeny (after Poriferans, and prior to Cnidarians) introduces a unique opportunity to study early elements of autonomic control the digestive system, and nervous system evolution.

2.3.3 Candidate Peptidergic Signaling Peptides and Receptors in the T. adhaerens

One fundamental property of the nervous system is the propagation of information from cell to cell, through electrical or chemical mechanisms. While the presence these forms of cellular communication predate the existence of neurons and synapses, it has been suggested that the first nervous systems rely heavily on peptidergic signaling (Moroz, 2009).

33

Of secretory proteins, regulatory-/neuro- peptides are small peptides that commonly serve as neural signals or hormones and are abundant across the Metazoa (Grimmelikhuijzen & Hauser, 2012; Nikitin, 2015; Veenstra, 2011). Previous bioinformatic prediction of regulatory peptides of T. adhaerens by Nikitin in 2015 identified nine short regulatory peptide precursors, and peptide precursors of insulin and the growth hormone granulin, along with putative homologues for many prohormone processing enzymes such as carboxypeptidase and convertases (Nikitin, 2015). We not only found all the known peptides reported by Nikitin (Nikitin, 2015) in the transcriptome, but also identified 11 additional novel putative neuropeptides. One, bearing a QDYPFFGN mature peptide sequence that shares sequence homology with the vertebrate YPFFamide (endomorphin, thus named T. adhaerens endormorphin-like peptide, TadELP), was recently shown by our collaborators at the National Institutes of Health (NIH) to localize to T. adhaerens’s neuron-like gland cells, in a granular pattern consistent with vesicular packaging (Senatore et al., 2017), which validates our bioinformatic prediction of this novel regulatory-/neuro-peptide. This finding appears to validate the expression of this novel regulatory-/neuro-peptide identified in the transcriptome. Strikingly, direct application of synthesized QDYPFFGN peptide elicited pausing behavior at doses between 2 and 5 µM (Senatore et al., 2017). Synthesis of the peptide with the C- terminal glycine removed (QDYPFFamide, and RQYPFFamide), mimicking the sequence of the vertebrate endomorphin YPFFamide, further reduced the concentration-requirement to 200 nM (Senatore et al., 2017). Surprisingly, the vertebrate endomorphin involved in nocioception was discovered via biochemical isolation, and no corresponding DNA sequence for any preproendorphin gene has been identified in the human genome (Wolfe et al., 2007), so whether its pre-pro-peptide form undergoes additional post-translational processing other than amidation is still unknown. Therefore, it is questionable whether the glycine residue downstream of the glutamate in TadELP could be removed endogenously to allow for amidation during post- translational processing.

Application of TadELP synchronized the cessation of ciliary beating amongst the ventral epithelial cells that drive locomotion, causing a pausing behavior similar to what is observed during feeding. Interestingly, synchronized pausing behavior between individual T. adhaerens, sequentially in relation to distance from the initial pausing animal, can be observed in the absence of the applied peptide (Senatore et al., 2017), supporting the notion that pausing behavior is regulated by secreted diffusible signals that affect not only the secreting animal but also those close by. This was the

34

first evidence put forward that T. adhaerens utilizes regulatory-/neuro-ligands for electrochemical cell-cell signaling, and thus represents an important foundation for functional experiments on the molecular elements that underlie T. adhaerens locomotive feeding behavior.

Lacking canonical neurons, it would not be surprising that T. adhaerens rely on ligand-based, or paracrine, signaling systems to coordinate cellular activities. Yet, what would be the recipients of those signals? Our large-scale GPCR prediction identified at least 665 GPCRs, important for cell signaling in the nervous system and across the animal body. Although there are over 800 GPCRs in human (Jassal et al., 2010), this number is already more than twice the amount of the 282 GPCRs predicted in the sponge species A. queenslandica (Krishnan et al., 2014), which could suggest for an expansion in GPCRs in T. adhaerens or greater reliance on GPCRs signaling compared to sponges.

Notably, vertebrate endomorphin has been suggested to bind Mu- (µ) opioid receptors (Marrone et al., 2016; Zadina, Hackler, Ge, & Kastin, 1997), and possibly, the TadELP peptide may also bind a homologous GPCR. We were able to identify one opioid receptor in the transcriptome, but it seems to find greater sequence homology with Kappa- (k) opioid receptors as opposed to the µ- subtype. Pharmacological experiments would be required to examine the potential ligand-receptor relationship.

Acid-sensing ion channels (ASIC) and epithelial sodium channels (ENaC) which are ion channels with diverse gating mechanisms, have also been suggested to have peptide-gated properties (Assmann, Kuhn, Dürrnagel, Holstein, & Gründer, 2014; Golubovic et al., 2007). These channels seem to be unique to metazoans since they are not found in unicellular eukaryotes nor bacteria, and have a broad range of functions (Golubovic et al., 2007). From the full predicted-proteome, we identified one ENaC and 10 ASICs. 9 of the 10 T. adhaerens ASIC sequences have also been successfully cloned in our laboratory for ongoing electrophysiological experiments in vitro (unpublished data), which further validates expression of the sequences, but their role in T. adhaerens biology is yet to be determined (further discussed in section 2.3.5).

2.3.4 Potential ‘Neurotransmitter’ Signaling in the ‘Aneural’ T. adhaerens

In the draft genome of T. adhaerens, Srivastava et al. identified many genes involved in neuronal excitability, synaptic transmission, and neurotransmitter biosynthesis (Srivastava et al., 2008). We

35

have expanded this analysis in order to investigate in greater detail which neurotransmitters are the best candidates for signaling in the animal.

In the full predicted-proteome, glutamate and GABA were the only transmitters for which we identified complete sets of both biosynthetic and degradative biochemical pathways (Table 9).

Numerous metabotropic GABA receptors (GABAB) and metabotropic glutamate receptors (mGluRs) were found, along with multiple ionotropic glutamate receptors (AMPA- and Kainate- like) but not ionotropic GABA receptors (GABAA). This is unsurprising because metabotropic GABA and glutamate receptors are part of the glutamate-like class (Class C) GPCRs, which is one of the most ancient GPCR classes (Krishnan, Almén, Fredriksson, & Schiöth, 2012). Moreover, the use of GABA and glutamate, and their respective metabotropic receptors, have already been observed in sponges for regulation of their contractile motility as well as feeding behavior (Elliott & Leys, 2010; Ellwanger, Eich, & Nickel, 2007; Ramoino et al., 2011).

We were able to identify all the main GPCR classes in the GRAFS (Glutamate-Rhodopsin- Adhesion-Frizzled/Taste2-Secretin classes) classification scheme (Figure 7), though GPCRs identified in the Adhesion (aka Class B1) and Secretin (aka Class B2) classes showed mixed results since secretin class GPCRs did not evolve from Adhesion GPCRs until later in animal evolution (Nordstrom, Lagerstrom, Waller, Fredriksson, & Schioth, 2008)

In T. adhaerens, GPCR class rhodopsin is also the most abundant, consisting 533 out of the total 665 receptor sequences (Figure 7). We were able to identify adrenergic and opioid receptors through manual annotation via BLAST but results for other rhodopsin-class receptors have mixed results in sequence homology with multiple receptor homologues such as serotonin, dopamine, acetylcholine, octopamine receptors, etc. Moreover, only certain components of the neurotransmitter signaling pathway for monoamine neurotransmitters were identified in T. adhaerens (Table 9). It is likely that these enzymes originally serve for other biological processes, and where later recruited to neurotransmitter biosynthetic mechanisms throughout nervous system evolution.

We have also identified two homologues of nitric oxide synthase (NOS) in T. adhaerens, both of which seem to find greater sequence homology with the neuronal-isoform NOS (nNOS) via BLAST alignment at the protein level. Nitric oxide (NO) is a gaseous and freely diffusing signaling

36

molecule that is implicated in the activation and modulation of many signaling pathways in the neuron (Cossenza et al., 2014; Mukhtarov, Urazaev, Nikolsky, & Vyskočil, 2000), including those involved in plasticity (Cossenza et al., 2014; Dejanovic & Schwarz, 2014; Hardingham, Dachtler, & Fox, 2013), and neuromuscular junction functions (Zhu et al., 2006) etc. NO signaling is also suggested to be involved in spine and synapse morphology as nNOS is recruited to the synapse during the process (Poglia, Muller, & Nikonenko, 2011), where disruption of the enzyme reduced dendritic spine and synapse density in rat hippocampal tissue culture (Nikonenko et al., 2008). The physiological role of NOS dates back to the pre-existence of the nervous system, where NOS has been found to be localized in the tissues suggested to have contractile behavior in the sponge E. muelleri (Elliott & Leys, 2010) and nitric oxide (NO) application has also been observed to induce contraction in another sponge species T. wilhelma (Ellwanger & Nickel, 2006). In mammals, NOS can also be found in vascular smooth muscles (Buchwalow et al., 2002), and have been implicated in vasodilation (Lindsey, Carver, Prossnitz, & Chappell, 2011). However, considering that T. adhaerens does not have canonically-defined muscles and synapses, the role of nNOS in the animal remains unclear.

2.3.5 T. adhaerens Expresses a Rich Repertoire of Channel Homologues One pertinent question is whether T. adhaerens utilizes electrical signaling in any form of cell-cell communication. There has been no report on endogenous electrical activity in T. adhaerens. However, multiple voltage-gated ion channels were found to be expressed in the transcriptome, including voltage-gated sodium (Nav) and potassium channels (Kv), inward rectifying potassium channels (Kir), and a potassium leak channel (two pore potassium channel, K2P), which are all important for membrane potential regulation and neuronal firing in animals with nervous systems.

T. adhaerens is the earliest metazoan to bear homologues of voltage-gated calcium channels from each class (TCav1, TCav2, and TCav3) (Smith et al., 2017; Srivastava et al., 2008). Not only do they share conserved domain architectures with those in animals with advanced nervous systems, the T. adhaerens T-type channel homologue (TCav3) also display hallmark biophysical properties of other Cav3 channels, where TCav3 is activated at low voltages and exhibit transient activation and inactivation when expressed in vitro (Smith et al., 2017). The N- or P/Q-type channel homologue (TCav2) (C. S. Smith; unpublished data), and TCav3 were also found to be co- immunolocalized with the T. adhaerens presynaptic protein complexin homologue in T. adhaerens gland cells (Smith et al., 2017).

37

Voltage-gated calcium channels, part of the voltage-gated ion channel superfamily, have channel a1 subunits that consists of four transmembrane (TM) domains (I, II, II, and IV), linked together via linker sequences, where each of the four TM domains is composed of 6 transmembrane spanning helices (S1-S6). Surprisingly, we identified a fourth channel sequence homologous to voltage-gated calcium channels. The novel channel appears to have a complete coding sequence with the corresponding protein length of 1566 aa but bears only two ion channel domains as opposed to four. Our preliminary phylogenetic analyses also suggest this channel to be phylogenetically distinct form the two-pore calcium channel (TPC) homologues, channels with structures of two-domain subunits that dimerize for channel function (Churamani, Hooper, Brailoiu, & Patel, 2012; Rietdorf et al., 2011), identified in T. adhaerens. TPCs are voltage- and ligand-gated ion channels mainly localized to intracellular membranes and organelles, and are implicated in regulation of cytosolic and luminal calcium concentrations (Lagostena, Festa, Pusch, & Carpaneto, 2017). Previous pharmacological and phylogenetic analyses suggest TPC to share putative ligand-binding sites with Cav and Nav channels, and could possibly share a common ancestor with those voltage-gated channels (Rahman et al., 2014). One theory is that Cav and Nav channels, both with four TM domains, stemmed from two rounds of sequence duplication from single-domain channels (such as voltage-gated potassium channels, which consist of four single- domain channel subunits coming together to form a tetramer complex), or from the duplication of an intermediate two-domain channel, possibly something similar to the TPCs (Ishibashi, Suzuki, & Imai, 2000; Rahman et al., 2014; Strong, Chandy, & Gutman, 1993). The current two-domain putative voltage-calcium channel in T. adhaerens could potentially represent a missing link between the evolution of channel types, though a more elaborate phylogenetic investigation, with the inclusion of more channel families and subtypes from various phyla-representatives, would be required to unravel the identity of this channel and its relative position in the evolution of channels types.

Ion channels are also important for direct interactions between animals and their surrounding environment, such as sensation of pressure and movement. In addition to the ASICs and ENaCs that respond to a wide variety of gating mechanisms (see section 2.3.3), including mechanical forces (Kang et al., 2012), we also identified a voltage-gated mechanosensitive channel homologue piezo, and the voltage-gated potassium channel shaker (Kv1, or KCNA), which is also been implicated in mechano- sensation in bilaterians. A recent finding related to sensory information processing in T. adhaerens suggests the presence of a putative gravity-sensing apparatus in crystal cells (Mayorova et al.,

38

2018). It was proposed that the cells may act as gravity sensors by activation of mechanosensitive channels in the cell membrane, stimulated by forces imposed by perturbed actin filaments attached to an aragonite crystal in the cell body that moves downwards due to gravity, irrespective of the orientation of the animal itself (Mayorova et al., 2018), though more experimental and localization support would be required. The authors proposed that the ability for the animal to ‘sense’ its orientation would allow them to perform compensatory movements and resist downward motion. It remains unclear what the underlying mechanism may be, but identification of mechanosensitive channel homologues in the animal is a good starting point for understanding how the animal ‘senses’ and ‘responds’ its mechanical stimuli in the environment.

2.3.6 Synaptic Scaffolding and Signaling Protein Homologues Suggest T. adhaerens to have the Molecular Capacity for Primitive Synapse-like Functions

If T. adhaerens expresses many secretory peptides, and possibly use transmitters such as glutamate and GABA for cell-cell communication, how could the release of these signal molecules be regulated and performed? It has been shown that T. adhaerens gland cells express exocytotic machinery including SNARE proteins (syntaxin1, synaptobrevin, and SNAP-25) as well as complexin, a regulator of the SNARE complex (Smith et al., 2014). We further identified four homologues of the mammalian homologue of uncoordinated protein 18 (munc-18/syntaxin- binding protein/stxbp), a protein involved the regulation of exocytotic membrane fusion via disruption of the closed protein conformation of syntaxin-1 (Khvotchev et al., 2007). The Munc18:Syntaxin-1 complex was initially thought to be specific to animals and neuronal secretion, but later was identified in the genome of the pre-metazoan choanoflagellate M. brevicollis (Burkhardt et al., 2011). Nevertheless, this protein-protein interaction is essential for proper neurotransmission at the synapses (Hosono et al., 1992; Khvotchev et al., 2007; Schulze et al., 1994). Figure 8A is a schematic showing domain compositions of the identified T. adhaerens homologues of presynaptic proteins, with proposed organizations based on protein-protein interactions reported in existing models and experimental findings from other synapses characterized in other animals as documented in literature (Emes & Grant, 2012; Südhof, 2012). It is remains unknown whether or not these proteins interact in T. adhaerens, in a similar manner as their bilaterian homologues would in the synaptic complex. Yet the expression of the near- complete set of SNARE proteins, SNARE associated proteins, and core presynaptic scaffolding complex proteins (Table 12) with highly conserved domain compositions (Figure 8A) supports the

39

notion of both the interactome and organization of these presynaptic-like proteins to be conserved in T. adhaerens.

Four neuroligin homologues were found in the transcriptome, one neuroligin-1 and three neuroligin-4 homologues. Neuroligin is an important cell adhesion protein in the post-synaptic terminal to stabilize synapses (Bang & Owczarek, 2013). Within the PSD, neuroligins interact with other post-synaptic proteins to regulate synapse maturation (Nam & Chen, 2005) and stabilization as well as synaptic plasticity (Choi et al., 2011). Neuroligin has also been suggested to form transynaptic interactions with the presynaptic cell adhesion neurexin molecule, where expression of the mice neuroligin in nonneuronal cells induced presynaptic-like morphology in contacting-cells, and application of a soluble form of b-neurexin inhibited such observation (Scheiffele, Fan, Choih, Fetter, & Serafini, 2000). Interestingly, it has been reported that neuroligins are not expressed in either choanoflagellates nor sponges (Riesgo, Farrar, Windsor, Giribet, & Leys, 2014), and we also failed to find homologues of neuroligin in Ctenophores through our NCBI BLAST search, nor several transcriptomes we assembled in house using previously published RNA-seq reads (see section 2.1.10). T. adhaerens is therefore likely the most early-diverging animal reported to express neuroligin, which could have served for later innovative evolution from proto- to true synapses in metazoans. However, we did identify neurexin homologues in both T. adhaerens and our Ctenophore (both M. leidyi and H. californiensis) transcriptomes (see section 2.1.10), though not in sponges nor choanoflagellates through our NCBI BLAST search. Nevertheless, despite the absence of canonical synaptic structures, T. adhaerens appears to the most basal animal species to express homologues of both neurexin and neuroligin.

Surprisingly, T. adhaerens does not express the master PSD scaffolding protein, Shank (Sala, Vicidomini, Bigi, Mossa, & Verpelli, 2015; Uemura, Mori, & Mishina, 2004; H. Zhang, 2005). Shank is present in the premetazoan choanoflagellates, numerous sponge species, and in Cnidarians (Riesgo et al., 2014; Onur Sakarya et al., 2007). The absence of Shank in both the T. adhaerens genome and transcriptome suggests a potential loss of this gene from the Placozoan. One model of nervous system evolution (see Chapter 1) suggests that T. adhaerens had lost the nervous system rather than never having evolved one (Ryan & Chiodin, 2015). Whether the genetic ‘neural repertoire’ of genes expressed in T. adhaerens supports the loss of a pre-existing nervous system or the innovative precursors of neural components for later nervous system, or precursor elements for nervous system evolution, is a difficult question to address and subject to

40

debate (Ryan & Chiodin, 2015), however the observed loss of the key scaffolding protein Shank gene is an interesting piece of additional information worth considering.

We also report the expression of a homologue for the master scaffolding protein in inhibitory synapses, gephyrin (E. Y. Kim et al., 2006; Tretter et al., 2012; Tyagarajan et al., 2011). Gephyrin is suggested to be important for the anchoring of ionotropic glycine and GABAA receptors at synapses (Danglot, Triller, & Bessis, 2003; E. Y. Kim et al., 2006). Yet, neither glycine nor

GABAA receptors, were found in the T. adhaerens predicted proteome nor published genome. Perhaps gephyrin may have other roles in protein scaffolding and was recruited to the inhibitory synapse formation and stabilization later in nervous system evolution. Gephyrin has also been shown to be required for the synthesis of the molybdenum cofactor (Moco) in non-neuronal tissue, where in humans deficiency of this cofactor usually results in early childhood death (J Reiss et al., 2001; Jochen Reiss & Johnson, 2003). It is unclear what may have been the first functional role of gephyrin in evolution. Hence, its expression in T. adhaerens cannot be indicative of neural activity or neuronal-like cell type. However, its presence in the transcriptome suggests that T. adhaerens has the capacity to exhibit clustering of synaptic scaffolds and signaling molecules for signal transduction.

We have also identified multiple membrane associated guanylate kinases (MAGUK) in the transcriptome, including the disc-large homologue 1 (DLG1, aka SAP97) implicated in synaptic development and plasticity (Poglia et al., 2011; Schlüter, Xu, & Malenka, 2006). In the nervous system, SAP97 is found in both the pre- and post-synaptic membrane as well as perisynapse regions, and is associated with binding and trafficking of many ion channels to the membrane, including ionotropic glutamate receptors (Oliva, Escobedo, Astorga, Molina, & Sierralta, 2012; Sans et al., 2001; White et al., 2016). The TPM expression of Dlg1/SAP97 in T. adhaerens is relatively high (TPM: 166.55, SD: 4.26) compared to most other MAGUK homologues which typically have TPM levels below 100. Many components of the mTOR and MAPK/ERK signaling pathways, along with protein kinases and transcription factors were also found in the transcriptome. Although many of these signaling pathways are also implicated in non-neural processes such as development and immunity, they are also very important components of synaptic signaling (Emes & Grant, 2012; Ringrose et al., 2013). These findings at least suggest that T. adhaerens has the molecular building blocks and the capacity for synapse-related activity and

41

functioning signaling cascades despite not having canonical synapses. In addition, given the complexity of the numerous cell types and molecular redundancy in the bilaterian nervous systems, T. adhaerens presents a unique opportunity for studying the interactions of these genes in a much more simplified molecular and cellular environment.

2.3.7 Evolutionary Expansion Protein-Protein-Interaction-Related Domain Counts

Scaffolding domains permit interactions between proteins which are important for the complex assembly of protein complexes in the cell membrane and their association with scaffolding proteins, as well as components of signaling cascades downstream of neural activity. Comparison of the M. brevicollis (Choanoflagellate) genome with metazoan genomes revealed both expansion and specialization of PDZ domains which is associated with increased molecular complexity of the synapse (O. Sakarya et al., 2010). Our cross-phyla domain counts confirmed this previously reported expansion of PDZ domains, but also observed a 2 to 3-fold reduction in domain counts for the following phyla: Ctenophora, Porifera, Placozoa (T. adhaerens) and Cnidaria. Such regression could be associated with the fine-tuning and rewiring of protein domain-ligand interactions as the increased population of scaffolding domains begin to participate in refined protein-protein-interaction networks (J. Kim et al., 2012; O. Sakarya et al., 2010).

Exceptional expansions in PDZ domain counts were observed for the Pacific oyster C. gigas and the horseshoe crab L. polyphemus. Researchers have previously reported a massive expansion and gene duplication of immunity-related signaling molecules and the protein kinase superfamily in C. gigas from their sequenced genome and RNA-seq transcriptome (Gao, Ko, Tian, Yang, & Wang, 2015; Gerdol, Venier, & Pallavicini, 2015; Kocot, Aguilera, McDougall, Jackson, & Degnan, 2016). However, gene duplications related to synaptic or scaffolding proteins have not been reported in C. gigas, therefore the reason behind the increase of domain count we observed is unknown. On the other hand, the L. polyphemus genome has been reported to contain an array of paralogue genes that typically exist as single copies in other species due to one or more rounds of whole genome duplications (Kenny et al., 2016). Thus, it is possible that the rise in domain counts we observe is a result of gene duplication for scaffolding proteins. In accordance, we identified four RIM homologues in L. polyphemus, while only one or two were identified in other Arthropoda representatives such as Drosophila melanogaster and Centruiroides sculpturatus.

42

Being aware that our current analysis only surveys the counts of few protein domains of interest, we cannot evaluate the complexity of domain architecture such as domain copy per gene or the diversity of sequence patterns and domain subtypes. We only seek to provide a focused view of the expansion pattern of scaffolding domain counts, as we do see remarkable enrichment in PDZ, SH3, C2, Rho-GAP, guanylate kinase, and CaMK domains in the chordates, accompanied by the increasing complexity of animal biology and nervous system evolution. Interestingly, previous proteome profiling of T. adhaerens revealed dramatic increase in tyrosine phosphorylation, suggesting a high level of tyrosine-regulated signaling in the animal compared to mammalian species (Ringrose et al., 2013), suggesting that our observed increase in SH3 domains in mammals compared to T. adhaerens is not directly indicative of the relative levels phospho-tyrosine signaling in the two groups, that the role of SH3 domains in ligand binding and protein-protein interactions should also be considered. Interestingly, the GK domain, with an origin of being catalytic active, has undergone a gradual loss of catalytic activity evolved to become a protein- protein interaction module in important scaffolding protein, such as the MAGUK family proteins (Kuhlendahl, Spangenberg, Konrad, Kim, & Garner, 1998; Te Velthuis, Admiraal, & Bagowski, 2007).

43

Chapter 3 Discovery of a novel Rab3-Interacting Molecule (RIM) Type Provides Insight into the Early Molecular Machinery of Nervous System Evolution Discovery of a novel Rab3-Interacting Molecule (RIM) Type Provides Insight into the Early Molecular Machinery of Nervous System Evolution 3.1 A Brief Description of the RIM Protein

The rab3-interacting molecule (RIM), was initially identified as a rab3 effector that binds the GTP- complexed rab3 found on synaptic vesicles (Wang, Okamoto, Schmitz, Hofmann, & Südhof, 1997). RIM is an important scaffolding protein at the presynaptic active zone, that is critical for proper neurotransmission through interacting with multiple binding partners, including rab3/27 (Wang et al., 1997)(Gracheva, Hadwiger, Nonet, & Richmond, 2008), Munc-13 (Betz et al., 2001), SNAP25 (Coppola et al., 2001), synaptotagmin-I (Coppola et al., 2001), and potentially presynaptic calcium channels (Kaeser et al., 2011). Notably, gene homologues for all of these proteins were identified in the T. adhaerens transcriptome with highly conserved domain compositions (see section 2.29), suggesting possibility for a similar interactive network despite the absence of a structural synapse in T. adhaerens.

In mice and D. melanogaster, deletion or mutation of RIM reduced synaptic release and disrupted localization of presynaptic voltage-gated calcium channels at the release site (Calakos, Schoch, Südhof, & Malenka, 2004; Graf et al., 2012; Kaeser et al., 2011). Interestingly, rescue experiments in mice with RIM deletion showed that introduction of the RIM N-terminus reversed the reduction in synaptic release, while the RIM C-terminus (with the inclusion of the RIM PDZ domain) rescued the reduction in calcium channel localization (Kaeser et al., 2011). Therefore, it has been proposed that RIM functions to tether synaptic vesicles in proximity to the presynaptic active zone exocytotic machinery and presynaptic calcium channels. Yet, whether this proposed interaction between the RIM PDZ domain and a putative PDZ-ligand binding motif located at the end of the

N- or P-/Q-type calcium channel (Cav2) C-terminus exists at the pre-synapse endogenously, is still under debate, due to the lack of experimental support for direct protein-protein-interaction between the two (Gardezi, Li, & Stanley, 2013; Wong & Stanley, 2010).

44

3.2 Methods

3.2.1 Phylogenetic Analyses and Annotations

Protein sequences obtained for RIM and rabphilin-3a (Rph3a) sequences were mainly obtained from NCBI protein database or available transcriptome or genome databases for selected species to represent each major phylum (for accessions, see Appendix III). Sequences that were too fragmented were either discarded or manually assembled with guidance of available genome scaffolds and raw sequence reads data for selected species (for accessions, please see Appendix IV).

RIM and Rph-3a protein sets were aligned using MUSCLE v3.8.31. The aligned RIM and Rph3a sequences were trimmed with Trimal v1.2rev59 with gap threshold (gt) 0.50 followed by manual trimming. Trimmed sequences for both trees were processed for phylogenetic model selection (selected model: VT + F + G4) and maximum likelihood tree construction with 1000 bootstraps using IQ-TREE (multicore v1.6.1). A Bayesian phylogenetic tree was constructed using MrBayes v3.2.6 (Huelsenbeck & Ronquist, 2001) (model: WAG + F + G, predicted using Mega-CC (Kumar, Stecher, Peterson, & Tamura, 2012)). Bayesian posterior probability scores are annotated on the maximum likelihood trees at nodes where there is a consensus branching pattern. The RIM Bayesian tree was constructed with the number of generations (ngen) of 10,000,000, sample frequency (samplefreq) of 100, and burnin (burninfrac) of 0.25.

3.2.2 Quantification PCR for RIM Types I and II in Lymnaea stagnalis

Quantification PCR for RIM in Lymnaea stagnalis was performed by a Ph.D. student in the Senatore Lab, Julia Gauberg.

Animal tissues were extracted from adult L. stagnalis (shell length ≥ 2 cm) anesthetized using Listerine for the central nervous system (CNS), heart, prostate, albumen, and buccal mass respectively. For each tissue type, 7-8 animals were used for tissue extraction where samples were pooled to give three distinct biological replicates. Total RNA was extracted from skin tissue using TRI Reagent (Sigma-Aldrich, USA) according to manufacturer’s instructions. DNase I (Thermo Fischer Scientific, USA) was used to treat extracted RNA, and first-strand cDNA was synthesized

TM using SuperScript IV Reverse Transcriptase (Thermo Fischer Scientific, USA) and Oligo(dT)20 primers.

45

Gene specific primers designed for L. stagnalis RIM I (Accession: FX186940.1), RIM II (Accession: FX181400.1) and EF-1� (Accession: DQ278441.1) were used in both RT-PCR and qPCR (see Appendix I). For RT-PCR the following reaction conditions were utilized for PCR: 1 cycle of denaturation (95o C, 30 sec), 30 cycles of denaturation (95o C, 15 sec), annealing (61o C, 15 sec) and extension (72o C, 60 sec), followed by a final single extension cycle (72o C, 5 min). RT-PCR generated amplicons were visualized by 1.5% agarose gel electrophoresis.

RIM I, RIM II and EF-1� protein mRNA abundance was examined using SensiFastTM SYBR No- ROX Kit (Bioline Reagent Limited, United Kingdom) under the following conditions: 1 cycle of denaturation (95 C, 2 min), and 40 cycles of denaturation (95o C, 5 sec), and annealing/extension (61o C, 30 sec). A melting curve was constructed after each qPCR run, ensuring that a single product was synthesized during each reaction. For all qPCR analyses, transcript abundance was normalized to EF-1α transcript abundance after determining by statistical analysis that EF-1α did not significantly alter (P > 0.2) in response to experimental conditions. Primer efficiency was confirmed to be between 90-110% before use.

Transcript abundance of RIM I and II proteins in L. stagnalis buccal mass, CNS, heart and prostate was expressed relative to albumen abundance, which was set to 100%. The delta-delta CT method was used to compare normalized abundance between genes. All data we expressed as mean values ±SEM (n=3). Significant differences (P<0.05) were determined by One-way ANOVA and are illustrated with lowercase letters. Statistical analyses were conducted using SigmaPlot 12.5 (Systat Software Inc., San Jose, CA, USA)

3.3 Results

3.3.1 Discovery of a Phylogenetically Novel Class of the Pre-synaptic Scaffolding Protein RIM

Given the importance of RIM for the organization and function of neuronal pre-synaptic terminals, we carried out a comprehensive annotation of this gene in T. adhaerens and representative organisms from the major animal phyla. Our findings in the T. adhaerens transcriptome annotations (see section 2.2.9) revealed the presence of two RIM homologues in the animal, one with the canonical domain composition (an N-terminal Zn-finger, a PDZ domain, and two C- terminal C2 domains), while the other did not have a PDZ domain, resembling the related protein Rph3a (Figure 10). Maximum likelihood and Bayesian phylogenetic trees, inferred from protein

46

Figure 10 – Domain Compositions of T. adhaerens Type I RIM (Tad-RIM-I), Type II RIM (Tad-RIM-II), and Rabphilin-3a (Rph3a) – N-terminal Zn-finger domain, PDZ domain, C- terminal C2 domain.

47

alignments, revealed the existence of two clades on RIM genes distinct from Rph3a, which we term RIM type I and RIM type II (Figure 11). Notably, the RIM gene associated with synaptic scaffolding in mice (phylum: Chordata), D. melanogaster (phylum: Arthropoda), and C. elegans (phylum: Nematoda) all fall within the RIM type I clade and appear in nearly all animal phyla, while RIM II genes were identified in a subset of bilaterian phyla including Arthropoda, Mollusca, Nemertea, and Brachiopoda. Interestingly the representative bilaterian animal in the Nemertea, as well as the early-diverging phylum Ctenophora, were found to only possess a Rim type II gene and not type I, making them the only animals with synapses and a nervous system but lack RIM type I.

Interestingly, the average length of RIM type I protein sequences on average seem to be longer than RIM type II sequences, and both RIM types were significantly longer than Rhp-3a homologues, which is expected to have implications for the spatial arrangements of protein complexes involving these proteins. Also, consistent with the recent study (Paps & Holland, 2018), we were unable to find either RIM or Rph3a genes in non-metazoan animals, and all animal phyla were found to possess at least one RIM homologue.

The T. adhaerens RIM type I (Tad-RIM-I) is one of the only two RIM sequences, along with one O. caremella RIM, included in our phylogenetic analysis that lack a PDZ domain. Instead, the ctenophore RIM type II gene lacks a canonical Zn-finger domain, which mediates important interaction with Munc-13, a critical player in the regulated exocytosis of vesicles at nerve terminals.

3.3.2 Differential Tissue Expression of RIM Types I and II in Lymnaea stagnalis

Quantitative PCR (qPCR) for RIM types I and II homologues of L. stagnalis reveals enrichment of RIM type I (Lym-RIM-I) in the central nervous system (CNS) compared to the heart and buccal mass (Figure 12). RIM type II (Lym-RIM-II) on the other hand was enriched in the prostate gland compared to the CNS, heart, albumin and buccal mass. The abundance of expression in heart relative to albumin gland was also much greater for Lym-RIM-I than Lym-RIM-II.

48

Figure 11 – Phylogeny of Rab3-Interacting Molecule (RIM) and Rabphilin-3a (Rph3a). A maximum-likelihood phylogenetic tree was constructed with VT model (+F, +G4). Bootstrap support values are indicated at branch nodes. Posterior probability (pp) value ranges of a Bayesian tree constructed with WAG model (+F, +G) are colour-coded and indicated on branch nodes. Green bars represent protein length (amino acid). Sequence completeness are indicated by presence or absence of start and stop codons (depicted by triangles at the beginning and end of each sequence (green bars). Domain compositions of each protein is indicated on the sequence (green bars)

49

Figure 12 – Differential Tissue Expression of RIM Types I and II in Lymnaea stagnalis

50

3.4 Discussions

3.4.1 Novel RIM Type Our analyses of RIM homologues revealed two phylogenetically distinct RIM types. Notably, all RIM proteins that have been experimentally studied in scientific literature, namely in mammalian species (phylum: Chordata), C. elegans (phylum: Nematoda), and D. melanogaster (phylum: Arthropoda) (Calahorro & Izquierdo, 2018; Graf et al., 2012; Kaeser et al., 2011), all belong to the RIM Type I clade. RIM Type II can therefore be said to be a novel set of unstudied RIM proteins. The fact that T. adhaerens was one of few species bearing representative RIM homologues from each clade suggests that a duplication event gave rise to the two RIM types likely occurred early on in animal evolution, prior to the divergence of T. adhaerens, and that certain RIM types were independently lost in numerous animal lineages, and in some cases duplicated as occurred in vertebrates and some arthropods.

One of the most interesting observation lies within the most early-diverging phyla. As mentioned in earlier chapters, one of the most controversial debates of nervous system evolution stems from the relative position of Ctenophora and Porifera in the animal phylogeny (Dunn et al., 2008; Jákely et al., 2015; Paps & Holland, 2018). Ctenophora and Nemertea are the only phyla to have single RIM representatives in the Type II, whereas the Poriferan species O. carmela contains two RIM homologues that are both Type I RIM (Figure 11). If Ctenophora is indeed the most basal animal phyla, RIM Type I could represent a more ancient RIM type that duplicated to form Type II RIMs following the divergence of Ctenophores, where Poriferans lost the Type I RIMS while T adhaerens kept both. If Porifera is basal to Ctenophora, then the opposite scenario may hold true, that RIM type duplication occurred following divergence of Porifera, and ctenophores lost the RIM type I. However, it is important to note that there are still many possible explanations, including convergent evolution of individual RIMs, causing distant RIM homologues to cluster into the same clade.

Deletions or mutations the RIM type I protein in D. melanogaster, C. elegans, and mice has been shown to disrupt the calcium-dependence for vesicular release at the presynaptic active zone (Gracheva et al., 2008; Graf et al., 2012; Kaeser et al., 2011), suggesting RIM to be ancient element of the primordial synapse, dating back to at least before the divergence of those animal phyla roughly 540 million years ago. With its N-terminus involved in binding with rab3 and Munc-13

51

for docking and priming of synaptic vesicles, many studies have also suggested the RIM-PDZ domain to have significant roles in interacting with the putative PDZ-ligand-binding motif Cav2 C-terminus (Gardezi et al., 2013; Kaeser et al., 2011; Kiyonaka et al., 2007; Wang et al., 1997), though this ligand-motif interaction still requires further experimental validation (Gardezi et al., 2013; Khanna, Li, Sun, Collins, & Stanley, 2006; Wong et al., 2014). Nevertheless, RIM has been recently suggested to be a metazoan-specific innovation, further supporting its functional significance in animal biology, and possibly nervous system evolution. Yet, again, all of those homologues are Type I RIMs, and the functional role of Type II RIMs has been unexplored (Figure 11). However, it is interesting to note that all animals have at least one of the two RIM types (Paps & Holland, 2018), including those that lack synapses and a nervous system, suggesting that RIM must have a fundamental role unique to animal biology. The observation that many species express multiple copies of the RIM protein could suggest for some level of functional redundancy, or possible sub-functionalization to different biological processes in animal physiology.

3.4.2 Tissue-Specific Differential Expression of RIM Types in L. stagnalis

In T. adhaerens, Tad-RIM-II has a much higher TPM expression than Tad-RIM-I (Figure 12), likely due to the presence of a PDZ domain in Tad-RIM-II and its absence in Tad-RIM-I. Yet, for animals with both RIM types that bear full canonical RIM domain architecture, what could be the functional differentiation between the two? Our qPCR results for RIM types I and II from the mollusc L. stagnalis, suggest RIM type I to be enriched in the CNS, while RIM type II to be enriched in the prostate gland. Being part of the reproductive system, the prostate gland of L. stagnalis is important for secretion of regulatory peptides into seminal fluids to ensure reproductive fitness (Koene et al., 2010). Since existence of the two RIM types date back to the animals that lack canonical neurons and synapses, we hypothesize that the duplication of RIM types led to sub-functionalization of the type I RIM to refined synaptic vesicular releases, while the type II RIM adopts a more generalized secretory function that likely resembles a primitive role of the ancient RIM homologue. Furthermore, for both RIM types, expression in the CNS is significantly higher than that in the heart. This is likely due to higher reliance on gap junctions for efficient flow of ions in the heart, for cardiac impulse propagation, compared to neuromuscular junctions and electrochemical synapses in the CNS (Desplantez, Dupont, Severs, & Weingart, 2007), consistent with the hypothesis that both RIM types are still involved in secretory functions.

52

Chapter 4 General Conclusions and Future Directions General Conclusions and Future Directions

The rise of the nervous system provides a mechanism for cells to communicate with one another in a precise and efficient manner as animal multicellularity emerged and evolved (Leys & Meech, 2006). Interestingly, despite not having a canonically-defined ‘nervous systems’, T. adhaerens is capable of interacting with the environment, and expresses a substantial number of genes important for neural signaling in well-characterized nervous systems of other animals. Our in-depth gene annotation with extensive focus on various properties of neuronal function could provide great insight into the animal’s biology. Though there has been no report on any electrical activity ever recorded in T. adhaerens cells, the rich repertoire of voltage-gated ion channels in the highly suggests that the animal uses some form of electrical signaling in nature. Abundant expression of secretory peptides, secretory machinery and GPCRs also supports the notion that the animal encompasses the fundamental apparatus for peptidergic signaling.

Here, we present a high-quality transcriptome of T. adhaerens that can serve as a useful tool for thorough gene search and annotation, with the addition of 2,500 novel T. adhaerens genes to the scientific community to completeness of the existing gene-sets used for large-scale phylogenetic analyses and cell-type-specific gene annotations. From the T. adhaerens transcriptome, we were able to identify numerous secreted proteins and regulatory-/neuropeptide- sequences previously unpredicted from the genome (Nikitin, 2015; Srivastava et al., 2008). Our documentation of the T. adhaerens secretome and GPCRome could serve for future deorphanization experiments (Bauknecht & Jékely, 2015), possibly mapped with localization studies of the ligand-receptor pairs to map out communicatory networks between cells or cell types in the animal. The high abundance of complete protein-coding sequences in the transcriptome also enables synthesis or in vitro expression of the T. adhaerens genes of interest in experiments such as proteomic analyses for potential protein-protein- interactions or interactive networks present in the animal. As previously mentioned, we report T. adhaerens to be the most early-diverging animal to express a homologue of the cell-adhesion protein neuroligin. The mice neuroligin homologue expressed in nonneuronal cells was observed to induce synaptic morphology of contacting-cells (Scheiffele et al., 2000). Future experiments could apply this approach to the T. adhaerens neuroligin and neurexin homologues, to see if proteins of the early- diverging animal are also capable of inducing the formation of synaptic morphology, despite the absence of endogenous canonical synaptic structure in the animal itself.

53

The identification of the two RIM homologues in the T. adhaerens transcriptome also lead to an extended phylogenetic analysis of RIM proteins across different phyla and revealed a novel unexplored RIM type. Our quantitative PCR experiment showed tissue-specific differential expression for the two RIM types in the mollusk L. stagnalis, suggesting sub-functionalization of the two RIM types. Future studies could investigate the localization of two RIM homologues in T. adhaerens to test if the tissue- or cell-type- specific differentiation in expression of the two RIM types exists early on in animal evolution. It would be interesting to see whether the absence of the PDZ-domain Tad-RIM-I may have implication in the protein’s functional role, as TPM expression suggest relatively low abundance, at the whole animal level, of Tad-RIM-I compared to Tad-RIM-II which has a PDZ domain.

As earlier mentioned, RIM is recently suggested to be one of the 25 groups of genes unique to all animals (Paps & Holland, 2018), and T. adhaerens is the most early-diverging animal to bear representative RIM homologues of both RIM types (Ctenophora only has Type II RIMs, while Porifera only has Type I RIMs). The unique position of T. adhaerens amongst the most early-diverging animals and that it lacks canonical neurons and synapses, and the expression of both types of the ‘presynaptic’ scaffolding RIM further proposes T. adhaerens to be a key model for studying the evolution of the nervous system.

Lastly, with the advance in sequencing and computational technology, a new trend of identifying cell types and tissue of origin via singe-cell RNA-Seq is emerging (Achim et al., 2015; Rotem et al., 2015). Large scale sampling of cells and combinations with localization and physiological approaches can serve to provide higher resolution of spatial and functionally distinct cell types (Achim et al., 2015; Fuzik et al., 2016; Rizvi et al., 2017). Future applications of more refined and high-resolution characterization of cell-types such as neuron-like cell types in ‘aneural’ species, may better represent transitional states of cell-types from being non-neuronal to exhibiting neuronal properties, such as the capability to form structural synaptic connections in animal and nervous system evolution.

54

References Achim, K., Pettit, J. B., Saraiva, L. R., Gavriouchkina, D., Larsson, T., Arendt, D., & Marioni, J. C. (2015). High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nature Biotechnology, 33(5), 503–509. https://doi.org/10.1038/nbt.3209 Assmann, M., Kuhn, A., Dürrnagel, S., Holstein, T. W., & Gründer, S. (2014). The comprehensive analysis of DEG/ENaC subunits in Hydra reveals a large variety of peptide- gated channels, potentially involved in neuromuscular transmission. BMC Medicine, 12(1), 1–14. https://doi.org/10.1186/s12915-014-0084-2 Bairoch, A. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research, 28(1), 45–48. https://doi.org/10.1093/nar/28.1.45 Bang, M. L., & Owczarek, S. (2013). A matter of balance: Role of neurexin and neuroligin at the synapse. Neurochemical Research, 38(6), 1174–1189. https://doi.org/10.1007/s11064-013- 1029-9 Bauknecht, P., & Jékely, G. (2015). Large-scale combinatorial deorphanization of platynereis neuropeptide GPCRs. Cell Reports, 12(4), 684–693. https://doi.org/10.1016/j.celrep.2015.06.052 Bayés, Á., Van De Lagemaat, L. N., Collins, M. O., Croning, M. D. R., Whittle, I. R., Choudhary, J. S., & Grant, S. G. N. (2011). Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nature Neuroscience, 14(1), 19–21. https://doi.org/10.1038/nn.2719 Betz, A., Thakur, P., Junge, H. J., Ashery, U., Rhee, J. S., Scheuss, V., … Brose, N. (2001). Functional interaction of the active zone proteins Munc13-1 and RIM1 in synaptic vesicle priming. Neuron, 30(1), 183–196. https://doi.org/10.1016/S0896-6273(01)00272-0 Blake, J. A., Christie, K. R., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., … Westerfeld, M. (2015). Gene ontology consortium: Going forward. Nucleic Acids Research, 43(D1), D1049–D1056. https://doi.org/10.1093/nar/gku1179 Bockaert, J., & Pin, J. P. (1999). Molecular tinkering of G protein-coupled receptors: an evolutionary success. The EMBO Journal, 18(7), 1723–1729. https://doi.org/10.1093/emboj/18.7.1723 Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. https://doi.org/10.1093/bioinformatics/btu170 Buchwalow, I. B., Podzuweit, T., Böcker, W., Samoilova, V. E., Thomas, S., Wellner, M., … Lerch, M. M. (2002). Vascular smooth muscle and nitric oxide synthase. FASEB Journal, 16(6). https://doi.org/10.1096/fj.01-0842com Burkhardt, P., Stegmann, C. M., Cooper, B., Kloepper, T. H., Imig, C., Varoqueaux, F., … Fasshauer, D. (2011). Primordial neurosecretory apparatus identified in the choanoflagellate Monosiga brevicollis. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1106189108 Calahorro, F., & Izquierdo, P. G. (2018). The presynaptic machinery at the synapse of C. elegans. Invertebrate Neuroscience, 18(2), 1–13. https://doi.org/10.1007/s10158-018-0207- 5 Calakos, N., Schoch, S., Südhof, T. C., & Malenka, R. C. (2004). Multiple roles for the active zone protein RIM1alpha in late stages of neurotransmitter release. Neuron, 42(6), 889–896. https://doi.org/10.1016/j.neuron.2004.05.014 Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15),

55

1972–1973. https://doi.org/10.1093/bioinformatics/btp348 Carr, M., Leadbeater, B. S. C., Hassan, R., Nelson, M., & Baldauf, S. L. (2008). Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.0801667105 Chen, C.-H., Ferreira, J. C. B., Gross, E. R., & Mochly-Rosen, D. (2014). Targeting Aldehyde Dehydrogenase 2: New Therapeutic Opportunities. Physiological Reviews, 94(1), 1–34. https://doi.org/10.1152/physrev.00017.2013 Chen, X., Winters, C., Azzam, R., Li, X., Galbraith, J. A., Leapman, R. D., & Reese, T. S. (2008). Organization of the core structure of the postsynaptic density. Proceedings of the National Academy of Sciences, 105(11), 4453–4458. https://doi.org/10.1073/pnas.0800897105 Choi, Y. B., Li, H. L., Kassabov, S. R., Jin, I., Puthanveettil, S. V., Karl, K. A., … Kandel, E. R. (2011). Neurexin-Neuroligin Transsynaptic Interaction Mediates Learning-Related Synaptic Remodeling and Long-Term Facilitation in Aplysia. Neuron, 70(3), 468–481. https://doi.org/10.1016/j.neuron.2011.03.020 Churamani, D., Hooper, R., Brailoiu, E., & Patel, S. (2012). Domain assembly of NAADP-gated two-pore channels. Biochemical Journal, 441(1), 317–323. https://doi.org/10.1042/BJ20111617 Coppola, T., Magnin-Lüthi, S., Perret-Menoud, V., Gattesco, S., Schiavo, G., & Regazzi, R. (2001). Direct Interaction of the Rab3 Effector RIM with Ca2+ Channels, SNAP-25, and Synaptotagmin. Journal of Biological Chemistry, 276(35), 32756–32762. https://doi.org/10.1074/jbc.M100929200 Cossenza, M., Socodato, R., Portugal, C. C., Domith, I. C. L., Gladulich, L. F. H., Encarnação, T. G., … Paes-de-Carvalho, R. (2014). Nitric oxide in the nervous system. Biochemical, developmental, and neurobiological aspects. Vitamins and Hormones (Vol. 96). https://doi.org/10.1016/B978-0-12-800254-4.00005-2 Danglot, L., Triller, A., & Bessis, A. (2003). Association of gephyrin with synaptic and extrasynaptic GABAA receptors varies during development in cultured hippocampal neurons. Molecular and Cellular Neuroscience, 23(2), 264–278. https://doi.org/10.1016/S1044-7431(03)00069-1 Dejanovic, B., & Schwarz, G. (2014). Neuronal Nitric Oxide Synthase-Dependent S- Nitrosylation of Gephyrin Regulates Gephyrin Clustering at GABAergic Synapses. Journal of Neuroscience, 34(23), 7763–7768. https://doi.org/10.1523/JNEUROSCI.0531-14.2014 Desplantez, T., Dupont, E., Severs, N. J., & Weingart, R. (2007). Gap junction channels and cardiac impulse propagation. Journal of Membrane Biology, 218(1–3), 13–28. https://doi.org/10.1007/s00232-007-9046-8 Dunn, C. W., Hejnol, A., Matus, D. Q., Pang, K., Browne, W. E., Smith, S. A., … Giribet, G. (2008). Broad phylogenomic sampling improves resolution of the animal tree of life. Nature, 452(7188), 745–749. https://doi.org/10.1038/nature06614 Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340 Eitel, M., Francis, W. R., Osigus, H.-J., Krebs, S., Vargas, S., Blum, H., … Worheide, G. (2017). A taxogenomics approach uncovers a new genus in the phylum Placozoa. BioRxiv, 0–47. https://doi.org/10.1101/202119 Elias, G. M., & Nicoll, R. A. (2007). Synaptic trafficking of glutamate receptors by MAGUK scaffolding proteins. Trends in Cell Biology, 17(7), 343–352. https://doi.org/10.1016/j.tcb.2007.07.005 Elliott, G. R. D., & Leys, S. P. (2010). Evidence for glutamate, GABA and NO in coordinating

56

behaviour in the sponge, Ephydatia muelleri (Demospongiae, Spongillidae). Journal of Experimental Biology. https://doi.org/10.1242/jeb.039859 Ellwanger, K., Eich, A., & Nickel, M. (2007). GABA and glutamate specifically induce contractions in the sponge Tethya wilhelma. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 193(1), 1–11. https://doi.org/10.1007/s00359-006-0165-y Ellwanger, K., & Nickel, M. (2006). Neuroactive substances specifically modulate rhythmic body contractions in the nerveless metazoon Tethya wilhelma (Demospongiae, Porifera). Frontiers in Zoology, 3, 7. https://doi.org/10.1186/1742-9994-3-7 Emanuelsson, O., Nielsen, H., Brunak, S., & Von Heijne, G. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology, 300(4), 1005–1016. https://doi.org/10.1006/jmbi.2000.3903 Emes, R. D., & Grant, S. G. N. (2012). Evolution of Synapse Complexity and Diversity. Annual Review of Neuroscience, 35(1), 111–131. https://doi.org/10.1146/annurev-neuro-062111- 150433 Ereskovsky, A. V., Richter, D. J., Lavrov, D. V., Schippers, K. J., & Nichols, S. A. (2017). Transcriptome sequencing and delimitation of sympatric Oscarella species (O. carmela and O. pearsei sp. nov) from California, USA. PLoS ONE, 12(9), 1–25. https://doi.org/10.1371/journal.pone.0183002 Fernandez-Valverde, S. L., Calcino, A. D., & Degnan, B. M. (2015). Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica. BMC Genomics, 16(1), 1–11. https://doi.org/10.1186/s12864-015-1588-z Finn, R. D., Attwood, T. K., Babbitt, P. C., Bateman, A., Bork, P., Bridge, A. J., … Mitchell, A. L. (2017). InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Research, 45(D1), D190–D199. https://doi.org/10.1093/nar/gkw1107 Fredman, D., Fredman, D., Michaela Schwaiger, ., Fabian Rentzsch, ., & Technau, U. . (2013). nveGenes.vienna130208.fasta. Retrieved April 13, 2018, from https://figshare.com/articles/nveGenes_vienna130208_fasta/807694/1 Fredriksson, R., Lagerstrom, M. C., Lundin, LG., Schioth, H. B. (2003). The G-Protein-Coupled Receptors in the Human Genome Form Five Main Families. Phylogenetic Analysis, Paralogon Groups, and Fingerprints. Molecular Pharmacology, 63(6), 1256–1272. https://doi.org/10.1124/mol.63.6.1256 Furness, J. B., & Stebbing, M. J. (2018). The first brain: Species comparisons and evolutionary implications for the enteric and central nervous systems. Neurogastroenterology and Motility, 30(2), 1–6. https://doi.org/10.1111/nmo.13234 Fuzik, J., Zeisel, A., Mate, Z., Calvigioni, D., Yanagawa, Y., Szabo, G., … Harkany, T. (2016). Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes. Nature Biotechnology, 34(2), 175–183. https://doi.org/10.1038/nbt.3443 Gao, D., Ko, D. C., Tian, X., Yang, G., & Wang, L. (2015). Expression divergence of duplicate genes in the protein kinase superfamily in Pacific oyster. Evolutionary Bioinformatics, 11, 57–65. https://doi.org/10.4137/EBO.S30230 Gardezi, S. R., Li, Q., & Stanley, E. F. (2013). Inter-channel scaffolding of presynaptic CaV2.2 via the C terminal PDZ ligand domain. Biology Open, 2(5), 492–498. https://doi.org/10.1242/bio.20134267 Gerdol, M., Venier, P., & Pallavicini, A. (2015). The genome of the Pacific oyster Crassostrea gigas brings new insights on the massive expansion of the C1q gene family in Bivalvia. Developmental and Comparative Immunology, 49(1), 59–71.

57

https://doi.org/10.1016/j.dci.2014.11.007 Golubovic, A., Kuhn, A., Williamson, M., Kalbacher, H., Holstein, T. W., Grimmelikhuijzen, C. J. P., & Gründer, S. (2007). A peptide-gated ion channel from the freshwater polyp Hydra. Journal of Biological Chemistry, 282(48), 35098–35103. https://doi.org/10.1074/jbc.M706849200 Götz, S., García-Gómez, J. M., Terol, J., Williams, T. D., Nagaraj, S. H., Nueda, M. J., … Conesa, A. (2008). High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research, 36(10), 3420–3435. https://doi.org/10.1093/nar/gkn176 Gracheva, E. O., Hadwiger, G., Nonet, M. L., & Richmond, J. E. (2008). Direct interactions between C. elegans RAB-3 and Rim provide a mechanism to target vesicles to the presynaptic density. Neuroscience Letters, 444(2), 137–142. https://doi.org/10.1016/j.neulet.2008.08.026 Graf, E. R., Valakh, V., Wright, C. M., Wu, C., Liu, Z., Zhang, Y. Q., & DiAntonio, A. (2012). RIM Promotes Calcium Channel Accumulation at Active Zones of the Drosophila Neuromuscular Junction. Journal of Neuroscience, 32(47), 16586–16596. https://doi.org/10.1523/JNEUROSCI.0965-12.2012 Grimmelikhuijzen, C. J. P., & Hauser, F. (2012). Mini-review: The evolution of neuropeptide signaling. Regulatory Peptides, 177(SUPPL.), S6–S9. https://doi.org/10.1016/j.regpep.2012.05.001 Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., … Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8(8), 1494–1512. https://doi.org/10.1038/nprot.2013.084 Hamed, N. O., Al-Ayadhi, L., Osman, M. A., Elkhawad, A. O., Qasem, H., Al-Marshoud, M., … El-Ansary, A. (2018). Understanding the roles of glutamine synthetase, glutaminase, and glutamate decarboxylase autoantibodies in imbalanced excitatory/inhibitory neurotransmission as etiological mechanisms of autism. Psychiatry and Clinical Neurosciences, 72(5), 362–373. https://doi.org/10.1111/pcn.12639 Hardingham, N., Dachtler, J., & Fox, K. (2013). The role of nitric oxide in pre-synaptic plasticity and homeostasis. Frontiers in Cellular Neuroscience, 7(October), 1–19. https://doi.org/10.3389/fncel.2013.00190 Hejnol, A., Obst, M., Stamatakis, A., Ott, M., Rouse, G. W., Edgecombe, G. D., … Dunn, C. W. (2009). Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B: Biological Sciences, 276(1677), 4261–4270. https://doi.org/10.1098/rspb.2009.0896 Holló, G. (2015). A new paradigm for animal symmetry. Interface Focus, 5(6), 20150032. https://doi.org/10.1098/rsfs.2015.0032 Hosono, R., Hekimi, S., Kamiya, Y., Sassa, T., Murakami, S., Nishiwaki, K., … Kodaira, K. -I. (1992). The unc-18 Gene Encodes a Novel Protein Affecting the Kinetics of Acetylcholine Metabolism in the Nematode Caenorhabditis elegans. Journal of Neurochemistry, 58(4), 1517–1525. https://doi.org/10.1111/j.1471-4159.1992.tb11373.x Huelsenbeck, J. P., & Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8), 754–755. https://doi.org/10.1093/bioinformatics/17.8.754 Ishibashi, K., Suzuki, M., & Imai, M. (2000). Molecular cloning of a novel form (two-repeat) protein related to voltage-gated sodium and calcium channels. Biochemical and Biophysical Research Communications, 270(2), 370–376. https://doi.org/10.1006/bbrc.2000.2435 Jákely, G., Paps, J., & Nielsen, C. (2015). The phylogenetic position of ctenophores and the

58

origin(s) of nervous systems. EvoDevo. https://doi.org/10.1186/2041-9139-6-1 Jassal, B., Jupe, S., Caudy, M., Birney, E., Stein, L., Hermjakob, H., & D’Eustachio, P. (2010). The systematic annotation of the three main GPCR families in Reactome. Database : The Journal of Biological Databases and Curation, 2010(July), 1–13. https://doi.org/10.1093/database/baq018 Jones, E. G. (1999). Golgi, Cajal and the Neuron Doctrine. Journal of the History of the Neurosciences, 8(2), 170–178. https://doi.org/10.1076/jhin.8.2.170.1838 Jones, P., Binns, D., Chang, H. Y., Fraser, M., Li, W., McAnulla, C., … Hunter, S. (2014). InterProScan 5: Genome-scale protein function classification. Bioinformatics, 30(9), 1236– 1240. https://doi.org/10.1093/bioinformatics/btu031 Kaeser, P. S., Deng, L., Wang, Y., Dulubova, I., Liu, X., Rizo, J., & Südhof, T. C. (2011). RIM proteins tether Ca2+channels to presynaptic active zones via a direct PDZ-domain interaction. Cell, 144(2), 282–295. https://doi.org/10.1016/j.cell.2010.12.029 Käll, L., Krogh, A., & Sonnhammer, E. L. L. (2004). A combined transmembrane topology and signal peptide prediction method. Journal of Molecular Biology, 338(5), 1027–1036. https://doi.org/10.1016/j.jmb.2004.03.016 Kang, S., Jang, J. H., Price, M. P., Gautam, M., Benson, C. J., Gong, H., … Brennan, T. J. (2012). Simultaneous disruption of mouse ASIC1a, ASIC2 and ASIC3 genes enhances cutaneous mechanosensitivity. PLoS ONE, 7(4). https://doi.org/10.1371/journal.pone.0035225 Kenny, N. J., Chan, K. W., Nong, W., Qu, Z., Maeso, I., Yip, H. Y., … Hui, J. H. L. (2016). Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs. Heredity, 116(2), 190–199. https://doi.org/10.1038/hdy.2015.89 Khanna, R., Li, Q., Sun, L., Collins, T. J., & Stanley, E. F. (2006). N type Ca2+channels and RIM scaffold protein covary at the presynaptic transmitter release face but are components of independent protein complexes. Neuroscience, 140(4), 1201–1208. https://doi.org/10.1016/j.neuroscience.2006.04.053 Khvotchev, M., Dulubova, I., Sun, J., Dai, H., Rizo, J., & Sudhof, T. C. (2007). Dual Modes of Munc18-1/SNARE Interactions Are Coupled by Functionally Critical Binding to Syntaxin- 1 N Terminus. Journal of Neuroscience, 27(45), 12147–12155. https://doi.org/10.1523/JNEUROSCI.3655-07.2007 Kim, Daehwan; Pertea, Geo; Trapnell, Cole; Pimentel, Harold; Kelly, Ryan; Salzberg, S. L. (2013). TopHat2 : accurate alignment of transcriptomes in the presence of insertions , deletions and gene fusions. Genome Biology, 14(4), R36. https://doi.org/10.1101/000851 Kim, E. Y., Schrader, N., Smolinsky, B., Bedet, C., Vannier, C., Schwarz, G., & Schindelin, H. (2006). Deciphering the structural framework of glycine receptor anchoring by gephyrin. The EMBO Journal, 25(6), 1385–1395. https://doi.org/10.1038/sj.emboj.7601029 Kim, J., Kim, I., Yang, J.-S., Shin, Y.-E., Hwang, J., Park, S., … Kim, S. (2012). Rewiring of PDZ Domain-Ligand Interaction Network Contributed to Eukaryotic Evolution. PLoS Genetics, 8(2), e1002510. https://doi.org/10.1371/journal.pgen.1002510 King, N., Westbrook, M. J., Young, S. L., Kuo, A., Abedin, M., Chapman, J., … Rokhsar, D. (2008). The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature, 451(7180), 783–788. https://doi.org/10.1038/nature06617 Kiyonaka, S., Wakamori, M., Miki, T., Uriu, Y., Nonaka, M., Bito, H., … Mori, Y. (2007). RIM1 confers sustained activity and neurotransmitter vesicle anchoring to presynaptic Ca2+ channels. Nature Neuroscience, 10(6), 691–701. https://doi.org/10.1038/nn1904 Kocot, K. M., Aguilera, F., McDougall, C., Jackson, D. J., & Degnan, B. M. (2016). Sea shell diversity and rapidly evolving secretomes: insights into the evolution of biomineralization.

59

Frontiers in Zoology, 13(1), 23. https://doi.org/10.1186/s12983-016-0155-z Koene, J. M., Sloot, W., Montagne-Wajer, K., Cummins, S. F., Degnan, B. M., Smith, J. S., … ter Maat, A. (2010). Male accessory gland protein reduces egg laying in a simultaneous hermaphrodite. PLoS ONE, 5(4), 1–7. https://doi.org/10.1371/journal.pone.0010117 Krishnan, A., Almén, M. S., Fredriksson, R., & Schiöth, H. B. (2012). The origin of GPCRs: Identification of mammalian like rhodopsin, adhesion, glutamate and frizzled GPCRs in fungi. PLoS ONE, 7(1), 1–15. https://doi.org/10.1371/journal.pone.0029817 Krishnan, A., Dnyansagar, R., Almén, M. S., Williams, M. J., Fredriksson, R., Manoj, N., & Schiöth, H. B. (2014). The GPCR repertoire in the demosponge Amphimedon queenslandica: insights into the GPCR system at the early divergence of animals. BMC Evolutionary Biology, 14(1), 270. https://doi.org/10.1186/s12862-014-0270-4 Krogh, A., Larsson, B., Von Heijne, G., & Sonnhammer, E. L. L. (2001). Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. Journal of Molecular Biology, 305(3), 567–580. https://doi.org/10.1006/jmbi.2000.4315 Kuhlendahl, S., Spangenberg, O., Konrad, M., Kim, E., & Garner, C. C. (1998). Functional analysis of the guanylate kinase-like domain in the synapse-associated protein SAP97. European Journal of Biochemistry, 252(2), 305–313. https://doi.org/10.1046/j.1432- 1327.1998.2520305.x Kumar, S., Stecher, G., Peterson, D., & Tamura, K. (2012). MEGA-CC: Computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics, 28(20), 2685–2686. https://doi.org/10.1093/bioinformatics/bts507 Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution, 33(7), 1870– 1874. https://doi.org/10.1093/molbev/msw054 Lagostena, L., Festa, M., Pusch, M., & Carpaneto, A. (2017). The human two-pore channel 1 is modulated by cytosolic and luminal calcium. Scientific Reports, 7(March), 1–11. https://doi.org/10.1038/srep43900 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. https://doi.org/10.1038/nmeth.1923 Laumer, C., Gruber-Vodicka, H., Hadfield, M. G., Pearse, V. B., Riesgo, A., Marioni, J. C., & Giribet, G. (2017). Placozoans are eumetazoans related to Cnidaria. BioRxiv, 1–29. https://doi.org/10.1101/200972 Leys, S. P., & Meech, R. W. (2006). Physiology of coordination in sponges. Canadian Journal of Zoology. https://doi.org/10.1139/z05-171 Li, B., & Dewey, C. N. (2011). RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12. https://doi.org/10.1186/1471- 2105-12-323 Li, W., & Godzik, A. (2006). Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22(13), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158 Lindsey, S. H., Carver, K. A., Prossnitz, E. R., & Chappell, M. C. (2011). Vasodilation in Response to the GPR30 Agonist G-1 is Not Different From Estradiol in the mRen2.Lewis Female Rat. Journal of Cardiovascular Pharmacology, 57(5), 598–603. https://doi.org/10.1097/FJC.0b013e3182135f1c Luo, Y.-J., Kanda, M., Koyanagi, R., Hisata, K., Akiyama, T., Sakamoto, H., … Satoh, N. (2018). Nemertean and phoronid genomes reveal lophotrochozoan evolution and the origin of bilaterian heads. Nature Ecology & Evolution, 2(1), 141–151.

60

https://doi.org/10.1038/s41559-017-0389-y Marder, E. (2012). Neuromodulation of Neuronal Circuits: Back to the Future. Neuron, 76(1), 1– 11. https://doi.org/10.1016/j.neuron.2012.09.010 Marrone, G. F., Lu, Z., Rossi, G., Narayan, A., Hunkele, A., Marx, S., … Pasternak, G. W. (2016). Tetrapeptide Endomorphin Analogs Require Both Full Length and Truncated Splice Variants of the Mu Opioid Receptor Gene Oprm1 for Analgesia. ACS Chemical Neuroscience, 7(12), 1717–1727. https://doi.org/10.1021/acschemneuro.6b00240 Mayorova, T. D., Smith, C. L., Hammar, K., Winters, C. A., Pivovarova, N. B., Aronova, M. A., … Reese, T. S. (2018). Cells containing aragonite crystals mediate responses to gravity in Trichoplax adhaerens ( Placozoa ), an animal lacking neurons and synapses. PLoS ONE, 13(1), 1–20. https://doi.org/10.1371/journal.pone.0190905 Moreland, R., Nguyen, A.-D., Ryan, J. F., Schnitzler, C. E., Koch, B. J., Siewert, K., … Baxevanis, A. D. (2014). A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi. BMC Genomics, 15(1), 316. https://doi.org/10.1186/1471-2164-15-316 Moroz, L. L. (2009). On the independent origins of complex brains and neurons. Brain, Behavior and Evolution. https://doi.org/10.1159/000258665 Moroz, L. L., Kocot, K. M., Citarella, M. R., Dosung, S., Norekian, T. P., Povolotskaya, I. S., … Kohn, A. B. (2014). The ctenophore genome and the evolutionary origins of neural systems. Nature. https://doi.org/10.1038/nature13400 Mukhtarov, M. R., Urazaev, A. K., Nikolsky, E. E., & Vyskočil, F. (2000). Effect of nitric oxide and NO synthase inhibition on nonquantal acetylcholine release in the rat diaphragm. European Journal of Neuroscience, 12(3), 980–986. https://doi.org/10.1046/j.1460- 9568.2000.00992.x Nam, C. I., & Chen, L. (2005). Postsynaptic assembly induced by neurexin-neuroligin interaction and neurotransmitter. Proceedings of the National Academy of Sciences, 102(17), 6137–6142. https://doi.org/10.1073/pnas.0502038102 Nguyen, L. T., Schmidt, H. A., Von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1), 268–274. https://doi.org/10.1093/molbev/msu300 Nikitin, M. (2015). Bioinformatic prediction of Trichoplax adhaerens regulatory peptides. General and Comparative Endocrinology, 212, 145–155. https://doi.org/10.1016/j.ygcen.2014.03.049 Nikonenko, I., Boda, B., Steen, S., Knott, G., Welker, E., & Muller, D. (2008). PSD-95 promotes synaptogenesis and multiinnervated spine formation through nitric oxide signaling. Journal of Cell Biology, 183(6), 1115–1127. https://doi.org/10.1083/jcb.200805132 Nordstrom, K. J. V., Lagerstrom, M. C., Waller, L. M. J., Fredriksson, R., & Schioth, H. B. (2008). The Secretin GPCRs Descended from the Family of Adhesion GPCRs. Molecular Biology and Evolution, 26(1), 71–84. https://doi.org/10.1093/molbev/msn228 O’Leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., McVeigh, R., … Pruitt, K. D. (2016). Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Research, 44(D1), D733–D745. https://doi.org/10.1093/nar/gkv1189 Oliva, C., Escobedo, P., Astorga, C., Molina, C., & Sierralta, J. (2012). Role of the maguk protein family in synapse formation and function. Developmental Neurobiology, 72(1), 57– 72. https://doi.org/10.1002/dneu.20949 Orsini, L., Orsini, L., Gilbert, D., Podicheti, R., Jansen, M., & Brown, J. B. (2016). Data Descriptor : Daphnia magna transcriptome by RNA-Seq across 12 environmental stressors. Nature Scientific Data, (May), 1–15. https://doi.org/10.1038/sdata.2016.30

61

Paps, J. (2018). What Makes an Animal? The Molecular Quest for the Origin of the Animal Kingdom. Integrative and Comparative Biology. https://doi.org/10.1093/icb/icy036 Paps, J., & Holland, P. W. H. (2018). Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nature Communications, 9(1), 1–8. https://doi.org/10.1038/s41467-018-04136-5 Petersen, T. N., Brunak, S., Von Heijne, G., & Nielsen, H. (2011). SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nature Methods, 8(10), 785–786. https://doi.org/10.1038/nmeth.1701 Philippe, H., Derelle, R., Lopez, P., Pick, K., Borchiellini, C., Boury-Esnault, N., … Manuel, M. (2009). Phylogenomics Revives Traditional Views on Deep Animal Relationships. Current Biology, 19(8), 706–712. https://doi.org/10.1016/j.cub.2009.02.052 Pick, K. S., Philippe, H., Schreiber, F., Erpenbeck, D., Jackson, D. J., Wrede, P., … Wörheide, G. (2010). Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Molecular Biology and Evolution, 27(9), 1983–1987. https://doi.org/10.1093/molbev/msq089 Poglia, L., Muller, D., & Nikonenko, I. (2011). Ultrastructural modifications of spine and synapse morphology by SAP97. Hippocampus, 21(9), 990–998. https://doi.org/10.1002/hipo.20811 Poroca, D. R., Pelis, R. M., & Chappe, V. M. (2017). ClC channels and transporters: Structure, physiological functions, and implications in human chloride channelopathies. Frontiers in Pharmacology, 8(MAR), 1–25. https://doi.org/10.3389/fphar.2017.00151 Putnam, N. H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., … Rokhsar, D. S. (2007). Sea anemone genome reveals the gene repertoire and genomic organization of the eumetazoan ancestor. Siencemag, (July), 86–94. https://doi.org/10.1126/science.1139158 Qiao, G. F., Qian, Z., Sun, H. L., Xu, W. X., Yan, Z. Y., Liu, Y., … Fu, Y. (2013). Remodeling of Hyperpolarization-Activated Current, Ih, in Ah-Type Visceral Ganglion Neurons Following Ovariectomy in Adult Rats. PLoS ONE, 8(8). https://doi.org/10.1371/journal.pone.0071184 Rahman, T., Cai, X., Brailoiu, G. C., Abood, M. E., Brailoiu, E., & Patel, S. (2014). Two-pore channels provide insight into the evolution of voltage-gated Ca2+ and Na+ channels. Science Signaling, 7(352), ra109-ra109. https://doi.org/10.1126/scisignal.2005450 Ramoino, P., Ledda, F. D., Ferrando, S., Gallus, L., Bianchini, P., Diaspro, A., … Manconi, R. (2011). Metabotropic γ-aminobutyric acid (GABAB) receptors modulate feeding behavior in the calcisponge Leucandra aspera. Journal of Experimental Zoology Part A: Ecological Genetics and Physiology, 315 A(3), 132–140. https://doi.org/10.1002/jez.657 Reiss, J., Gross-Hardt, S., Christensen, E., Schmidt, P., Mendel, R. R., & Schwarz, G. (2001). A mutation in the gene for the neurotransmitter receptor-clustering protein gephyrin causes a novel form of molybdenum cofactor deficiency. Am J Hum Genet, 68(1), 208–213. https://doi.org/S0002-9297(07)62485-9 [pii]\n10.1086/316941 Reiss, J., & Johnson, J. L. (2003). Mutations in the molybdenum cofactor biosynthetic genes MOCS1, MOCS2, and GEPH. Human Mutation, 21(6), 569–576. https://doi.org/10.1002/humu.10223 Riesgo, A., Farrar, N., Windsor, P. J., Giribet, G., & Leys, S. P. (2014). The analysis of eight transcriptomes from all poriferan classes reveals surprising genetic complexity in sponges. Molecular Biology and Evolution, 31(5), 1102–1120. https://doi.org/10.1093/molbev/msu057 Rietdorf, K., Funnell, T. M., Ruas, M., Heinemann, J., Parrington, J., & Galione, A. (2011). Two-pore channels form homo- and heterodimers. Journal of Biological Chemistry,

62

286(43), 37058–37062. https://doi.org/10.1074/jbc.C111.289835 Ringrose, J. H., Van Den Toorn, H. W. P., Eitel, M., Post, H., Neerincx, P., Schierwater, B., … Heck, A. J. R. (2013). Deep proteome profiling of Trichoplax adhaerens reveals remarkable features at the origin of metazoan multicellularity. Nature Communications, 4(May 2012), 1407–1408. https://doi.org/10.1038/ncomms2424 Rizvi, A. H., Camara, P. G., Kandror, E. K., Roberts, T. J., Schieren, I., Maniatis, T., & Rabadan, R. (2017). Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nature Biotechnology, 35(6), 551–560. https://doi.org/10.1038/nbt.3854 Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S. D., … Birol, I. (2010). De novo assembly and analysis of RNA-seq data. Nature Methods, 7(11), 909–912. https://doi.org/10.1038/nmeth.1517 Rotem, A., Ram, O., Shoresh, N., Sperling, R. A., Schnall-Levin, M., Zhang, H., … Weitz, D. A. (2015). High-throughput single-cell labeling (Hi-SCL) for RNA-Seq using drop-based microfluidics. PLoS ONE, 10(5), 1–14. https://doi.org/10.1371/journal.pone.0116328 Ruthmann, A., Behrendt, G., & Wahl, R. (1986). The ventral epithelium of Trichoplax adhaerens (Placozoa): Cytoskeletal structures, cell contacts and endocytosis. Zoomorphology, 106(2), 115–122. https://doi.org/10.1007/BF00312113 Ryan, J. F., & Chiodin, M. (2015). Where is my mind? How sponges and placozoans may have lost neural cell types. Philosophical Transactions of the Royal Society B: Biological Sciences. https://doi.org/10.1098/rstb.2015.0059 Ryan, J. F., Pang, K., Schnitzler, C. E., Nguyen, A.-D., Moreland, R. T., Simmons, D. K., … Baxevanis, A. D. (2013). The Genome of the Ctenophore Mnemiopsis leidyi and Its Implications for Cell Type Evolution. Science. https://doi.org/10.1126/science.1242592 Sakarya, O., Armstrong, K. A., Adamska, M., Adamski, M., Wang, I. F., Tidor, B., … Kosik, K. S. (2007). A Post-Synaptic Scaffold at the Origin of the Animal Kingdom. PLoS ONE. https://doi.org/10.1371/journal.pone.0000506 Sakarya, O., Conaco, C., Eǧecioǧlu, Ö., Solla, S. A., Oakley, T. H., & Kosik, K. S. (2010). Evolutionary expansion and specialization of the PDZ domains. Molecular Biology and Evolution, 27(5), 1058–1069. https://doi.org/10.1093/molbev/msp311 Sala, C., Vicidomini, C., Bigi, I., Mossa, A., & Verpelli, C. (2015). Shank synaptic scaffold proteins: Keys to understanding the pathogenesis of autism and other synaptic disorders. Journal of Neurochemistry, 135(5), 849–858. https://doi.org/10.1111/jnc.13232 Sans, N., Racca, C., Petralia, R. S., Wang, Y. X., McCallum, J., & Wenthold, R. J. (2001). Synapse-associated protein 97 selectively associates with a subset of AMPA receptors early in their biosynthetic pathway. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 21(19), 7506–7516. https://doi.org/21/19/7506 [pii] Scheiffele, P., Fan, J., Choih, J., Fetter, R., & Serafini, T. (2000). Neuroligin expressed in nonneuronal cells triggers presynaptic development in contacting axons. Cell, 101(6), 657– 669. https://doi.org/10.1016/S0092-8674(00)80877-6 Schlüter, O. M., Xu, W., & Malenka, R. C. (2006). Alternative N-Terminal Domains of PSD-95 and SAP97 Govern Activity-Dependent Regulation of Synaptic AMPA Receptor Function. Neuron, 51(1), 99–111. https://doi.org/10.1016/j.neuron.2006.05.016 Schulz, M. H., Zerbino, D. R., Vingron, M., & Birney, E. (2012). Oases: Robust de novo RNA- seq assembly across the dynamic range of expression levels. Bioinformatics, 28(8), 1086– 1092. https://doi.org/10.1093/bioinformatics/bts094 Schulze, K. L., Littleton, J. T., Salzberg, A., Halachmi, N., Stern, M., Lev, Z., & Bellen, H. J. (1994). Rop, a drosophila homolog of yeast Sec1 and vertebrate n-Sect/Munc-18 proteins, is

63

a negative regulator of neurotransmitter release in vivo. Neuron, 13(5), 1099–1108. https://doi.org/10.1016/0896-6273(94)90048-5 Senatore, A., Raiss, H., & Le, P. (2016). Physiology and evolution of voltage-gated calcium channels in early diverging animal phyla: Cnidaria, placozoa, porifera and ctenophora. Frontiers in Physiology. https://doi.org/10.3389/fphys.2016.00481 Senatore, A., Reese, T. S., & Smith, C. L. (2017). Neuropeptidergic integration of behavior in Trichoplax adhaerens , an animal without synapses. The Journal of Experimental Biology, 220(18), 3381–3390. https://doi.org/10.1242/jeb.162396 Shi, Y., Abe, C., Holloway, B. B., Shu, S., Kumar, N. N., Weaver, J. L., … Bayliss, D. A. (2016). Nalcn Is a “Leak” Sodium Channel That Regulates Excitability of Brainstem Chemosensory Neurons and Breathing. Journal of Neuroscience, 36(31), 8174–8187. https://doi.org/10.1523/JNEUROSCI.1096-16.2016 Simakov, O., Kawashima, T., Marlétaz, F., Jenkins, J., Koyanagi, R., Mitros, T., … Gerhart, J. (2015). Hemichordate genomes and deuterostome origins. Nature, 527(7579), 459–465. https://doi.org/10.1038/nature16150 Smith, C. L., Abdallah, S., Wong, Y. Y., Le, P., Harracksingh, A. N., Artinian, L., … Senatore, A. (2017). Evolutionary insights into T-type Ca 2+ channel structure, function, and ion selectivity from the Trichoplax adhaerens homologue. The Journal of General Physiology, 149(4), 483–510. https://doi.org/10.1085/jgp.201611683 Smith, C. L., Pivovarova, N., & Reese, T. S. (2015). Coordinated feeding behavior in trichoplax, an animal without synapses. PLoS ONE, 10(9), 1–16. https://doi.org/10.1371/journal.pone.0136098 Smith, C. L., Varoqueaux, F., Kittelmann, M., Azzam, R. N., Cooper, B., Winters, C. A., … Reese, T. S. (2014). Novel cell types, neurosecretory cells, and body plan of the early- diverging metazoan Trichoplax adhaerens. Current Biology, 24(14), 1565–1572. https://doi.org/10.1016/j.cub.2014.05.046 Southey, B. R., Amare, A., Zimmerman, T. A., Rodriguez-Zas, S. L., & Sweedler, J. V. (2006). NeuroPred: A tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides. Nucleic Acids Research, 34(WEB. SERV. ISS.), 267–272. https://doi.org/10.1093/nar/gkl161 Srivastava, M., Begovic, E., Chapman, J., Putnam, N. H., Hellsten, U., Kawashima, T., … Rokhsar, D. S. (2008). The Trichoplax genome and the nature of placozoans. Nature, 454(7207), 955–960. https://doi.org/10.1038/nature07191 Strong, M., Chandy, K. G., & Gutman, G. A. (1993). Molecular evolution of voltage-sensitive ion channel genes: on the origins of electrical excitability. Molecular Biology and Evolution, 10(1), 221–242. https://doi.org/10.1093/oxfordjournals.molbev.a039986 Südhof, T. C. (2012). The presynaptic active zone. Neuron, 75(1), 11–25. https://doi.org/10.1016/j.neuron.2012.06.012 Syed, T., & Schierwater, B. (2002). Trichoplax adhaerens: Discovered as a missing link, forgotten as a hydrozoan, re-discovered as a key to Metozoan evolution. Vie et Milieu, 52(4), 177–187. Takahashi, M., Miyata, S., Fujii, J., Inai, Y., Ueyama, S., Araki, M., … Kuroki, Y. (2012). In vivo role of aldehyde reductase. Biochimica et Biophysica Acta - General Subjects, 1820(11), 1787–1796. https://doi.org/10.1016/j.bbagen.2012.07.003 Te Velthuis, A. J. W., Admiraal, J. F., & Bagowski, C. P. (2007). Molecular evolution of the MAGUK family in metazoan genomes. BMC Evolutionary Biology, 7, 1–10. https://doi.org/10.1186/1471-2148-7-129 Thiemann, M., & Ruthmann, A. (1989). Microfilaments and microtubules in isolated fiber cells

64

of Trichoplax adhaerens (Placozoa). Zoomorphology, 109(2), 89–96. https://doi.org/10.1007/BF00312314 Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., … Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7(3), 562–578. https://doi.org/10.1038/nprot.2012.016 Tretter, V., Mukherjee, J., Maric, H.-M., Schindelin, H., Sieghart, W., & Moss, S. J. (2012). Gephyrin, the enigmatic organizer at GABAergic synapses. Frontiers in Cellular Neuroscience, 6(May), 1–16. https://doi.org/10.3389/fncel.2012.00023 Tyagarajan, S. K., Ghosh, H., Yevenes, G. E., Nikonenko, I., Ebeling, C., Schwerdel, C., … Fritschy, J.-M. (2011). Regulation of GABAergic synapse formation and plasticity by GSK3 -dependent phosphorylation of gephyrin. Proceedings of the National Academy of Sciences, 108(1), 379–384. https://doi.org/10.1073/pnas.1011824108 Ueda, T., Koya, S., & Maruyama, Y. K. (1999). Dynamic patterns in the locomotion and feeding behaviors by the placozoan Trichoplax adhaerence. BioSystems, 54(1–2), 65–70. https://doi.org/10.1016/S0303-2647(99)00066-0 Uemura, T., Mori, H., & Mishina, M. (2004). Direct interaction of GluRδ2 with Shank scaffold proteins in cerebellar Purkinje cells. Molecular and Cellular Neuroscience, 26(2), 330–341. https://doi.org/10.1016/j.mcn.2004.02.007 Veenstra, J. A. (2011). Neuropeptide evolution: Neurohormones and neuropeptides predicted from the genomes of Capitella teleta and Helobdella robusta. General and Comparative Endocrinology, 171(2), 160–175. https://doi.org/10.1016/j.ygcen.2011.01.005 Viklund, H., Bernsel, A., Skwark, M., & Elofsson, A. (2008). SPOCTOPUS: A combined predictor of signal peptides and membrane protein topology. Bioinformatics, 24(24), 2928– 2929. https://doi.org/10.1093/bioinformatics/btn550 Wang, Y., Okamoto, M., Schmitz, F., Hofmann, K., & Südhof, T. C. (1997). Rim is a putative rab3 effector in regulating synaptic-vesicle fusion. Nature, 388(6642), 593–598. https://doi.org/10.1038/41580 White, S. L., Ortinski, P. I., Friedman, S. H., Zhang, L., Neve, R. L., Kalb, R. G., … Pierce, R. C. (2016). A Critical Role for the GluA1 Accessory Protein, SAP97, in Cocaine Seeking. Neuropsychopharmacology, 41(3), 736–750. https://doi.org/10.1038/npp.2015.199 Wistrand, M., Käll, L., & Sonnhammer, E. L. L. (2006). A general model of G protein-coupled receptor sequences and its application to detect remote homologs. Protein Science, 15(3), 509–521. https://doi.org/10.1110/ps.051745906 Wolfe, D., Hao, S., Hu, J., Srinivasan, R., Goss, J., Mata, M., … Glorioso, J. C. (2007). Engineering an endomorphin-2 gene for use in neuropathic pain therapy. Pain, 133(1–3), 29–38. https://doi.org/10.1016/j.pain.2007.02.003 Wong, F. K., Nath, A. R., Chen, R. H. C., Gardezi, S. R., Li, Q., & Stanley, E. F. (2014). Synaptic vesicle tethering and the CaV2.2 distal C-terminal. Frontiers in Cellular Neuroscience, 8(March), 1–12. https://doi.org/10.3389/fncel.2014.00071 Wong, F. K., & Stanley, E. F. (2010). Rab3a interacting molecule (RIM) and the tethering of pre-synaptic transmitter release site-associated CaV2.2 calcium channels. Journal of Neurochemistry, 112(2), 463–473. https://doi.org/10.1111/j.1471-4159.2009.06466.x Wu, T. D., & Nacu, S. (2010). Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics, 26(7), 873–881. https://doi.org/10.1093/bioinformatics/btq057 Wu, T. D., & Watanabe, C. K. (2005). GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21(9), 1859–1875. https://doi.org/10.1093/bioinformatics/bti310

65

Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., … Wang, J. (2014). SOAPdenovo- Trans: De novo transcriptome assembly with short RNA-Seq reads. Bioinformatics, 30(12), 1660–1666. https://doi.org/10.1093/bioinformatics/btu077 Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., … Chen, S. (2012). FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads. PLoS ONE, 7(12), 1–6. https://doi.org/10.1371/journal.pone.0052249 Zadina, J. E., Hackler, L., Ge, L. J., & Kastin, A. J. (1997). A potent and selective endogenous agonist for the µ-opiate receptor. Nature. https://doi.org/10.1038/386499a0 Zhang, H. (2005). Association of CaV1.3 L-Type Calcium Channels with Shank. Journal of Neuroscience, 25(5), 1037–1049. https://doi.org/10.1523/JNEUROSCI.4554-04.2005 Zhang, Z. Q. (2013). Animal biodiversity: An update of classification and diversity in 2013. Zootaxa, 3703(1), 5–11. https://doi.org/10.11646/zootaxa.3703.1.3 Zhu, X., Heunks, L. M. A., Ennen, L., Machiels, H. A., Van Der Heijden, H. F. M., & Dekhuijzen, P. N. R. (2006). Nitric oxide modulates neuromuscular transmission during hypoxia in rat diaphragm. Muscle & Nerve, 33(1), 104–112. https://doi.org/10.1002/mus.20445

66

Table 1 – Top 30 Gene Ontology Categories for Biological Process Enriched in the Top 1000 Highly-Expressed T. adhaerens Genes

Nr Non Annot Non Annot GO ID GO Name FDR P-Value Nr Test Reference Test Reference Cotranslational protein targeting GO:0006613 5.82E-67 2.65E-70 73 8 927 10974 to membrane GO:0045047 Protein targeting to ER 4.89E-65 2.54E-68 72 9 928 10973 SRP-dependent cotranslational GO:0006614 6.03E-65 3.53E-68 71 8 929 10974 protein targeting to membrane Establishment of protein GO:0072599 localization to endoplasmic 1.10E-63 7.86E-67 73 12 927 10970 reticulum Protein localization to GO:0070972 2.90E-60 2.26E-63 77 23 923 10959 endoplasmic reticulum Nuclear-transcribed mRNA GO:0000184 catabolic process, nonsense- 3.93E-59 3.32E-62 74 20 926 10962 mediated decay GO:0006612 Protein targeting to membrane 4.36E-52 4.25E-55 85 58 915 10924 Organonitrogen compound GO:1901564 1.40E-51 1.45E-54 292 1134 708 9848 metabolic process Nuclear-transcribed mRNA GO:0000956 6.78E-46 7.93E-49 83 72 917 10910 catabolic process GO:0006402 mRNA catabolic process 8.40E-45 1.20E-47 84 79 916 10903 GO:0006401 RNA catabolic process 1.41E-44 2.11E-47 87 89 913 10893 Establishment of protein GO:0090150 3.72E-44 5.81E-47 106 157 894 10825 localization to membrane GO:0019083 Viral transcription 5.97E-43 9.70E-46 70 47 930 10935 GO:0016032 Viral process 2.45E-42 4.14E-45 121 230 879 10752 Interspecies interaction between GO:0044419 5.01E-42 8.79E-45 128 264 872 10718 organisms GO:0019058 Viral life cycle 1.48E-41 2.70E-44 101 152 899 10830 GO:0044764 Multi-organism cellular process 3.72E-41 7.25E-44 122 243 878 10739 Symbiosis, encompassing GO:0044403 2.07E-40 4.17E-43 123 253 877 10729 mutualism through parasitism GO:0006413 Translational initiation 2.19E-40 4.55E-43 80 84 920 10898 GO:0043603 Cellular amide metabolic process 2.42E-40 5.20E-43 155 415 845 10567 Aromatic compound catabolic GO:0019439 3.02E-40 6.67E-43 107 184 893 10798 process Nucleobase-containing compound GO:0034655 3.14E-40 7.29E-43 99 152 901 10830 catabolic process GO:0046700 Heterocycle catabolic process 3.14E-40 7.35E-43 105 176 895 10806 GO:0009056 Catabolic process 3.95E-40 9.51E-43 232 890 768 10092 GO:0019080 Viral gene expression 5.02E-40 1.24E-42 71 59 929 10923 Establishment of protein GO:0072594 7.68E-40 1.95E-42 120 244 880 10738 localization to organelle GO:0044033 Multi-organism metabolic process 1.20E-39 3.12E-42 73 66 927 10916 GO:0006518 Peptide metabolic process 3.99E-39 1.06E-41 142 358 858 10624 GO:0051704 Multi-organism process 4.65E-39 1.27E-41 168 502 832 10480 Organic substance catabolic GO:1901575 5.90E-39 1.65E-41 221 835 779 10147 process *Nr = Non-redundant Sequence Count, Test = Top 1000 Expressed Gene Subset, Reference = Evg-final Transcriptome, Non Annot (Non- annotated)

67

Table 2 – Top 30 Gene Ontology Categories for Cellular Component Enriched in the Top 1000 Highly-Expressed T. adhaerens Genes

Nr Non Annot Non Annot GO ID GO Name FDR P-Value Nr Test Reference Test Reference GO:0070062 Extracellular exosome 3.29E-113 2.14E-117 392 1077 608 9905 GO:1903561 Extracellular vesicle 9.63E-112 1.88E-115 392 1095 608 9887 GO:0043230 Extracellular organelle 9.63E-112 1.88E-115 392 1095 608 9887 GO:0005576 Extracellular region 5.44E-92 1.41E-95 437 1616 563 9366 GO:0044421 Extracellular region part 8.54E-92 2.77E-95 419 1490 581 9492 GO:0031982 Vesicle 1.07E-89 4.19E-93 463 1841 537 9141 GO:0005829 Cytosol 2.74E-64 1.78E-67 323 1172 677 9810 GO:0022626 Cytosolic ribosome 1.47E-57 1.34E-60 67 12 933 10970 GO:0044445 Cytosolic part 8.02E-46 9.91E-49 80 64 920 10918 GO:0044391 Ribosomal subunit 8.08E-46 1.05E-48 69 37 931 10945 GO:0044444 Cytoplasmic part 3.97E-45 5.42E-48 619 4177 381 6805 GO:0005840 Ribosome 3.60E-41 6.78E-44 75 66 925 10916 GO:0005737 Cytoplasm 4.58E-35 1.55E-37 673 5085 327 5897 GO:0022625 Cytosolic large ribosomal subunit 3.54E-34 1.24E-36 40 7 960 10975 GO:0005925 Focal adhesion 5.47E-30 2.17E-32 84 153 916 10829 GO:0005924 Cell-substrate adherens junction 1.53E-29 6.25E-32 84 156 916 10826 GO:0030055 Cell-substrate junction 1.51E-28 6.85E-31 84 163 916 10819 GO:0015934 Large ribosomal subunit 2.57E-28 1.22E-30 41 19 959 10963 GO:0070161 Anchoring junction 7.86E-28 3.78E-30 94 215 906 10767 GO:0005912 Adherens junction 9.86E-28 4.81E-30 93 211 907 10771 GO:0032991 Macromolecular complex 5.27E-27 2.64E-29 359 2174 641 8808 GO:0043227 Membrane-bounded organelle 1.54E-26 7.92E-29 689 5570 311 5412 GO:0098796 Membrane protein complex 1.90E-25 1.01E-27 86 196 914 10786 GO:0043209 Myelin sheath 2.69E-25 1.45E-27 42 29 958 10953 GO:0022627 Cytosolic small ribosomal subunit 2.13E-23 1.24E-25 27 4 973 10978 Intracellular ribonucleoprotein GO:0030529 2.27E-23 1.36E-25 117 387 883 10595 complex GO:1990904 Ribonucleoprotein complex 2.27E-23 1.36E-25 117 387 883 10595 GO:0044446 Intracellular organelle part 9.63E-23 6.07E-25 543 4111 457 6871 Inner mitochondrial membrane GO:0098800 1.51E-20 1.08E-22 30 14 970 10968 protein complex GO:0044422 Organelle part 1.27E-19 9.39E-22 549 4303 451 6679 *Nr = Non-redundant Sequence Count, Test = Top 1000 Expressed Gene Subset, Reference = Evg-final Transcriptome

68

Table 3 – Top 30 Gene Ontology Categories for Molecular Functions Enriched in the Top 1000 Highly-Expressed T. adhaerens Genes

Nr Non Annot Non Annot GO ID GO Name FDR P-Value Nr Test Reference Test Reference GO:0003735 Structural constituent of ribosome 2.99E-46 3.30E-49 70 38 930 10944 GO:0005198 Structural molecule activity 1.72E-36 5.36E-39 92 146 908 10836 GO:0003723 RNA binding 4.67E-21 3.16E-23 117 418 883 10564 GO:0038023 Signaling receptor activity 6.00E-18 4.91E-20 8 757 992 10225 Transmembrane signaling GO:0004888 3.81E-17 3.22E-19 7 708 993 10274 receptor activity GO:0099600 Transmembrane receptor activity 1.00E-15 9.26E-18 9 719 991 10263 GO:0016491 Oxidoreductase activity 1.06E-15 9.85E-18 112 469 888 10513 GO:0005515 Protein binding 2.28E-15 2.19E-17 596 5006 404 5976 GO:0004872 Receptor activity 8.51E-15 8.79E-17 15 825 985 10157 GO:0060089 Molecular transducer activity 8.51E-15 8.79E-17 15 825 985 10157 GO:1901363 Heterocyclic compound binding 1.80E-14 1.88E-16 267 1751 733 9231 GO:0004871 Signal transducer activity 2.93E-14 3.10E-16 20 910 980 10072 GO:0005488 Binding 4.08E-14 4.40E-16 683 6054 317 4928 G-protein coupled receptor GO:0004930 4.33E-14 4.70E-16 5 570 995 10412 activity GO:0043167 Ion binding 1.09E-13 1.22E-15 238 1521 762 9461 GO:0097159 Organic cyclic compound binding 4.13E-13 4.75E-15 270 1828 730 9154 GO:0051082 Unfolded protein binding 6.36E-13 7.35E-15 25 24 975 10958 GO:0019843 rRNA binding 6.35E-12 7.88E-14 22 19 978 10963 Hydrogen ion transmembrane GO:0015078 1.21E-11 1.53E-13 22 20 978 10962 transporter activity Cation-transporting ATPase GO:0019829 2.54E-11 3.33E-13 19 13 981 10969 activity ATPase activity, coupled to GO:0044769 transmembrane movement of 2.95E-11 3.91E-13 14 3 986 10979 ions, rotational mechanism GO:0043169 Cation binding 1.27E-10 1.74E-12 138 776 862 10206 GO:0046872 Metal ion binding 5.87E-10 8.62E-12 131 739 869 10243 GO:0003729 mRNA binding 1.66E-09 2.61E-11 28 53 972 10929 GO:0036094 Small molecule binding 1.78E-09 2.83E-11 143 851 857 10131 GO:0003676 Nucleic acid binding 2.38E-09 3.85E-11 160 995 840 9987 Monovalent inorganic cation GO:0015077 transmembrane transporter 2.96E-09 4.87E-11 35 88 965 10894 activity GO:0008233 Peptidase activity 3.17E-08 5.82E-10 62 263 938 10719 GO:0032403 Protein complex binding 4.10E-08 7.64E-10 57 231 943 10751 GO:0016787 Hydrolase activity 5.39E-08 1.04E-09 191 1321 809 9661 *Nr = Non-redundant Sequence Count, Test = Top 1000 Expressed Gene Subset, Reference = Evg-final Transcriptome

69

Table 4 – T. adhaerens Digestive-related Secreted Proteins with top TPM Expression

23 17 22 22 22 22 23 20 172 20 20

------Val - E 3.57E 5.26E 1.67E 1.17E 1.78E 6.60E 2.50E 5.72E 1.21E 1.48E 2.37E

Score 99.8 99.8 99 77.4 99.8 99 99 97.8 99.8 98.6 508

Processes Digestive Digestive, Immune Digestive Unknown Digestive Digestive Digestive Digestive Digestive Digestive Digestive

type type

- - Regulationof acid bile Hydrolase, Lipase synthesis Molecular Function Serine endopeptidase Hydrolase, Lipase Hydrolase, Lipase Serine endopeptidase Hydrolase, Protease Hydrolase, Protease Amylase Hydrolase, Protease Hydrolase, Protease Hydrolase,

(Fissionyeast)

Human Acroporamillepora Schizosaccharomyces Schizosaccharomyces pombe Schistosoma mansoni Mus musculus (Bloodfluke) Species Human scrofa Sus maculataDolichovespula Mus musculus intestinalis Giardia Talaromyces stipitatus

P92131 P35030 Q26563 B3EWZ6 B8MF81 O42918 Q3TCN2 Accession P07477 P00592 P53357 P04071

like2 -

receptorclass A Homology) 1 -

- likeCP1 relatedpeptidase -

-

1 3 containing protein 2 protein containing - - BLAST - amylase 4 amylase -

Trypsin Phospholipase A2 Phospholipase A1 2 1 Kallikrein b16 CathepsinB Penicillopepsin Alpha Name ( Name Trypsin MAM and LDL domain CathepsinC Putative phospholipase B

) SD 2048.01) TPM ( 4856.47 (590.70) 2364.55 ( 7259.67 (2455.44) 4574.36 (1578.82) 3120.24 (206.70) (32.50) 86.55 (5.56) (32.66) 121.88 121.47 (20.93) 859.57 313.88 (70.81) 110.09 (6.28)

GeneID evg958433 evg43312 evg52845 evg1036213 evg1240777 evg1030094 evg1172676 evg93589 evg1106662 evg384 evg687502

70

Table 5 - T. adhaerens Immune-related Secreted Proteins with top TPM Expression

17 22 19 21 06 23 23 22 19 21 18 21 24

------Val - E 5.26E 3.25E 1.20E 9.97E 9.59E 1.57E 1.01E 5.50E 1.61E 3.81E 1.09E 3.62E 3.70E

96.7 Score 77.4 99.8 99.8 94.7 97.1 95.5 99.4 99 90.9 49.3 99.8 99.4

utrophil utrophil e

n of of n N

o

i egulat Immune Immune, Housekeeping Immune,Protein developmental Apoptosis folding Processes Immune,Digestive Immune Immune,Apoptosis Immune,Digestive Immune Immune Immune Immune,Apoptosis Immune, degranulation, R autophagy;

,

type

-

activator, activator, binding GPCR Surface antigen Isomerase Enzyme Peroxidase Isomerase / Lipase Molecular Function Hydrolase, Hydrolase, Protease Hydrolase, Glycosidic Serine Endopeptidase, receptor binding activity Sphingolipid metabolism Hydrolase, Peptidase Cytokine binding GPCR

norvegicus gallus Gallus Mycoplasma hyorhinis Methanothermo bacter marburgensis Drosophila melanogaster Human Species scrofa Sus Mus musculus Rattus norvegicus Human Human Rattus Human Caenorhabditis elegans

O13035 P07602 Q49538 P23785 Q9V438 Accession P00592 O35205 P00786 Q86SG7 P01024 Q57109 Q15084 Q93408

isomerase isomerase - -

Homology)

likeprotein 2

-

BLAST (

cathepsin H cathepsin

- GranzymeK Pro Lysozymeg Granulins Prosaposin ComplementC3 Name Phospholipase A2 surfaceantigenF Variant Protein disulfide homologA6 Prosaposin Peroxiredoxin Protein disulfide A6 D2005.3 Uncharacterized protein Uncharacterized

) SD TPM ( 4856.47 (2455.44) 3566.66 (881.92) 1485.39 (77.77) 1483.16 (534.48) 1029.12 (228.31) 965.03 (19.99) (35.52) 137.63 (57.79) (236.56) 606.54 (25.50) 171.54 112.87 (9.74) 104.34 (10.47) 208.18 91.98 (4.57)

GeneID evg1226704 evg588262 evg1172592 evg958433 evg1106487 evg1173372 evg772232 evg107861 evg1224954 evg772045 evg1030832 evg11227 evg1107447

71

Table 6 - T. adhaerens Developmental-related Secreted Proteins with top TPM Expression

12 19 19 21 18 16 22 21 19 06 20 12

------Val - 5.66E 1.07E 5.71E 5.57E 1.61E 5.26E 6.38E 4.00E 2.98E 9.87E E 1.64E 2.37E

70.5 Score 74.7 99.8 90.9 84.7 95.1 99.8 91.7 99.8 87.8 87.8 53.5

l celll

ntal, epithelial cell cell epithelial ntal,

Developmental, epithelial cell cell epithelial Developmental, Sphingolipid differentiation, development function&Hepatocellular Developmental Developme Sphingolipid differentiation, metabolism embryos metabolism migration Cell Developmental, Developmental, reg. of BMP signaling BMP of reg. Developmental, pathway Processes cell epithelial of reg. Developmental, migration transporter lipid Developmental, reservoir nutrient activity, Transport Protein Developmental, Immune,Epithelia Developmental, Sphingolipid differentiation, metabolism in proliferation cell Developmental, Developmental, Immune, Epithelial cell cell Immune,Epithelial Developmental, Sphingolipid differentiation, metabolism

activator, activator,

binding binding binding - - - Enzymeactivator, Enzymeactivator, binding GPCR Enzyme Ca2+ GPCR binding GPCR binding GPCR Structural molecule activity Glutathione transferase Glutathione Protein MolecularFunction Signaling homodimeraization activity Storage protein Ca2+ Cadherinbinding, Ca2+ Enzymeactivator, binding GPCR

opus opus vis os taurus os Human Human Bostaurus Rattus norvegicus Xenopus laevis Xen lae Gallus gallus musculus Human Species Human Mus Human B

P02845 P07602 Q6NUJ1 Q8K5B3 Q8JFZ2 Accession Q9IBG7 Q9H4G4 P26779 Q8R4V5 P07942 Q9UM47 P26779

1 -

like -

related -

induced 2 - like1 - - -

coagulation coagulation

BLAST enic locus notch

( associated plant plant associated

- derived neuronal neuronal derived - Name Homology) Kielin/chordin protein Golgi Vitellogenin Multiple factordeficiency protein protein)survival pathogenesis stem (Neural homolog 2 cell protein 1 protein Prosaposin Prosaposin Proactivator polypeptide Oncoprotein transcript3 protein Lamininsubunit beta Neurog 3 protein homolog Glutathione S Glutathione transferase P Prosaposin

) SD 1.63) TPM ( 558.32 269.47 (12.56) 243.28 (54.86) 173.53 (10.98) (2.90) 171.54 164.18 (14.84) 156.80 (26.33) (35.52) 158.47 ( (14.77) 144.19 (20.26) 121.76 (3.33) 108.69 87.06 (6.62)

evg1106594 GeneID evg1338849 evg1030832 evg1030554 evg1172861.1 evg102425 evg1409072 evg661 evg17156 evg1330357 evg2802 evg1326682

72

Table 7 - T. adhaerens Cell Adhesion Matrix-related Secreted Proteins with top TPM Expression

05 08 20 06 17 16 162

------Val - E 1.04E 6.12E 2.62E 9.68E 3.27E 4.93E 3.47E

Score 48.9 54.7 99.8 48.9 91.7 84.3 459

adhesion adhesion adhesion adhesion, - - - -

Matrix/Adhesion/fibrils Cell matrixCell matrixCell Processes Extracellularmatrix Stereocilium organization Cell Matrix/Adhesion/fibrils Cell matrixCell matrixCell Cell

binding binding binding binding - - - - -

Function Actin binding Molecular Ca2+ Ca2+ Ca2+ Ca2+ / /

musculus Acanthamoeba mimivirus polyphaga gallus Gallus Species Macaca fascicularis asinina Haliotis Strongylocentrotus purpuratus asinina Haliotis Mus

P14315 P86734 Q8VIM6 Q5UQ13 Accession Q9N0C7 P49013 P86729

related -

Homology)

related proteinrelated 1 related proteinrelated 2 3 - - -

likeprotein 2 -

capping protein subunit subunit protein capping BLAST - (

actin - Name Mammalian ependymin 1 protein Ependymin Fibropellin Ependymin Stereocilin Collagen F 2 and 1 isoforms beta

) SD TPM 716.53 (234.52) (36.50) 245.15 (24.23) ( 426.40 (32.75) 272.00 131.59 (3.85) (16.71) (5.35) 91.97 119.44

GeneID evg1173237 evg207671 evg267926 evg1027559 evg1253428 evg6706 evg848790

73

Table 8 - T. adhaerens Housekeeping function-related Secreted Proteins with top TPM Expression

------

Val

-

21 E 6.47E 180 2.05E 16 6.31E 23 2.33E 23 23 4.26E 18 5.60E 7.96E 26 0 13 2.73E 20 13 5.70E 2.01E 24 3.84E 24 20 5.89E 11 6.64E 3.70E 20 5.07E 11 4.34E 13 7.80E 3.65E

75.5 99.4 Score 99.4 99.4 514 83.2 99.8 99.4 87 99.8 966 69.3 97.8 60.1 96.7 86.7 71.6 75.1 94

dependent activities in ER in activities dependent

progression -

tein folding tein Cell cycle progression cycle Cell Protein folding, redox homeostasis Protein folding DNA repair, DNA catabolic process progression cycle Cell Protein localizationto cell surface progression cycle Cell Processes Translation Translation Protein folding cycle Cell Protein folding, gonad development, organization cytoskeleton Protein folding, protein leukocyte polyubiquitination, migration response stress binding, RNA Ca2+ Reg. DNA biosynthesis, reg. proliferation Pro Pathogenesis DNA Repair -

type type

prolyl cis prolyl

- methylase -

type - binding, binding, binding

- -

Ca2+ Peptidyl isomerase trans ATP Chaperone, Histone methylase endopeptidase / Nuclease Histone Cysteine binding Function rRNA binding Ca2+ carboxypeptidase Ubiquitinprotein ligase Ribonuclease Histone methylase Protein disulfide activity isomerase Molecular rRNA binding chaperone Serine Histone methylase Chaperone Methyltransferase

ast)

richophyton tonsurans Dictyostelium discoideum Caenorhabditis briggsae Caenorhabditis Human lycopersicum Solanum (Tomato) griseus Cricetulus Xenopus tropicalis auratus Mesocricetus (Golden hamster) (Slime mold) Rattusnorvegicus (Chinese hamster) Species Haloarcula marismortui Archaeoglobus fulgidus Bos taurus T Bos taurus ye (Fission pombe Mus musculus Rattusnorvegicus Xenopus tropicalis Thioalkalivibrio sp. Human Schizosaccharomyces

B6V867 B2GUS6 Q61SU8 Q13356 P80196 P26186 Q5BKL9 P86220 Accession P14119 O28370 P52193 Q554J3 Q66HD0 Q1JP61 P38659 D3SGB1 Q8N371 Q9CXT6 Q9USP5

-

) (Heat -

94 - trans trans

protein - -- isomerase A4 A4 isomerase , GRP binding protein protein binding - Homology) -

protein L6 protein fic demethylase 8 DNA

-

like 2 (PPIase) 2 like

- prolyl cis prolyl 1) -

binding protein 1 protein binding BLAST specific demethylase 8 speci specific demethylase 10 specific demethylase 9 - ( - - - -

anchor transamidase anchor transamidase - Name L19e protein ribosomal 50S ribosomal 50S Calreticulin Probable prefoldin subunit 1 Carboxypeptidase S1 homolog A homolog S1 Carboxypeptidase Lysine Protein disulfide cysteine methyltransferase methyltransferase cysteine FK506 Peptidyl isomerase Intracellular ribonuclease LX calcium kDa 45 (Cab45) Prohibitin Endoplasmin kDa (94 glucose regulated protein shockprotein 90 kDa beta Deoxyribonuclease member Lysine Lysine Lysine Methylated GPI

) SD TPM ( 2158.90 (92.29) 689.83 (34.28) 271.18 (41.51) (27.84) 264.22 259.90 254.18 (13.76) 230.65 (12.81) 189.45 168.43 (24.46) 150.98 (104.57) 142.43 (8.78) (21.29) 121.68 (12.70) 97.61 (14.81) (40.03) (20.55) 258.32 (34.23) 189.58 (40.18) 171.17 (27.91) (48.22) 148.90 140.39 (27.63) 126.95

evg31409 evg73337 evg929466 evg1030249 GeneID evg584100 evg1106726 evg1173364 evg57984 evg1107053 evg1030771 evg1031595 evg1385052 evg1350548 evg1106888 evg1107054 evg349524 evg669145 evg16586 evg1030526

74

Table 9 – Neurotransmitter Biosynthesis/Degradation Pathways

Pathway Involved Function Protein/Gene GeneID TPM SD Choline acetyltransferase* (ChAT, CAT)/ evg1010800 4.71 0.51 Synthesis* Acetylcholine (ACh) Carnitine O-acetyltransferase Homologue evg1612704 3.12 0.70 Degradation Acetylcholinesterase (ACHE) - evg1159900 6.34 0.33 Synthesis Glutamic acid decarboxylase (GAD) evg1360443 5.56 1.18 evg5655 2.16 0.92 GABA GABA alpha-oxoglutarate transminase evg1626234 73.31 2.60 Degradation evg13526 21.57 1.01 Succinic semialdehyde dehydrogenase evg211374 113.43 4.59 evg1110228 21.05 1.52 Synthesis Glutaminase Glutamate evg1619123 15.13 0.96 Degradation Glutamine synthetase evg45473 110.78 14.19 Catecholamine [DA] Synthesis Tyrosine hydroxylase (TH) - Aromatic L-amino acid decarboxylase Catecholamine [DA; 5-HT] Synthesis evg7208 55.16 23.34 (AADC) Catecholamine [DA; NE] Degradation Catechol O-methyltransferase (COMT) - Catecholamine [DA; NE; 5-HT; Flavin Monoamine oxidase (MAO) family Degradation evg63696 111.12 31.60 Histamine] protein evg1030843 106.21 5.87 evg106404 71.06 13.92 evg107489 214.18 140.42 Dopamine beta-hydroxylase (DBH) evg1219800 7.26 1.85 (aka DBH-like monooxygenase) Catecholamine [NE] Synthesis evg1385944 25.80 1.44 evg1456966 24.79 4.45 evg1800852 69.35 10.24 evg1367047 14.87 17.81 Phenylethanolamine N-methyltransferase - (PNMT) Octopamine Synthesis Tyramine beta-hydroxylase Identical results as DBH Serotonin (5-HT) Synthesis Tryptophan-5-monooxygenase (TPH) - Aldehyde dehydrogenase 2-like (ADH or evg11367 1788.62 152.13 ALDH or NAD+) evg1031090 151.43 10.07 evg1255322 25.63 3.70 Serotonin (5-HT); Histamine Degradation Aldehyde dehydrogenase-like (ADH or ALDH evg1355432 49.30 10.71 or NAD+) evg211683 34.11 4.10 XM_002118459.1 XM_002110504.1 Serotonin (5-HT) Degradation Aldehyde reductase (AR or ALDR) evg1092684 0.60 0.28 Synthesis Histidine decarboxylase (DCHS) - Histamine Diamine oxidase - Degradation Histamine methyltransferase (HNMT) - evg1033103 14.43 5.32 Nitric oxide (NO) Synthesis Nitric oxide synthase (NOS) evg1234918 63.72 7.93

75

Table 10 – T. adhaerens Receptor Protein Homologues

Receptor GeneID TPM SD Adrenergic receptor (ADR) evg17446 3.535 1.09 Cannabinoid receptor (CNR) - Dopamine receptor (DRD) - Extracellular calcium-sensing receptor (CASR) Numerous / / Frizzled (FDZ) evg1335942 20.6925 4.45 Frizzled (FDZ) evg1365435 2.7225 0.74 Histamine receptors (HRH) - Metabotropic Acetylcholine receptor (mAChR) - Metabotropic GABA receptor (GABA-B) Numerous / / Metabotropic Glutamate receptors (mGluR) Numerous / / Octopamine receptor (OCTR) - Opioid receptor (OPR) evg14705 1.3875 0.21 Serotonin receptor (SERT; 5-HTR) - Trace amine-associated receptor (TAAR) -

76

Table 11 – T. adhaerens Channel Protein Homologues

Channel Category GeneID TPM SD evg1307556 6.08 2.65 evg1246831 6.65 0.62 evg1232682 9.92 1.88 evg1453266 25.72 1.01 evg18670 2.04 0.34 Acid-sensing ion channel (ASIC) evg1159323 7.3 0.64 evg1356197 7.15 1.19 XP_002114391.1 evg1364096 5.63 0.63 evg1320933 25.23 2.07 Amerlioride-sensing sodium Channel (ENac) evg1357535 91.87 6.15 Aquaporin-like evg1106674 141.70 2.41 Calcium-activated potassium channel (KCN/BK Channel) evg1034613 21.07 1.94 Catsper Channel - Calcium Channel Protein (CCH1) -

Ionotropic GABAA Receptor (GABAAR) - Glycine Receptor (GlyR) - Cyclic Nucleotide-gated Cation Channel (CNG) evg1116517 3.5 0.53 Histidine-gated Chloride Receptor (HisCl) - Ionotropic Serotonin Receptors - evg1035139 21.23 2.47 Inositol 1,4,5-trisphosphate receptor (IP3R/ITPR) evg108852 24.27 5.76 evg1036536 6.235 1.13 evg6818 20.33 1.72 evg372810 8.66 2.23 evg47083 1.19 0.28

Inward-rectifying potassium channel (Kir) evg8274 0.97 0.39 evg1534828 3.67 0.26 evg8273 1.76 0.42 evg1158468 2.41 0.65 Ionotropic glutamate receptor AMPA-like (GRIA-like) evg1356554 6.43 0.49 evg1237846 4.07 1.41 evg1036978 17.94 1.41 Ionotropic glutamate receptor AMPA-or-Kainate-like (GRIA or GRIK-like) evg1416140 13.78 5.81 evg9235 0.63 0.28 XP_002116224.1 evg1673038 14.48 0.90 Ionotropic glutamate receptor Kainiate-like evg1807018 1.83 0.73 (GRIK-like) evg4931 5.87 0.98 Nicotinic Acetylcholine Receptor (nAChR) - Orai Receptor evg1009716 17.07 1.16 Otopetrin 1 (Otop1) - Piezo evg1040699 6.66 0.74 evg1623395 5.86 0.54 Purinoreceptor (P2X 4) evg374767 25.21 1.03 Ryanodine Receptor (RyR) evg1037889 7.18 2.54 Sodium leak ion channel (NALCN) evg1750541 2.93 0.63 Transient receptor potential cation channel (TRP) Numerous / / Two Pore Calcium Channel 1 (TPCN1) evg1031864 70.33 3.34 Two Pore Calcium Channel 2 (TPCN2a) evg1031200 7.85 0.75 Two Pore Calcium Channel 2 (TPCN2b) evg101995 14.09 1.54 Two Pore Potassium Channel (K2P) evg591174 4.92 0.82 Voltage-gated Chloride Channel / evg1487797 16.17 0.87

77

H+/Cl- - Exchange transporter (CLC) evg1458552 12.48 0.53 evg1631519 29.14 4.86 evg1007835 12.85 1.51 evg1402314 18.48 0.88 evg1367027 5.21 0.42

Voltage-gated calcium channel 1 (Cav1) evg1032956 38.18 0.87

Voltage-gated calcium channel 2 (Cav2) evg1041627 5.25 0.20

Voltage-gated calcium channel 3 (Cav3) evg954676 6.86 0.52

Putative Voltage-gated calcium channel (Cav) evg211865 15.14 0.86 evg1651966 1.55 0.08 Voltage-gated potassium channel Erg (KCNH) evg11938 17.04 2.13 evg1452858 11.68 1.58 Voltage-gated potassium channel Erg/Eak/Elk (KCNH) evg1042413 3.34 0.30 Voltage-gated potassium channel KCNA subunit beta evg17630 13.99 1.29

Voltage-gated potassium channel Shaker (Kv1, KCNA) evg1228738 10.83 2.28

Voltage-gated potassium channel Shab-like (Kv2-like, KCNB-like) evg1363170 2.63 0.74

Voltage-gated potassium channel Shaw (Kv3, KCNC) evg1012413 6.34 1.28

Voltage-gated sodium channel (Nav) evg1168784 14.38 1.28

78

Table 12 – SNARE and SNARE-related Protein Homologues

Category Protein Name GeneID TPM SD Complexin Complexin 1/2 evg5984.6 26.57 5.07 SNAP25a evg978 157.23 19.15 SNAP25 SNAP25b evg85503 0.95 0.29 SNAP23 SNAP23 evg956449 3.09 0.47 Synapsin SYN - SEC22B-like evg1330910 1.92 0.55 VAMP 1-like 1393460 40.38 1.94 VAMP 7 evg4529 36.53 9.48 Synaptobrevin (Vesicular-associated membrane VAMP 8-like 1603474 12.63 1.84 protein, VAMP) VAMP-like protein a 1587783 26.73 3.72 VAMP-like protein b evg766417 24 7.16 YKT6 evg5978 81.21 4.34 Synaptophysin Synaptophysin-like evg1106991 128.71 5.44 Syt14/16 evg1240961 10.08 1.04 Syt15-like evg1450144 42.94 6.54 Syt7-like evg10145 36.07 0.51 Synaptotagmin Syt-like protein a evg1419244 49.05 3.78 Syt-like protein b XP_002110497.1 Syt-like protein c evg373520 9.27 1.28 Syt-like protein d evg852597 26.29 3.94 evg1246021 38.95 5.54 evg1037406 8.11 0.63 Syntaxin-1 Stx1 evg685990 29.71 2.74 evg1238652 34.27 2.74 MStxbp1 (Unc18-1)/ Stxbp2 (Unc18-2)/ evg1108634 78.90 4.75 Stxbp3 (Unc18-3)-like UNC18 Stxbp4 (Unc18-4) evg113041 16.85 0.41 (Syntaxin-Binding Protein, Stxbp) Stxbp5 evg1243997 32.94 4.16 evg16839 31.42 2.86

79

Table 13 – Pre-/Post-Synaptic Protein Homologues

Category/Common Protein Name GeneID TPM SD Name 14-3-3 beta/alpha or zeta/delta evg1162983 415.16 19.87 14-3-3 14-3-3 epsilon evg4013 386.78 35.45 Actin evg772067 450.65 41.79 Actin evg369522 7323.16 322.23 Actin Actin evg871968 2.04 0.94 Actin evg1109219 11.81 1.97 Centractin evg10120 123.76 12.79 Actinin Alpha-actinin evg1173984 109.21 3.27 Actin-related protein 10 evg1108456 23.51 1.45 Actin-related protein 1-like evg1534380 12.27 6.81 Actin-related protein 2 evg1339813 44.81 2.69 Actin-related protein Actin-related protein 3 evg1394057 84.6 6.52 Actin-related protein 5 evg1361431 14.59 0.6 Actin-related protein 6 evg1663692 39.04 11.95 Actin-related protein 8 evg211476 9.87 3.28 AKAP AKAP0/Pericentrin-like evg1457448 12.9 2.9 a-Liprin a-Liprin evg1684269 6.87 1.14 Ankyrin 2; Serine/threonine-protein phosphatase 6 regulatory ankyrin evg11985 21.15 1.54 repeat subunit Ankyrin Ankyrin-3 or 2 evg1032266 90.84 9.82 Ankyrin-3 or 2 evg1530888 5.05 2.16 BAIAP3 Brain-specific Angiogenesis Inhibitor Binding Protein 3 (BAIAP3) evg1089398 11.64 0.7 BAIAP4 Brain-specific Angiogenesis Inhibitor Binding Protein 3 (BAIAP3) evg1075128 11.72 2.47 Bassoon Bassoon - Bruchpilot Bruchpilot - CACNB CACNB evg1279646 20.53 0.84 Cadherin EGF LAG seven-pass G-type receptor 1 or 2; starry night evg1015472 22.68 1.89 Cadherin EGF LAG seven-pass G-type receptor 3 evg1145104 15.29 3.09 Cadherin EGF LAG seven-pass G-type receptor 3 precursor evg16867 9.79 2.66 Protocadherin Fat 1 evg108851 10.79 3.08 Cadherin Protocadherin Fat 1 evg1466180 4.98 0.69 Protocadherin Fat 1 or 2 or 3 evg1233609 9.7775 2.9 Protocadherin Fat 1; cadherin-23 evg113147 15.77 4.2 Protocadherin-16 or 23 evg1015398 11.96 1.47 Calcineurin B homologous protein 1599843 70.93 11.54 Calcineurin Calcineurin subunit B evg930848 65.89 4.42 Calcium-transporting ATPase type 2C member 1 evg1141371 12.99 2.32 Calcium-transporting Plasma membrane calcium-transporting ATPase 3 evg1031382 89.9 4.04 ATPase Sarcoplasmic/endoplasmic reticulum calcium ATPase 3 evg1031147 114.92 9.16 Calmodulin evg817116 2091.63 197.24 Calmodulin Calmodulin 2; 5; 1 1704448 6.38 5.83 Calsyntenin Calsyntenin 2 evg1003195 5.03 0.93 CaMKII Calcium/calmodulin-dependent protein kinase II evg3777 109.62 8.85 CARD Caspase recruitment domain-containing protein - CASK Calcium/calmodulin dependent serine protein kinase (CASK) evg106358 40.67 1.74

80

Catenin alpha evg1227853 26.79 2.49 Catenin Catenin beta evg1245692 21.85 0.93 Catenin beta-like evg1007651 3.43 0.92 Plakophilin-4/Delta catenin 1590597 13.29 6.11 Catenin/Plakophilin Plakophilin-4/Delta catenin evg1229306 20.22 6.25 Vinculin/Alpha-catenin evg1100214 14.6 1.74 Catenin/Vinculin Vinculin/Alpha-catenin evg1069083 22.89 1.03 Citron Citron Rho-interacting kinase evg1093744 13.33 4.64 Clathrin Heavy Chain evg1345234 185.75 17.94 Clathrin Heavy Chain Clathrin Heavy Chain evg1142024 3.48 0.44 Clathrin heavy chain linker domain-containing protein evg1404702 6.3 1.3 Clathrin Light Chain Clathrin Light Chain evg208602 328.13 14.93 Rho guanine nucleotide exchange factor 9 (collybistin)/Rho guanine Collybistin-like evg1406494 9.3 1.04 nucleotide exchange factor 4 SRC substrate cortactin evg1322928 11.86 4 Cortactin SRC substrate cortactin evg62154 2.03 1.48 CREB3l cAMP-responsive element-binding protein 3-like evg1106961 83.71 7.01 CREM/CREB cAMP-responsive element modulator/binding protein evg264740 21.7 1.88 Desmoglein Desmoglein - DIP2 Disco-interacting protein 2 homolog C evg1034818 21.07 0.83 Disc large-1, DLG1/SAP97 evg1321563 166.55 4.26 Disc large Disc large-5, DLG5/P-Dlg evg1322614 19.62 5.1 Dynamin-1 evg1254226 21.65 3.95 Dynamin Dynamin-1-like evg1176219 12.84 0.3 Dynamine-like mitochondrial isoform evg1110077 14.82 1.39 Dystroglycan evg112805 38.87 0.93 Dystroglycan Dystroglycan evg1344485 3.74 0.92 EIF Eukaryotic Inititaion Factor Numerous / / ELKS ELKS (aka CAST, ERC) evg1254197 10.29 1.35 Elongation factors Translation and Transcription Elongation Factors Numerous / / EPH Ephrin type A evg1034790.1 17.12 2.4 EPH Ephrin type A or B receptor evg1034606 15.87 1.59 Erbin Erbin - ERK2 (Extracellular-signal-regulated kinase 2)/MAPK1 (Mitogen- evg1033958 36.56 2.23 activated protein kinase 1) ERK5 (Extracellular-signal-regulated kinase 5)/MAPK7 (Mitogen- ERK/MAPK evg589 12.25 1.11 activated protein kinase 7) ERK7 (Extracellular-signal-regulated kinase 7)/MAPK15 (Mitogen- evg1249971 11.8 1.71 activated protein kinase 15) Tyrosine-protein kinase fyn evg1067385 2.89 0.52 FYN Tyrosine-protein kinase fyn evg18426 0.62 0.16 GAB GRB2-associated protein - Guanine nucleotide exchange factor DBS evg100 20.69 2.54 GEF Guanine nucleotide exchange factor DBS evg1037692 9.01 0.25 Gephyrin Gephyrin evg1146978 16.17 1.37 Growth factor receptor-bound protein 14 evg1116562 5.24 0.37 GRB Growth factor receptor-bound protein 2 evg1032398 45.49 0.77 GRIP Glutamate receptor-interacting protein - Homer Homer evg1234967 31.48 3.84 Heat shock 70 evg1108070 37.15 7.53 HSP70 Heat shock 70 XP_002118988.1

81

Heat shock 70 kDa protein 8 evg1426243 37.58 23.88 Heat shock 70 kDa protein 8 evg1106341 346.03 85.66 Heat shock protein 70 family protein 5 evg7056 287.04 120.84 Heat shock protein 70 family protein 9 evg806473 92.59 8.1 IGSF9B IGSF9B/Turtle homolog B evg1040118 7.03 0.92 Integrin Alpha Chain evg1088147 21.95 0.57 Integrin Alpha Chain evg11744 48.55 2.37 Integrin Integrin Beta Chain evg487755 28.71 0.83 Integrin Beta Chain evg11688.1 33.01 1.29 Integrin Beta Chain evg1002196 2.02 0.65 IP3KB Inositol triphosphate 3 kinase evg1033455 44.605 2.14 IQSEC1 IQ motif and SEC7 domain-containing protein 1 evg1344245 9.44 1 Rho-GEF kalirin evg1034986 11.28 1.71 Kalirin Rho-GEF kalirin evg10906 22.48 0.72 Kinesin Kinesin Numerous Neural cell adhesion molecule L1-like protein evg1030844 163.61 15.84 Neural cell adhesion molecule L1-like protein or neuroglian or cortactin evg111551 27.89 3.18 L1CAM Neural cell adhesion molecule L1-like protein or neuroglian or cortactin evg1090418 3.53 0.9 Neural cell adhesion molecule L1-like protein or neuroglian or cortactin 1599310 2.65 0.59 LRFN Leucine-rich repeat and fibronectin type III domain-containing - LRRC7 Leucine-rich repeat-containing protein 7 evg1112168 14.31 2.65 LRRTM Leucine-rich repeat transmembrane neuronal protein - Membrane-associated guanylate kinase, WW and PDZ domain- MAGI1 evg1010495 67.06 5.17 containing protein 1 Mitogen-activated protein kinase 10 (aka MAPK JNK) evg4180 48.46 3.11 MAPK Mitogen-activated protein kinase 14A (aka MAP kinase p38 alpha) evg1291503 37.63 0.44 MDGA MAM domain-containing glycosylphosphatidylinositol anchor protein - Dual specificity mitogen-activated protein kinase kinase 2 evg11269 37.16 0.94 Dual specificity mitogen-activated protein kinase kinase 3 evg1034669 14.54 1.63 MEK Dual specificity mitogen-activated protein kinase kinase 4 evg6879 26.87 2.13 Dual specificity mitogen-activated protein kinase kinase 4 evg1246778 25.18 0.87 Dual specificity mitogen-activated protein kinase kinase 7 evg1260662 26.03 2.24 MINT MINT evg1036658 18.39 1.79 MLCK Myosin light chain kinase, smooth muscle isoform evg1331456 49.92 2.47 MAGUK p55 subfamily member 2 evg375368 849.49 18.31 MPP MAGUK p55 subfamily member 5 evg1034360 23.32 2.35 MAGUK p55 subfamily member 7 evg1239777 32.01 2.46 MTOR Serine/threonine-protein kinase mTOR evg1164613 18.69 1.46 Myosin Myosin Numerous / / Myosin-binding Myosin-binding protein - protein Myosin light chain evg1233912 240.23 3.46 Myosin light chain Myosin light chain 1501255 38.23 1.08 Na+/K+ ATPase ATPase Na+/K+ transporting subunit alpha evg1107237 159.27 17.26 Neurexin 4-like protein evg1278706 105.85 4.3 Neurexin4 evg1340280 12.01 1.25 Neurexin Neurexin-like protein a evg1236022 5.87 0.91 Neurexin-like protein b evg1678810 16.04 3.67 Neurexin-like protein c evg1032136 23.03 1.66

82

Nexin Nexin Numerous / / Neuroligin-1 evg1010438 6.06 1.3 Neuroligin-4 Y or X-linked or 3 evg1253790 10.89 1.55 Neuroligin Neuroligin-4, X-linked isoform or 1 or 3 evg1247752 13.94 3.44 Neuroligin-4, Y-linked isoform evg2557 0.39 0.25 NF1 Neurofibromin evg1110936 11.2 0.43 NSF vesicle-fusing ATPase evg1257917 31.23 2.18 Piccolo Piccolo - PICK1 PICK1 evg847021 54.95 5.1 cAMP-dependent protein kinase catalytic subunit alpha evg1026942 158.55 19.11 cAMP-dependent protein kinase type I-alpha regulatory subunit evg211235 209.16 6.55 PKA cAMP-dependent protein kinase type II-alpha regulatory subunit evg105878 2.41 0.5 cAMP-dependent protein kinase type II-alpha regulatory subunit evg210191 0 0 Protein kinase C alpha type evg1109134 43.94 3.36 PKC Protein kinase C epsilon type evg1247926 12.85 0.41 Protein kinase C iota type evg1167251 7.85 1.26 Serine/threonine-protein phosphatase 2A catalytic subunit evg1106934 194.95 20.45 Serine/threonine-protein phosphatase 2A regulatory subunit A evg1107225 127.06 5.95 Serine/threonine-protein phosphatase 2A regulatory subunit A evg592041 51.96 1.07 PP2A Serine/threonine-protein phosphatase 2A regulatory subunit A evg1730787 4.44 0.62 Serine/threonine-protein phosphatase 2A regulatory subunit B evg1623181 24.35 2.62 Serine/threonine-protein phosphatase 2A regulatory subunit B evg487561 45.25 1.35 PP2B Serine/threonine-protein phosphatase 2B catalytic subunit alpha evg1159951 89.7 5.17 PX-RICs Rho GTPase-activating protein 32 - RACK1 Receptor of activated protein C kinase 1 evg1171260 410.77 17.25 Raf Proto-oncogene B-Raf; serine/threonine-protein kinase B-raf; BRaf1 evg1035734 11.79 0.26 RAPTOR Regulator-associated protein of mTOR evg1251821 14.43 0.38 Ras GTPase Kras evg21995 7.01 3.55 Ras GTPase-activaging protein 2 evg1152581 16.9 5.12 Ras GTPase-activaging protein 2 evg1110182 12.56 1.44 RasGAP Ras GTPase-activaging protein 3 evg1241034 15.76 1.18 RasGTPase-activating protein-like evg1238990 15.71 2.11 Rho GTPase activating protein at 100F or Rho GTPase-activating protein evg1246317 20.16 2.66 SYDE Rho GTPase-activating protein 1 or 8 evg1705612 65.5 1.51 Rho GTPase-activating protein 17 evg1235752 17.45 1.8 Rho GTPase-activating protein 19 evg1343525 10.18 1.11 Rho GTPase-activating protein 20 evg1039753 10.79 3.25 Rho GTPase-activating protein 21 or 23 evg1036297 20.8 0.83 Rho GTPase-activating protein 22 evg1173759 36.79 4.27 RhoGAP Rho GTPase-activating protein 26 or 10 evg112834 24.49 2.23 Rho GTPase-activating protein 27 or 12 evg1032498 13.99 1.88 Rho GTPase-activating protein 27 or 15 evg371974 7.56 2.18 Rho GTPase-activating protein 45-like evg102026 18.32 0.96 Rho GTPase-activating protein 5 or 35; orRho GTPase activating protein evg1141743 11.8 0.81 p190 Rho GTPase-activating protein 6 evg1031835.2 24.25 1.96 Rho GTPase-activating protein 7 evg1229271 19.49 5.83 RhoGDI Rho GDP-dissociation inhibitor 1 1593940 16.26 1.05

83

Rho GDP-dissociation inhibitor 2 evg484053 342.57 36.65 Rho guanine nucleotide exchange factor 11 evg1035587 20.61 1.28 Rho guanine nucleotide exchange factor 12 or 11 evg1038332 11.29 0.67 Rho guanine nucleotide exchange factor 17 evg1368288 8.08 1.86 RhoGEF Rho guanine nucleotide exchange factor 7 evg1246306 11.99 0.7 Rho guanine nucleotide exchange factor 9 evg1248052 16.42 5.26 Rho guanine nucleotide exchange factor 9 or 4 evg1406494 9.3 1.04 RhoGTPBP Rho-related GTP-binding protein 1722683 547.78 40 RICTOR Rapamycin-insensitive companion of mTOR evg1216110 8.05 0.4 RIM Rab3-Interacting Molecule (RIM), (Tad-RIM-I) evg1642237 3.17 1.48 RIM Rab3-Interacting Molecule (RIM), (Tad-RIM-II) evg1176111 27.85 1.94 RIMBP Rab3-Interacting Molecule-Binding Protein (RIMBP) evg1167478 14.56 0.62 RND Rho-related GTP-binding protein Rho 1722683 547.78 40 ROCK Rho-associated protein kianse 2 evg1249800 22.71 1.67 RPS6KA Ribosomal S6 kinase alpha-5 or 4 evg1678450 19.67 0.36 RPS6KA (also RSK) Ribosomal S6 kinase alpha-2 (aka RSK3) or ribosomal S6 kinase alpha-1 evg1497205 14.33 2.98 RPS6KB Ribosomal S6 kinase beta-1 evg24823 34.14 1.55 Septin-10 evg484336 58.21 5.46 Septin-4 evg1277503 49.72 5.74 Septin Septin-7 evg1030740 53.03 4.23 Septin-9 evg1462766 31.17 1.6 Shank SH3 and multiple ankyrin repeat domains protein - Slitrk3 SLIT and NTRK-like family, member 3" - Spectrin alpha chain evg1032048 75.29 12.63 Spectrin beta chain evg1031961 34.63 4.7 Spectrin Spectrin beta chain evg1661785 2.85 0.19 Spectrin beta chain evg1032928 37.04 3.96 Spectrin beta chain evg12255 20.19 1.72 SRC Proto-oncogene tyrosine-protein kinase Src - SRGAP SLIT-ROBO Rho GTPase-activating protein 3 evg1039428 9.26 0.6 SYD-1 SYD-1 evg1246317 20.16 2.66 SynGAP SynGAP - TJP/ZO-1 Tight junction protein ZO-1 evg1114527 10.64 0.92 Tropomyosin 1 evg1023916 97.29 7.22 Tropomyosin Tropomyosin 1 or Tropomyosin alpha-3 chain evg1158893 273.8 10.77 Tubulin-alpha evg1030105 1984.01 404.47 Tubulin-alpha evg11749 231.48 28.62 Tubulin-alpha evg1539237 140.28 6.61 Tubulin-alpha evg1246478 6.24 1.27 Tubulin-beta 1172586 2587.81 328.34 Tubulin Tubulin-beta evg11104 172.9 34.56 Tubulin-delta/gamma evg1231826 19.645 0.94 Tubulin-delta/gamma evg1240920 9.07 0.47 Tubulin-epsilon evg1159996 8.21 2.29 Tubulin-gamma evg692 11.13 1.08 Unc-13 Unc-13 B evg1655705 21.12 1.96 VELI VELI/Lin7 evg945053 56.4 9.72 V-type Proton ATPase V-type proton ATPase 116 kDa subunit a evg1106942 145.45 46.7

84

V-type proton ATPase 116 kDa subunit a evg852581 103.86 7.38 V-type proton ATPase 116 kDa subunit a evg1646691 65.39 2.38 V-type proton ATPase catalytic subunit A evg104749 372.92 34.41 V-type proton ATPase proteolipid subunit evg204052 671.77 37.14 V-type proton ATPase proteolipid subunit 685600 2070.74 65.74 V-type proton ATPase subunit B evg374592 422.35 36.39 V-type proton ATPase subunit C evg1385175 40.46 3.24 V-type proton ATPase subunit D evg2479 73.64 1.65 V-type proton ATPase subunit E evg1172691 599.85 36.65 V-type proton ATPase subunit E XP_002114781.1 V-type proton ATPase subunit F evg201854 339.4 66.4 V-type proton ATPase subunit G 1797971 733.61 62.98 V-type proton ATPase subunit H evg1030436 172.26 3.45 V-type proton ATPase subunit S1 evg769507 273.4 3.66 V-type proton ATPase subunit D evg1344767 254.07 5.38

85

Appendix I - Accessions for Nematostella vectensis Transcriptome Assembly

SRR5183917 SRR5183918 SRR5183919 SRR5183920 SRR5183921 SRR5183922 SRR5183923 SRR5183924 SRR5183925 SRR5183926 SRR5183927 SRR5183928 SRR5183929 SRR5183930

86

Appendix II – Accessions for Mnemiopsis leidyi Transcriptome Assembly

SRR1971491 SRR4353882 SRR4353883 SRR4353884 SRR4353885 SRR4353886 SRR4353887 SRR4353888 SRR4353889 SRR4353890 SRR4353891 SRR4353892 SRR4353893 SRR4353894 SRR4374091 SRR4374265 SRR4374273 SRR4374274 SRR4374324 SRR4374325 SRR4374356 SRR4374357 SRR4374583 SRR4374709 SRR4374710 SRR4374711 SRR4374712 SRR4374713 SRR4374714 SRR4374715 SRR4374742 SRR4374769

87

Appendix III - Accessions for RIM/Rph3a Phylogenetic Analysis

Species Gene ID Homo sapiens RIM Q9UQ26 Rattus norvegicus RIM Q9JIS1 Mus musculus RIM Q9EQZ7 Rattus norvegicus RIM Q9JIR4 Mus musculus RIM Q99NE5 Homo sapiens RIM Q86UR5 Danio rerio RIM F1RCJ3 Danio rerio RIM A0A0R4ING7 PFL3_pfl_40v0_9_20150316_1g5731.t1 Ptychodera flava RIM *Obtained from the Ptychodera flava genome assembly gene model (Simakov et al., 2015) sakowv30037565m Saccoglossus kowalevskii RIM *Obtained from the Saccoglossus kowalevskii genome assembly gene model (Simakov et al., 2015) Acanthaster planci RIM XP_022091188.1 Strongylocentrotus RIM XP_011664650.1 purpuratus Lingula anatina RIM XP_013421607.1 Octopus bimaculoides RIM XP_014789722.1 Aplysia californica RIM XP_005106594.2 Mizuhopecten yessoensis RIM XP_021377513.1 Crassostrea gigas RIM XP_019921964.1 Echinococcus granulosus RIM EUB56343.1 Schistosoma haematobium RIM XP_012796083.1 Drosophila melanogaster RIM NP_001247166.2 Daphnia magna RIM KZS07819.1 Hyalella azteca RIM XP_018007323.1 Eurytemora affinis RIM XP_023333958.1 Eurytemora affinis RIM XP_023346790.1 Varroa jacobsoni RIM XP_022708982.1 Varroa destructor RIM XP_022645204.1 Parasteatoda tepidariorum RIM XP_021001073.1 Parasteatoda tepidariorum RIM XP_015920376.1 Centruroides sculpturatus RIM XP_023224599.1 Limulus polyphemus RIM XP_022257304.1 Limulus polyphemus RIM XP_022243248.1 Ramazzottius varienornatus RIM GAV01719.1 Hypsibius dujardini RIM OQV23161.1 Caenorhabditis elegans RIM NP_741831.1 Nematostella vectensis RIM NVE19942 Exaiptasia pallida RIM XP_020898842.1 Hydra vulgaris RIM XP_012564401.1 Trichoplax adhaerens RIM evg1642237 Oscarella carmela RIM m.21147 Oscarella carmela RIM m.26069 Aplysia californica RIM XP_012944640.1 Crassostrea gigas RIM XP_011422085.1 Mizuhopecten yessoensis RIM XP_021345230.1 Octopus bimaculoides RIM XP_014774175.1 TRINITY_DN204404_c0_g2_i5 Notospermus geniculatus RIM *Obtained from Notospermus geniculatus transcriptome (Luo et al., 2018) sakowv30000298m Saccoglossus kowalevskii RIM *Obtained from the Saccoglossus kowalevskii genome assembly gene model (Simakov et al., 2015) Centruroides sculpturatus RIM XP_023227959.1 Limulus polyphemus RIM XP_022258855.1 Limulus polyphemus RIM XP_022240908.1 Lingula anatina RIM XP_013397099.1 Exaiptasia pallida RIM KXJ21887.1

88

Manually assembled from transcript sequences (TR51711|c1_g1_i2 and TR51711|c1_g3_i7) Beroe ovata RIM obtained from the Beroe ovata genome assembly gene model (unpublished), generously provided by the Ryan Lab, University of Florida Mnemioposis leidyi RIM evg198193 Hormiphora californensis RIM Obtained from a Trinity de novo transcriptome assembled using reads: SRR1992642 Trichoplax adhaerens RIM evg1176111 Mus musculus Rph3a P47708 Rattus norvegicus Rph3a P47709 Homo sapiens Rph3a Q9Y2J0 Danio rerio Rph3a E7FEI1 Danio rerio Rph3a A0A140LH42 Caenorhabditis elegans Rph3a NP_001022566.1 Octopus bimaculoides Rph3a XP_014779575.1 Aplysia californica Rph3a XP_012945559.1 Mizuhopecten yessoensis Rph3a XP_021345406.1 Crassostrea gigas Rph3a XP_011424844.1 Lingula anatina Rph3a XP_013406958.1 Saccoglossus kowalevskii Rph3a XP_006817123.1 Acanthaster planci Rph3a XP_022085716.1 Strongylocentrotus Rph3a XP_011683137.1 purpuratus Limulus polyphemus Rph3a XP_013782034.1 Drosophila melanogaster Rph3a NP_572651.1 Hyalella azteca Rph3a XP_018018219.1 Ramazzottius varieornatus Rph3a GAV03979.1 hypsibius dujardini Rph3a GFGW01003946.1 Echinococcus granulosus Rph3a CDS16155.1 Schistosoma haematobium Rph3a XP_012797599.1 Hydra vulgaris Rph3a XP_012561655.1 Oscarella carmela Rph3a m_15463 Trichoplax adhaerens Rph3a evg1107189.2 Acropora digitifera Rph3a XP_015773295.1 Exaiptasia pallida Rph3a XP_020895735.1 Hormiphora californensis Rph3a Obtained from a Trinity de novo transcriptome assembled using reads: SRR1992642 Mnemioposis leidyi Rph3a evg19024

89

Appendix IV- Primers for qPCR of Lymnaea stagnalis RIM Types I and II

Gene Primers Amplicon size RIM I F: GTGAGGAAGCAGGAAGTGGA 234 R: CCAGCACAATAGACCCAACC RIM II F: CACTACCAGCCACACAAAGC 182 R: TGTTCCCACTCAGGATGACA EF-1� F: TGGCAAGTCAACCACAACTG 161 R: TAATACCACGCTCACGCTCA

90