MOLECULAR PROFILING OF CELLULAR IDENTITY AND PLASTICITY IN THE NERVOUS SYSTEM OF CALIFORNICA

By

CALEB JAMES BOSTWICK

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2019

© 2019 Caleb James Bostwick

To my family

ACKNOWLEDGMENTS

First, I’d like to acknowledge the people who helped me to bring this dissertation to fruition. My advisor Dr. Leonid Moroz helped me come up with ideas and questions and encouraged me to generate more data and perform more extensive analyses than I would have previously thought possible. Dr. Andrea Kohn helped keep the lab organized and running. Yelena Bobkova provided care, dissection, and molecular biology expertise as well as friendship and support. Tanya Moroz assisted in cDNA library construction and the ganglia plasticity experiments in addition to brightening the general lab atmosphere. Dr. Peter Williams aided me in finding ways to better explain and document my computational biology techniques. Dr. Shaun

Mukherjee was a friend who made lab meetings more bearable. Dr. Gabrielle Winters assisted me by providing assistance with molecular biology in the lab, asking insightful questions, proofreading my dissertation, and by being a generous and supportive partner during our many years of graduate study. Dr. Emily Dabe was a colleague and friend with whom I discussed bioinformatics methods and techniques and shared laughs. I thank my university and my graduate committee (Dr. Thomas Foster, Dr. David

Borchelt, and Dr. Richard Yost), as well as Dr. Jada Lewis for her support and encouragement. I also thank my colleagues who have graduated from the University of

Florida IDP program before me for their support and friendship. Finally, I thank my family, whose love and encouragement allowed me to become the person I am today.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 8

LIST OF FIGURES ...... 10

LIST OF OBJECTS ...... 13

LIST OF ABBREVIATIONS ...... 14

ABSTRACT ...... 17

CHAPTER

1 INTRODUCTION ...... 19

Simple Models and Powerful Technologies for Understanding Neuroscience ...... 19 Aplysia californica as a Model Organism for Learning and Memory Studies .... 20 Next-generation Single Cell RNA Sequencing (scRNA-seq) ...... 22 Biology of Identified Single Neurons ...... 24 Aplysia Learning and Memory Circuits and cAMP Signaling ...... 27 Aplysia: Model for a Reductionist Approach to Studying Learning and Memory ...... 27 Molecular Biology of Short- and Long-term Memory ...... 28

2 METHODS ...... 32

Molecular Biology for Construction of RNA-sequencing Libraries ...... 32 and Cell/Tissue Isolation ...... 32 Treatment of CNS Ganglia with 8-Br cAMP...... 33 RNA Extraction ...... 34 cDNA Synthesis and Verification ...... 34 External RNA Spike-ins ...... 35 Sequencing Library Preparation ...... 35 Library Quality Control and Validation ...... 36 RNA-Sequencing Quality Control, Read Mapping, and Exploratory Data Analysis ...... 36 Read Summarization ...... 37 Data Visualization ...... 38 Principal Component Analysis ...... 38 Saturation Curves ...... 39 Linear Regression Curve Fitting ...... 39 Differential Expression Analysis ...... 39 Calculating Relative scRNA-seq Transcript Abundance ...... 40

5

Absolute scRNA-seq Transcript Abundance ...... 40 Localization of Cellular mRNA ...... 41 Molecular Cloning ...... 41 Synthesis of Antisense Probes Labelled with Digoxygenin ...... 42

3 ABSOLUTE QUANTIFICATION OF MESSENGER RNAS FROM IDENTIFIABLE SINGLE NEURONS IN APLYSIA ...... 46

Overview of Single-Cell RNA Sequencing ...... 46 Mammalian Single Cell RNA-seq Studies ...... 47 Characteristics of the Aplysia californica Genome ...... 49 Results of Absolute mRNA Quantification in Identifiable Single Neurons ...... 50 Transcriptomic Evaluation of the Characteristics of Identified Neurons ...... 52 Shallow Transcriptomic Profiling of 96 Individual Neurons, Including VC Sensory Neurons ...... 55 Insights Regarding Absolute Molecular Quantification of Messenger RNAs .... 56 Functional Biology of Identified Neurons ...... 59 Data transformation used for differential expression analysis ...... 59 R2 vs L7 differentially expressed ion channel-related genes ...... 60 DE cell cycle and synaptic genes ...... 61 Five RNA modification/editing/methylation genes are DE between L7 and R2 ...... 63 Transcription factors and chromatin remodeling genes DE in R2 versus L7 ...... 64 Overall comparison of DE genes between R2 shared with five other neuron classes ...... 65 Functional annotation of enriched genes using DAVID ...... 66 Enriched genes found between R2 and other neurons ...... 67 Principal component analysis reveals features of cell type-specific clustering ...... 69 Discussion of Absolute Quantitation of mRNAs from Identifiable Single Neurons .. 70 Rapidly Advancing Knowledge of Secretory Molecules from the Phylum ...... 71

4 DISCOVERY OF NOVEL MEMORY-RELATED GENES IN A CAMP-TREATED MOLLUSCAN NERVOUS SYSTEM ...... 118

Introduction ...... 118 Results of RNA-seq Reveal Canonical and Novel Gene Expression ...... 123 RNA-seq of Established cAMP-Dependent Genes ...... 123 Novel Differentially Expressed Genes in Response to 8-Br cAMP Treatment 129 AG Molecular Function GO ...... 130 AG Biological Process GO ...... 132 AG Cellular Component GO ...... 133 Right Pleural Molecular Function GO ...... 134 Classification of Unique Ganglia-Specific Transcripts ...... 136 Time-course Analysis of Transcripts Influenced by 8-Br cAMP Treatment ...... 137

6

Unsupervised Clustering of 8-Br cAMP-Treated Ganglia ...... 139 Abdominal Ganglion 0.5-hour cAMP Treatment vs Control (FSW) ...... 140 Abdominal Ganglion 1-hour cAMP Treatment vs Control (FSW) ...... 141 Abdominal Ganglion 2-hour cAMP Treatment vs Control (FSW) ...... 141 135 Genes DE in 0.5, 1, and 2-hour 8-Br cAMP Time Points in the AG ...... 142 The 0.5, 1, and 2-hour 8-Br cAMP Treated RPlG have 60 DE Genes in Common...... 142 Discussion of Canonical and Novel Candidate Genes ...... 143

5 INSIGHTS GAINED FROM THE STUDY OF SINGLE CELL IDENTITY AND GANGLIONIC PLASTICITY IN APLYSIA ...... 184

Aplysia Enables the Transcriptomic Profiling of Identifiable Single Neurons ...... 184 Cellular Identity of Individual Single Neurons ...... 184 Confirmation of Previous Findings and Novel Genes Related to cAMP- Induced Plasticity ...... 186 Future Directions ...... 187 Final Comments ...... 188

APPENDIX

A LIST OF SUPPLEMENTARY TABLES ...... 190

Primary Data Master Tables ...... 190 Tables of Differentially Expressed Genes in Single Identified Neurons ...... 190 Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours) ...... 192

B METHOD SCHEMATIC ...... 196

C SUPPLEMENTAL FIGURES FOR SINGLE NEURON GENE ANALYSIS ...... 197

LIST OF REFERENCES ...... 204

BIOGRAPHICAL SKETCH ...... 228

7

LIST OF TABLES

Table page

2-1 Symbols and definitions used for absolute RNA quantification ...... 45

3-1 AplCal3.0 Gene models and categories ...... 76

3-2 Twenty-three deeply sequenced single neurons with information regarding read counts ...... 77

3-3 The number of mRNA transcripts corresponding to 1 TPM in single neurons .... 79

3-4 Minimal lane effects present indicating accurate transcript detection and reproducible quantification ...... 81

3-5 ERCC external RNA spike-in counts detected in single neurons ...... 82

3-6 Summary of single neuron differential gene expression analysis ...... 83

3-7 Log2 fold change between R2 and L7 putative secretory molecule transcripts ... 84

3-8 The four DE transcripts shared between R2 and the other five cell types...... 86

3-9 Functionally enriched DE gene clusters between R2 and L7 neurons ...... 87

3-10 Functionally enriched DE gene clusters between R2 and LPl1 neurons ...... 89

3-11 Functionally enriched DE gene clusters between R2 and L11 neurons ...... 89

3-12 Functionally enriched DE gene clusters between R2 and lMCC neurons ...... 90

3-13 Functionally enriched DE gene clusters between R2 and rMCC neurons ...... 91

3-14 Functionally enriched DE gene clusters between L7 vs rMCC ...... 93

3-15 Functionally enriched DE gene clusters between L7 vs L11 neurons...... 96

3-16 Functionally enriched DE gene clusters between L7 vs LPl11 neurons ...... 97

3-17 Functionally enriched DE gene clusters between L7 vs lMCC ...... 99

3-18 Functionally enriched DE gene clusters between L11 vs LPl1 ...... 101

3-19 Functionally enriched DE gene clusters between L11 vs lMCC ...... 104

3-20 Functionally enriched DE gene clusters between L11 vs rMCC ...... 106

3-21 Functionally enriched DE gene clusters between LPl1 vs lMCC ...... 108

8

3-22 Functionally enriched DE gene clusters between LPl1 vs rMCC ...... 109

3-23 Genes varying between two groups of R2 neurons...... 110

4-1 Significantly DE transcripts in 8-Br cAMP-treated ganglia compared to controls ...... 148

4-2 Molecular function GO terms enriched in AG 0.5-hour cAMP treatment ...... 149

4-3 Molecular function GO terms enriched in AG 1-hour cAMP treatment ...... 151

4-4 Molecular function GO terms enriched in AG 2-hour cAMP treatment ...... 153

4-5 Biological process GO terms enriched in AG 0.5-hour cAMP treatment ...... 155

4-6 Biological process GO terms enriched in AG 1-hour cAMP treatment ...... 157

4-7 Biological process GO terms enriched in AG 2-hour cAMP treatment ...... 159

4-8 Cellular component GO terms enriched in AG 0.5-hour cAMP treatment ...... 160

4-9 Cellular component GO terms enriched in AG 1-hour cAMP treatment ...... 162

4-10 Cellular component GO terms enriched in AG 2-hour cAMP treatment ...... 164

4-11 Molecular function GO terms enriched in RPlG 0.5-hour cAMP treatment ...... 165

4-12 Biological process GO terms enriched in RPlG 0.5-hour cAMP treatment ...... 166

4-13 Cellular component GO terms enriched in RPlG 0.5-hour cAMP treatment ..... 167

9

LIST OF FIGURES

Figure page

1-1 Schematic diagram indicating the major components of the Aplysia gill withdrawal reflex ...... 30

1-2 Schematic diagram of the ganglia comprising the Aplysia central nervous system ...... 31

3-1 Saturation curves from four neurons that received ERCC spike-ins...... 111

3-2 Stacked bar chart of gene expression in single neurons ...... 112

3-3 Pairwise correlation of external spike-in abundances between sequencing lanes, indicating no obvious issues attributable to lane effects...... 113

3-4 Dose-response curve of external RNA spike-ins display a strong linear relationship between log2 of TPM normalized expression values and log2 of the absolute number of transcript molecules ...... 114

3-5 Venn diagram comparison of DE gene relative to R2 that the other five neuron classes (L7, LPl1, L11, rMCC, lMCC) have in common ...... 115

3-6 Principal component analysis (PCA) of single neurons. The principal component analysis can delineate between each type of neuron...... 116

3-7 Diagram indicating numbers of differentially expressed genes (green = unique transcripts, red = counting transcript isoforms) between identified deeply sequenced single neurons ...... 117

4-1 Bar chart of CREB2 mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia ...... 168

4-2 Boxplot of C/EBP mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia...... 169

4-3 Bar chart of tolloid/BMP-1-like (TBL-1) mRNA expression in left pleural and right pleural ganglia ...... 170

4-4 Boxplot of EGR1 mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia...... 171

4-5 Boxplot of CaM mRNA expression in left pleural and right pleural ganglia...... 172

4-6 Boxplot of reductase-related mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia...... 173

10

4-7 Boxplot of Uch mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia...... 174

4-8 The acyclic induced subgraph of the 10 most significant molecular function GO terms as identified by the “classic” algorithm using Fisher’s exact test on the 0.5-hour abdominal ganglia DE transcripts...... 175

4-9 Bar chart of glutamate dehydrogenase mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia...... 176

4-10 Venn diagram displaying transcripts that are commonly expressed by multiple ganglia ...... 177

4-11 Gene clusters exhibiting distinct temporal patterns of expression over the course of 8-Br cAMP treatment in the abdominal ganglion...... 178

4-12 Gene clusters exhibiting distinct temporal patterns of expression over the course of 8-Br cAMP treatment in the right pleural ganglion...... 179

4-13 Principal component analysis of 8-Br cAMP treated ganglia...... 180

4-14 Boxplot of KAT8-associated complex NSL subunit mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia ...... 181

4-15 Significantly differentially expressed genes detected between abdominal ganglia exposed to 8-Br cAMP and FSW controls ...... 182

4-16 Significantly differentially expressed genes detected between right pleural ganglia exposed to 8-Br cAMP and FSW controls ...... 183

5-1 Model of pre- and post-synaptic neurons indicating genes affected by activation of cAMP signaling pathway in Aplysia...... 189

B-1 Workflow for Molecular Biology and Bioinformatic Analysis...... 196

C-1 Unstacked bar charts corresponding to stacked bars of Figure 3-2...... 197

C-2 Absolute number of mRNA molecules estimated in single neurons using external RNA spike ins for calibration ...... 198

C-3 Histograms of log10 transformed absolute numbers of number of ion channel related transcripts for deeply sequenced neurons with RNA spike-ins...... 199

C-4 The percentages of the absolute number of ion channel transcripts present in eight of the 23 individual neurons that received external RNA spike-ins ...... 200

C-5 Global comparison the effects of several common data transformations on our scRNA-seq count data...... 201

11

C-6 Modification of Figure 3-1. Saturation curves for single neurons L11_1, L11_2, L7_5, and R2_5 using a gene expression threshold of 10 transcripts per million (TPM) or greater...... 202

C-7 These are saturation curves for single neurons L11_1, L11_2, L7_5, and R2_5 using a gene expression threshold of 100 transcripts per million (TPM) or greater ...... 203

12

LIST OF OBJECTS

Object page

3-1 Primary Data Master Tables. Table M2. (Master Data Table 2) Shallow Sequencing of Single Identified Neurons with Annotated TPMs ...... 55

3-2 Tables of Differentially Expressed Genes in Single Identified Neurons. Table 3-S1 R2vL7...... 60

3-3 Tables of Differentially Expressed Genes in Single Identified Neurons. Tables 3-S1-S15...... 71

4-1 Primary Data Master Tables. Table M3. (Master Data Table 3) Individual Ganglia treated with 8-Br cAMP with Annotated TPMs...... 124

4-2 Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Supplemental Tables 4-S1-S6...... 129

4-3 Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Supplemental Table 4-S7 Ganglia specific gene list...... 137

4-4 Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Table 4-S1 cAMP 0.5hr AG...... 140

5-1 Primary Data Master Tables. Table M1. (Master Data Table 1) Deep Sequencing of Single Identified Neurons with Annotated TPMs ...... 185

5-2 Primary Data Master Tables. Table M2. (Master Data Table 2) Shallow Sequencing of Single Identified Neurons with Annotated TPMs...... 185

13

LIST OF ABBREVIATIONS

0TP Zero-time point – refers to instantly lysed Aplysia ganglia

8Br-cAMP 8-Bromoadenosine 3',5'-cyclic monophosphate

ACh Acetylcholine

AG Abdominal ganglion

ASW Artificial sea water

BG Buccal ganglion

Bp Base pair of complementary nucleotides

CaCl2 Calcium chloride

C Celsius

Cat # Catalog number cDNA Complementary DNA

CG Cerebral ganglion

CNS Central nervous system

DE Differentially expressed

DNA Deoxyribonucleic acid

ELH Egg-laying hormone

ES Enrichment score

FDR False discovery rate aka Benjamini-Hochberg adjusted p-value1

FSW Filtered sea water

GFF General feature format. A standard text file format used for storing genomic features

HEPES N-2-hydroxyethylpiperazine-N'-2-ethanesulfonic acid

Kb Kilobase – one thousand base pairs

LPeG Left pedal ganglion

14

LPlG Left pleural ganglion

Mb Megabase – one million base pairs

MB Megabyte

MCC Metacerebral cell

µg Microgram

MgCl2 Magnesium chloride mM Millimolar – one thousandth of a mole per liter mRNA Messenger RNA mL Milliliter mm Millimeter

MMLV Moloney murine leukemia virus

NaCl Sodium chloride

Ng Nanogram ncRNA Non-coding RNA nt nucleotides

P-adj Benjamini-Hochberg adjusted p-value1 aka FDR

PCA Principal Component Analysis

PCR Polymerase chain reaction

RNA Ribonucleic acid

RNA-seq RNA sequencing

RPeG Right pedal ganglion

RPlG Right pleural ganglion

RT Reverse transcriptase. An enzyme capable of synthesizing DNA from an RNA template scRNA-seq Single cell/neuron RNA sequencing

15

ssDNA Single-stranded DNA ssRNA Single-stranded RNA

TPM Transcripts per million

16

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

MOLECULAR PROFILING OF CELLULAR IDENTITY AND PLASTICITY IN THE NERVOUS SYSTEM OF APLYSIA CALIFORNICA

By

Caleb James Bostwick

December 2019

Chair: Leonid L. Moroz Major: Medical Sciences

A fundamental goal of modern neuroscience is to understand the how the phenomena of "systems-level", top-down organismal behavior is generated from the cellular and molecular biology of the individual neurons comprising functional neural circuits. There currently exists a gap in our knowledge, at least partly because we still lack a comprehensive understanding of the genomic profile and transcriptomic characteristics of identifiable neurons. The immense complexity of the mammalian brain makes it difficult to study the molecular components of reliably identifiable neurons due to their small size and vast number. One approach to circumvent this problem is to examine the properties of numerically simpler and larger neurons to elucidate common mechanisms; such as are found in the marine mollusk Aplysia californica.

We report the most comprehensive transcriptomic characterization of cellular identity yet performed in individual neurons. Using a set of external RNA spike-ins, we estimate both the absolute abundance and the dynamic range of mRNA molecules detected in individual, identifiable neurons. These included the cholinergic, mucus- releasing motor neuron R2, the gill withdrawal reflex effector neuron L7, the motor neuron L11, and the left and right metacerebral cells (MCCs) - which are a pair of giant

17

serotonin-containing interneurons involved in feeding arousal. We sequenced and quantified neuronal mRNA expression levels, revealing Aplysia neurons express an average of 10,110 ± 2,660 transcripts per cell.

In addition to the transcriptomic profiling of individual neurons, we also analyzed neuronal plasticity within whole Aplysia ganglia treated to artificially stimulate the cAMP signaling pathway and mimic alterations in gene expression that mirror behavioral sensitization. We confirmed the regulation of many canonical sensitization-related transcripts as well as discovered a host of molecules that might play novel roles in the adaptive process, including a histone acetylase as well as a deacetylase, cellular metabolic/energetics-related enzymes, and cytoskeletal components. This study serves as a framework for the transcriptomic classification and identity of neuronal cell types, as well as highlights the plasticity and unique adaptations displayed by whole ganglia within a central nervous system.

18

CHAPTER 1 INTRODUCTION

Simple Models and Powerful Technologies for Understanding Neuroscience

Today one of perhaps the loftiest goals in the field of neuroscience, to fully understand the workings of the human brain – one of, if not the most complex objects in existence – is tantalizingly close to being within our grasp. Of course, one might argue the human brain itself has always been within our grasp, residing within a protective skull in each of our heads. It actively integrates and interprets the information transmitted to it by a myriad of sensory organs, driving our perceptions, generating our emotions, and using this incoming information along with that which is stored already in the brain (memories), it dictates the actions and behaviors that constitute the daily lives of each human being on the planet.

Despite the fact that each day neuroscience research moves us closer to our goal of total understanding and characterization of the nervous systems of a multitude of organisms, history has demonstrated that some nervous systems are more amenable to study than others. Mammalian nervous systems, with their intrinsic complexity and immense numbers of neurons and supporting cells (~86 billion neurons with an approximately equal number of non-neuronal cells are estimated to be in the human brain2) continue to pose a significant challenge to neuroscientists. It is not clearly understood how individual neurons and their synaptic connections contribute to the learning process. The field of neurobiology has benefitted enormously from the utilization and study of a variety of nervous systems from both vertebrate and invertebrate model . For example, Alan Hodgkin’s and Andrew Huxley’s discovery of the fundamental molecular basis of the action potential using the giant

19

axons of the squid3. Using this invertebrate mollusc, Hodgkin and Huxley were able to elucidate vital tenants of biological electrical signaling that were shared by all currently known nerve cells, including those of mammals.

Some species from another class within phylum Mollusca () possess large and unambiguously identifiable neurons that can be reliably detected in each individual organism. These large neurons facilitated research into the neural circuits underlying simple behaviors and how the cells in these circuits communicate by enabling stable intracellular electrophysiological recordings4. Insights gained from the study of these massive, identifiable neurons included early progress in understanding the structure of neural synapses5, neurosecretion6, chemical signaling7, as well as the neural basis of behavior8,9. In particular, one species of gastropod has served as an invaluable substrate for research into the delineation of the cellular and molecular mechanisms of learning – the sea hare, Aplysia californica.

Aplysia californica as a Model Organism for Learning and Memory Studies

Organisms of the genus Aplysia have been known to humans since the time of the Ancient Greeks10. They were called “sea hares” due to their large rhinophores which resemble the ear of a rabbit. Aplysia californica offers multiple advantages to neuroscientists studying mechanisms of learning and memory not found in vertebrate model systems. In addition to the relatively small number of neurons present in Aplysia

(~10,00010) compared to mammals, Aplysia neurons are much larger in size and amenable to electrophysiological manipulations, microchemical analysis and direct single cell RNA sequencing (scRNA-seq). These advantages enable the study of individual neurons comprising the neural circuits responsible for controlling specific reflexive behaviors. A key defensive reflex used to study non-associative behaviors

20

such as habituation and sensitization in A. californica is the gill-withdrawal reflex11,12. A weak tactile stimulus applied to the siphon of the animal results in the withdrawal of the siphon and gill into a protective mantle sheath13. Upon repeated touching the reflex gradually decreases in response to stimulus. This is known as habituation. Repeatedly subjecting the tail of the animal to noxious electrical shock before siphon stimulation will increase the duration and vigor of the withdrawal reflex. This type of learning is sensitization, and both it and habituation are examples of so-called implicit memory.

Implicit memory is memory for motor skills and reflexes which are not consciously recalled, in contrast to explicit memory, which is a conscious recall of factual knowledge regarding people, places, and things and is mostly associated with vertebrate brains14.

The strength of the Aplysia withdrawal reflex was discovered to be modifiable by at least two types of non-associative learning (habituation and sensitization), as well as a form of associative learning (classical conditioning), providing researchers with an readily observable measure of behavioral change15. Additionally, many of the behavioral features exhibited by this reflex are shared by mammalian learning, which suggests that learning in Aplysia and mammals may share common mechanisms14. Research has indicated that alterations in the structure and strength of synaptic connections between neurons plays a crucial role in both implicit and explicit forms of memory15. Two major components of the defensive gill withdrawal reflex are the direct monosynaptic connections from siphon sensory neurons onto gill motor neurons and polysynaptic

(heterosynaptic) connections that communicate information from sensory neurons to motor neurons via a network of interneurons capable of modulating the strength and duration of the response16. In general, two different broad types of memory storage

21

have been described and are differentiated by their temporal characteristics: short-term memory (STM) lasting on a timescale of minutes to perhaps hours, and long-term memory (LTM) lasting days, weeks, or even the entire lifespan of an organism17. It is well accepted that long-term memory formation requires the synthesis of new proteins and mRNAs15, however the specifics of the molecular mechanisms vital to this process, i.e. - how particular gene cohorts and their protein products interact with each other to contribute to the culmination of formation and recollection of a memory are still unknown. My research sought to answer the questions: which genes are expressed by individual neurons comprising the various components of the gill withdrawal reflex? Do those genes vary across different types of neurons (motor neurons, interneurons, etc.)?

What is the essential complement of genes (ion channels, receptors, metabolic enzymes, homeostatic maintenance genes, non-coding RNA, etc.) required for a neuron to function? Is this a function of neuronal type? How do genes expressed by neurons differ from those expressed in other tissues and organ systems?

Next-generation Single Cell RNA Sequencing (scRNA-seq)

The fields of biological sciences and medical have advanced into a cellular and molecular era where the most basic units of life (the cell) and its constituents can be examined and manipulated. The advent of next-generation sequencing technologies and advances in molecular biology have enabled high-throughput interrogation of individual, identifiable cells. Recent improvements in RNA sequencing (RNA-seq) methodologies, bioinformatics techniques, and big data processing have facilitated the capture and quantitation of gene expression at a single cell resolution, revealing extensive transcriptional information about individual cells. Single cell RNA-seq (scRNA- seq) has become the method of choice for answering key biological questions regarding

22

cell heterogeneity and variability from rare and valuable biological samples such as from early developing embryos, cancer cells, and neurons18. Furthermore, scRNA-seq can provide researchers with integral information about the characteristics of gene expression and transcriptional responses to various conditions19. This cutting-edge technique has the potential to revolutionize the fields of biological and clinical research and usher in an era of personalized genomic medicine. However, for all its potential advantages scRNA-seq must be performed carefully and the results interpreted with appropriate vigilance. Due to the intrinsically low amount of starting material, scRNA- seq is limited by low capture efficiency and sensitivity to outside contaminating nucleic acid influences18. These issues are directly influenced by the methods and techniques utilized to isolate single cells from their biological environment. We encountered challenges when isolating individual neurons from Aplysia ganglia despite their large size and relative ease of isolation compared to mammalian neurons. While the soma of the neurons were often relatively easy to isolate via microdissection using blunted glass electrodes due to their location on the periphery of the central nervous system ganglia

(see Tissue isolation in Methods), the neuronal processes emanating from the soma are entangled with those of other neurons and glial cells in a neuropil at the center of each ganglia20. It is evident from our single neuron transcriptome data that highly abundant secretory molecules and glia cell markers (such as the Aplysia glial (Ag) transcript21) are present in the transcriptomes but not actually transcribed by the individual isolated neurons. We are aware of this from the over 300 in situ hybridization mRNA localization experiments collectively performed by the Moroz laboratory group (mostly unpublished data, but see examples of genes such as sensorin22, FMRFamide23, pleurin and

23

myoinhibitory peptide (MIP) neuropeptide precursors24, and nitric oxide synthase25).

These experiments indicate that abundant representation of certain neurosecretory and glial markers (Ag) within a single neuron transcriptome is most likely due to synaptic contamination from the cellular processes. We discovered an additional potential source of extracellular contamination in that small cells may be lodged within the folds and grooves of the surface of the larger neurons and can be seen in electron micrographs

(unpublished data). The total contribution of this extracellular material to the single neuron transcriptomes is unknown, but the stochastic effects can be mitigated through repeated sequencing of biological replicates and sensible data interpretation.

We performed deep single cell RNA-seq on 23 identifiable neurons and shallow coverage sequencing on an additional 96 single neurons isolated from the nervous system of Aplysia californica. We calibrated these transcriptomes with external RNA spike-ins to estimate, for the first time, the absolute abundance of endogenous RNA molecules expressed by identified neurons, detecting an average of 10,110 ± 2,660 transcripts per cell. These cells represent six classes of identified neurons, including cholinergic motor neurons controlling mucus release, motor neurons mediating the gill- withdrawal reflex, homologous serotonergic interneurons, and nociceptive sensory neurons that are an important site of synaptic plasticity related to sensitization learning26.

Biology of Identified Single Neurons

For our interneuron representatives we chose the serotonergic, homologous giant metacerebral cells (MCCs) located in the anterior of the cerebral ganglion of A. californica. Investigations into the neuronal basis of feeding mechanisms revealed that the presentation of food to A. californica induces arousal that may be mediated by the

24

pair of bilaterally symmetric left and right MCC neurons27. Deletion of the MCCs in A. californica reduced the feeding rate of animals by about 40%27.These neurons employ serotonin (5-HT) as a neurotransmitter28,29 and homologs are found in several other molluscan species30,31. These neurons are considered homologous on the basis of multiple criteria including their symmetrical axonal distribution, synaptic inputs and connections, electrophysiological properties, pharmacological responses, and plasticity30. The MCCs project both ipsilaterally and contralaterally into the feeding network and receive feedback from gut sensory neurons indirectly through the C1 neuron and directly from the lip as well as the feeding network central pattern generator neurons located in the buccal ganglion30,32. The ventrocaudal (VC) mechanoafferent sensory neurons consist of about 20 neurons that lie within a homogenous, compact cluster of approximately 200 cells ranging in size from about 40-80 µm on the ventrocaudal surface of each A. californica pleural ganglion33 (see Figure 1-1). The VC neurons innervate a large part of the body, including the tail and are activated by pressure or electrical stimulation33. Their role as nociceptive tail mechanosensory neurons indicates a role for the VC neurons in several forms of learning and behaviorally paradigms, including respiratory pumping and escape locomotion34,35, sensitization26, and classical conditioning36,37. In the left caudal quarter (LCQ) on the dorsal surface of the abdominal ganglion lie two motor neurons, L7 and L1138 (see

Figure 1-1). L7 projects its axon down the brachial nerve to control muscles of the gill and is a primary effector motor neuron responsible for gill withdrawal39. Siphon sensory neurons directly synapse onto L7 and long-term sensitization training produces changes in the structure and strength of these monosynaptic connections40. L11 is usually the

25

largest cell in the LCQ and can range in diameter from 200-400 μm in animals weighing between 50-300 grams. It is pale and displays little pigmentation. L11 is spontaneously active, firing at a rate of 2-4 spikes per second and its firing is inhibited by stimulation of the abdominal connective nerve38. Cell L11 is a motor neuron that projects axons into both the siphon and genital-pericardial nerves and out into the periphery of the animal38.

The giant R2 neuron is the largest and most heavily pigmented cell in the abdominal ganglion, located on the right dorsal side (Figure 1-1) of the ganglion. It is silent and is one of three cells in the abdominal ganglion (the others being L1 and R1) to send axons into the abdominal connectives38. R2 and its bilaterally homologous and cholinergic sister neuron (LPl1 – located in the left pleural ganglion, see Figure 1-1 lower panel) are two of the largest neuron known to exist41. In larger A. californica these neurons can grow to over 1000 µm in diameter, with at least one R2 neuron reaching a size of 1.1 milimeters42. R2 and LPl1 have overlapping and nearly symmetrical axonal projections that extend over most of the body wall of A. californica and, when traced, were found to contact subepidermal glands containing mucus43. It was confirmed these giant neurons control mucus release from the animal body wall43. Additionally, R2 has been utilized for studies involving spike initiation44, axonal transport45, ionic components of action potentials46,47, acetylcholine metabolism28,48,49, and synaptic plasticity50. Due to their large size, the R2 and LPl1 neurons were also chosen as subjects for single neuron transcriptome microarray studies24 and analyzing the gene expression and methylation changes that occur in individual neurons during aging42. We choose this selection of neurons to ensure sampling from all the components (sensory, inter-, and motor neurons) of the A. californica gill withdrawal reflex as well as for their large size (except

26

in the case of the VC neurons) and stereotypical anatomical location, which made isolation easier. We performed differential expression, functional enrichment, and unbiased classification analyses to generate comprehensive descriptions of the transcriptomic identity of these neurons. This study serves to elucidate the complement of similar and differential gene expression within individual, identified neurons and as well as the heterogeneity and range of expression inherent to these cell types.

Aplysia Learning and Memory Circuits and cAMP Signaling

Following our investigation of cellular identity, we next aimed to characterize genetic mechanisms of cellular plasticity in the A. californica nervous system at the level of individual ganglia (see Figure 1-2 for a schematic diagram of the central Aplysia ganglia).

Aplysia: Model for a Reductionist Approach to Studying Learning and Memory

The cells of the A. californica central nervous system have been extensively studied for decades to gain insights into the molecular mechanisms of learning and memory14. Aplysia was selected for this task due to its large and individually identifiable neurons. These neurons were relatively easy to record from using electrophysiological techniques and this enabled researchers to delineate the neural circuits of simple reflexes that were modifiable by different forms of learning14. One type of fear-learning was a particular focus of early researchers – sensitization. It was determined that brief shocks to the tail of an Aplysia produced short-term facilitation of the defensive withdrawal reflex, while repeated training resulted in longer-term facilitation that could last for days or weeks15.

27

Molecular Biology of Short- and Long-term Memory

The neural circuits of the withdrawal reflex are partly constituted by direct monosynaptic connections from sensory to motor neurons, as well as by polysynaptic connections through a network of excitatory and inhibitory interneurons40. As a result of behavioral sensitization, stimulation applied to the siphon increases the degree of withdrawal of both the siphon and gill due to increased motor neuron activity. This is due to enhanced excitability in the sensory neurons (greater generation of action potentials in response to the same peripheral stimulus) and an increased excitatory postsynaptic potential (EPSP) in the motor neurons (known as short-term facilitation)51.

These effects are induced at least in part by release of the neurotransmitter serotonin

(5-HT) by modulatory interneurons in the circuit that binds to cellular receptors present on the sensory and motor neurons52-54. The molecular mechanisms of 5-HT were further explored in cell culture where it was found that the effects of 5-HT were mediated by cyclic adenosine-3-monophosphate (cAMP) and a second-messenger-activated protein kinase, protein kinase A (PKA)55.

We sought to mimic the behavioral effects of sensitization using a soluble cellular activator of the cAMP signaling pathway, 8-bromo cAMP (8-Br cAMP), in an analogous manner to the use of chemical application of 5-HT in cell culture preparations. We choose 8-Br cAMP because it enabled us to activate the intracellular cAMP signaling pathways while bypassing the need to induce activation of cellular 5-HT receptors. We performed RNA-seq on isolated abdominal, cerebral, buccal, left and right pedal, and left and right pleural ganglia to determine their complements of specific transcripts both with and without chemically-induced activation of the cAMP pathway. This experiment enabled us to examine the time course of gene expression in cAMP-mediated cellular

28

plasticity and how the expression profiles varied among the central ganglia of the nervous system in Aplysia californica.

29

ag cell clusters bdominal Ganglion a-pl 0 I I G 7 iphon

Fs

iphon ner e

ranchial ner e Genital-pericardial Gill ner e

c-pl eft leural and edal Ganglia

C l

lG

pl-p

eG

Figure 1-1. Schematic diagram indicating the major components of the Aplysia gill withdrawal reflex. Panel A depicts the ventral side of the abdominal ganglion displaying the LE sensory, L7 motor, LFs motor, L11 motor, L29 and L30 inter-neurons which comprise the circuit. Panel B illustrates one pair of the symmetrical pedal and pleural ganglia which are the site of the VC sensory neurons involved in sensitization learning. A-pl = abdominal-pleural connectives, c-pl = the cerebro-pleural connectives, pl-p = the pedal-pleural nerve connective.

30

Figure 1-2. Schematic diagram of the ganglia comprising the Aplysia central nervous system. Abbreviations: AG – abdominal ganglion, BG – buccal ganglion, CG – cerebral ganglion, LPeG – left pedal ganglion, LPlG – left pleural ganglion, RPeG – right pedal ganglion, RPlG – right pleural ganglion.

31

CHAPTER 2 METHODS

This chapter will describe the methodology used to generate the biological data we present and how we bioinformatically quantified it.

Molecular Biology for Construction of RNA-sequencing Libraries

The following sections describe the protocols we used for animal and tissue dissections. This is followed by a description of how we performed the 8-Br cAMP treatments on Aplysia ganglia. Next we discuss the molecular biology techniques we used to create RNA-sequencing libraries.

Animals and Cell/Tissue Isolation

Aplysia californica weighing between 40-200 grams were obtained from the

National Resource for Aplysia at the University of Miami and maintained in running saltwater aquaria at 15-17°C with ad libitum feeding access to red macroalgae (also obtained from the National Resource for Aplysia). Animals were anesthetized by injection of 50% (volume/body weight) isotonic MgCl2 (337 mM) before dissection.

Aplysia were pinned ventral (foot) side up to a Sylgard (Dow Corning, Midland, MI) coated tray and the body cavity was opened with a longitudinal incision from anterior to posterior. The digestive tract was pinned to one side to better expose the central nervous system (CNS). The CNS was removed and rinsed in chilled artificial sea water

(ASW - 460 mM NaCl, 10 mM KCl, 55 mM MgCl2, 11 mM CaCl2, 10 mM HEPES (pH

7.8)). Next, to remove the connective tissue from the CNS, it was transferred to a 1.5 mL Eppendorf polypropylene microcentrifuge tube (Thermo Fisher Scientific, Cat #: 13-

698-791) filled with a 1% solution of Protease Type XIV (Sigma-Aldrich, Cat #: P5147).

The tube containing the CNS was incubated in a water bath at 34°C for approximately

32

30-60 minutes (depending on the size of the animal – larger animals generally have more connective tissue surrounding the CNS) to partially digest the connective tissue of the neuronal sheath. The CNS was then pinned to a Sylgard dish in ASW and the neurons exposed by mechanical removal of the overlying sheath with fine forceps and scissors. Individual neurons were identified visually with a stereo dissecting microscope

(Olympus SZX9, Olympus America Inc., Melville, NY) and gently isolated from the CNS using glass microelectrodes pulled from 1.2 mm thin-walled borosilicate glass capillaries

(WPI, Sarasota, FL) pulled using a P-2000 microelectrode puller (Sutter Instrument Co.,

Novato, CA). Isolated neurons were then transferred via micropipette to a 0.5 mL

Eppendorf DNA LoBind microcentrifuge tube (Thermo Fisher Scientific, Cat #: 13-698-

790) containing 5 µL of 1X First-strand buffer (375 mM KCl, 15 mM MgCl2, 250 mM

Tris-HCl (pH 8.3)) from the SMART-Seq v4 Ultra Low Input RNA kit for Sequencing

(Clontech Laboratories Inc., Cat #: 634891).

Treatment of CNS Ganglia with 8-Br cAMP

For the 8-Br cAMP experiments, whole ganglia were isolated from the CNS and treated to induce activation of the cAMP-dependent pathway. Ganglia still connected to each other were removed from the animals by cutting the nerves connecting them to the body. They were then transferred to a small petri dish filled with filtered sea water and the individual ganglia were sequentially severed from the others and placed into 1.5 mL

Eppendorf tubes filled with either 1 mL of FSW (control ganglia) or 200 µM 8-Br cAMP in 1 mL FSW and incubated for 30 minutes, 1 hour, or 2 hours. During the incubation, tubes were placed in a cold-water bath maintained at 15-17 °C to minimize heat stress.

After incubation ganglia were removed from tubes and placed into new 1.5 mL

Eppendorf tubes containing 350 µL RLT buffer for cell lysis. Ganglia were briefly

33

sonicated to disrupt cellular membranes and free RNA for extraction. After lysis, tubes were frozen at -20 °C until RNA was isolated using the Qiagen RNeasy Micro Kit (Cat #:

74004) per the manufacturer’s instructions.

RNA Extraction

For individual neurons that received spike-ins (see later external RNA spike-ins section for more details), total RNA was first extracted from neurons using the Qiagen

easy icro Kit (Cat #: 74004) as per the manufacturer’s instructions. Dr. Moroz isolated all individually identified neurons used in this study. Contaminating genomic

D was remo ed with the “on-column” Qiagen ase-Free DNase Set (Cat #: 79254).

Samples were eluted in 14 µL of Ambion nuclease-free water (Cat #: AM9937) and stored at -80°C. RNA quality and concentration were determined using an Agilent 2200

TapeStation System (Part #: G2964AA) and High Sensitivity R6K ScreenTape (Cat #:

5067-5369) with High Sensitivity R6K Reagents (Cat #: 5067-5370) per the manufacturer’s instructions. cDNA Synthesis and Verification

Messenger RNA (mRNA) from single neurons and/or ganglia was converted into complementary DNA (cDNA) libraries using the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing protocol. This process is performed by a modified MMLV-RT enzyme with terminal transferase activity in conjunction with a “template-switch” primer, resulting in cDNA synthesized directly from a full-length mRNA template. All reactions were initiated with 5 µL of RNA at a concentration 0.2 ng/µL (for a total of 1 ng of RNA per reaction). Beginning the cDNA synthesis with the same starting concentration of RNA allows meaningful abundance comparisons to be made across samples. For samples with ERCC spike-in controls (samples listed in Appendix), see the following section.

34

After reverse transcription, the resulting cDNA was amplified by polymerase chain reaction (PCR). Library amplification was verified by running 5 µL of each sample on a

2% E-Gel (ThermoFisher Cat. #: G501802) before proceeding with the remainder of the protocol.

External RNA Spike-ins

We added a set of 92 external RNA standards to the purified endogenous cellular

RNAs isolated from single neurons for quality control purposes and to derive estimates of the absolute number of RNA molecules detected in identifiable individual neurons.

The spike in control mixes were purchased from ThermoFisher (ERCC ExFold RNA

Spike-In Mix Cat # 4456739) and were designed by the External RNA Controls

Consortium (ERCC) with oversight from the National Institute of Standards and

Technology (NIST)56. The pre-formulated set of 92 polyadenylated transcripts range from 250-2000 nt in length, span an approximately 106-fold concentration range, and are designed to mimic naturally occurring eukaryotic mRNAs. We added 1 µL of a

1:40,000 dilution from the ERCC stock mix 1 to each sample of RNA (concentration of 1 ng/µL) prior to library construction using the Clontech SMART-Seq v4 Ultra Low Input

RNA kit (Cat # 634889, Clontech Takara Bio Inc). We choose this spike-in volume to avoid overwhelming the endogenous RNA populations of the cells (of which mRNA is typically 1-5% of the total RNA). The spike-ins were added before poly(A) selection to control for variation throughout the entire library processing workflow.

Sequencing Library Preparation

After cDNA amplification was confirmed, each sample was cleaned using 1X volume of Agencourt AmpureXP magnetic beads (Beckman Coulter, Product # A63882) and magnetic plate (Beckman Coulter, Product # A32782). Samples were next

35

transferred into a Covaris Screw Cap microtube (Covaris SKU: 520096), for sonication using the M220 Focused-ultrasonicator™ (Co aris KU: 500 5) and sheared to approximately 400 base pairs (bp) using the Covaris SonoLab 7 software.

After sonication, we proceeded with library construction using the NEBNext®

Ultra™ II D ibrary rep Kit for Illumina ( # 7 0 / ). riefly, libraries were treated for end repair, adaptor ligation (using NEBNext Multiplex Oligos for Illumina Set

E7600S for dual indexing, or E7500S or E7335S for single indexing), SPRIselect size selection (Beckman Coulter, Product # B23317) (using recommended conditions for

300-400 bp approximate insert size), amplification, and final cleanup.

Library Quality Control and Validation

Completed sequencing libraries were assayed for quality and quantity using multiple methods. First, samples were run again on a 2% E-Gel (ThermoFisher Cat.

G501802) for visual confirmation and size determination. Samples were also characterized on the 2200 TapeStation System (G2964AA) using High Sensitivity D1K

ScreenTape (5067-5363) with High Sensitivity D1K Reagents (Ladder and Sample

Buffer) (5067-5364). Finally, samples were quantified using Qubit Fluorometric

Quantitation system 2.0 (Invitrogen Catalog #: Q32866) and Qubit dsDNA BR Assay Kit

(Invitrogen Catalog #: Q32850). Equimolar samples of final libraries were pooled with other multiplexed samples, and the pool was sequenced using Illumina NextSeq500 or

Illumina HiSeq3000 instruments. Sequencing was performed at the University of Florida

ICBR Next-Gen Sequencing Core in Gainesville, FL (http://www.biotech.ufl.edu/).

RNA-Sequencing Quality Control, Read Mapping, and Exploratory Data Analysis

RNA-sequencing (RNA-seq) datasets were first processed to remove library adapter sequences and low quality bases using the program Trimmomatic (v0.32)57.

36

Trimmomatic quality filtering uses a sliding window approach to trim poor-quality bases from reads starting from the 5’ end and remo ing the remainder of the read if the average Phred quality score within the window declines below 20 (corresponding to the base call accuracy dropping below 99.0%). After trimming, reads shorter than 35 nucleotides (nt) were excluded from further analysis. Trimmomatic parameters for the library adapter removal, sliding window trimming, and removal of reads less than 35 nt in length were as follows:

ILLUMINACLIP::2:30:10; SLIDINGWINDOW:8:20;

MINLEN:35

The remaining reads were mapped to the Aplysia californica AplCal3.0 genome assembly (Accession number: GCF_000002075.2) using the RNA-seq alignment and mapping software STAR (v2.3)58. Datasets that failed to uniquely map 60% or more of their reads to the AplCal3.0 reference genome assembly using STAR were discarded from further analysis. STAR mapping was performed using default parameters. For detailed explanations of STAR parameter options, see the STAR manual.

Uniquely mapping reads were summarized and quantified using the program featureCounts, which is part of the Rsubread software package (v1.22.3)59. We used the default parameters of featureCounts to summarize the reads, except as follows: isPairedEnd = TRUE; maxFragLength = 800; allowMultiOverlap = FALSE; countMultiMappingReads = FALSE

Read Summarization

The summarized reads were then provided to the software package edgeR

(v3.14.0)60 in order to compute the fragments per kilobase per million mapped reads

(FPKMs)61 and transcripts per million mapped reads (TPMs)62. FPKMs are the paired-

37

end sequencing equivalent of reads per kilobase per million (RPKMs)63. The rpkm function in the edgeR package was used to compute RPKMs/FPKMs. TPMs were calculated as described in Wagner and colleagues62 by multiplying the FPKM value for each gene by a scaling factor of 1,000,000 divided by the sum of the FPKM values for all genes in the transcriptome. Differential gene expression testing was performed using edgeR and DESeq264 (v1.11.23). The packages Rsubread, edgeR, and DESeq2 are available as part of the open-source Bioconductor software project65, which is based on the R statistical programming language66.

Data Visualization

We utilized a variety of programs to create the figures and graphs depicted in this manuscript. The majority of the graphs were created using the R statistical computing language66 and the packages described in this Methods section. The primary package used for generating charts and data graphics was the ggplot2 package62,67. Additional charts were constructed using Microsoft Excel 365 (version 1906). Schematic illustrations were prepared using dobe Illustrator C 6™.

Principal Component Analysis

We performed exploratory data analysis to visualize sample-to-sample differences among single neurons and ganglia. Using an ordination technique known as principal component analysis (PCA), we examined the similarity of the transcriptomes.

We performed a variance stabilizing transformation (using the variancestabilizingtransformation function on the count data, before using the plotPCA function (both from the DESeq2 package) to generate the PCA which was plotted using the ggplot2 package. Raw count data was obtained using the Rsubread package function featureCounts59.

38

Single neurons were sequenced using a combination of Illumina HiSeq3000 and

NextSeq500 instruments. The L7, L11, R24, R25, and R26 neurons were sequenced with the HiSeq3000 device. Principal component analyses were conducted using the plotPCA function included in the DESeq264 package, as well as the prcomp function from the R base package stats.

Saturation Curves

Saturation curves were constructed by randomly sampling increasing fractions of reads (10%, 20%, 30%, etc.) from a transcriptome and quantifying the number of genes detected in each fraction. The random sampling was performed using the program seqtk (https://github.com/lh3/seqtk). After quantification, the number of transcripts detected in each fraction was plotted against the number of reads sampled to generate the curve for that respective neuron.

Linear Regression Curve Fitting

Linear regression curves were plotted to determine the relationship between

TPMs and the absolute number of molecules within a single neuron. Also see section on absolute scRNA-seq transcript abundance below.

Differential Expression Analysis

We compared normalized and quality filtered gene expression profiles amongst the classes of individual neurons to determine which transcripts were significantly differentially expressed. We employed the statistical models implemented in the R packages: DESeq264 and edgeR60. DESeq2 and edgeR are recognized as being among the best performing software available for the analysis of differential expression in RNA- seq experiments68. Both edgeR and DESeq2 model read counts using a negative binomial distribution (also known as the gamma-Poisson distribution) and both use an

39

empirical Bayes strategy to shrink the estimated gene dispersion. The programs differ in the specific way each estimates the gene-wise dispersion64,69. Transcripts were defined as significantly differentially expressed if they exhibited greater than a two-fold log2 change with a Benjamini and Hochberg1-adjusted false discovery rate of less than 0.1.

Calculating Relative scRNA-seq Transcript Abundance

We first quantified RNA expression using the abundance metric transcripts per million (TPM), from Wagner et al.62 as

푋 1 푇푃푀 = 푖 ∗ ∗ 106 (2-1) 푖 푙 퐺푋푔 푖 ∑푔 푙푔

The first term on the right-hand side of Eq (2-1) is the number of mapped reads Xi for gene i, divided by the length of the gene li. The denominator of the second term is the sum of the mapped reads for every gene g (g = , , …, G) divided by its length l. The third term is a normalization term to avoid dealing with fractional numbers. The goal of quantitative gene expression measurements is to directly estimate the concentration of different RNA molecules within a cell. Measuring mRNA abundance also requires information about the number of cells sampled and their volume. In most RNA-seq experiments there is no or little information available regarding these variables62.

Therefore, the transcripts per million (TPM) metric was devised as a consistent measurement that proportional is to the relative molar concentration of each mRNA species.

Absolute scRNA-seq Transcript Abundance

One of the chief advantages of studying the large neurons of the Aplysia CNS is that we have information about cellular number and RNA volume. Using this information along with the relative molar concentration of each mRNA species, we were able to

40

derive estimates of the absolute abundance of individual mRNAs in identifiable cells using calibrated external RNA standards of known abundance as follows (see Table 2-1 for descriptions of symbols used)

The absolute number of molecules for each transcript yi,j is given by

푙표푔2(푇푃푀푖,푗) = 푚 (푙표푔2(푦푖,푗)) + 푏 (2-2)

Solving Eq (2-2) for 푙표푔2⁡(푦푖,푗) we have

푙표푔 (푇푃푀 )−푏 2 푖,푗 = 푙표푔 (푦 ) (2-3) 푚 2 푖,푗

Rewriting Eq (2-3) in exponential form and multiplying by ogadro’s constant (푁퐴), the spike-in scaling factor (퐶), and the volume of the RNA used to construct library j divided by the dilution factor (1 to 40,000) of the spike-ins (푉),

푙표푔2(푇푃푀푖,푗)−푏 푦푖,푗 = 2 푚 (퐶)(푁퐴)(푉) (2-4)

Eq (2-4) was applied to each transcript i using the fitted regression model derived from the spike-ins detected in library j to calculate the absolute number of RNA molecules.

Localization of Cellular mRNA

To visualize the expression of mRNA transcripts in specific cells, we used an slightly modified version of in-situ hybridization described in Jezzini et al23.

Molecular Cloning

Short (20-25 nt) oligonucleotide primer sequences for PCR were designed to flank transcribed regions (usually within the coding sequence (CDS)) in genes of interest. Primers were constructed using the web-based design tool Primer3 based on criteria such as an approximately 50% G/C% content, a melting temperature (TM) near

60 ºC, and length of about 20-25 nt. Transcripts were cloned from cDNA libraries

41

(created from A. californica CNS) using a high-fidelity TaKaRa LA Taq® DNA polymerase (Takara Cat #: RR002A) for PCR. PCR products were purified using the

MinElute PCR Purification Kit (Qiagen Cat #: 28004), ligated into a vector plasmid using the TO O™ T Cloning™ Kit for equencing, and transformed into One hot™ TO 0

Chemically Competent E. coli bacteria (ThermoFisher Cat #: K4575J10) for amplification. Plasmids were extracted and purified with the QIAprep Spin Miniprep Kit

(Qiagen Cat #: 27104). The veracity of insert DNA sequences were confirmed by commercial sequencing at Macrogen Inc (USA).

Synthesis of Antisense Probes Labelled with Digoxygenin

The Roche DIG labeling Kit (Sigma Cat #: 11277073910) was used to generate digoxygenin (DIG)-labeled probes. Depending on the direction (5’- ’ or ’-5’) of the D insert into the vector during molecular cloning, antisense probes were generated by use of either Not1-HF (NEB Cat #: R3189S) restriction enzyme to linearize the plasmid, followed RNA synthesis via a Roche T3 RNA polymerase (Sigma Cat #: 11031163001), or by using Pme1 (NEB Cat #: R0560S) restriction enzyme to linearize the plasmid, followed RNA synthesis performed with a T7 RNA polymerase (Sigma Cat #:

10881767001). DIG-labeled antisense RNA probes were cleaned up using the RNeasy

MinElute Cleanup Kit (Qiagen Cat #: 74204).

In-Situ Hybridization of Antisense Probes with Cellular mRNA

After dissection, the CNS was treated with Protease Type XIV (Sigma-Aldrich,

Cat #: P5147) and fixed with 4% paraformaldehyde (PFA) in phosphate buffered solution (PBS) at 4°C for three to six hours before being desheathed with fine scissors and forceps. Once connective tissue was removed, the CNS was dehydrated by sequential incubation in 25%, 50%, and 75% methanol in 0.1% Tween 20 in PBS (PTW)

42

solutions, followed by a 5-minute incubation in 100% methanol. The CNS was then rehydrated by following the dehydration procedure in the reversed order. The CNS system was next treated with a 10 µg/mL proteinase K in PTW solution for 1 hour at room temperature and fixed with 4% PFA in PBS for 20 minutes at 4°C before washing twice in PTW. The system was put into 0.1 M Triethanolamine hydrochloride, pH 8.0

(TEA HCl) for 5 minutes before 2.5 µL/mL acetic anhydride was added to the solution and incubated for 5 more minutes (two times). Next, three washes with PTW, and the system was subsequently placed in hybridization buffer (50% formamide, 5 mM EDTA,

5X saline sodium citrate (SSC) buffer; (1X SSC = 150 mM NaCl, 15 mM sodium citrate; pH 7.0), 1X Denhardt solution (0.02% ficoll, 0.02% polyvinylpyrrolidone, 0.02% bovine serum albumin (BSA)), 0.1% Tween 20, and 0.5 mg/mL yeast tRNA) at 50°C for 6-8 hours.

The next step was to add the antisense probe and incubate at 50°C for 12 hours, followed by subsequent washes in 50% formamide/5X SSC/1% sodium dodecyl sulfate

(SDS) at 60°C, then 50% formamide/2X SSC/1% SDS at 60°C, and then 0.2X SSC at

55°C two times for 30 minutes each. Systems were then transferred to PBT (0.1%

Triton-X 100 and 2 mg/mL BSA in PBS) for three washes. Goat serum was added to the third PBT wash at a concentration of 10% by volume and the CNS was incubated at 4°C for 60–90 minutes with gentle rocking. Next the system was placed in 1% goat serum in

PBT with a 1:1500 dilution of alkaline phosphatase conjugated DIG antibodies (Roche

Cat #: 11093274910) overnight (12-16 hours) at 4°C with gentle shaking.

After overnight incubation the CNS was washed in PBT three times for 20 minutes each at 4°C followed by three 5-minute incubations in detection buffer (100 mM

43

NaCl, 50 mM MgCl2, 0.1% Tween 20, 1 mM levamisol, 100mM Tris-HCl, pH 9.5) at

4°C. After this final series of washes, the CNS was incubated in 20 µL NBT/BCIP in

1mL of detection buffer to initiate the colored precipitation development. The system was kept on ice in the dark for 30 to 90 minutes and checked periodically to determine how the reaction was proceeding. Upon completion of development, the reaction was stopped by transferring the system to 4% PFA in methanol for 60 minutes at 4°C, followed by two final 10-minute washes in 100% ethanol at 4°C.

44

Table 2-1. Symbols and definitions used for absolute RNA quantification Symbol Definition Volume of RNA used to construct library in microliters (µL) divided by the V spike-in dilution factor (40,000) 23 NA ogadro’s constant ≈ 6.023⁡ ×⁡10 푚표푙푒푐푢푙푒푠⁄푚표푙 m Slope of log-log linear regression model b Intercept of log-log linear regression model

C Attomole scaling factor for spike-in concentrations:⁡⁡(1⁡푋⁡10−18) yi,j Number of molecules of transcript i in library j with RNA spike-ins Normalized expression value of transcript i in transcripts per million (TPM) TPM i,j for library j

45

CHAPTER 3 ABSOLUTE QUANTIFICATION OF MESSENGER RNAS FROM IDENTIFIABLE SINGLE NEURONS IN APLYSIA

Overview of Single-Cell RNA Sequencing

Single-cell RNA sequencing (scRNA-seq) is an emerging genetic technique which enables the transcriptional querying and quantification of RNA molecules present in indi idual cells. It has e ol ed from the broader field of “bulk” sequencing ( - seq or transcriptomics) which is the study or profiling of RNAs originating from multiple cellular sources such as organs or tissues. scRNA-seq provides unparalleled resolution in the study of cellular heterogeneity compared to traditional bulk tissue RNA-seq70.

Although this has broadened our basic understanding of the cell as a functional unit 71, it also presents additional challenges that must be met to ensure the accurate and unbiased quantification of RNA expression levels. Issues such as noise, technical errors, and biases can have a comparatively greater effect on scRNA-seq measurements, which necessitates the careful analysis and normalization of scRNA- seq data before making biological conclusions72.

The innovations in the fields of molecular biology and next-generation sequencing (NGS) have been paralleled by the recent rise of computational biology techniques and ideas devised specifically to enable the analysis and normalization of scRNA-seq data72-75. Many of these methods were developed to directly overcome the challenges presented by scRNA-seq datasets, including high levels of technical noise – perhaps best encapsulated by the large numbers of gene dropouts. These are transcript counts of zero (or near-zero). These gene dropouts are generally considered to be due to the low amount of input RNA used to construct cDNA sequencing libraries for scRNA-

46

seq experiments76. The RNA capture efficiency can also vary from cell to cell, which hinders accurate comparison and normalization of counts between cells70.

Mammalian Single Cell RNA-seq Studies

The majority of scRNA-seq studies performed to date on mammalian nervous systems have used rodents as their experimental organism. The mouse (Mus musculus) and its rodent cousin, the rat (Rattus norvegicus), are the primary mammalian model organisms used in neuroscience research (as well as other important biomedical fields) today and have held this distinction for over a century77. Usoskin and colleagues78 conducted RNA-seq experiments on 799 single cells from the mouse lumbar dorsal root ganglion (DRG). They generated 2.76 billion reads, of which 1.3 billion (47.1%) were properly barcoded and unambiguously assigned to reference sequence (RefSeq) gene models. They reported an average of approximately 1.14 million reads mapping to 3,574 ± 2,010 distinct genes in each individual cell. Mouse (as well as human genomes) are thought to contain about 20,000 total genes79, which would indicate the average cell in this study expressed roughly 7-28% of the total mouse genome.

The somatosensory S1 cortex and hippocampus CA1 regions of the murine brain were also subjected to scRNA-seq analysis80. Based on 3,005 single cell transcriptomes, they identified 47 distinct cellular subclasses. The group reported detection of approximately 5,000 to 35,000 polyA+ RNA molecules and about 1,800 to

6,000 total genes per cell. Tasic et al81 investigated cellular diversity in the mouse neocortex, analyzing 23,822 cells from the primary visual cortex and the anterior lateral motor cortex. They defined 133 transcriptomic types, sequenced to a median depth of

2.54 million reads and median number of 9,462 genes detected per cell. They found

47

nearly all the GABAergic neurons to be present in both the visual cortex and the lateral motor cortex, whereas the majority of glutamatergic neurons were found in one of the two areas. Recently, fetal human cortical tissue was surveyed and analyzed via scRNA- seq82. They collected 2,394 single cells, 2,309 of which passed quality control and, on average, detected 1.1 million mapped reads and 2,654 genes per cell.

These studies have revealed molecularly distinct cell populations and have been concerned with the unbiased molecular classification of cellular subtypes and cell stages in heterogenous cell populations. Although significant headway has been made, gaining a truly comprehensive understanding of the cellular basis of mammalian neural circuity and synaptic interconnections remains a significant challenge, despite coordinated, multidisciplinary scientific efforts like the Brain Research through

Advancing Innovative Neurotechnologies (BRAIN) Initiative and other National Institutes of Health (NIH) projects83. One of the I Initiati e’s major goals is to classify the various subtypes of mammalian neurons and improve our understanding of their roles in processing and integrating information84. Historically, neurons have been catalogued based on identified morphological, electrophysiological, genetic, transcriptomic, and molecular properties, or a combination thereof. As centuries of biological study have demonstrated, it is often more insightful and ultimately fruitful to address fundamental biological questions by utilizing “simpler” and more tractable model systems. In invertebrate nervous systems, individual neurons have been uniquely identified that share similar properties from organism to organism85. In particular, gastropod molluscs have proven to be vital research subjects for the study of neuronal properties and circuitry due to their large and reliably identifiable neurons that enabled neuroscientists

48

in the 1950s to make stable intracellular recordings86. Eventually, neural circuits responsible for directing and modifying simple reflex behaviors were discovered. This work eventually led to the delineation of cellular mechanisms of learning and behavioral modification in the nervous system of the sea hare Aplysia californica, for which Kandel was awarded the Nobel Prize in Physiology and Medicine in 200015. Aplysia californica possesses several advantages in the study of the cellular basis of behavior: 1) behaviors are produced by only a small number of large and easily identifiable neurons,

2) the identified nerve cells form precise connections that are detectable electrophysiologically, 3) these neurons are uniquely identifiable in every animal of the species, as well as homologous neurons in even distantly related species.

Characteristics of the Aplysia californica Genome

Mindful of these advantages, we took advantage of the enormous quantity of

RNA yielded by the gigantic neurons of the gastropod mollusk Aplysia californica to generate quantitative transcriptome data relative to calibrated external standards of known concentration. This allowed us to estimate the absolute quantities of RNA molecules present in the transcriptomes of identifiable, individual neurons as well as compare differentially expressed and functionally enriched transcripts across cholinergic, serotonergic, motor, and interneurons found in the A. californica central nervous system (CNS). The serotonergic interneurons we examined were the left and right giant metacerebral cells (MCCs), part of the cerebral central ring ganglion. The

MCCs extrinsically modulate the central pattern generator (CPG) of the feeding network in Aplysia californica87. We also analyzed the transcriptomes of nine identified cholinergic motor neurons; six R2 and three LPl1 neurons. Finally, we investigated

49

motor neurons known to be involved in the A. californica defensive withdrawal reflex – the L7 gill withdrawal motor neuron and L11.

The Aplysia californica genome assembly (Genbank designation: AplCal3.0;

Accession Number: GC _00000 075. ) currently exists in a “scaffold-le el” state of assembly. This indicates that some of the sequencing contigs (which are continuous sequences of DNA resulting from the reassembly of smaller DNA fragments (reads) obtained from DNA sequencing instruments) have been connected across gaps into scaffolds. The 4,332 scaffolds have not yet been further assembled into chromosomes.

The A. californica genome is thought to be composed of 17 haploid chromosomes and is 927,310,431 bp in length (~927 megabases (Mb))88,89. The AplCal3.0 assembly contains 164,545 contigs with an N50 = 9,586 bp and L50 = 16,681. N50 is defined as the largest length L such that 50% of all nucleotides in the genome are contained in contigs of size at least L90. Nature 409.6822: 860-921.). L50 is the rank of the contig

(when ordered from largest to smallest) that corresponds to the N50 length. The

AplCal3.0 assembly serves as the reference genome for the analyses performed in this manuscript. It describes 28,999 gene models, a term that we define as any computational prediction, mRNA sequencing, or genetic characterization that results in the description of a gene product (The Arabidopsis Information Resource (TAIR), https://www.arabidopsis.org/help/helppages/generesu.jsp, on www.arabidopsis.org, Apr

4, 2019.).

Results of Absolute mRNA Quantification in Identifiable Single Neurons

We based our analyses on the AplCal3.0 reference genome assembly. The

AplCal3.0 gene models can be divided into four categories as shown in Table 3-1: mRNAs, non-coding RNAs (ncRNAs), tRNAs, and pseudogenes. Of the 28,999

50

transcript models, 7,926 (27.3%) lack annotation. These transcripts are described as uncharacterized or hypothetical (hypothetical meaning a protein is predicted to be encoded from an open reading frame, but there exists no experimental evidence of translation91. Uncharacterized transcripts display low sequence similarity/identity

(homology) to other transcripts or proteins of known function and thus cannot be annotated by computational tools92, while hypothetical transcripts can be “conser ed” and exist in the genomes of organisms from several phylogenetic lineages, but lack functional annotation93. The percentage of unannotated genes in the current A. californica genome release (27.3%) is slightly lower than that found in most sequenced genomes (estimated to be at 30-40%)94,95. The mRNA gene models can be further subdivided into two categories. The vast majority (28,756) of model transcripts and their corresponding protein products were computationally generated by the NBCI eukaryotic genome annotation pipeline. Annotations for remaining 243 genes were mainly derived from Genbank cDNA and expressed sequence tag (EST) data and are supported by the

National Center for Biotechnology Information (NCBI) reference sequence (RefSeq) eukaryotic curation group. A total of 3,924 gene models have one more or more isoforms and collectively these alternatively spliced transcripts make up 7,986 of the

28,999 total gene models (27.5%). The most isoforms transcribed from a single gene locus is 42, from a glutenin-like protein (glutenin’s are high-molecular weight protein components of gluten).

Not all the 28,999 gene models were detected in the single neuron sequencing data. There were 638 transcript models not detected as having a single read assigned to them in any of the 23 neurons (2.2%). Conversely, this means that at least one read

51

(that passed quality control, see Methods) was mapped to 97.8% of the gene models, generating an integer count of greater than 0 for that gene. We found that the number of genes detected varied between cell types and individual neurons; these numbers are recorded in Table 3-2.

Table 3-2 displays the six groups or “cell types” of the identified neurons we surveyed. Individual neurons belonging to a group are differentiated with underscores or subscripts after their primary identifier. Table 3-2 shows the numbers of gene models that have zero reads mapped to them in each individual cell. Subtracting these numbers from the total of 28,999 gene models, we calculate that the average individual A. californica neuron expresses about 17,400 (~60%) genes(or transcripts) if we set the threshold of “expression” as at least one mapped read. However, in some neurons (for example, LPl1_2 and rMCC3) over 24,000 (> 83%) gene models have a least one read that maps within their genomic coordinates. This number is a function of both biological variation and sequencing depth75.

Transcriptomic Evaluation of the Characteristics of Identified Neurons

Previous RNA-seq studies comparing mammalian tissues found that the brain

(along with the kidneys and testis) expressed the most complex transcriptomes96. A majority of the genes were expressed, and mRNAs were found to possess unusually long ’ untranslated regions (UT s)97. It has been observed that, in general, cells of the nervous system express more genes compared to an average cell located in other parts of an organism2.

We generated saturation curves from single neurons as shown in Figure 3-1 to answer two questions. The first is a biological one: “how many genes does a single

Aplysia neuron express?” We wanted to know how many of the 8, genes defined in

52

the AplCal 3.0 genome are expressed in individual neurons and how this number varied among the different types of neurons, as well as between the same neuron isolated from different A. californica individuals. The second question was of a technical nature: how many reads are needed to detect all the genes expressed in a single neuron? This question has important implications on future work in the field of next-generation sequencing. Knowledge of how many sequencing reads are necessary to fully characterize the transcriptome of a neuron will enable the estimation of how much sequencing is needed on a per-cell basis. It can also give us some idea of how much of a transcriptome is “missing” from sequencing projects that do not reach saturation. We can determine how many individual neurons to sequence per flow cell to capture the entire complement of their transcriptome. As shown in Figure 3-1, the saturation curves resemble vertically shifted logarithmic functions, growing until reaching a horizontal asymptote representing the total number of genes expressed in a neuron. For example, from a single R2 cholinergic motor neuron (R25) we generated 10,244,420 reads and detected 11,445 unique transcripts in total. For Figure 3-1, we used an expression threshold of 1 transcript per million (TPM) or greater as the cutoff for determination of whether a gene was considered “expressed” in a neuron. We found that approximately

90% of these 11,445 transcripts were detectable using only 3 million randomly sampled reads; this is indicated by the dotted lines in Figure 3-1. 75% of the total transcripts

(8,547) were detected using just 10% of the total reads (1,024,442). Figure 3-1 shows that we are sequencing individual neurons sufficiently deeply to detect all their expressed transcripts (reaching sequencing saturation). See also Supplemental Figures

C-6 and C7 for modified versions of Figure 3-1 that raise the threshold of transcripts per

53

million (T s) necessary for a gene to be considered “expressed” in a neuron.

Supplemental Figure C-6 illustrates saturation curves using a 10 TPMs or greater expression cutoff and Supplemental Figure C-7 depicts saturation curves using a threshold of 100 TPMs or greater for expressed genes.

After determining that we reached saturation, we wanted to find which genes were expressed in some cells but not others and what were the abundances of these transcripts. Figure 3-2 shows the number of genes detected per cell at various TPM expression cutoffs (1-5, 5-10, 10-50, 50-100, 100-500, 500-1000, and 1000+). The lowest expression category (1-5 TPMs) indicates transcripts that are the most scarce within the neurons, some only represented as a few mRNA molecules (see Table 3-

3).The most abundant category (1000+ TPMs) is representative of the most highly expressed neuronal transcripts (for example, ribosomal RNAs, cytoskeletal components, and signaling molecules). The abundance of most transcripts falls somewhere in between these two extremes. Comparing across cell types, we see that the majority of transcripts that are missing from one cell compared to another were in the low abundance category (1-5 TPM, also see Supplemental Figure C-1). This is largely due to the stochastic nature of RNA sampling, as a rare transcript may be detected in one neuron’s transcriptome but not in another (despite being a biological replicate).

Another method we used to delineate between the stochastic technical variability inherently present in transcriptomes and relevant biological variability is to look at technical sequencing replicates. Table 3-4 displays the numbers of transcripts detected in eight neurons that exceeded three abundance levels (1, 3, and 5 TPM). Neurons

54

L11_1, L11_2, L11_3, L7_3, L7_5, R2_4, R2_5, and R2_6 were sequenced on two separate lanes of a single HiSeq3000 flow cell to determine the technical variability in gene measurements due to lane effects. We found that the number of genes detected at three levels of abundance (1, 3, and 5 TPM) did not appreciably change. Small differences in the total numbers of genes expressed above a certain threshold could be explained by minor variation in TPM value near the chosen cutoffs. For example, if gene

A has a TPM value of 2.998 in one lane and a value of 3.001 in the other lane, then this would result in gene A being included in the gene count in the latter lane, but not in the former. nother way of obser ing “lane effects” is to compare a scatterplot of the expression values of individual RNAs from one lane to the other. Figure 3-3 shows the near-perfect pairwise correlation of external spike-in abundances between sequencing lanes, indicating no obvious issues attributable to lane effects.

Shallow Transcriptomic Profiling of 96 Individual Neurons, Including VC Sensory Neurons

In addition to the 23 individual neurons that were deeply sequenced to saturation

(see Figure 3-1), we also performed shallow scRNA-seq on 96 neurons isolated from abdominal, pleural, pedal, and cerebral ganglia (see Supplemental Table M2).

Object 3-1. Primary Data Master Tables. Table M2. (Master Data Table 2) Shallow Sequencing of Single Identified Neurons with Annotated TPMs. This table contains the TPM expression values and bioinformatic annotation data of all genes from the 96 single neurons on which we performed shallow-coverage RNA-seq.

On average, we obtained 2.46 million reads per individual cell and 376,000 of these were uniquely mapped to the AplCal3.0 genome assembly. The number of average reads per cell put this scRNA-seq experiment on par with the sequencing level achieved by many of the mammalian scRNA-seq studies mentioned previously.

55

However, technical difficulties during sequencing library construction were likely the cause of the overall low unique read mapping rate (~15%). While we were able to identify some individual neurons, the majority have not been previously characterized

(or named) and were designated as members of their respective clusters or by their proximity to identified neurons. We were able to achieve successful sequencing of 14 individual ventrocaudal (VC) sensory neurons. These cells are thought to be nociceptors that contribute to the neural plasticity accompanying sensitization learning26. We found the most highly abundant transcript in VC neurons to be the neuropeptide sensorin. Sensorin-expressing sensory neurons from the various Aplysia central ganglia have receptive fields that span the entire surface of the body22. Other abundant transcripts were predicted to encode ribosomal components and cytoskeletal elements, similarly to other individual neurons we sequenced.

Insights Regarding Absolute Molecular Quantification of Messenger RNAs

A summary of our addition of known quantities of external RNA spike-ins to eight

(of 23, 35%) different single cell libraries are displayed in Table 3-5. As can be seen from Table 3-5, the read counts mapping to spike-ins was less than 1% compared to the read counts ascribed to native cellular RNA, except in L7_5 (2.31%). The range of RNA molecules detected in the two L7 neurons is from about 325 - 850 million molecules.

The numbers of RNA molecules detected in the three R2 neurons range from about 6 –

41 billion molecules. R2 is a larger neuron (and displays increased polyploidy10) compared to L7 so it appears that the larger the size of a neuron (or the number of sets of chromosomes), the greater its number of expressed mRNA molecules. Even comparing R2 neurons of approximately the same diameter (300 um) from animals weighing approximately the same amount (150 grams), we detected a nearly two-fold

56

difference (6 billion molecules in one R2 neuron to about 12 billion in the other) in the total number of RNA molecules detected. Similarly for L11 motor neurons, we detected

~4.5 billion molecules in L111 and about ~10 billion in L112, even though the neurons were approximately the same diameter and harvested from similarly sized animals. See also Supplemental Figure C-2 for information about the total numbers of RNA molecules detected in individual neurons and for a representation of how the number of molecules detected increases as cell diameter increases.

We also determined the dynamic range or dose response over which the external

RNA spike-ins were detected. The dynamic range is defined as the difference between the highest and lowest concentration of ERCC transcripts detected in the transcriptome.

Figure 3-4 is a representative dose-response curve we plotted to examine the dynamic range. From the dose-response data of detected spike ins we fit log-log regression models to interpolate the estimated absolute concentrations of endogenous RNAs within the dynamic range of detection (see methods). Relative normalized abundances from RNA-seq libraries were calibrated to absolute molecular abundances using the fitted regression models.

In R25, external RNA species ERCC-130 is present at 30,000 attomoles/uL (1 µL of 1:40,000 dilution was used, so 0.75 attomoles is the expected quantity present in the

RNA before cDNA synthesis and amplification). ERCC-84 is present at a concentration of 29.3 attomoles/uL (we used 1 µL of a 1:40,000 dilution, see Chapter 2 - Methods, meaning that we estimate to have spiked in 7.32×10-4 attomoles of this RNA species).

This indicates a conservative estimate of dynamic range of over 1000-fold. ERCC-69 is present at 1.83 attomoles/uL (1 µL of 1:40,000 dilution was used, so 4.58e-5 attomoles

57

is the expected quantity present in the RNA before cDNA synthesis and amplification). If this estimate is used as the lower limit of detection, the dynamic range of within this R2 neuron is over five orders of magnitude.

We calculated the number of RNA molecules in a neuron that corresponds to an expression value of 1.0 TPM, which we have used as a minimum threshold for considering whether a gene is “expressed” or not. If a gene exceeds this threshold, we consider it to be “detectably” expressed. In other words, we consider a TPM value of 1.0 to be the lower limit of detection and can be considered a measure of sensitivity. We calculated the number of RNA molecules corresponding to a TPM value of 1.0 for individual neurons. These results are found in Table 3-3. Individual neurons displayed correlation coefficients was near unity, indicating a strong linear relationship between the log2 of the TPM value and the log2 of the absolute concentration of transcripts for external RNA spike-ins.

Using our absolute molecular information, we constructed a histogram

(Supplemental Figure C-3) of the number of molecules annotated as ion channel-related transcripts (corresponding to their constituent subunits). We also calculated the relative proportions of the absolute number of ion channel transcript molecules present in the eight individual neurons that received external spike-ins (Supplemental Figure C-4).

Supplemental Figure C-4 shows that a higher proportion of the expressed cellular transcriptome correspond to ion channel-related molecules in L7 neurons compared to

L11 or R2 neurons.

58

Functional Biology of Identified Neurons

After acquiring the data and calculating the absolute numbers of cellular mRNA molecules, we examined the functional enrichment and potential biology influencing cellular identity in the single neurons.

Data transformation used for differential expression analysis

A common assumption among statistical tests and analyses is that the underlying data is homoscedastic – i.e. that the variables (in this case the genes) have similar variance98. Supplemental Figure C-5 compares the effects of several common data transformations on our scRNA-seq count data. The red line in Supplemental Figure C-5 panel A displays the variance trend for transcripts. The left portion of the line displays a distinct hump, indicating that the variance is greater for genes with low read counts compared to those with greater read counts. This means the variance is a function of the mean and the data are better described as heteroscedastic. To reduce the amount of heteroscedasticity, we employed DESeq2 to shrink the variance of genes with low read counts. This is done using the dispersion-mean trend observed for the entire dataset as a reference. Consequently, genes with low and highly variable read counts will be assigned more homogeneous read count estimates so that their variance resembles the variance observed for most of the genes (which results in a more stable overall variance). DESeq2’s vst function returns values that are both normalized for sequencing depth and display (in Supplemental Figure C-5 panel D) values adjusted to fit the experiment-wide trend of the variance-mean relationship.

After data transformation, we analyzed differentially expressed (DE) genes between individual neurons isolated from the CNS of A. californica. We tabulated the number of DE genes, including and excluding transcript isoforms. The results are

59

summarized in Table 3-6. For instance, comparing gene expression between the cholinergic neuron R2 and the motor neuron L7, we identified 1,077 (of a potential

28,999 A. californica genes) to be differentially expressed using a Benjamin-Hochberg adjusted (for multiple comparisons) false discovery rate (FDR) of less than 0.1. The

1,077 genes included 9 genes encoding “pre-synaptic” genes, including members of the synaptotagmin and syntaxin-binding protein families - Vesicle-associated membrane protein (VAMPs). We also found 47 ncRNA molecules that were significantly DE between R2 and L7. We discovered two cell cycle related genes, one upregulated in R2 and the other upregulated in L7.

From Table 3-6 we can surmise that LPl1 neurons are probably the most transcriptionally similar to R2 (since they expressed the fewest number of DE genes -

455) of all cell types examined. The rMCC neurons were overall the most transcriptionally distinct from the R2 neurons, tabulating 1164 non-isoform DE genes, followed by the L7 motor neurons (1077 non-isoform DE genes) and the lMCC neurons

(912 non-isoform DE genes).

R2 vs L7 differentially expressed ion channel-related genes

Eleven transcripts encoding ion channel-like proteins were DE when comparing neurons R2 to L7 (see Supplemental Table 3-S1).

Object 3-2. Tables of Differentially Expressed Genes in Single Identified Neurons. Table 3-S1 R2vL7. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and L7 neurons.

Six of these eleven are predicted to encode subunits of ionotropic glutamate receptors (IGluRs); of these six, three are predicted to form kainate-type receptors and one an AMPA-like receptor. Four of the six putative IGluR subunit transcripts are more

60

abundant in L7 than R2 neurons, including the AMPA receptor subunit and two of the kainate-type receptor subunit transcripts. The remaining five (of eleven) non-IGluR-like transcripts belong to several other families of membrane channels. These five transcripts were all more highly expressed in R2 neurons. One of these five transcripts is predicted to encode an anoctamin-like protein, two predicted as aquaporin-like channels, and the remaining two are predicted to be voltage gated potassium (Kv) channels, one related to the Shaker family and the other to the Ether-a-go-go (EAG) family. The anoctamin family of proteins serve at least two major functions: as Ca2+- dependent ion channels and/or Ca2+-dependent scramblases2. Aquaporins are membrane proteins that selectively allow the passage of water or other small uncharged molecules into or out of cells, depending on the osmotic gradient3. Kv channels play important roles in the regulation of various cellular processes, including modulation of the plasma membrane potential, apoptosis, cell differentiation and growth, and release of secretory molecules such as hormones or neurotransmitters1. The EAG-like gene is represented by a least five isoforms in the A. californica genome. The anoctamin-like transcript was expressed in greater abundance than any of the other four DE genes.

DE cell cycle and synaptic genes

Two significantly differentially expressed transcripts between R2 and L7 are predicted to encode proteins that regulate the cell cycle (cyclins). A cyclin D-like transcript was expressed more abundantly in L7 neurons, while the other cyclin B3-like transcript was found at higher levels in the R2 neurons. The presence of cell cycle- regulating transcripts in terminally differentiated neurons could be due to a variety of cellular needs, including DNA damage repair, epigenetic regulation, or even part of normal neuronal function.

61

Given that chemical transmission and cell signaling via the release of neurotransmitters from a presynaptic neuron onto a postsynaptic neuron or other effector cell (for instance, muscles or glands) is the primary function of neural cells, a natural question is how similar are the expression patterns of genes encoding proteins that comprise the exocytotic release machinery of signaling molecules from neuron to neuron? We found nine DE genes annotated as components of the process of exocytosis. Six of these nine transcripts, including two synaptotagmin-like, two syntaxin- like, a Rab-interacting molecule-binding protein 2-like (RIM-BP2), and a complexin-like gene, are expressed at greater levels in L7 neurons compared to R2. The three remaining transcripts (one synaptotagmin-like and two vesicle-associated membrane protein-like (VAMP) are found in higher abundance in R2 neurons compared to L7. This suggests that these neurons may use slightly different complements of proteins to coordinate and achieve secretion of endogenous cellular molecules.

We found R2 neurons preferentially expressed (when compared to L7 neurons) a mollusk-derived growth factor (MDGF) transcript which exhibits adenosine deaminase

(ADA) activity as well as stimulates cell proliferation in vitro99. This protein was originally found in granules within the atrial gland of A. californica and named atrial gland-specific antigen (AGSA) before its homology to insect-derived growth factor (IDGF) was known.

Upon the discovery of this shared sequence homology, AGSA was renamed MDGF to reflect its function as a growth factor100. ADA-domain containing proteins possess the enzymatic capability to converts the nucleoside adenosine to inosine and thus plays an important role in cellular purine metabolism and regulation.

62

Neuroactive peptides (neuropeptides) are short proteins produced and secreted by neurons to communicate with and act on other neural substrates. They are a type of neurotransmitter, but unlike other small-molecule chemical neurotransmitters (such as acetylcholine, glutamate, GABA, dopamine, serotonin (5-HT), etc.), neuropeptides (as their name implies) are comprised of unbranched chains of amino acids (polypeptides).

This feature means that transcriptomic studies can directly detect the presence of neuropeptides (or their predecessors – known as pre-propeptides) within a cell. Small- molecule neurotransmitters cannot be directly detected via RNA-seq, although their presence can be indirectly detected by identifying the transcripts that encode enzymes required for their synthesis. As neuropeptides are used for neuronal communication and cellular neuropeptide levels are constantly depleted via secretion, transcripts encoding neuropeptides could be expected to be among the most highly abundant mRNA species produced by neurons. Indeed, we find this to be the case in single A. californica neurons. We detected 28 (35 including isoforms) DE neuropeptide/secretory product transcripts upon comparison of R2 and L7 neurons. These differentially expressed transcripts and their log2 fold change between R2 and L7 neurons are displayed in

Table 3-7. A positive fold change indicates a transcript was more abundantly expressed in R2 neurons, while a negative fold change indicates higher expression in L7 neurons.

As seen in Table 3-7, 23 of the 28 neurosecretory molecules are upregulated in R2 and five are more abundant in L7 neurons.

Five RNA modification/editing/methylation genes are DE between L7 and R2

Differential gene expression analysis of R2 vs L7 (see Object 3-2. Supplemental

Table 3-S1) detected higher transcript levels of the RNA demethylase enzyme ALKBH5, as well as an RNA methyltransferase enzyme subunit known as methyltransferase-like

63

3 (METTL3). In addition to these two transcripts, five other transcripts predicted to have methyltransferase-activity were DE. Three of these five are predicted to be transfer RNA

(tRNA) methyltransferases.

Transcription factors and chromatin remodeling genes DE in R2 versus L7

Transcription factors (TFs) are proteins that regulate the transcription of DNA into messenger RNA by binding to specifically targeted sites in DNA. There are many known families of TFs, one of which is the basic helix-loop-helix (bHLH) family whose members are characterized by the presence of a bHLH domain, which utilizes its basic region to bind to DNA101 (which is acidic) while the helix-loop-helix motif binds other proteins102,103. In addition to bHLH domains, other protein domains found in bHLH TFs include Orange, PAS, and leucine zipper domains104, which confer additional capabilities. Another family of TFs are characterized by their basic leucine zipper (bZIP) domains, where (similarly to the bHLH domain) the basic region binds to specific DNA sequences and the leucine zipper region of the bZIP domain enables its dimerization with other proteins104,105. A third family is the C2H2 zinc finger (ZF) TFs which are identified by a sequence of two cysteine (C) and two histidine (H) amino acid residues that coordinate a zinc ion106. In addition to these three families (as well as others), the

Homeobox TF family is defined by the presence of a Homeobox domain in its constituents107,108. The homeobox domain is an integral feature of Hox genes which are a group of related genes that coordinate patterning of the body plan during Bilaterian embryogenesis109.

We discovered 36 TF-like transcripts (not including isoforms) to be DE when comparing R2 and L7 neurons (see Object 3-2. Supplemental Table 3-S1). 18 of the 36 are predicted to encode C2H2 ZF-like TF family members, three bZIP TFs, five bHLH-

64

like, and two homeobox-like TF. Three nuclear transcription factor Y subunit transcripts, one forkhead box (FOX) TF, three nuclear receptor family-like genes (one of which is the A. californica estrogen receptor), and slightly elevated levels in R2 of the transcription factor TFIIH subunit core complex which is predicted to have helicase and

ATPase activity to create the transcription bubble110.

DE analysis revealed five transcripts involved in chromatin remodeling. They include: 1) a barrier-to autointegration-like factor (BAF), 2) a chromodomain helicase

DNA-binding (Chd) transcript, 3) an RAD54-like gene (Rad54 is a motor protein that translocates along dsDNA in an ATP-dependent manner111), 4) a SWI/SNF-related matrix-associated actin-dependent regulator of chromatin-like (SMARC) transcript, and

5) a structural maintenance of chromosomes-like (SMC) transcript. SMC proteins represent a large family of ATPases that help organize higher-order chromatin structure and remodeling112. Furthermore, R2 neurons expressed moderately higher levels of the variant histone 3.3 compared to L7 neurons. Histone 3.3 has been shown to play a vital role in the maintenance of chromatin integrity during mammalian development113.

Overall comparison of DE genes between R2 shared with five other neuron classes

We also wanted to make cross-cell comparisons of the genes present in our DE transcript lists. Figure 3-5 depicts a Venn diagram comparing the DE transcripts

(relative to R2) that the other five cell types ((L7, LPl1, L11, rMCC, and lMCC)) we scrutinized have in common.

Examination of the four common DE transcripts relative to R2 shared by all five other cell types (center area in Figure 3-5) revealed them to be three predicted neuropeptides and a fourth transcript lacking a known annotation. These are listed in

65

Table 3-8. Querying the NCBI blast databases using the blastx algorithm with this gene sequence resulted in only one hit (e-value: 2e-25) – an uncharacterized protein found in the transcriptome of another gastropod mollusk (the freshwater snail Biomphalaria glabrata) that shares ~ 31% identity to the predicted Aplysia protein sequence.

Functional annotation of enriched genes using DAVID

For each set of DE genes, we performed a functional analysis using the web- based classification and clustering tool DAVID 6.8114. The DAVID algorithm implements an EASE score, which is a modified ersion of Fisher’s xact test, to determine if a particular gene group or pathway is “enriched” in a gi en list of annotated genes114. This resource was used to cluster genes with similar annotation categories and assign them an enrichment score (ES). Table 3-9 lists the functional annotation clusters with the three highest ES from each differentially expressed gene set using medium (the default) classification stringency. In the analysis of annotated DE genes between R2 and L7 neurons, the top cluster (ES = 0.68) consisted of four genes that encode Ras-like proteins. The Ras protein superfamily consists of small GTPase enzymes that are involved in cellular transduction and signaling. They mediate a variety of cellular functions including proliferation, membrane trafficking, adhesion, and cytoskeleton remodeling, among others. The gene cluster with the next highest enrichment score (ES

= 0.47) consists of five genes predicted to encode neuropeptides. It includes FMRF- amide, CP2, egg-laying hormone (ELH), neuropeptide R15, and LUQIN. Neuropeptides are secreted signaling molecules that serve to influence the activity of downstream effector cells. LUQIN is a decapeptide that is derived from cleavage of the L5-76 precursor transcript and is prominently expressed by the neurons within the dorsal left upper quadrant (LUQ) of the abdominal ganglion in A. californica115,116. The 3rd cluster

66

(ES = 0.25) consisted of glutamate and acetylcholine (ACh) receptor subunits, including

GluR1, GluR2, GluR5, an ACh receptor subunit α-type acr-16-like, and an ACh receptor beta-1-like subunit. Glutamate and acetylcholine are classical neurotransmitters, which are non-peptide signaling molecules that are also secreted by neurons to send chemical messages to other cells.

Enriched genes found between R2 and other neurons

Listed in Table 3-10 are the comparisons of annotated DE genes between the R2 and LPl1 neurons using DAVID resulted in one cluster (ES = 2.33) of eight predicted neuropeptide precursor transcripts. They included the neuropeptide R15, enterin, CP2,

ELH, R3-14 neuropeptide, L11 neuropeptide, PRQFVamide, and myomodulin-1.

Myomodulin is involved in controlling some of the muscles used during feeding behavior in Aplysia117. As was the case with R2 and LPl1, we detected one DE enriched gene cluster between R2 and L11 (ES = 1.96) that consisted of six neuropeptide precursor transcripts; this is shown in Table 3-11. They are: FMRF-amide, CP2, ELH, L11 neuropeptide, LUQIN, and a myoinhibitory (MIP)-related peptide precursor.

Neuropeptides play an important role in inter-neuronal signaling but also in the growth and development of other body structures within an organism. The annelid Platynereis dumerilii undergoes larval settlement due to the secretion of a MIP neuropeptide from the animal’s brain118. Table 3-12 lists the members of two groups of enriched genes found to be differentially expressed between R2 and lMCC neurons. The first gene group (ES = 2.76) consists of ten neuropeptides. The second gene group (ES = 0.35) is comprised of glutamate and acetylcholine (ACh) receptor subunits, similar to the 3rd gene cluster between R2 and L7 neurons.

67

Table 3-13 displays the three groups of enriched genes DE between R2 and rMCC neurons. Group 1 (ES = 1.73) is composed of nine neuropeptide precursor transcripts. The eight genes of group 2 (ES = 0.93) are glutamate and ACh receptor subunits, while the ten genes that comprise group 3 (ES = 0.36) have somewhat varied annotations. The five transcripts that encode innexin, neurexin, neural cell adhesion molecule (NCAM), and cadherin-related proteins are thought to mediate processes such as cellular adhesion, the formation of gap junctions, exocytosis, and synaptic binding.

Three of the ten transcripts in group 3 encode neurotransmitter transporter or receptor proteins. The serotonin transporter usually acts to transport the neurotransmitter serotonin (5-HT) from the synaptic cleft into presynaptic neurons. Vesicular glutamate transporter (VGLUT) proteins package glutamate present in the cellular cytoplasm into synaptic vesicles for release. The 5-HT receptor 2-like (5-HT2) transcript is predicted to function as an excitatory G-protein-coupled receptor which responds to 5-HT binding.

The remaining two (of ten) group 3 genes are predicted to be enzymes: adenylate cyclase 1 and aminopeptidase. Adenylate cyclases catalyze the conversion of ATP to cAMP and inorganic phosphate. Aminopeptidases, as their name suggests, catalyze the cleavage of amino acids from the N-terminus of proteins and peptides.

ine of the ten transcripts in the “group ” D genes between the s l CC and R2 vs rMCC neurons are the same. The only “extra” transcript not found in gene group 1 in the R2 vs rMCC neurons that is part of gene group 1 in the R2 vs lMCC neurons is Myomodulin-1. Table 3-14 lists enriched genes from comparison of the motor neuron L7 vs the right serotonergic interneuron MCC. Gene group 1 (ES = 1.14) contains furin & prohormone convertase-type predicted transcripts. The ferric uptake

68

regulator (FUR) protein is an iron-sensing transcriptional repressor that regulates the cellular metabolism of iron119. Prohormone convertases are enzymes that catalyze the cleavage of inactive precursor RNAs into biologically active products120.

Tables 3-15 through 3-22 (eight tables) contain additional information regarding the genes belonging to the functionally enriched clusters as determined by analysis with

DAVID114. The single neurons being contrasted in each table for differential expression testing are listed in the table names.

Principal component analysis reveals features of cell type-specific clustering

The two-dimensional PCA plot in Figure 3-6 represents 58% of the variance in a

500-dimensional dataset. The plotPCA function uses the 500 most variable genes in the dataset to calculate the principal components (PCs). The first PC alone accounts for

43% of the observed variance. The principal component analysis can delineate between each type of neuron. There is some intermingling of the lMCC and rMCC neurons; neither forms a distinct cluster of its own type, but instead each lMCC and rMCC neuron collected from the same animal cluster together within a larger cluster consisting solely of left and right MCC neurons. This phenomenon of individual neurons originating from the same nervous system clustering more closely together is also observed in the case of R2_5 and L7_5. The motor neurons L11 and L7 form their own distinct clusters with individual neurons in the L7 cluster showing the most separation along the first PC, while the L11 neurons separate most along the second PC. The R2 neurons separate into two clusters, one composed of individual R2 neurons sequenced together in the same sequencing run (R2_4, R2_5, and R2_6; R2 cluster 1) and the other of two R2s

(R2_1 and R2_2) sequenced together on the same run along with R2_3 that was sequenced separately. This group (R2_1, R2_2, and R2_3; R2 cluster 2) clusters more

69

closely with the LPl1 neurons. The R2 cluster 1 neurons were converted from RNA into cDNA and subsequently into sequencing libraries in parallel, while the sequencing libraries from the R2 cluster 2 neurons were constructed in three different batches.

Discussion of Absolute Quantitation of mRNAs from Identifiable Single Neurons

This study describes the comprehensive transcriptomic profiling of single, repeatedly identifiable, and functionally characterized neurons from the CNS of the gastropod mollusk A. californica – a well-established neuroscience model used extensively for the direct correlation of simple behaviors to neuron physiology. We obtained 915,448,442 reads from 23 individual neurons representing six different lineages including cholinergic, serotonergic, motor, and interneurons. These large, easily identifiable neurons are true biological replicates, the examination of which allowed us to perform a detailed assessment of the biological as well as the technical variation we uncovered. We quantified the absolute abundances of RNA species present in eight of these neurons using calibrated external spike-ins.

We applied cell-specific log-log regression models fitted using the relationship between the absolute concentrations of ERCC spike in molecules and their normalized gene expression values to all endogenous neuronal RNAs to estimate their absolute abundances. We also determined the limits of detection in each experiment using the

ERCC spike ins.

This study represents the first comprehensive profiling of the cellular transcriptomes of identified neurons. We achieved an estimated 5-6X coverage of the transcriptomes of these cells, based on saturation curves (Figure 3-1). Based on our results, achieving a sequencing depth of three million reads per neuron is enough to detect the extent of expressed genes.

70

In R25 we detected RNA species spanning the entire range of spike-in concentrations (from 1.43×10-2 attomoles/µL – 30,000 attomoles/µL), a 106-fold concentration range.

In addition to the identity and quantities of unique (non-isoform) transcripts that were differentially expressed between individual single neurons, we also recorded the transcript isoforms that were DE. The quantity of transcripts including isoforms versus the quantities of unique DE transcripts between individual cell types are depicted in

Figure 3-7. See also Object 3-3. Supplemental Tables 3-S1-S15 for the lists of these genes.

Object 3-3. Tables of Differentially Expressed Genes in Single Identified Neurons. Tables 3-S1-S15.

Rapidly Advancing Knowledge of Secretory Molecules from the Phylum Mollusca

Recent improvements in next-generation sequencing, computational biology, and mass spectroscopy have enhanced the discovery of novel and conserved secretory molecules and neuropeptides. Entire organismal complements of signaling molecules from a host of species (including the lophotrochozoan lineage) can now be rapidly detected and analyzed via transcriptomics121,122. This list of species has expanded to include several gastropods121,123-126121,124-127, bivalves such as Crassostrea gigas128,

Pinctata fucata, and Patinopecten yessoensis 128,129, and the coleoid cephalopod Sepia officinalis 130. Since the discovery of the excitatory peptide FMRFamide (isolated from the clam Macrocallista nimbosa in 1977131), molluscan families have served as useful models in the study of peptidergic signaling. After the initial detection of FMRFamide peptides in clams, a homologous peptide was identified in the phylum Annelida (a sister clade to Molluscs within the superphylum Lophotrochozoa), followed by other

71

homologues discovered in organisms of the Ecdysozoa superphylum132. Despite the example of FMRFamide homologues discovered throughout various phyla, in general, the neuropeptide complements of arthropods (Ecdysozoa) and molluscs

(Lophotrochozoa) have evolved relatively differently from each other133. The mollusc

Aplysia californica, with its large, easily accessible nervous system, has been an excellent model for many researchers to functionally characterize the effects of its endogenous neuropeptides on behavior. These secretory/signaling molecules have been implicated in many physiological processes, such as: 1) feeding behavior - mediated by buccalin134, adipokinetic hormone135, feeding circuit activating peptide136, enterin137, leucokinin138, PRQFVamide139, SCP140, and urotensin II141; 2) circulation – regulated by FMRFamide140 and SCP142 (in Aplysia kurodai – circulation is affected by enterin and NdWFamide143; 3) reproduction - controlled by ELH144, enticin145, seductin146, and temptin145; 4) mechanosensation and nociception (sensorin22); 5) locomotion (pedal peptide147,148and GdFFD149,150); and 6) neural circuit activity – modulated via FMRFamide151, sensorin152,153, and other neuropeptides. The collective activities of hormones, secretory molecules, and neuropeptides greatly influence intercellular communication. Evidence suggests that the same bioactive peptides can produce inhibition or excitation of targets cells, depending on the contents of the extracellular microenvironment and the availability of cell-surface receptors, which indicates the properties of malleability and plasticity within the signaling network151.

Even highly evolved forms of molluscs (for example Octopus species with enlarged and centralized nervous systems) utilize peptidergic signaling molecules and mechanisms. Vital biological processes such as reproduction are influenced by

72

allatostatin, feeding circuit-activating peptide, PRQFVamide, and FMRFamide154,155, while vascular and circulatory functions are mediated by cardioactive peptide, cephalotocin, FMRFamide, and octopressin 155,156.

We found neuropeptides to be differentially enriched across each neuron type, indicating high le els of their precursor’s transcription to be an important cellular priority.

In general, we found at least some neuropeptides to be differentially expressed between each cell type that we examined. To ascertain whether these peptide precursors were endogenous to the neurons identified by scRNA-seq or their detection was due to synaptic contamination during isolation of the neurons, we looked at previously performed in-situ hybridization (ISH) preparations to localize transcripts of interest.

Through ISH screening, we detected three peptides in R2 neurons – MIP, FMRFamide, and fullicin24. Therefore, we postulate that the high levels of other peptide precursors we discovered through our scRNA-seq of these identified neurons were most likely due to other synaptic inputs harvested during cell isolation. Differential expression testing was performed with the R2 neurons represented as a single group. This was done to maintain consistency in the analysis. Consequently, both technical variability and biological variability are co-mingled in the differential gene expression analysis. We attempted to distinguish which gene counts are affected to a greater degree by technical variability as we cannot distinguish the relative contribution of these effects.

However, we observed in Figure 3-6 that the first PC represents more technical variability compared to the 2nd as the R2 neurons are more separated along the x-axis compared to the y-axis. We examined the proportional contributions of the most variable genes to each of the principal components. We hypothesized that technical “batch

73

effects” are the most likely force dri ing the separation of the two clusters. To test this hypothesis, we conducted an ad hoc analysis aimed at detecting transcripts influenced most greatly by technical variability ex post facto. The R2 neurons were segregated into two groups: cluster 1 and cluster 2. Differential gene expression analysis was performed on these groups, resulting in 5,200 genes being called as

“differentially expressed” with an FDR of less than 0.1. The analysis was then repeated iteratively, discarding differentially expressed transcripts according to an increasing log2-fold change cutoff, as shown Table 3-23. This assumes that the transcripts displaying the largest log2 fold change differences between the R2 groups are the result of primarily technical factors (including libraries were prepared in different batches).

As shown in Figure 3-2, the majority of transcripts that dropped out from any particular transcriptome were of low abundance (TPM expression value less than 1.0, also see Supplemental Figure C-1). There are a few potential explanations for their presence in some identified neuronal transcriptomes but not others. One reason could be extracellular contamination resulting from the process of harvesting the neurons or potentially biological adhesion of smaller cells (other neurons or glial cells) to the larger

“isolated” neurons. Another possibly is intrinsic transcriptional noise or the possibility of distinct cellular “states” existing within the same neuron at different time points carrying out different biological processes. The most abundant genes were consistently found across all neurons of a cell type.

In summary, for the first time, we sequenced (to saturation) and quantified absolute RNA abundances from reliably identifiable individual neurons representing cholinergic and non-cholinergic motor neurons, and serotonergic interneurons. We

74

detected expected trends of external RNA spike-ins and validated the presence of genes from the literature as well as from in-situ hybridization experiments. This study helped elucidate stable genetic features and transcriptomic inventories from individual identified neurons.

75

Table 3-1. AplCal3.0 Gene models and categories Gene models Categories

27,579 Putative protein-coding genes (mRNAs)

1,270 Non-coding RNAs

102 Pseudogenes

48 Transfer RNAs (tRNAs)

28,999 Total number of gene models

76

Table 3-2. Twenty-three deeply sequenced single neurons with information regarding read counts (also see Table M1). Genes with Mean ± SD of Range of

Genes with no no reads in all genes with no genes with no Neuron reads neurons of reads per cell reads per cell

cell type type type

L11_1 13,685

L11_2 13,516 9,455 12,892 ± 1,230 2,210

L11_3 11,475

L7_1 9,096

L7_2 11,662

L7_3 11,672 5,119 12,587 ± 2,670 6,616

L7_4 15,712

L7_5 14,794

LPl1_1 16,007

LPl1_2 4,491 3,715 10,710 ± 5,813 11,516

LPl1_3 11,633

R2_1 11,007

R2_2 14,208

R2_3 10,637 4,753 12,541 ± 1,548 3,591 R2_4 12,148

R2_5 14,228

R2_6 13,020

lMCC1 12,349 5,248 10,831 ± 3,138 5,699

77

Table 3-2. Continued Genes with Mean ± SD of Range of

Genes with no no reads in all genes with no genes with no Neuron reads neurons of reads per cell reads per cell

cell type type type

lMCC2 12,921

lMCC3 7,222

rMCC1 13,343 2,892 10,047 ± 5,757 10,000

rMCC2 13,399

rMCC3 3,399

78

Table 3-3. The number of mRNA transcripts corresponding to 1 TPM in single neurons Neurons Number of RNA molecules in 1 TPM

L111 488

L73 334

Near L73 344

R61 306

R14 142

L2 75

L75 56

Near L75 32

R24 57

R15 301

R62 88

R31 44

R25 46

L112 277

R26 41

R32 214

L10 88

L113 571

L12 99

L13 483

LCQg 24

79

Table 3-3. Continued Neurons Number of RNA molecules in 1 TPM

RPlg 1,615

B4 201

80

Table 3-4. Minimal lane effects present indicating accurate transcript detection and reproducible quantification L11_1 L11_3 L11_4 L7_3 L7_5 R2_4 R2_5 R2_6

T1 >1 TPM 10,601 11,050 11,163 10,859 10,905 12,284 11,396 11,358

T2 >1 TPM 10,574 10,952 11,278 10,855 10,664 12,267 11,104 11,414

T1 >3 TPM 7,201 7,616 7,892 7,654 7,647 8,864 7,876 7,821

T2 >3 TPM 7,202 7,596 7,972 7,663 7,673 8,800 7,876 7,915

T1 >5 TPM 5,799 6,077 6,281 6,403 6,311 7,276 6,283 6,245

T2 >5 TPM 5,841 6,077 6,447 6,417 6,377 7,196 6,304 6,305

81

Table 3-5. ERCC external RNA spike-in counts detected in single neurons Cell L11_1 L11_2 L11_3 L7_3 L7_5 R2_4 R2_5 R2_6 ERCCs 52 49 56 66 61 50 54 49 detected % of ERCC 0.56% 0.47% 0.71% 0.85% 2.32% 0.36% 0.77% 0.32% counts RNA used for library 11.43 11.97 11.16 12.00 12.35 11.84 12.00 11.88 (ng) Total Number of RNA 4.577×109 1.019×1010 2.414×109 8.523×108 3.247×108 1.150×1010 5.936×109 4.121×1010 molecules detected

82

Table 3-6. Summary of single neuron differential gene expression analysis Neurons compared DE genes (no isoforms) DE genes (including

isoforms)

R2 vs L7 1077 1523

R2 vs LPl1 455 605

R2 vs L11 688 1051

R2 vs rMCC 1164 1672

R2 vs lMCC 912 1285

L7 vs LPl1 2084 3104

L7 vs L11 979 1440

L7 vs rMCC 2223 3310

L7 vs lMCC 2103 3086

LPl1 vs L11 2633 3971

LPl1 vs rMCC 138 190

LPl1 vs lMCC 165 223

L11 vs rMCC 2873 4330

L11 vs lMCC 2451 3698 rMCC vs lMCC 2 2

83

Table 3-7. Log2 fold change between R2 and L7 putative secretory molecule transcripts DE Transcript log2 fold change

Pedal peptide 2 42.72

Sp11 10.42

FVRF/SP4 9.67

Achatin 9.47

Theromacin 8.72

Egg-laying hormone (ELH) 8.53

SIS 7.88

Insulin 5/DIL 7.74

Clionin 7.49

Sp2 7.27

Copasetin 7.22

FIRFamide 2 7.11

CP2 6.97

Sp20 6.17

Seductin 5.13

Sp1 5.02

LUQIN 4.84

FMRFamide 4.50

Allatotropin 4.00

Sp9 2.94

Myomodulin-like 2.90

84

Table 3-7. Continued DE Transcript log2 fold change

Tolloid-1 2.45

CCN-amide 2.14

Sp23 -1.59

Sp10 -2.91

Preprogonadotropin -5.53

R-15 -5.65

NdWFamide 1 -7.39

85

Table 3-8. The four DE transcripts shared between R2 and the other five cell types. Gene ID Annotation

101856185 Unannotated

101864107 Copasetin

101845589 Sp2

100533533 Egg-laying hormone

86

Table 3-9. Functionally enriched DE gene clusters between R2 and L7 neurons Identifier Value

Gene Group 1 Enrichment Score: 0.68

ENTREZ_GENE_ID Gene Name

100533230 small G-protein (Rap)

100533240 ras-like GTP-binding protein RHO

100533229 small G-protein (Ras)

100533305 Rac

Gene Group 2 Enrichment Score: 0.47

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533232 FMRF-amide

100533251 neuropeptide CP2 precursor

100533239 LUQIN

100533533 egg-laying hormone (ELH)

Gene Group 3 Enrichment Score: 0.25

ENTREZ_GENE_ID Gene Name

100533396 GluR1

acetylcholine receptor subunit α-type acr- 106012547 16-like

100533314 GluR5

87

Table 3-9. Continued Identifier Value

100533313 GluR2

101845525 acetylcholine receptor subunit β-1-like

Gene Group 4 Enrichment Score: 0.17

ENTREZ_GENE_ID Gene Name

100533453 putative neurotransmitter transporter

100533255 chemosensory receptor C

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

100533308 adenylate cyclase

88

Table 3-10. Functionally enriched DE gene clusters between R2 and LPl1 neurons Identifier Value

Gene Group 1 Enrichment Score: 2.33

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533242 enterin precursor (ENPP)

100533251 neuropeptide CP2 precursor (CP2PP)

100533237 neuropeptide L11

100533533 egg-laying hormone (ELH)

100533236 R3-14 neuropeptide

100533301 PRQFVamide precursor

100533334 myomodulin

Table 3-11. Functionally enriched DE gene clusters between R2 and L11 neurons Identifier Value

Gene Group 1 Enrichment Score: 1.96

ENTREZ_GENE_ID Gene Name

100533232 FMRF-amide

100533251 neuropeptide CP2 precursor (CP2PP)

100533239 LUQIN

100533237 abdominal ganglion neuropeptide L11

100533402 MIP-related peptide precursor (MRP)

100533533 egg-laying hormone (ELH)

89

Table 3-12. Functionally enriched DE gene clusters between R2 and lMCC neurons Identifier Value

Gene Group 1 Enrichment Score: 2.76

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533334 myomodulin

100533242 enterin precursor (ENPP)

100533402 MIP-related peptide precursor (MRP)

100533239 LUQIN

100533237 neuropeptide L11

100533236 neuropeptide R3-14

100533533 egg-laying hormone (ELH)

100533301 PRQFVamide precursor

100533232 FMRF-amide

Gene Group 2 Enrichment Score: 0.35

100533395 GluR3

100533396 GluR4

100533314 GluR5

106012547 ACh receptor subunit α-type acr-16-like

100533249 nicotinic acetylcholine receptor α

subunit

101846050 neuronal acetylcholine receptor subunit

α-9-like

90

Table 3-13. Functionally enriched DE gene clusters between R2 and rMCC neurons Identifier Value

Gene Group 1 Enrichment Score: 1.73

ENTREZ_GENE_ID Gene Name

100533242 enterin precursor (ENPP)

101859008 neuropeptide R15

100533402 MIP-related peptide precursor (MRP)

100533239 LUQIN

100533237 neuropeptide L11

100533236 neuropeptide R3-14

100533533 egg-laying hormone (ELH)

100533301 PRQFVamide precursor

100533232 FMRF-amide

Gene Group 2 Enrichment Score: 0.93

100533312 GluR1

100533395 GluR3

100533396 GluR4

100533314 GluR5

100533315 GluR6

100533316 GluR7

100533249 nicotinic acetylcholine receptor α

subunit

106012547 Ch receptor subunit α-type acr-16-like

91

Table 3-13. Continued Identifier Value

Gene Group 3 Enrichment Score: 0.36

ENTREZ_GENE_ID Gene Name

100533385 aminopeptidase (APN)

100533308 Adenylate cyclase 1 (AC1)

100856796 Neurexin-1

100533404 Innexin-7

100533379 Innexin-8

100533488 cadherin-related protein (CAD1)

100533421 neural cell adhesion molecule-like

(NCAM)

100533265 serotonin transporter (SERT)

101848683 vesicular glutamate transporter 5-like

(VGLUT5)

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

92

Table 3-14. Functionally enriched DE gene clusters between L7 vs rMCC Identifier Value

Gene Group 1 Enrichment Score: 1.14

ENTREZ_GENE_ID Gene Name

100533343 FUR protein (FUR)

furin-like prohormone convertase 100533397 (afurin2)

100533362 prohormone convertase (PC2)

100533213 prohormone convertase 1 (PC1B)

Gene Group 2 Enrichment Score: 0.83

ENTREZ_GENE_ID Gene Name

nicotinic acetylcholine receptor α 100533249 subunit

100533316 GluR7

non-α nicotinic acetylcholine receptor 100533245 subunit

100533313 GluR2

100533244 NR2

100533312 GluR1

100533331 synaptobrevin

neuronal acetylcholine receptor subunit 101857921 α-10-like

93

Table 3-14. Continued Identifier Value

100533396 glutamate receptor 1

acetylcholine receptor subunit α-type acr- 106012547 16-like

neuronal acetylcholine receptor subunit 101846050 α-9-like

Gene Group 3 Enrichment Score: 0.75

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533232 FMRF-amide

100533239 LUQIN

100533402 MIP-related peptide precursor (MRP)

100533237 neuropeptide L11

100533236 R3-14 neuropeptide

100533334 myomodulin

100533301 PRQFVamide precursor

Gene Group 4 Enrichment Score: 0.59

ENTREZ_GENE_ID Gene Name

100533421 NCAM-related cell adhesion molecule

100533453 putative neurotransmitter transporter

100533255 chemosensory receptor C

100533331 synaptobrevin

94

Table 3-14. Continued Identifier Value

100533385 aminopeptidase (APN)

100533404 pannexin 7

vesicular glutamate transporter 2-like 101848683 (VGLUT2)

100533488 cadherin-related protein (cad1)

95

Table 3-15. Functionally enriched DE gene clusters between L7 vs L11 neurons Identifier Value

Gene Group 1 Enrichment Score: 0.88

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533232 FMRF-amide

100533239 LUQIN

100533237 neuropeptide L11

100533402 MIP-related peptide precursor (MRP)

Gene Group 2 Enrichment Score: 0.20

ENTREZ_GENE_ID Gene Name

100533396 GluR1

acetylcholine receptor subunit α-type acr- 106012547 16-like

100533313 GluR2

neuronal acetylcholine receptor subunit 101857921 α-10-like

Gene Group 3 Enrichment Score: 0.11

100533453 putative neurotransmitter transporter

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

100533216 pannexin 10

vesicular glutamate transporter 2-like 101848683 (VGLUT2)

96

Table 3-16. Functionally enriched DE gene clusters between L7 vs LPl11 neurons Identifier Value

Gene Group 1 Enrichment Score: 1.32

ENTREZ_GENE_ID Gene Name

100533321 NMDA-type glutamate receptor

nicotinic acetylcholine receptor α 100533249 subunit

100533316 GluR7

100533314 GluR5

100533313 GluR2

100533244 NR2

100533312 GluR1

100533331 synaptobrevin

100533215 pannexin 9

neuronal acetylcholine receptor subunit 101857921 α-10-like

100533396 glutamate receptor 1

acetylcholine receptor subunit α-type acr- 106012547 16-like

neuronal acetylcholine receptor subunit 101846050 α-9-like

Gene Group 2 Enrichment Score: 0.61

101859008 neuropeptide R15

97

Table 3-16. Continued Identifier Value

100533232 FMRF-amide

100533242 enterin precursor (ENPP)

100533239 LUQIN

100533237 neuropeptide L11

100533236 R3-14 neuropeptide

100533301 PRQFVamide precursor

100533334 myomodulin

98

Table 3-17. Functionally enriched DE gene clusters between L7 vs lMCC Identifier Value

Gene Group 1 Enrichment Score: 0.86

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533232 FMRF-amide

100533239 LUQIN

100533402 MIP-related peptide precursor (MRP)

100533237 neuropeptide L11

100533236 R3-14 neuropeptide

100533334 myomodulin

100533301 PRQFVamide precursor

Gene Group 2 Enrichment Score: 0.84

ENTREZ_GENE_ID Gene Name

nicotinic acetylcholine receptor α 100533249 subunit

100533316 GluR7

non-α nicotinic acetylcholine receptor 100533245 subunit

100533313 GluR2

100533312 GluR1

neuronal acetylcholine receptor subunit 101857921 α-10-like

99

Table 3-17. Continued Identifier Value

acetylcholine receptor subunit α-like 2- 101846927 like

100533396 glutamate receptor 1

acetylcholine receptor subunit α-type acr- 106012547 16-like

neuronal acetylcholine receptor subunit 101846050 α-9-like

Gene Group 3 Enrichment Score: 0.34

ENTREZ_GENE_ID Gene Name

100533421 NCAM-related cell adhesion molecule

100533255 chemosensory receptor C

100533265 serotonin transporter

100533385 aminopeptidase (APN)

vesicular glutamate transporter 2-like 101848683 (VGLUT2)

100

Table 3-18. Functionally enriched DE gene clusters between L11 vs LPl1 Identifier Value

Gene Group 1 Enrichment Score: 1.14

ENTREZ_GENE_ID Gene Name

100533343 FUR protein (FUR)

furin-like prohormone convertase 100533397 (afurin2)

100533362 prohormone convertase (PC2)

100533213 prohormone convertase 1 (PC1B)

Gene Group 2 Enrichment Score: 0.86

100533242 enterin precursor (ENPP)

100533334 myomodulin

101859008 neuropeptide R15

100533402 MIP-related peptide precursor (MRP)

100533239 LUQIN

100533237 neuropeptide L11

100533443 buccalin precursor

100533236 R3-14 neuropeptide

100533301 PRQFVamide precursor

100533232 FMRF-amide

Gene Group 3 Enrichment Score: 0.54

ENTREZ_GENE_ID Gene Name

100533321 NMDA-type glutamate receptor

101

Table 3-18. Continued Identifier Value

100533316 GluR7

100533314 GluR5

100533244 NR2

100533312 GluR1

100533331 synaptobrevin

100533215 pannexin 9

neuronal acetylcholine receptor subunit 101857921 α-10-like

100533395 glutamate receptor 2

Gene Group 4 Enrichment Score: 0.35

ENTREZ_GENE_ID Gene Name

100533344 presenilin 1-1

vesicular glutamate transporter 2-like 101848683 (VGLUT2)

100533404 pannexin 7

100533379 pannexin 8

100533216 pannexin 10

100533331 synaptobrevin

100533215 pannexin 9

100533421 NCAM-related cell adhesion molecule

100533488 cadherin-related protein (cad1)

102

Table 3-18. Continued Identifier Value

100533255 chemosensory receptor C

100856796 neurexin

100533208 VAMP/synaptobrevin binding protein

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

103

Table 3-19. Functionally enriched DE gene clusters between L11 vs lMCC Identifier Value

Gene Group 1 Enrichment Score: 0.74

ENTREZ_GENE_ID Gene Name

100533443 buccalin precursor

101859008 neuropeptide R15

100533242 enterin precursor (ENPP)

100533251 neuropeptide CP2 precursor (CP2PP)

100533402 MIP-related peptide precursor (MRP)

100533237 neuropeptide L11

100533236 R3-14 neuropeptide

100533301 PRQFVamide precursor

Gene Group 2 Enrichment Score: 0.31

ENTREZ_GENE_ID Gene Name

100533453 putative neurotransmitter transporter

100533404 pannexin 7

100533265 serotonin transporter

100533379 pannexin 8

100533216 pannexin 10

100533421 NCAM-related cell adhesion molecule

100533488 cadherin-related protein (cad1)

100856796 neurexin

100533255 chemosensory receptor C

104

Table 3-19. Continued Identifier Value

100533208 VAMP/synaptobrevin binding protein

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

Gene Group 3 Enrichment Score: 0.14

ENTREZ_GENE_ID Gene Name

acetylcholine receptor subunit α-like 2- 101846927 like

100533316 GluR7

nicotinic acetylcholine receptor α 100533249 subunit

100533312 GluR1

neuronal acetylcholine receptor subunit 101857921 α-10-like

105

Table 3-20. Functionally enriched DE gene clusters between L11 vs rMCC Identifier Value

Gene Group 1 Enrichment Score: 0.73

ENTREZ_GENE_ID Gene Name

100533251 neuropeptide CP2 precursor (CP2PP)

100533242 enterin precursor (ENPP)

101859008 neuropeptide R15

100533402 MIP-related peptide precursor (MRP)

100533237 neuropeptide L11

100533236 R3-14 neuropeptide

100533443 buccalin precursor

100533301 PRQFVamide precursor

100533232 FMRF-amide

Gene Group 2 Enrichment Score: 0.35

ENTREZ_GENE_ID Gene Name

100533344 presenilin 1-1

100533385 aminopeptidase (APN)

100533404 pannexin 7

100533265 serotonin transporter

100533379 pannexin 8

100533216 pannexin 10

100533215 pannexin 9

100533421 NCAM-related cell adhesion molecule

106

Table 3-20. Continued Identifier Value

100533488 cadherin-related protein (cad1)

100533255 chemosensory receptor C

100856796 neurexin

100533208 VAMP/synaptobrevin binding protein

100533276 5-hydroxytryptamine receptor 2 (5-HT2)

Gene Group 3 Enrichment Score: 0.25

ENTREZ_GENE_ID Gene Name

100533321 NMDA-type glutamate receptor

acetylcholine receptor subunit α-like 2- 101846927 like

100533316 GluR7

nicotinic acetylcholine receptor α 100533249 subunit

100533315 GluR6

100533215 pannexin 9

neuronal acetylcholine receptor subunit 101857921 α-10-like

100533312 GluR1

107

Table 3-21. Functionally enriched DE gene clusters between LPl1 vs lMCC Identifier Value

Gene Group 1 Enrichment Score: 1.41

ENTREZ_GENE_ID Gene Name

100533232 FMRF-amide

100533239 LUQIN

100533402 MIP-related peptide precursor (MRP)

100533334 myomodulin

108

Table 3-22. Functionally enriched DE gene clusters between LPl1 vs rMCC Identifier Value

Gene Group 1 Enrichment Score: 1.83

ENTREZ_GENE_ID Gene Name

101859008 neuropeptide R15

100533232 FMRF-amide

100533239 LUQIN

100533402 MIP-related peptide precursor (MRP)

100533334 myomodulin

109

Table 3-23. Genes varying between two groups of R2 neurons. R2 group 1vs R2 group 2 Number of DE genes log2 fold change

0 5200

1 5141

1.5 4711

2 3979

3 2842

4 1839

5 1040

6 480

7 185

10 26

12 22

15 21

110

Figure 3-1. Saturation curves from four neurons that received ERCC spike-ins. The data points represent the number of genes detected (>1 TPM) subsampling the transcriptomes at 10% intervals of the number of total reads. The lines are Loess (Local Regression) curves fit to the data points for each neuron. The dotted intersecting lines approximate the point where 90% of the total number of genes in a transcriptome are detected (90% saturation). This occurs at about two million reads in L111 and L112 and at about three million reads for neurons L75 and R25. The red dots and curve represent the L11_1 neuron. The brown dots and curve represent the L11_2 neuron. The blue dots and curve represent the L7_5 neuron. The green dots and curve represent the R2_5 neuron. R2_5 expressed the most transcripts – over 11,000.

111

Figure 3-2. Stacked bar chart of gene expression in single neurons. Colors and legend indicate levels of abundance. Comparison of the number of genes in several nonoverlapping categories of TPM expression (1-5, 5-10, 10-50, 50-100, 100- 500, 500-1000, and 1000+) between individual neurons. Most of the differences are due to the lowly expressed genes (1-5 TPM).

112

Figure 3-3. Pairwise correlation of external spike-in abundances between sequencing lanes, indicating no obvious issues attributable to lane effects.

113

Figure 3-4. Dose-response curve of external RNA spike-ins display a strong linear relationship between log2 of TPM normalized expression values and log2 of the absolute number of transcript molecules. The high correlation coefficient indicates good agreement between technical replicates of the same biological library.

114

Figure 3-5. Venn diagram comparison of DE gene relative to R2 that the other five neuron classes (L7, LPl1, L11, rMCC, lMCC) have in common. The least number of differentially expressed genes not shared between other neurons are found between LPl1 and R2 (65). The greatest number is 740 transcripts DE between R2 and L7. The lists of genes corresponding to each area of the diagram can be found in Object 3-2. Supplementary Table 3-S16.

115

Figure 3-6. Principal component analysis (PCA) of single neurons. The principal component analysis can delineate between each type of neuron. The first principal component (PC1, x-axis) represents 43% of the observed variance in the data set. The second principal component (PC2, y-axis) represents 16% of the observed variance in the data set. See text for more details

116

Figure 3-7. Diagram indicating numbers of differentially expressed genes (green = unique transcripts, red = counting transcript isoforms) between identified deeply sequenced single neurons. See also Object 3-3. Supplemental Tables 3-S1-S15 for the lists of these genes.

117

CHAPTER 4 DISCOVERY OF NOVEL MEMORY-RELATED GENES IN A CAMP-TREATED MOLLUSCAN NERVOUS SYSTEM

Introduction

Remarkable progress has been made during the last century in our efforts to understand the molecular and cellular biology of memory storage. In 1894, Santiago

amón y Cajal’s published his cellular connectionist approach of the ner ous system - a theory that posited changes in the strength of connections between nerve cells could mediate mental processes such as experience-driven learning and memory157,158. More than 50 years later, Jerzy Kornorski coined the term “synaptic plasticity” to describe the neural modifications inherent to Cajal’s theory159 and the idea was subsequently integrated by the neuropsychologist Donald Hebb into his own detailed models of learning160.

The exploration into the cellular mechanisms and genes underlying the processes of synaptic plasticity and ultimately memory required a model system in which one can study the effects of changes in the neuronal components of that system before and after (or during) the experimental manipulation which induces memory161. In the decades following Hebb’s postulations, se eral important breakthroughs occurred in the field due to the reductionist approach employed by many researchers investigating a

ariety of “simple” forms of learning using insightful model systems. These include the eye-blink response and flexion reflex of the rabbit and cat, respectively162,163, as well as a collection of invertebrate models. Among them were: the gill-withdrawal reflex of

Aplysia californica164, the tail-flip escape reflex of the crayfish165, olfactory discrimination learning in the fly Drosophila melanogaster166, responses to visual and rotational stimuli in the nudibranch Hermissenda crassicornis167, food-aversion learning in the garden

118

Limax maximus168, and color-sugar water association in the honeybee Apis mellifera169. These studies were intended to identify the neural circuitry modified by learning and memory storage and elucidate the cellular mechanisms underlying these changes170.

The experimental tractability of these model systems enabled the first systematic investigations into the neural mechanisms of learning and memory. A major advantage of studying invertebrate nervous systems (particularly those of gastropod mollusks) is that circuit analyses can be conducted using repeatedly identifiable individual neurons with known electrical characteristics and stereotyped synaptic connections and that these circuits can be linked with observable behaviors171. A wide variety of learning paradigms have been developed in a number of different gastropod species to scrutinize both associative and non-associative forms of learning. These studies provide a basis for cross-species comparisons regarding the fundamental neural mechanisms driving memory formation derived from different types of learning (e.g., associative versus non-associative, classical versus operant conditioning)172. Gastropod mollusks have facilitated the comprehension of mechanisms of synaptic plasticity as well as other cellular (non-synaptic) mechanisms of memory formation and consolidation, such as the enhancement of cellular excitability and the epigenetic modification of DNA. It is evident that insights gleaned from research on gastropod mollusks have been and will continue to be of importance in the ongoing endeavor to decipher the neural mechanisms of memory formation.

Studies on the Aplysia gill-withdrawal reflex began with delineating the cellular biology underlying two forms of non-associative learning exhibited by the reflex –

119

habituation and sensitization. Habituation, described behaviorally by the gradual decrease in withdrawal response to an innocuous touch, was found to be mediated by a decrease in transmitter release from the presynaptic sensory neurons onto the postsynaptic motor neurons of the circuit. Conversely, studies of sensitization, the observed increase in defensive withdrawal upon presentation of a noxious stimulus, revealed an increase in the release of the neurotransmitter glutamate by the presynaptic neurons. This was found to be caused by activation of modulatory interneurons that released serotonin (5-HT), which bound to cellular receptors present on the presynaptic sensory neurons. 5-HT binding caused levels of cyclic adenosine monophosphate

(cAMP) to rise within the presynaptic sensory neurons52.

Discovery of the molecule cAMP was first reported by Earl Sutherland in 1958173 during his research into the mechanisms of hormone action174. cAMP was described as a “second” messenger that acted intracellularly in response to hormone binding (the

“first” messenger) to receptors in the cellular membrane. runelli et al.52 also directly demonstrated cAMP acted as the second messenger of 5-HT in the gill-withdrawal reflex by injecting cAMP into sensory neurons. As predicted, this led to an increase in transmitter release from the sensory neurons. It has been established that short-term memory (lasting only minutes to hours), involves the covalent modification of existing neural proteins while long-lasting forms of memory (lasting days or longer) require gene transcription and new protein translation15.

The biochemical reactions resulting from learning induce a complex network of including modifications of existing cellular proteins/processes (associated with short- term memory), and new transcription, translation, synaptic growth and plasticity

120

(hallmarks of LTM). It is essential to further characterize the genes involved in the induction and consolidation of LTM to better illuminate the mechanisms of memory formation.

The cAMP signaling pathway is one of the most fundamentally important and conserved signal transduction networks175,176. Pathway regulation is typically achieved by an extracellular ligand binding to its cognate membrane-bound G-protein coupled receptor (GPCR). The heterotrimeric G-protein complex dissociates, releasing a subunit that can activate (or inhibit, depending on the type of GPCR) a large family of proteins known as adenylyl cyclases (AC). ACs are enzymes that catalyze the conversion of adenosine triphosphate (ATP) into cAMP. In the present study, we treated the central nervous system of the mollusk Aplysia californica with 8-bromo cAMP (8-Br cAMP), a membrane permeable derivative of the cellular cyclic nucleotide second messenger used in physiological signaling, in lieu of performing behavioral learning paradigms on intact animals.

We treated the central ring ganglia of animals over the course of two hours, collecting whole ganglia at three time points (30 minutes, 60 minutes, and 120 minutes post-treatment) to investigate the complement of genes affected by activation of the cyclic adenosine monophosphate signaling pathway. These time points were selected because they correspond with the early stages of transcriptional activation that underlies the synaptic plasticity accompanying the formation of long-term memory

(LTM). Therefore, we wanted to examine genes known to mediate long-term behavioral sensitization. The effects of 8-Br cAMP treatment were measured in triplicate for the abdominal, cerebral, and buccal ganglia using high-throughput next-generation

121

sequencing (RNA-seq). We also quantified the effects of 8-Br cAMP treatment on left and right pleural ganglia, though we had fewer than three biological replicates (see results). One reason we were interested in performing quantitative RNA-seq on the pleural ganglia is that they are the location of the cell bodies (somata) of the ventral- caudal (VC) nociceptor neurons33,177, which are a cluster of sensory neurons that innervate the tail and are thought to contribute to the neural plasticity accompanying behavioral sensitization of the tail-elicited siphon withdrawal reflex (T-SWR)26. In addition to this cluster of about 200 neurons33, the pleural ganglia is also comprised of other neural somata, including those of inhibitory interneurons which modulate the

VCs178,179, motor neurons directing mucus secretion43 and discharge of the opaline gland180, as well as several other yet uncharacterized cell types and clusters181,182.

Many studies investigating the formation and consolidation of short-term and long-term memory have used experimental paradigms that globally block either transcription or translation or both183. For example, the antibiotic anisomycin blocks general eukaryotic protein synthesis. As these treatments produce such drastic effects on cells, it was sometimes unclear whether the effects on behavior were due to the blocking drugs themselves. Literature describing the cellular locations and properties of Aplysia neurons involved in the siphon withdrawal reflex described approximately 24 (LE) sensory neurons that innervate the siphon skin and 6-7 LFs motor neurons that produce contractions in the gill.

RNA-seq data analyses revealed temporal expression patterns of known cAMP signaling-related genes as well as the expression dynamics of many transcripts not previously associated with canonical cAMP signaling pathways, including several

122

uncharacterized transcripts. Our goal was to determine which genetic components of signaling cascades were activated in A. californica nervous systems upon treatment with 8-Br cAMP. This study advances our knowledge of the genetic repertoire of the components of a vital signaling cascade and important model for the cellular biology of memory storage.

Results of RNA-seq Reveal Canonical and Novel Gene Expression

To confirm the activation of the cAMP-dependent signaling pathway within the

Aplysia ganglia, we examined the literature for specific transcripts previously observed to be affected by either long-term sensitization behavioral training paradigms or by experiments designed to mimic this learning in Aplysia.

RNA-seq of Established cAMP-Dependent Genes

These genes include the transcription factors CREB1, CREB2, C/EBP, and EGR as well as other genes encoding Tolloid/Bone morphogenetic protein 1 (aka BMP-1,

TBL-1, or TLD-1), calmodulin, Ap-Uch, and a reductase-related protein182. CREB1 is constitutively expressed in neurons rather than being induced by neuronal excitation or activity184. In A. californica sensory neurons the cellular levels of CREB1 mRNA, as measured by reverse-transcriptase polymerase chain reaction (RT-PCR), were reported to be unaffected by 5-HT exposure both in vivo and in vitro185. However it was also reported that CREB1 mRNA levels measured in the pleural ganglion as a whole increased 1 hour after both sensitization training and in vivo 5-HT treatment186.

Surprisingly, we did not detect any reads mapping to the CREB1 gene in any FSW- incubated (control) ganglia and only minimal expression levels were observed in 8-Br cAMP-treated ganglia (see Table M3).

123

Object 4-1. Primary Data Master Tables. Table M3. (Master Data Table 3) Individual Ganglia treated with 8-Br cAMP with Annotated TPMs. This table contains the TPM expression values and annotation data of all genes from the 118 ganglia on which we performed RNA-seq after plasticity testing (incubation in 8-Br cAMP or FSW for 0.5, 1, or 2 hours, or instant lysis (also see Chapter 2 - Methods).

Bartsch and colleagues initially cloned a transcription factor that represses the transcriptional effects of CREB1 in A. californica, designating it ApCREB2185. They examined sensory neurons isolated from the pleural ganglion (VC neurons) using in vitro cell culture and found ApCREB2 mRNA expression was unchanged in sensory neurons upon exposure to 5-HT or other chemicals that increase cellular cAMP levels

(including forskolin, 8-Br cAMP, and others)185. Subsequent studies found no significant changes in ApCREB2 mRNA levels after treating the pleural ganglia with brief pulses of

5-HT, although both observed a non-statistically significant trend of increased

ApCREB2 mRNA levels in pleural ganglia that were immediately collected and lysed after 5-HT treatment and also 1 hour post-treatment187,188. In contrast, we found 8-Br cAMP treatment increased the abundance of ApCREB2 mRNA in the left and right pleural ganglia in a time-dependent manner as seen in Figure 4-1. We found a statistically significant difference in ApCREB2 levels both 1-hour and 2-hour 8-Br cAMP- treated right pleural ganglia vs controls (log2 fold change (LFC) ± standard error (SE),

Benjamini-Hochberg adjusted p-value or false discovery rate (p-adj or FDR) (1-hour

LFC = 1.26 ± 0.38 SE, p-adj = 0.013, 2-hour LFC = 1.58 ± 0.41 SE, p-adj < 0.003) and when comparing the 2-hour treated ganglia to the 1-hour treatment group (LFC = 1.18 ±

0.27 SE, p-adj = 0.043), as well as comparing the 2-hour treated ganglia with the 30- minute incubation sample (LFC = 2.16 ± 0.36 SE, p-adj < 0.001). Upon examination of the left pleural ganglia, we found a nearly statistically significant increase (LFC = 0.98 ±

124

0.29 SE, p-adj = 0.145) in ApCREB2 mRNA expression comparing 2-hour 8-Br cAMP- treated ganglia with the 0.5-hour incubated ganglia. Considering the original results of

Bartsch et al.185, that ApCREB2 mRNA levels were unchanged in sensory neurons exposed to 8-Br cAMP, we postulate our observed increased levels of ApCREB2 mRNA might be attributable to the variety of other cells present in the pleural ganglia (see

Introduction for brief description).

We observed dynamic expression of ApCREB2 mRNA levels in the abdominal ganglia after 8-Br cAMP treatment (Figure 4-1). We report a statistically significant upregulation (LFC = 0.76 ± 0.31 SE, p-adj = 0.090) between AG exposed to treatment for 30 minutes versus controls. None of the other comparisons we performed reached the significance threshold (although 2-hour treated ganglia contrasted with 0.5-hour treated ganglia (LFC = 0.77 ± 0.23 SE, p-adj = 0.104) was close), but we did find

ApCREB2 mRNA expression was elevated in AG by exposure to 8-Br cAMP (see

Figure 4-1).

ApC/EBP (CCAAT enhancer-binding protein) is a transcription factor activated downstream of CREB1 that binds to an eight-nucleotide DNA motif. This binding was shown to be required for induction of long-term facilitation (LTF), but not short-term, in

Aplysia189. LTF is involved in long-term behavioral sensitization in the sensorimotor synaptic connections in the abdominal ganglia190. Figure 4-2 displays boxplots summarizing the normalized expression (TPMs) levels of ApC/EBP mRNA in abdominal

(upper panel) and left and right pleural ganglia (lower panel). Upon exposure of the ganglia to 8-Br cAMP, we initially (after 30 minutes of treatment) observed a decrease in

ApC/EBP mRNA in the abdominal (LFC = -0.74 ± 1.37) and right pleural ganglia (LFC =

125

-2.43 ± 2.0). After 1-hour this trend was reversed, and we saw an increase (AG: LFC =

3.48 ± 1.65) (RPlG: LFC = 0.73 ± 1.52) in ApC/EBP mRNA levels. However, neither these increases nor initial decreases in abundance reached statistical significance.

We found high levels of a tolloid/BMP-1-like (apTBL-1) mRNA in the right and left pleural ganglia we examined (Figure 4-3), but the differences between treated and control ganglia were not significant enough to qualify as differentially expressed. We did detect a significant downregulation of TBL-1 in 0.5-hr treated abdominal ganglia (LFC =

-0.92 ± 0.37, p-adj < 0.080) The abundance of this TBL-1 mRNA was highest in pleural ganglia exposed to 8-Br cAMP for at least 1-hour.

A tolloid/BMP-1-like protein (apTBL-1) was identified in pedal-pleural ganglia by

Liu and colleagues after behavioral sensitization training and 5-HT treatment191.

Tolloid/Bone morphogenetic protein 1 (BMP-1) mRNA levels were increased significantly in sensory neurons from isolated pleural-pedal ganglia by serotonin treatments for 1.5 hours as well as by long-term sensitization training of Aplysia191. The study’s authors hypothesized T -1 might modulate the morphology and neurotransmitter release between sensorimotor synaptic connections associated with long-term sensitization.

Multiple studies of synaptic plasticity and memory processes have investigated the role of IEGs that function as inducible transcriptional regulators192,193. In particular, the Early Growth Response (EGR) gene family is known to play a critical role in early- phase LTP in the CA1 region of the mouse hippocampus194 and is also involved in modulation of the regulatory signaling pathways in songbird memory195. ApEGR

(RefSeq ID: NM_001281796.1) was previously demonstrated to be upregulated in left

126

and right pleural ganglia by long-term sensitization training196. We observed significant differentially expression in the form of downregulation 30 minutes after 8-Br cAMP exposure in the abdominal ganglia (LFC = -2.26 ± 0.48, p-adj < 0.001), followed by robust statistically-significant upregulation at the 1-hour (LFC = 2.37 ± 0.47, p-adj <

0.002) and 2-hour time points (LFC = 3.19 ± 0.43, p-adj < 0.001) compared to 0.5 hour- treatment (see Figure 4-4 upper panel). In the right pleural ganglia, we found a decrease in ApEGR expression 30 minutes into 8-Br cAMP exposure, although it was not significant (LFC = -1.79 ± 0.74, p-adj = 0.153). After 1-hour this trend had reversed, and we saw a significant increase in 1-hour versus 0.5-hour (LFC = 3.31 ± 0.66, p-adj <

0.009), and 2-hour versus 0.5-hour ganglia (LFC = 3.14 ± 0.56, p-adj < 0.001) expression levels in the right pleural ganglia (see Figure 4-4 lower panel).

Calmodulin (calcium-modulated protein) (RefSeq ID: NM_001204580.1) is a

Ca2+-binding protein that is expressed by all eukaryotic cells. It regulates a wide variety of enzymes and mediates many physiological cellular processes197. Calmodulin mRNA has been previously identified to be upregulated in Aplysia pedal-pleural ganglia in response to 5-HT exposure198. We detected calmodulin to be DE in right pleural ganglia treated with 8-Br cAMP for 2 hours compared to controls (LFC = 1.15 ± 0.44, p-adj =

0.058) (see Figure 4-5 lower panel).

Zwartjes and colleagues198 examined the effects of 5-HT on the mRNA levels of transcripts of interest in the pedal-pleural ganglia using in vitro translation. They identified a novel gene product (designated protein-3, RefSeq ID: NM_001204605.1) which was upregulated in pleural sensory neurons by 5-HT treatment as well as by sensitization training and speculated it plays a role in the generation of long-term

127

sensitization. They also found protein-3 to display sequence homology to C. elegans and Arabidopsis thaliana dihydroflavonol 4-reductase domain-containing proteins.

Herdegen et al.182 also examined the m expression of this “reductase-related” transcript from pleural ganglia of sensitized Aplysia, however, in contrast to Zwartjes et al., they did not find evidence that it was upregulated. We examined expression levels of the reductase-related mRNA in abdominal and pleural ganglia and found it to be significantly differentially expressed and positively upregulated at each time point we examined (see Figure 4-6) (AG 0.5-hr: LFC = 1.57 ± 0.31, padj < 0.001, AG 1-hr: LFC =

1.15 ± 0.30, padj < 0.004, AG 2-hr: LFC = 1.06 ± 0.30, padj = 0.010, RPlG 0.5-hr: LFC =

1.40 ± 0.47, padj = 0.072, RPlG 1-hr: LFC = 1.45 ± 0.35, padj < 0.001, RPlG 2-hr: LFC

= 2.24 ± 0.45, padj < 0.001).

Ap-Uch (ubiquitin C-terminal hydrolase) (RefSeq ID: XM_013080847.1) is upregulated by 5-HT and required for long-term synaptic facilitation199. It enables protein degradation via the ubiquitin-proteasome pathway and was demonstrated specifically to degrade the regulatory subunit of protein kinase A (PKA), freeing the catalytic subunits to phosphorylate protein targets200-202. Mohamed and colleagues quantified levels of Ap-

Uch mRNA in pleural ganglia using qRT-PCR following repeated pulses of 5-HT, initially finding a decrease in expression at 1-hour and 2-hour time points (although not statistically significant) before significantly increasing to over 150% of vehicle-treated controls after 5 hours187. We observed significant decreases in Ap-Uch at 0.5 and 2- hour exposure in AG and 0.5-hr RPlG, with the 1-hr RPlG condition nearly reaching significance (p-adj < 0.109) (see Figure 4-7).

128

Table 4-1 shows the comparisons between 8-Br cAMP-treated and FSW- incubated abdominal ganglia and right pleural ganglia in which they were detected as significantly differentially expressed.

Novel Differentially Expressed Genes in Response to 8-Br cAMP Treatment

We identified several genes that were not previously described in the literature to be regulated by LTS training, 5-HT treatment, or 8-Br cAMP exposure. Some of the most abundant differentially expressed transcripts were predicted to encode neuropeptide precursors. Neuropeptides that are known to be involved in basic physiological processes like vascular function (CCAP203, SCP204,205), reproduction

(ELH204, seductin146,206), and feeding (buccalin134, insulin207, SCP208, and neuropeptide

Y209). Multiple neuropeptide precursor transcripts were detected to be differentially expressed across 8-Br cAMP-treatment time points in both abdominal and right pleural ganglia (see Supplemental Tables 4-S1-S6).

Object 4-2. Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Supplemental Tables 4-S1-S6.

However, determining the roles these precursor transcripts play within individual cells in response to 8-Br cAMP treatment will require further assessment as neuropeptide functions can vary spatially and temporally even within a nervous system151,203. Despite our limited ability to accurately speculate about the precise function of each differentially expressed peptide, the general changes in the expression patterns of these secreted signaling molecules suggest that the 8-Br cAMP-treatment alters the complement of secretory molecules available for signaling between cells.

Perhaps these specific secretory molecules are involved in establishing or maintaining new neural connections or supporting synaptogenesis210.

129

Functional Enrichment Analysis of AG DE Genes

For all the differentially expressed genes detected using DESeq2 (FDR < 0.1), we performed a gene ontology (GO) enrichment analysis using the topGO package

(vers 2.36) in R211. We tested for enriched GO terms from each of the three functional categories (molecular function, biological process, and cellular component) using two types of test statistics: the “classic” Fisher’s exact test and a Kolmogoro -Smirnov like test performed with both “classic” and “elim” algorithms. We also examined how the significant GO terms were distributed within the GO hierarchies. Figure 4-8 shows the induced subgraph of the 10 most significant molecular function GO terms as identified by the “classic” algorithm using Fisher’s exact test.

AG Molecular Function GO

A summary of the ten most significant molecular function GO terms from the 30- minute treatment of 8-Br cAMP abdominal ganglia can be viewed in Table 4-2. From this, we see adenylate cyclase activity, which is known to result from increased cellular levels of cAMP. We also see an enrichment of transcripts exhibiting thiol-dependent ubiquitin-specific protease activity (GO:0004843) (see previous Ap-Uch discussion). We also find “ferroxidase acti ity” (GO:0004322), which is the oxidation/reduction (redox) reaction of ferric iron (Fe3+) to the ferrous state (Fe2+) or vice-versa from ferrous to ferric forms212. Iron is an important cellular metabolite, catalyzing enzymatic redox reactions and strongly linked to synaptic activity and plasticity213. Another transcript annotated with “ferroxidase acti ity” is soma ferritin, an iron binding and sequestration protein214.

This transcript is highly abundant and mRNA levels are significantly upregulated in abdominal and right pleural ganglia both 0.5-hr and 1-hr after 8-Br cAMP exposure.

Ferroxidase–knockout mice displayed increased brain-iron concentrations along with

130

learning and memory deficiencies215. Human neurodegenerative diseases such as

arkinson’s and lzheimer’s diseases may be partly attributable to increased iron accumulation within the brain216 and increased generation of reactive oxygen species

(ROS)217,218. Additionally, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is another metabolic enzyme that has recently come under scrutiny for its association with

lzheimer’s disease-proteins, including amyloid-beta protein precursor219. GAPDH is most known for its role in glycolysis220, but is now implicated in an impairment of function due to oxidative-modification in lzheimer’s221. We discovered a GAPDH-like transcript to be significantly upregulated at 0.5-hr and 2-hr time points in the abdominal ganglia, as well as all 0.5-, 1-, and 2-hr conditions in the right pleural ganglia (see below). After 1-hour of 8-Br cAMP (see Table 4-3) treatment we see transcripts involved in oxidoreductase (GO:0016491) and ferroxidase activity (GO:0004322) are still enriched along with “ferric iron binding” (GO:0008 ). dditionally, we find the terms

“calcium-dependent phospholipid binding” (GO:0005544), “ binding”, and

“structural constituent of cytoskeleton” (GO:0005200). After 2 hours of 8-Br cAMP- treatment in the abdominal ganglia (see Table 4-4), we find GTP binding (GO:0005525) and GTPase activity (GO:0003924) to be among the ten most enriched terms. We find transcripts associated with the Ras/Rho superfamily significantly downregulated

(RefSeqID: XM_013091486.1) (LFC = -1.07 ± 0.42, p-adj < 0.099). Ras and Rho

GTPases are recognized as important regulators of gene expression, cellular growth and cytoskeletal organization222. It is also known that PKA phosphorylates RhoA, thereby inhibiting its activity223. This would be consistent with cellular PKA activation due to sustained levels of 8-Br cAMP in this study.

131

AG Biological Process GO

We also performed an enrichment analysis for the biological process GO terms, in a similar manner to that for the molecular function ontology. After 30 minutes of 8-Br cAMP treatment the top 10 biological process (BP) terms are displayed in Table 4-5.

We can see that the overarching theme of this list relates to the biosynthesis and metabolism of proteins. 47 of the 794 (just under 6%) significantly DE transcripts encode ribosomal proteins, which suggests the AG is preparing to increase the number of ribosomes available to translate the newly synthesized mRNAs. There is also a ~1.9 log2 fold increase in mitochondrial enzyme glutamate dehydroxylase (XM_013085387.1) which is responsible for converting glutamate into alpha ketoglutarate. This molecule is a key intermediate in the Krebs cycle (TCA cycle) which is an essential energy- generating (ATP synthesizing) pathway in all aerobic organisms. It becomes slightly more upregulated after 60 minutes of treatment (approximately 2.2 log2-fold (see Figure

4-9) and Table 4-6) and is also detected as significantly DE. After 2 hours of 8-Br cAMP incubation (see Table 4-7), we find enriched terms such as “intracellular signal transduction” (GO:00 5556) and “phosphate-containing compound metabolic process”

(GO:0006796). These terms are annotated to a highly abundant GAPDH-like transcript mentioned earlier (LFC = 1.33 ± 0.39, p-adj < 0.014) and discussed in more detail subsequently and an adenylate cyclase (AC)-encoding transcript (NM_001204606.1) that is significantly downregulated (LFC = -3.35 ± 0.62, p-adj <0.001). This downregulation of an enzyme that produces cAMP (from ATP) makes sense considering 8-Br cAMP is resistant to cellular degradation, which presumably results in the cells being inundated by high levels of cAMP. Decreasing transcription of AC could be an effort to repress further production of cAMP.

132

AG Cellular Component GO

The final third of the GO ontology deals with cellular components (CC). Half an hour (30 minutes) of 8-Br cAMP exposure to the A. californica abdominal ganglion produced the 10 cellular component terms displayed in Table 4-8. We find

(GO:00 56 0) “microtubule cytoskeleton” and “ribonucleoprotein complex”

(GO:1990904) to be among the enriched terms. After 1-hour (and still into the 2-hour treatment) of 8-Br c , we continue to see “microtubule cytoskeleton” is among the

10 most significant GO terms. Tubulin and actin are ubiquitous and highly abundant proteins that polymerize to form microtubules and microfilaments, respectively. Along with intermediate filaments, microtubules and microfilaments are the primary components of the cellular cytoskeleton224. The cytoskeleton is an organized and dynamic network of interlinking proteins connecting the nucleus to the cellular membrane225. Previous experimental evidence suggests that learning and memory involve the restructuring of the neuronal cytoskeleton226. At the 1-hour time point in AG we found the term microtubule-based process (GO:0007017) in the top ten most enriched terms. Transcripts encoding α-tubulin 1 and 2 chains were significantly upregulated (~1.25 and ~1.5 log2-fold), while a β-tubulin transcript was highly expressed and mildly upregulated (~0.36 log2-fold). While these transcripts were significantly upregulated by 8-Br cAMP treatment, we also found a tau-like microtubule-associated protein encoding transcript (RefSeq ID: XM_013087465.1) to be universally significantly downregulated at all time points of 8-Br cAMP treatment in both abdominal ganglia and right pleural ganglia. Tau-like proteins are implicated in stabilizing microtubule

133

assembly227. This provides evidence that indicates dynamic reorganization of the cellular cytoskeleton may indeed take place.

After 1-hour of 8- r c treatment we also see the term “supramolecular complex” (GO:00 080), associated with α- and β-tubulins and an intermediate filament transcript (RefSeq ID: NM_001204724.1). At the 1-hour time point (see Table 4-9) an

“integral component of synaptic esicle membrane” (GO:0030285) appears and carries over into the 2-hour time point. After 2-hours of exposure (see Table 4-10) we find terms annotated to “integral component of synaptic esicle membrane” (GO:00 0 85) and

“cytoplasmic esicle” (GO:00 4 0). These terms are annotated to a synaptotagmin-1 transcript that is significantly upregulated (LFC = 2.22 ± 0.86, p-adj < 0.010) in AG.

Synaptotagmins are major structural protein components of Ca2+-induced vesicle exocytosis and transmitter-mediated intercellular signaling228.

Right Pleural Molecular Function GO

In the preceding abdominal ganglia molecular function GO section, we noted that genetic knockout mice lacking functional ferroxidase enzymes experienced difficulties in a spatial learning and memory task215. Additionally, it was noted that these mice had lower levels of superoxide dismutase (SOD) and glutathione peroxidase (GPx) enzyme activity in their brains215. We found GPx mRNA to be significantly upregulated at all three time points of 8-Br cAMP treatment in both abdominal and right pleural ganglia

(see Table 4-11). We also detected SOD mRNA as significantly greater in abundance at the 1-hr experimentally treated time point in right pleural ganglia (LFC = 1.24 ± 0.53, p- adj = 0.098). Higher expression of these transcripts likely provides the cells with increased protection against toxic reactive oxygen radicals. Furthering our consideration

134

of upregulated “metabolic” enzymes, we found G DH significantly upregulated at 0.5- hr (LFC = 1.53 ± 0.62, p-adj = 0.145), 1-hr (LFC = 1.16 ± 0.46, p-adj < 0.076), and 2-hr

(LFC = 1.95 ± 0.52, p-adj = 0.003) treatment conditions. GAPDH was discovered to have a myriad of additional functions besides its canonical role in glycolysis, including

DNA/RNA binding229,230, transcriptional regulation231, kinase/phosphotransferase activity232,233, facilitation of vesicular transport234, as well as catalyzing microtubule formation and polymerization235,236.

Right Pleural Biological Process GO

The synapse is a highly organized neuronal structure which specializes in the release of chemical messengers that mediate cellular signaling237. In mammals, synapsins are the most numerous brain phosphoproteins present at the synapse and on synaptic vesicles238. Synapsin proteins are thought to interact with the actin cytoskeleton in synapses to modulate the tethering and release of synaptic vesicles238.

Phosphorylation of synapsins appears to affect the probability of release of synaptic vesicles, thereby regulating the efficiency of synaptic transmitter release239. After 0.5-hr of 8-Br cAMP exposure (see Table 4-12), we discovered a large downregulation of synapsin transcript (RefSeq ID: NM_001204483.1) (LFC = -6.89 ± 2.62, p-adj < 0.116), though not quite at the threshold of statistical significance. We also find a significant downregulation of the synaptic vesicular monoamine transporter (VMAT) (RefSeq ID:

XM_013091313.1) (LFC = -3.52 ± 1.22, p-adj = 0.082), which is responsible for the reuptake of various monoamine transmitters (including 5-HT) from the synaptic cleft back into presynaptic vesicles240. After 1-hour, we find the metabolic-process transcript glyceraldehyde-3-phosphate dehydrogenase-like (RefSeq ID: NM_001280826.1) to be robustly upregulated after 1-hour and continuing into the 2-hour time point.

135

Right Pleural Cellular Component GO

Descriptions of the ten most significant cellular component GO terms from the right pleural ganglia subjected to a 30-minute incubation in 8-Br cAMP can be viewed in

Table 4-13. The transcripts listed under the enriched term “extracellular region”

(GO:0005576) are annotated to neuropeptides and neurosecretory molecules as well as receptors. Among the receptors is a frizzled-related protein 2 transcript (RefSeq ID:

XM_013081500.1) (LFC = -1.75 ± 0.65, p-adj = 0.104), described by Moroz and colleagues as being expressed in the L7 motor neuron and being associated with synaptic plasticity24.

Calreticulin (RefSeq ID: NM_001204594.1) (LFC = 1.26 ± 0.49, p-adj < 0.063) is a significantly upregulated Ca2+ binding protein that has been demonstrated to be involved in the regulation of intrinsic neuronal excitability and LTP induction241. Among the 1- and 2-hour enriched terms are “non-membrane-bounded organelle”

(GO:0043228) which includes ribosomal components, cytoskeletal components, and histone variants. A core histone (H3.3) variant (RefSeq ID: NM_001204495.1) showed a slight, but statistically significant decrease (LFC = -0.83 ± 0.23, p-adj = 0.005). Histone

3 is the site of the most extensive modifications of all the histone proteins242,243, therefore changes in its abundance may have more dramatic effects upon nucleosome and higher chromatin structures.

Classification of Unique Ganglia-Specific Transcripts

The Venn diagram in Figure 4-10 shows the breakdown of ganglion-specific transcripts from the control (FSW) abdominal, cerebral, buccal, and right pleural ganglia

(for list of genes see Object 4-3. Supplementary Table 4-S7).

136

Object 4-3. Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Supplemental Table 4-S7 Ganglia specific gene list. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 4-12. These lists are the genes that are unique to each ganglion. If more than one ganglion is listed on a sheet, they each express the listed genes. For these lists, expression was defined as any TPM value greater than 0.

Overall, these four ganglia express 10,283 transcripts (out of 21,013 total genes not counting isoforms) and 4,300 of them (41.8%) are commonly expressed in all four ganglia. The abdominal, cerebral, and buccal ganglia commonly express 1,412 transcripts (13.7% of total expressed transcripts). The cerebral ganglion expresses the highest number of unique “ganglia-specific” genes (1,009), followed by the buccal ganglion (941), and the abdominal ganglion (533), while the right pleural ganglion expresses the least unique transcripts (107).

Time-course Analysis of Transcripts Influenced by 8-Br cAMP Treatment

We also utilized the R package maSigPro244,245 to analyze the 8-Br cAMP data for both abdominal and right pleural ganglia. The program uses a two-step regression strategy to find significantly differentially expressed genes over a time-course transcriptomic experiment. This complements the pair-wise approach we undertook using DESeq2 where each experimental condition was compared against its corresponding time control incubated in FSW. It allowed us to detect trends in gene expression over time. During the course of the 8-Br cAMP treatment, the maSigPro software package identified 565 (BH-adjusted p-value < 0.05) genes as DE in the abdominal ganglia. This list includes ApCREB2, which was also identified using

DESeq2. Using the “mclust” algorithm246 to cluster genes with similar expression patterns, we identified five clusters within the 565 DE genes (Figure 4-11). Cluster 4 consists of genes that are highly upregulated in AG upon exposure to 8-Br cAMP in all

137

time points examined. Of the 93 transcripts comprising this cluster, 25 encode ribosomal proteins, indicating that the biosynthesis of ribosomes (the translation machinery) is significantly upregulated. Two eukaryotic translation initiation factor subunits and two elongation factor subunits are also found among the 93 transcripts of cluster 4, further supporting the idea that the abdominal ganglion is actively supporting translation of new proteins as soon as half an hour after the cAMP-dependent signaling pathway is activated.

Time-course analysis allowed us to detect some canonical long-term sensitization associated genes that were not identified as significantly DE by DESeq2.

One of these genes is the NR1 subunit of the A. californica N-methyl-D-aspartate

(NMDA) receptor. This analysis also revealed that the glutamate receptor subunit GluR1 and NMDA receptor subunit NR1 were differentially expressed in cAMP compared to control ganglia. NR1 is grouped in cluster 2, which consists of transcripts that increase expression upon 8-Br cAMP treatment. Over the course of treatment NR1 shows upregulation from ~1.96 to ~2.84 log2-fold but is not distinguished as statistically significant by DESeq2. GluR1 is also a member of cluster 2 and displays increased expression at all measured time points (from ~1.3 to ~3.2 log2-fold). The Aplysia GluR1 transcript may be homologous to a subunit of an alpha-amino-3-hydroxy-5-methyl-4- isoxazole propionic acid receptors (AMPAR). Evidence suggests that phosphorylation of

GluR1 in the mouse hippocampus plays a central role in long-term potentiation (LTP) and long-term depression (LTD)247.

Using the maSigPro algorithms on the right pleural ganglia we found 1,346 genes significantly DE as a result of 8-Br cAMP treatment. Performing an unsupervised

138

clustering using the “mclust” algorithm, these transcripts segregated into seven clusters

(Figure 4-12). Cluster 1 is comprised of 738 genes (54.8%) that are not detected as expressed in the control right pleural ganglia and become detectably expressed in right pleural ganglia when they are exposed to 8-Br cAMP. Such genes include GluR1, a

Shaw-like K+ voltage gated channel, synaptotagmin-9, a CREB3-like transcript, zinc- finger transcription factors, and glycine transporters. The second cluster contains 118 transcripts that decrease in expression over time in the control ganglia but remain stable in their expression levels in the 8-Br cAMP treated ganglia. Cluster 4 is made up of 103 genes that are consistently expressed in 8-Br cAMP ganglia but increase over time in the controls. Finally, cluster 7 is composed of 26 highly abundant genes that retain stable expression levels over time in the control ganglia and display a moderate decrease in 8-Br cAMP treated ganglia over time.

Unsupervised Clustering of 8-Br cAMP-Treated Ganglia

We performed an unsupervised clustering analysis (PCA) of ganglia incubated in

8-Br cAMP for 0.5, 1, or 2 hours. The results of the clustering can be seen in Figure 4-

13. Two features of this PCA are immediately recognizable: 1) the 8-Br cAMP treated ganglia form distinct clusters, separate from each other. The only ganglia that co-inhabit a cluster together are the left and right pleural ganglia (represented as stars and clubs, respectively) and the left and right pedal ganglia (represented as diamonds and spades, respectively) 2) the abdominal ganglia cluster separates the most from the other central ganglia. This is most visibly demonstrated along the horizontal axis.

Results of Differential Expression Testing in FSW Controls

We compared the gene expression profiles between the abdominal ganglia and right pleural ganglia incubated in FSW for 30 minutes, 1 hour, and 2 hours to observe

139

any changes in our controls. The analysis revealed there were no differentially expressed genes between our abdominal ganglia controls at any time point. A similar analysis in the right pleural ganglia revealed no differentially expressed genes between the 30-minute versus 1-hour comparison, however the 2-hour FSW-incubated ganglia displayed significant downregulation of two transcripts compared to the 1-hour and 0.5- hour FSW-incubations. One was an RFT1-like homolog (RefSeq ID: XM_005103822.2) and the second was an RNA-binding transcript (RefSeq ID: XM_005108114.2). In humans, point mutations in the RFT1 gene are associated with disfunction in N-linked glycosylation248.

Abdominal Ganglion 0.5-hour cAMP Treatment vs Control (FSW)

Overall, using DESeq2, we found 794 genes to be significantly differentially expressed in abdominal ganglia between the 30-minute 8-Br cAMP incubation and 30- minute FSW exposure. See also:

Object 4-4. Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours). Table 4-S1 cAMP 0.5hr AG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing abdominal ganglia incubated in 8-Br cAMP solution to those in FSW for 0.5 hours.

In addition to Ap-Uch, we also found two other ubiquitin-related transcripts - one predicted to encode an E2 ubiquitin-conjugating protein (RefSeq ID: XM_005112198.2) and the other an E3 ubiquitin-ligase (XM_005102402.2). The E2-encoding transcript was downregulated, while the E3-encoding transcript was upregulated. E3 ubiquitin ligases mediate protein-protein interactions that determine the substrate specificity for the cellular process of ubiquitylation249. Ubiquitination can have various effects on proteins: it may target them for degradation, alter their activity, location within the cell, or modulate interactions with other proteins. Examination of transcripts involved in

140

ubiquitylation may be particularly insightful, as the ubiquitin proteasome system plays a critical inhibitory role in constraining synaptic strength200.

Additionally, we found other three other zinc-finger transcription factors (besides

ApEGR) to be significantly downregulated by 8-Br cAMP treatment. The roles of these transcripts are yet unknown.

Abdominal Ganglion 1-hour cAMP Treatment vs Control (FSW)

Comparing the 1-hr 8-Br cAMP-treated abdominal ganglia to their FSW controls, we found 305 genes to be differentially expressed using DESeq2 (see Object 4-2.

Supplemental Table 4-S2).We observed a six-fold increase in the transcript encoding the serotonin transporter (SERT) – a monoamine transporter responsible for reuptake of

5-HT from the synaptic cleft back into a presynaptic neuron. We found a ~6.3-log2 fold decrease in a transcript encoding a histone deacetylase (RefSeq ID: XM_005097821.2) at the 1-hour time point. This transcript was one of the most consistently significantly downregulated in all phases of 8-Br cAMP treatment in both abdominal ganglia and right pleural ganglia. Histone deacetylases (HDAC) are a superfamily of enzymes that cleave acetyl groups from histones, thus compacting DNA and making it less accessible to protein transcription factors250. Downregulation of HDACs would diminish the removal of histone acetyl groups, generally decompressing DNA and enabling its transcription.

Abdominal Ganglion 2-hour cAMP Treatment vs Control (FSW)

Comparing the 2-hour cAMP-treated abdominal ganglia to their 2-hour FSW- incubated in FSW counterparts, we found 346 genes to be differentially expressed using

DESeq2 (see Object 4-2 (Supplemental Table 4-S3). One such transcript is predicted to encode a regulatory subunit of the lysine acetyltransferase 8 (KAT8)-associated non- specific lethal (NSL) complex (RefSeq ID: XM_005107484.2). This complex associates

141

with histone acetyltransferases (HATs) such as KAT8 which are responsible for acetylating histone H4 at lysine 16 (H4K16ac), which decondenses chromatin structure in D. melanogaster as well as in mammalian species251. After two hours of 8-Br cAMP treatment, this transcript increases 3 ~ 5.8-fold in expression in abdominal ganglia (see

Figure 4-14 upper panel). Sixteen of these genes are non-coding RNAs with unknown functions. Non-coding RNAs can exhibit a bewildering array of functions including regulation of transcription, translation, RNA processing, and are also found to associate with proteins into RNA-protein complexes with diverse biological functions252.

135 Genes DE in 0.5, 1, and 2-hour 8-Br cAMP Time Points in the AG

We cross compared the list of differentially expressed genes from 0.5, 1, and 2- hour 8-Br cAMP treatments to find out which transcripts were commonly DE. The results of the comparison are depicted in Figure 4-15. One of the 135 genes DE at all time points is predicted to encode a potassium channel Kv (Shal-like) that was commonly ~2 log2-fold downregulated at each TP compared to controls.

The 0.5, 1, and 2-hour 8-Br cAMP Treated RPlG have 60 DE Genes in Common

Analogously to our abdominal ganglia comparison, we contrasted the lists of differentially expressed genes from the 0.5, 1, and 2-hour 8-Br cAMP treatments to the right pleural ganglia. The resulting Venn diagram is found in Figure 4-16. Of the 60 transcripts that were differentially expressed by all 8-Br cAMP treated ganglia compared to FSW controls, one is a predicted to encode a galectin-4-like protein (RefSeq ID:

XM_005101982.2). Galectins are proteins that bind to carbohydrates and can be involved in many cellular functions including cell signaling, migration, inflammation, and immune responses253.

142

Discussion of Canonical and Novel Candidate Genes

The relative paucity of DE genes detected between the FSW control abdominal

(no DE transcripts) and right pleural (2 DE transcripts) ganglia is indicative of their suitability of purpose as controls relative to 8-Br cAMP-treated ganglia.

We did not identify two of the canonical transcription factors of sensitization- induced long-term memory, CREB1 (RefSeq ID: XM_013084337.1) or C/EBP (RefSeq

ID: NM_001204463.1) as significantly differentially expressed. As mentioned previously,

CREB1 mRNA expression was scarcely detectable in almost all the ganglia we tested. It is possible, given its role as a key upstream transcriptional activator of the cAMP signaling pathway, that CREB1 mRNA expression is tightly regulated and is only lowly or moderately produced in a select handful of neurons (even when the cAMP signaling cascade is initiated) and that by sampling entire ganglia, the overall expression is averaged out to undetectable levels. A previous quantitative PCR (qPCR) experiment performed to measure the expression of CREB1 in Aplysia pleural ganglia found increased levels of mRNA after long-term sensitization training, but they did not reach statistical significance182. We also did not find C/EBP transcript to be significantly differentially expressed, although at the 1-hour time point, the 8-Br cAMP-treated abdominal ganglia did display an upregulation of C/EBP mRNA (LFC = 3.48 ± 1.65, p- adj = 0.240) with a p-value < 0.035, prior to correction for multiple testing. This suggests that additional biological replicates might have produced a significant difference. Alberini and colleagues, who first cloned and characterized ApC/EBP, measured expression changes using the entire CNS189, whereas our study used individual ganglia. Cyriac et al.196 found C/EBP mRNA to be upregulated in pedal-pleural ganglia extracts 1 hour after long-term sensitization training. Herdegen and colleagues also examined isolated

143

pleural ganglia 1 hour after long-term sensitization training using microarray and qPCR techniques and found significant increases in ApC/EBP and ApEGR182. Our pleural ganglia data were likely limited by low statistical power due to lack of replicates (n = 1 for 0.5 hr 8-Br cAMP right pleural ganglion, n = 2 for all right pleural ganglia FSW controls conditions as well as all 8-Br cAMP treatments conditions, except the 1 hour time point where n = 3). Therefore, this data must be interpreted with caution as transcripts which did not reach statistical significance may be biologically regulated.

Since its discovery and link to receptor-dependent LTP254, the NMDA receptor has been postulated as a cellular mechanism to explain key molecular events fundamental to associative learning and synaptic potentiation. The NR1 subunit was initially cloned and found to be expressed by a multitude of Aplysia neurons255. The subunit does not pass the DESeq2 adjusted p-value imposed stringency cutoff of less than 0.1 in any 8-Br cAMP versus control condition, but is upregulated (log2-fold change

= 2.34) in abdominal ganglia treated with cAMP for 2 hours compared to control to a.

One potential factor as to why this transcript does not pass the DESeq2 stringency cutoff is its low mean expression. DESeq2 has a more difficult time distinguishing genes as DE when they are not abundantly expressed64.

One of the most consistently upregulated transcripts in both abdominal and right pleural ganglia treated with 8- r c was predicted to encode an α-tubulin 2 protein.

A microarray analysis of a related species of Aplysia (Aplysia kurodai) found 3-fold upregulation of an α-tubulin 2 transcript 2 hours after 5-HT treatment, but concluded it was a false-positive result when no change was detected via real-time PCR256. The robust upregulation of α-tubulin 2 in A. californica suggests that either this is a species-

144

specific difference, or perhaps examining α-tubulin 2 mRNA levels in A. kurodai might warrant further investigation.

We sought to identify candidate genes that could serve to direct the cellular and molecular changes that reshape the neural networks controlling behavior257. The Shal potassium channel was originally discovered in the fruit fly Drosophila melanogaster.

Potassium channels with similar amino acid sequences were subsequently found in several vertebrates and now collectively constitute what is known as the Kv4 family.

These channels primarily contribute to form the transient, voltage-dependent potassium currents in the nervous system (called A currents) as well as in the heart (the transient outward current)55. The Kv4 channel family can be further divided into 3 subfamilies, designated as: Kv4.1 (KCND1), Kv4.2 (KCND2) and Kv4.3 (KCND3).

Kv4.2 family subunits are major contributors to somatodendritic A-type K+ channels in the basal forebrain neurons and globus pallidus neurons (basal ganglia) in mammals258. They are also expressed in the mammalian CNS and heart. Additionally, they are predicted to play a role in the regulation of neuronal transmission at post- synaptic loci in defined brain regions. Kv4.2 channel proteins contain several conserved sites for mitogen-activated protein kinase (MAPK) ERK phosphorylation, which implies that ERK could regulate K+ channel function by direct phosphorylation54. The downregulation of the Aplysia Shal-like K+ channel transcript could be part of negative- feedback loop in response to constitutive 8-Br cAMP neuronal activation.

The increasing expression of ApCREB2 over time when exposed to 8-Br cAMP may indicate the cells were increasing expression of this repressive transcription factor to negatively regulate the expression of cAMP and shut off the cAMP signaling to

145

conserve cellular resources259. Additionally, we found calmodulin-related genes, some of which are involved in Ca2+-dependent cellular signaling cascades, downregulated in the abdominal ganglia during 8-Br cAMP treatment. This may be an effort by the neurons or glia of the abdominal ganglion to limit the cellular Ca2+ levels.

In addition to the CREB1, CREB2, and C/EBP transcription factors, we identified and quantified the expression of other genes previously associated with the learning and memory response to sensitization training (or its in vitro analog – 5HT treatment), including TBL-1, EGR, calmodulin, Ap-Uch, and a reductase-related transcript.

Furthermore, we identified several transcripts that may serve as hitherto unknown or little explored molecular players in sensitization or cAMP-related signaling. They include chromatin remodeling and histone deacetylase encoding transcripts, cytoskeletal subunits (including actin, alpha and beta-tubulins, and intermediate filaments), synaptic transcript-encoding proteins, ion channels, monoamine transporters, glutathione peroxidase, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Ras/Rho signaling

GTPases, multiple putative neuropeptides, iron metabolism transcripts such as soma ferritin, and a Shal-like potassium channel.

Overall, our major findings include: 1) validation of cAMP pathway activation by

8-Br cAMP treatment and its analogy to a form of behavioral learning (long-term sensitization) experienced in A. californica; 2) confirmation via high-throughput next- generation sequencing of differential regulation of genes involved/activated by long-term sensitization training or the in vitro application of 5-HT to neurons mediating the defensive withdrawal reflex modifiable by sensitization; 3) discovery of novel genes not yet known to be involved or regulated by cAMP-signaling pathways, including genes

146

that have not yet been experimentally characterized (only computationally predicted); and 4) characterization of Aplysia genes that are ganglia-specific and differentially expressed over the time course of 30 minutes to 2 hours of 8-Br cAMP treatment

Aplysia ganglia.

147

Table 4-1. Significantly DE transcripts in 8-Br cAMP-treated ganglia compared to controls Gene AG 0.5 AG 1 AG 2 RPlG 0.5 RPlG 1 RPlG 2

CREB1 

CREB2 ✔ ✔ ✔

C/EBP

TBL-1 ✔

EGR ✔

Calmodulin ✔ reductase-related ✔ ✔ ✔ ✔ ✔ ✔

Uch ✔ ✔ ✔

148

Table 4-2. Molecular function GO terms enriched in AG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005198 structural molecule 8 5 2.02 0.023 0.033 0.28

activity

GO:0003735 structural constituent of 4 3 1.01 0.049 0.068 0.59

ribosome

GO:0016491 oxidoreductase activity 5 3 1.26 0.101 0.007 0.12

GO:0003824 catalytic activity 37 12 9.34 0.151 0.183 0.35

GO:0005200 structural constituent of 3 2 0.76 0.156 0.129 0.17

cytoskeleton

GO:0004016 adenylate cyclase 1 1 0.25 0.253 0.218 0.62

activity

GO:0004222 metalloendopeptidase 1 1 0.25 0.253 0.304 0.99

activity

GO:0004322 ferroxidase activity 1 1 0.25 0.253 0.195 0.47

149

Table 4-2. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0004365 glyceraldehyde-3- 1 1 0.25 0.253 0.203 0.52

phosphate

dehydrogenase (NAD+)

(phosphorylating)

activity

GO:0004843 thiol-dependent 1 1 0.25 0.253 0.234 0.72

ubiquitin-specific

protease activity

150

Table 4-3. Molecular function GO terms enriched in AG 1-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005200 structural 3 2 0.49 0.069 0.15 0.64

constituent of

cytoskeleton

GO:0005198 structural 8 3 1.32 0.12 0.3 0.48

molecule activity

GO:0000149 SNARE binding 1 1 0.16 0.165 0.22 0.95

GO:0004322 ferroxidase 1 1 0.16 0.165 0.23 0.99

activity

GO:0005506 iron ion binding 1 1 0.16 0.165 0.23 0.99

GO:0005543 phospholipid 1 1 0.16 0.165 0.22 0.95

binding

GO:0005544 calcium- 1 1 0.16 0.165 0.22 0.95

dependent

phospholipid

binding

151

Table 4-3. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0008199 ferric iron binding 1 1 0.16 0.165 0.23 0.99

GO:0016722 oxidoreductase 1 1 0.16 0.165 0.23 0.99

activity, oxidizing

metal ions

GO:0016724 oxidoreductase 1 1 0.16 0.165 0.23 0.99

activity, oxidizing

metal ions,

oxygen as

acceptor

152

Table 4-4. Molecular function GO terms enriched in AG 2-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005515 protein 14 7 3.36 0.018 0.076 0.93

binding

GO:0050662 coenzyme 2 2 0.48 0.055 0.046 0.41

binding

GO:0005200 structural 3 2 0.72 0.141 0.154 0.32

constituent of

cytoskeleton

GO:0048037 cofactor 3 2 0.72 0.141 0.169 0.41

binding

GO:0001882 nucleoside 7 3 1.68 0.215 0.577 0.82

binding

GO:0001883 purine 7 3 1.68 0.215 0.577 0.82

nucleoside

binding

153

Table 4-4. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0003924 GTPase 7 3 1.68 0.215 0.577 0.82

activity

GO:0005102 signaling 7 3 1.68 0.215 0.279 0.45

receptor

binding

GO:0005179 hormone 7 3 1.68 0.215 0.279 0.45

activity

GO:0005525 GTP binding 7 3 1.68 0.215 0.577 0.82

154

Table 4-5. Biological process GO terms enriched in AG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:1901566 organonitrogen 5 4 1.56 0.036 0.035 0.64

compound biosynthetic

process

GO:0009058 biosynthetic process 8 5 2.49 0.068 0.071 0.55

GO:0044249 cellular biosynthetic 8 5 2.49 0.068 0.071 0.55

process

GO:0044271 cellular nitrogen 8 5 2.49 0.068 0.071 0.55

compound biosynthetic

process

GO:1901564 organonitrogen 8 5 2.49 0.068 0.071 0.55

compound metabolic

process

155

Table 4-5. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:1901576 organic 8 5 2.49 0.068 0.071 0.55

substance

biosynthetic

process

GO:0044267 cellular protein 6 4 1.87 0.084 0.095 0.64

metabolic

process

GO:0044237 cellular metabolic 11 6 3.43 0.092 0.09 0.46

process

GO:0006412 translation 4 3 1.25 0.099 0.114 0.73

GO:0006518 peptide metabolic 4 3 1.25 0.099 0.114 0.73

process

156

Table 4-6. Biological process GO terms enriched in AG 1-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0065008 regulation of 4 3 0.96 0.049 0.106 0.99

biological

quality

GO:0019725 cellular 2 2 0.48 0.064 0.091 0.84

homeostasis

GO:0042592 homeostatic 2 2 0.48 0.064 0.091 0.84

process

GO:0007017 microtubule- 3 2 0.72 0.161 0.141 0.25

based

process

GO:0000041 transition 1 1 0.24 0.259 0.309 1

metal ion

transport

GO:0006812 cation 1 1 0.24 0.259 0.309 1

transport

157

Table 4-6. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0006826 iron ion 1 1 0.24 0.259 0.309 1

transport

GO:0006873 cellular ion 1 1 0.24 0.259 0.309 1

homeostasis

GO:0006875 cellular metal 1 1 0.24 0.259 0.309 1

ion

homeostasis

GO:0006879 cellular iron 1 1 0.24 0.259 0.309 1

ion

homeostasis

158

Table 4-7. Biological process GO terms enriched in AG 2-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0009987 cellular process 43 18 14.88 0.014 0.019 1

GO:0050794 regulation of cellular process 29 13 10.04 0.073 0.117 0.96

GO:0006753 nucleoside phosphate metabolic 2 2 0.69 0.115 0.066 0.41

process

GO:0006793 phosphorus metabolic process 2 2 0.69 0.115 0.066 0.41

GO:0006796 phosphate-containing 2 2 0.69 0.115 0.066 0.41

compound metabolic process

GO:0009056 catabolic process 2 2 0.69 0.115 0.122 0.88

GO:0009117 nucleotide metabolic process 2 2 0.69 0.115 0.066 0.41

GO:0009165 nucleotide biosynthetic process 2 2 0.69 0.115 0.066 0.41

GO:0019637 organophosphate metabolic 2 2 0.69 0.115 0.066 0.41

process

GO:0035556 intracellular signal transduction 2 2 0.69 0.115 0.169 0.41

159

Table 4-8. Cellular component GO terms enriched in AG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0043228 non-membrane- 14 7 3.16 0.019 0.039 0.47

bounded organelle

GO:0043232 intracellular non- 14 7 3.16 0.019 0.039 0.47

membrane-

bounded organelle

GO:0005840 ribosome 4 3 0.9 0.04 0.061 0.67

GO:1990904 ribonucleoprotein 4 3 0.9 0.04 0.061 0.67

complex

GO:0005737 cytoplasm 22 8 4.97 0.096 0.28 0.64

GO:0005874 microtubule 3 2 0.68 0.138 0.114 0.14

GO:0015630 microtubule 3 2 0.68 0.138 0.114 0.14

cytoskeleton

GO:0005856 cytoskeleton 6 3 1.35 0.142 0.117 0.46

GO:0044422 organelle part 14 5 3.16 0.205 0.323 0.48

160

Table 4-8. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0044446 intracellular 14 5 3.16 0.205 0.323 0.48

organelle part

161

Table 4-9. Cellular component GO terms enriched in AG 1-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005874 microtubule 3 2 0.58 0.11 0.14 0.37

GO:0015630 microtubule 3 2 0.58 0.11 0.14 0.37

cytoskeleton

GO:0044430 cytoskeletal part 4 2 0.77 0.18 0.26 0.37

GO:0099080 supramolecular 4 2 0.77 0.18 0.26 0.37

complex

GO:0099081 supramolecular 4 2 0.77 0.18 0.26 0.37

polymer

GO:0099512 supramolecular 4 2 0.77 0.18 0.26 0.37

fiber

GO:0099513 polymeric 4 2 0.77 0.18 0.26 0.37

cytoskeletal fiber

162

Table 4-9. Continued GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0030285 integral 1 1 0.19 0.21 0.24 0.96

component of

synaptic vesicle

membrane

GO:0031300 intrinsic 1 1 0.19 0.21 0.24 0.96

component of

organelle

membrane

GO:0031301 integral 1 1 0.19 0.21 0.24 0.96

component of

organelle

membrane

163

Table 4-10. Cellular component GO terms enriched in AG 2-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005874 microtubule 3 2 0.84 0.19 0.18 0.37

GO:0015630 microtubule cytoskeleton 3 2 0.84 0.19 0.18 0.37

GO:0030141 secretory granule 1 1 0.28 0.28 0.18 0.34

GO:0030285 integral component of synaptic 1 1 0.28 0.28 0.34 1

vesicle membrane

GO:0031300 intrinsic component of organelle 1 1 0.28 0.28 0.34 1

membrane

GO:0031301 integral component of organelle 1 1 0.28 0.28 0.34 1

membrane

GO:0098563 intrinsic component of synaptic 1 1 0.28 0.28 0.34 1

vesicle membrane

GO:0031410 cytoplasmic vesicle 4 2 1.12 0.31 0.63 0.82

GO:0031982 vesicle 4 2 1.12 0.31 0.63 0.82

GO:0044430 cytoskeletal part 4 2 1.12 0.31 0.35 0.37

164

Table 4-11. Molecular function GO terms enriched in RPlG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005200 structural constituent of 3 2 0.31 0.028 0.084 0.1

cytoskeleton

GO:0003824 catalytic activity 34 6 3.56 0.082 0.457 0.64

GO:0016491 oxidoreductase activity 5 2 0.52 0.083 0.094 0.78

GO:0004322 ferroxidase activity 1 1 0.1 0.105 0.181 0.89

GO:0004843 thiol-dependent ubiquitin- 1 1 0.1 0.105 0.166 0.64

specific protease activity

GO:0005506 iron ion binding 1 1 0.1 0.105 0.181 0.89

GO:0008199 ferric iron binding 1 1 0.1 0.105 0.181 0.89

GO:0008234 cysteine-type peptidase activity 1 1 0.1 0.105 0.166 0.64

GO:0016722 oxidoreductase activity, 1 1 0.1 0.105 0.181 0.89

oxidizing metal ions

GO:0016724 oxidoreductase activity, 1 1 0.1 0.105 0.181 0.89

oxidizing metal ions, oxygen as

acceptor

165

Table 4-12. Biological process GO terms enriched in RPlG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0019725 cellular homeostasis 2 2 0.4 0.037 0.062 0.96

GO:0042592 homeostatic process 2 2 0.4 0.037 0.062 0.96

GO:0007017 microtubule-based process 3 2 0.6 0.099 0.107 0.14

GO:0065008 regulation of biological quality 4 2 0.8 0.175 0.103 0.96

GO:0000041 transition metal ion transport 1 1 0.2 0.2 0.256 0.98

GO:0006511 ubiquitin-dependent protein 1 1 0.2 0.2 0.199 0.63

catabolic process

GO:0006812 cation transport 1 1 0.2 0.2 0.256 0.98

GO:0006826 iron ion transport 1 1 0.2 0.2 0.256 0.98

GO:0006873 cellular ion homeostasis 1 1 0.2 0.2 0.256 0.98

GO:0006875 cellular metal ion homeostasis 1 1 0.2 0.2 0.256 0.98

166

Table 4-13. Cellular component GO terms enriched in RPlG 0.5-hour cAMP treatment GO.ID Term Annotated Significant Expected classicFisher classicKS elimKS

GO:0005874 microtubule 3 2 0.48 0.064 0.093 0.1

GO:0015630 microtubule cytoskeleton 3 2 0.48 0.064 0.093 0.1

GO:0005576 extracellular region 26 7 4.12 0.064 0.036 0.59

GO:0044430 cytoskeletal part 4 2 0.63 0.116 0.18 0.1

GO:0099080 supramolecular complex 4 2 0.63 0.116 0.18 0.1

GO:0099081 supramolecular polymer 4 2 0.63 0.116 0.18 0.1

GO:0099512 supramolecular fiber 4 2 0.63 0.116 0.18 0.1

GO:0099513 polymeric cytoskeletal fiber 4 2 0.63 0.116 0.18 0.1

GO:0005856 cytoskeleton 6 2 0.95 0.24 0.351 0.1

GO:0043228 non-membrane-bounded 13 2 2.06 0.656 0.711 0.1

organelle

167

Figure 4-1. Bar chart of CREB2 mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each bar represents one ganglion, with the names indicative of the treatment it received (control or 8-bromo cAMP), the duration of exposure, and the number of replicates. The bar heights represent the mRNA expression in transcripts per million (TPM) for CREB2 transcript. Similar colors indicate ganglia compared together to determine differentially expressed genes. In A) for example, light green bars represent CREB2 expression in control abdominal ganglia incubated in filtered sea water for 1 hour and the dark green bars indicate 200 µM 8-Br cAMP-treated abdominal ganglia incubated for 1 hour.

168

Figure 4-2. Boxplot of C/EBP mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values of C/EBP transcript. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter-quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - .5 * IQ . The red dot below the c _ _ box represents an “outlier” data point.

169

Figure 4-3. Bar chart of tolloid/BMP-1-like (TBL-1) mRNA expression in left pleural and right pleural ganglia. Each bar represents one ganglion, with the names indicative of the treatment it received (control or 8-bromo cAMP), the duration of exposure, and the number of replicates. The bars heights represent the mRNA expression in transcripts per million (TPM) for TBL-1 transcript. Similar colors indicate ganglia compared together to determine differentially expressed genes. In A) for example, light green bars represent TBL-1 expression in control right pleural ganglia incubated in filtered sea water for 1 hour and the dark green bars indicate 200 µM 8-Br cAMP-treated right pleural ganglia incubated for 1 hour.

170

Figure 4-4. Boxplot of EGR1 mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values of EGR1 transcript. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter-quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - 1.5 * IQR.

171

Figure 4-5. Boxplot of CaM mRNA expression in left pleural and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values of CaM transcript. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter- quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - 1.5 * IQR.

172

Figure 4-6. Boxplot of reductase-related mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values of reductase-related transcript. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter-quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - 1.5 * IQR.

.

173

Figure 4-7. Boxplot of Uch mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values of Uch transcript. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter-quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - 1.5 * IQR.

174

Figure 4-8. The acyclic induced subgraph of the 10 most significant molecular function GO terms as identified by the “classic” algorithm using Fisher’s exact test on the 0.5-hour abdominal ganglia DE transcripts.

175

Figure 4-9. Bar chart of glutamate dehydrogenase mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each bar represents one ganglion, with the names indicative of the treatment it received (control or 8-bromo cAMP), the duration of exposure, and the number of replicates. The bar heights represent the mRNA expression in transcripts per million (TPM) for glutamate dehydrogenase transcript. Similar colors indicate ganglia compared together to determine differentially expressed genes. In A) for example, light green bars represent glutamate dehydrogenase expression in control abdominal ganglia incubated in filtered sea water for 1 hour and the dark green bars indicate 200 µM 8-Br cAMP-treated abdominal ganglia incubated for 1 hour.

176

Figure 4-10. Venn diagram displaying transcripts that are commonly expressed by multiple ganglia (for example, all four ganglia express 4,300 transcripts indicated by center area) or unique to specific ganglia (for example, 533 unique AG-specific transcripts). AG – abdominal ganglia, CG – cerebral ganglia, BG – buccal ganglia, RPlG – right pleural ganglia. Also see Object 4- 2. Supplemental Table 4-S7 Ganglia specific gene list. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 4- 12. These lists are the genes that are unique to each ganglion. If more than one ganglion is listed on a sheet, they each express the listed genes. For these lists, expression was defined as any TPM value greater than 0.

177

Figure 4-11. Gene clusters exhibiting distinct temporal patterns of expression over the course of 8-Br cAMP treatment in the abdominal ganglion.

178

Figure 4-12. Gene clusters exhibiting distinct temporal patterns of expression over the course of 8-Br cAMP treatment in the right pleural ganglion.

179

Figure 4-13. Principal component analysis of 8-Br cAMP treated ganglia. A – abdominal, B – buccal, C – cerebral, LPe – left pedal, LPl – left pleural, RPe – right pedal, RPl – right pleural ganglia.

180

Figure 4-14. Boxplot of KAT8-associated complex NSL subunit mRNA expression in top panel - abdominal, bottom panel - left and right pleural ganglia. Each box represents all the ganglia corresponding to each treatment named below it on the horizontal axis, with the names indicative of the treatment received (control or 8-bromo cAMP) and the duration of exposure. The upper hinge (top of the boxes) are at the 75% quantile, the lower hinge (bottom of boxes) at the 25% quantile, and the horizontal line bisecting the box represents the median (50% quantile) of the transcripts per million (TPM) values. The upper whiskers of the boxes indicate the largest observation less than or equal to the upper hinge + 1.5 multiplied by the inter-quartile range (IQR). The lower whiskers indicate the smallest observation greater than or equal to the lower hinge - 1.5 * IQR.

181

0.5 hr hr

65 5 5 75 5 7 0

0

hr

Figure 4-15. Significantly differentially expressed genes detected between abdominal ganglia exposed to 8-Br cAMP and FSW controls. The different colored circles correspond to the various time points (blue – 0.5 hr, red – 1 hr, green – 2 hr). The lists of genes corresponding to each area of the diagram can be found in Object 4-2. Supplementary Table 4-S26.

182

0.5 hr hr

4 56 60 0 57

4

hr

Figure 4-16. Significantly differentially expressed genes detected between right pleural ganglia exposed to 8-Br cAMP and FSW controls. The different colored circles correspond to the various time points (blue – 0.5 hr, red – 1 hr, green – 2 hr). The lists of genes corresponding to each area of the diagram can be found in Object 4-2. Supplementary Table 4-S27.

183

CHAPTER 5 INSIGHTS GAINED FROM THE STUDY OF SINGLE CELL IDENTITY AND GANGLIONIC PLASTICITY IN APLYSIA

This study serves as a framework for the classification via transcriptomic profiling of individual, identifiable neuronal cell types, as well as the plasticity displayed by whole ganglia when a key cellular signaling pathway is activated (cAMP) within the Aplysia californica central nervous system.

Aplysia Enables the Transcriptomic Profiling of Identifiable Single Neurons

High-throughput next generation sequencing has greatly increased our knowledge of the molecular genetics taking place at the single cell level84. The study of less complex organisms, as was previously the case with bacterial and yeast models for molecular biology in the past, can allow the generalization of conserved functions to more complicated systems. The marine invertebrate Aplysia californica offers advantages to the study of its nervous system properties: large, identifiable neurons that are easily experimentally manipulated. It is only within this system that we are currently able to study the transcriptomic properties of identifiable single neurons.

Cellular Identity of Individual Single Neurons

Lessons regarding the cellular and molecular strategies utilized by the nervous system of Aplysia are conserved in mammalian memory systems and can be applied to the study of both explicit and implicit forms of memory15. The individual neurons comprising the Aplysia nervous system have been studied for decades to gain valuable cellular and molecular insights. We now further advance this body of knowledge by investigating identifiable single neurons using single cell RNA-sequencing techniques.

We completed deep-coverage scRNA-seq on 23 identifiable neurons representing six cell types from the Aplysia californica CNS. We analyzed differential

184

expression and functional enrichment, using unbiased classification methods to generate complete profiles of the transcriptomic identity of these neurons. Utilizing external RNA spike-ins for calibration, we estimate for the first time the absolute abundance of endogenous neuronal RNA molecules expressed in identifiable single neurons. This study serves to illuminate the variability within identified neurons as well as the core transcriptomic similarities shared by these cells.

To summarize, for the first time, we sequenced (to saturation) and exhaustively quantified absolute RNA abundances from reliably identifiable individual neurons representing classes of cholinergic, serotonergic, motor, and interneurons (see Object

5-1. Supplemental Table M1).

Object 5-1. Primary Data Master Tables. Table M1. (Master Data Table 1) Deep Sequencing of Single Identified Neurons with Annotated TPMs. This table contains the TPM expression values and bioinformatic annotation data of all genes from the 23 single neurons on which we performed deep-coverage RNA-seq.

We also performed relatively shallow sequencing and transcriptomic characterization of 96 individual neurons, including 14 VC sensory neurons (see Object 5-1.

Supplemental Table M2).

Object 5-2. Primary Data Master Tables. Table M2. (Master Data Table 2) Shallow Sequencing of Single Identified Neurons with Annotated TPMs. This table contains the TPM expression values and bioinformatic annotation data of all genes from the 96 single neurons on which we performed shallow-coverage RNA-seq.

We investigated three components of identified neural circuits (sensory, inter- and motor neurons) and quantified trends in gene abundances using external RNA spike- ins. This portion of our work will potentially prove to be of valuable aid in future investigation of expressed genetic features and transcriptomic profiles from identified single neurons.

185

Confirmation of Previous Findings and Novel Genes Related to cAMP-Induced Plasticity

The experimental tractability of these CNS of Aplysia systems enabled some of the initial investigations into the neural mechanisms underlying learning and memory.

The cAMP signaling pathway was found to be at the heart of the cellular mechanisms of a quantifiable type of behavioral learning – sensitization. We substituted behavioral learning procedures with exposing the central nervous system of A. californica to 8- bromo cAMP (8-Br cAMP), a membrane permeable derivative of cAMP.

We treated the central ring ganglia with 8-Br cAMP over the course of two hours, to determine which genes were changed by activation of the cAMP signaling pathway.

We sought to identify candidate plasticity genes that could regulate the cellular and molecular changes that potentially refashion the neural networks dictating behavior257.

In addition to the well-recognized CREB1, CREB2, and C/EBP transcription factors, we identified and evaluated the expression of other genes associated with sensitization learning (or 5HT treatment, an in-vitro analog), including TBL-1, EGR, calmodulin, Ap-Uch, and a reductase-related transcript. In addition, we identified multiple transcripts that may regulate or play a role in cAMP-related signaling (see

Table M3). These include chromatin remodeling and histone deacetylase encoding transcripts, cytoskeletal subunits (including actin, α- and β-tubulins, and intermediate filaments), synaptic transcript-encoding proteins, ion channels, monoamine transporters, glutathione peroxidase, glyceraldehyde-3-phosphate dehydrogenase

(GAPDH), Ras/Rho signaling GTPases, multiple neuropeptides, iron metabolism transcripts such as soma ferritin, and a Shal-like potassium channel. In summary our primary findings in the investigation of ganglionic plasticity are: 1) validation of cAMP

186

pathway activation by 8-Br cAMP treatment and its similarity to behavioral sensitization,

2) confirmation of differential regulation of genes described as regulated by long-term sensitization training or the in vitro application of 5-HT to neurons mediating the defensive withdrawal reflex modifiable by sensitization, 3) discovery of novel genes previously unknown to be involved or regulated by cAMP-signaling pathways, including genes that have not yet been experimentally characterized (only computationally predicted), and 4) characterization of Aplysia genes that are ganglia-specific and differentially expressed over the time course of 30 minutes to 2 hours of 8-Br cAMP treatment Aplysia ganglia. This study advances our knowledge of the genetic repertoire of the components of a vital signaling cascade and important model for the cellular biology of memory storage. We have created a schematic diagram shown in Figure 5-1

(based on a diagram from Byrne and Hawkins51) that indicating the key genes we discovered (as well as what was previously known from the literature) to be affected by

8-Br cAMP-mediated activation of cAMP signaling pathway in Aplysia. We hope this updated model of Aplysia cAMP signaling will benefit future research in this field.

Future Directions

Next we will turn our focus to two future areas of inquiry. First, we will perform in- situ hybridization experiments to localize the expression of novel and differentially expressed genes from single neurons as well as 8-Br cAMP treated ganglia. This will allow us to validate our RNA-seq results and gi e us a “fine-grain” resolution of the local individual cellular expression of our transcripts of interest. We will also perform in-situ hybridization experiments using central nervous systems treated with 8-Br cAMP to verify our RNA-seq conclusions, as well as identify and clarify any expression paradoxes.

187

Our second field of further study will be to perform a similar experiment with whole ganglia as we did with 8-Br cAMP, but this time to use the drug 8-Br cGMP to modulate activation of the cellular cGMP pathway as we did to the cellular cAMP pathway. Activation of the cGMP pathway is analogous to aspects of behavioral classical conditioning, which has been demonstrated in A. californica. It would be insightful to examine the similarities and differences in gene expression between the ganglia when the cGMP pathway is activated.

Final Comments

Overall we have verified the use of sophisticated next-generation high throughput

RNA-sequencing to successfully quantify and advance both our knowledge of the transcriptomic profiles of individual neurons, as well to characterize the ganglionic plasticity exhibited when a key cellular signaling pathway is invoked. We validate previous findings as well as broaden the scope of novel genes not yet experimentally characterized and their potential roles in a chemical analog paradigm of behavioral sensitization.

188

Figure 5-1. Model of pre- and post-synaptic neurons indicating genes affected by activation of cAMP signaling pathway in Aplysia.

189

APPENDIX A LIST OF SUPPLEMENTARY TABLES

Primary Data Master Tables

Table M1. (Master Data Table 1) Deep Sequencing of Single Identified Neurons with Annotated TPMs. This table contains the TPM expression values and bioinformatic annotation data of all genes from the 23 single neurons on which we performed deep- coverage RNA-seq.

Table M2. (Master Data Table 2) Shallow Sequencing of Single Identified Neurons with Annotated TPMs. This table contains the TPM expression values and bioinformatic annotation data of all genes from the 96 single neurons on which we performed shallow- coverage RNA-seq.

Table M3. (Master Data Table 3) Individual Ganglia treated with 8-Br cAMP with Annotated TPMs. This table contains the TPM expression values and annotation data of all genes from the 118 ganglia on which we performed RNA-seq after plasticity testing (incubation in 8-Br cAMP or FSW for 0.5, 1, or 2 hours, or instant lysis (also see Chapter 2 - Methods).

Tables of Differentially Expressed Genes in Single Identified Neurons

These 16 tables accompany Table M1 (Master Data Table 1). The Gene IDs given in Table_3-S16_R2_vs_single_neurons_DE_gene_list.xlsx can be cross-referenced with the gene names and annotation information found in Table M1. Tables 3-S1-S15 (15 tables) contain statistics derived from DESeq2 analysis of differentially expressed genes in identified single neurons. The columns containing these statistics are: Column Header Description intermediate mean of normalized counts baseMean for all samples log2FoldChange log2 fold change (LFC) standard error estimate for the log2 fold lfcSE change estimate Wald statistic (the LFC divided by its stat standard error) p-value calculated prior to Benjamini- pvalue Hochberg false discovery rate (FDR) correction padj adjusted p-value after FDR correction Note that for the log2FoldChange, a positive value indicates the gene is more highly expressed in the cell listed first in the table name (for example - R2 in R2vL7). A negative value indicates the gene is more highly expressed in the cell listed second (in this case, L7).

Table 3-S1 R2vL7. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and L7 neurons.

190

Table 3-S2 R2vLPl1. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and LPl1 neurons.

Table 3-S3 R2vL11. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and L11 neurons.

Table 3-S4 R2vlMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and lMCC neurons.

Table 3-S5 R2vrMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between R2 and rMCC neurons.

Table 3-S6 L7vLPl1. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L7 and LPl1 neurons.

Table 3-S7 L7vL11. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L7 and L11 neurons.

Table 3-S8 L7vlMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L7 and lMCC neurons.

Table 3-S9 L7vrMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L7 and rMCC neurons.

Table 3-S10 LPl1vL11. This table contains statistics and annotations corresponding to the list of differentially expressed genes between LPl1 and L11 neurons.

Table 3-S11 LPl1vlMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between LPl1 and lMCC neurons.

Table 3-S12 LPl1vrMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between LPl1 and rMCC neurons.

Table 3-S13 L11vlMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L11 and lMCC neurons.

Table 3-S14 L11vrMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between L11 and rMCC neurons.

Table 3-S15 lMCCvrMCC. This table contains statistics and annotations corresponding to the list of differentially expressed genes between lMCC and rMCC neurons.

Table 3-S16 R2 vs single neurons DE gene list. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 3-5. These lists are the significantly differentially expressed (DE) genes commonly shared by the other single

191

neurons (L7, LPl1, L11, lMCC, and rMCC) when compared to R2. For example, the first sheet of the table lists the four genes (and their Entrez IDs) that are differentially expressed between R2 vs L7, R2 vs LPl1, R2 vs L11, R2 vs lMCC, and R2 vs rMCC. This corresponds to the common central area in the middle of Figure 3-5 that is overlapped by all cells (labelled 4 to indicate the four genes that are commonly differentially expressed).

Tables of Differentially Expressed Genes in Individual Ganglia of Aplysia following 8-Br cAMP treatments (for 0.5, 1, and 2 hours)

These 27 tables accompany the data found in Table M3 (Master Data Table 3). From Table M3, the gene names and annotation information can be found that corresponds with the GeneIDs and RefSeq IDs given in the three tables: Table 4-S7 Ganglia specific gene list.xlsx, Table 4-S26 AG cAMP DE gene lists.xlsx, and Table 4-S27 RPlG cAMP DE gene lists.xlsx. Tables 4-S1-S6 and S8-S23 (22 tables) contain statistics derived from DESeq2 analysis of differentially expressed genes in individual ganglia. The columns containing these statistics are: Column Header Description intermediate mean of normalized counts baseMean for all samples log2FoldChange log2 fold change (LFC) standard error estimate for the log2 fold lfcSE change estimate Wald statistic (the LFC divided by its stat standard error) p-value calculated prior to Benjamini- pvalue Hochberg false discovery rate (FDR) correction padj adjusted p-value after FDR correction Note that for the log2FoldChange, a positive value indicates the gene is more highly expressed in the ganglia treated with 8-Br cAMP compared to ganglia incubated in FSW. A negative value indicates the inverse, that the genes is more highly expressed in ganglia incubated in FSW compared to ganglia incubated in 8-Br cAMP.

Table 4-S1 cAMP 0.5hr AG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing abdominal ganglia incubated in 8-Br cAMP solution to those in FSW for 0.5 hours.

Table 4-S2 cAMP 1hr AG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing abdominal ganglia incubated in 8-Br cAMP solution to those in FSW for 1 hour.

Table 4-S3 cAMP 2hr AG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing abdominal ganglia incubated in 8-Br cAMP solution to those in FSW for 2 hours.

192

Table 4-S4 cAMP 0.5hr RPlG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pleural ganglia incubated in 8-Br cAMP solution to those in FSW for 0.5 hours.

Table 4-S5 cAMP 1hr RPlG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pleural ganglia incubated in 8-Br cAMP solution to those in FSW for 1 hour.

Table 4-S6 cAMP 2hr RPlG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pleural ganglia incubated in 8-Br cAMP solution to those in FSW for 2 hours.

Table 4-S7 Ganglia specific gene list. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 4-12. These lists are the genes that are unique to each ganglion. If more than one ganglion is listed on a sheet, they each express the listed genes. For these lists, expression was defined as any TPM value greater than 0.

Table 4-S8 cAMP 0.5hr BG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing buccal ganglia incubated in 8-Br cAMP solution to those in FSW for 0.5 hours.

Table 4-S9 cAMP 1hr BG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing buccal ganglia incubated in 8-Br cAMP solution to those in FSW for 1 hour.

Table 4-S10 cAMP 2hr BG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing buccal ganglia incubated in 8-Br cAMP solution to those in FSW for 2 hours.

Table 4-S11 cAMP 0.5hr CG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing cerebral ganglia incubated in 8-Br cAMP solution to those in FSW for 0.5 hours.

Table 4-S12 cAMP 1hr CG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing cerebral ganglia incubated in 8-Br cAMP solution to those in FSW for 1 hour.

Table 4-S13 cAMP 2hr CG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing cerebral ganglia incubated in 8-Br cAMP solution to those in FSW for 2 hours.

Table 4-S14 cAMP 0.5hr LPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 0.5 hours to those that were isolated from the animal and instantly placed in RNA lysis buffer.

193

Table 4-S15 cAMP 1hr LPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 1 hour to those that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S16 cAMP 2hr LPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 2 hours to those that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S17 cAMP 0.5hr RPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pedal ganglia incubated in 8-Br cAMP solution for 0.5 hours to those that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S18 cAMP 1hr RPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pedal ganglia incubated in 8-Br cAMP solution for 1 hour to those that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S19 cAMP 2hr RPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing right pedal ganglia incubated in 8-Br cAMP solution for 2 hours to those that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S20 LPeGvRPeG IL. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia that were isolated from the animal and instantly placed in RNA lysis buffer to right pedal ganglia that were isolated from the animal and instantly placed in RNA lysis buffer.

Table 4-S21 cAMP 0.5hr LPeGvRPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 0.5 hours to right pedal ganglia that were incubated in 8-Br cAMP solution for 0.5 hours.

Table 4-S22 cAMP 1hr LPeGvRPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 1 hour to right pedal ganglia that were incubated in 8-Br cAMP solution for 1 hour.

Table 4-S23 cAMP 2hr LPeGvRPeG. This table contains statistics and annotations corresponding to the list of differentially expressed genes when comparing left pedal ganglia incubated in 8-Br cAMP solution for 2 hours to right pedal ganglia that were incubated in 8-Br cAMP solution for 2 hours.

194

Table 4-S24 cAMP DE gene clusters AG. This table contains lists of genes and annotations separated by cluster (5 clusters for abdominal ganglia) that were detected as differentially expressed by maSigPro. Corresponds to genes plotted in Figure 4-15.

Table 4-S25 cAMP DE gene clusters RPlG. This table contains lists of genes and annotations separated by cluster (7 clusters for right pleural ganglia) that were detected as differentially expressed by maSigPro. Corresponds to genes plotted in Figure 4-16.

Table 4-S26 AG cAMP DE gene lists. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 4-8. These lists are the significantly differentially expressed (DE) genes from the 8-Br cAMP treated abdominal ganglia for the 0.5, 1, and 2-hour time points.

Table 4-S27 RPlG cAMP DE gene lists. This table contains the lists of genes corresponding to all the areas of the Venn diagram in Figure 4-9. These lists are the significantly differentially expressed (DE) genes from the 8-Br cAMP treated right pleural ganglia for the 0.5, 1, and 2-hour time points.

195

APPENDIX B METHOD SCHEMATIC

Figure B-1. Workflow for Molecular Biology and Bioinformatic Analysis: This schematic shows the molecular biology techniques and bioinformatic software programs used to determine differential gene expression.

196

APPENDIX C SUPPLEMENTAL FIGURES FOR SINGLE NEURON GENE ANALYSIS

Figure C-1. Unstacked bar charts corresponding to stacked bars of Figure 3-2. The highest amount of variability in the TPM expression categories exists in the least abundantly expressed genes. This is due to the random nature of detecting rare genes in a transcriptome.

197

Figure C-2. Absolute number of mRNA molecules estimated in single neurons using external RNA spike ins for calibration. Red circles indicate L11 neurons, blue circles indicate L7 neurons, and green circles indicate R2 neurons. The y- axis is the number of molecules plotted on a log10 billions scale. For example, the 9 indicates 1,000,000,000 (nine zeros) RNA molecules. The blue horizontal line indicates 0.85 billion molecules were detected in L73, the red line indicates 4.5 billion molecules were in L111, and the green line represents the 41 billion RNA molecules detected in R26.The black circles represent cells of other sizes isolated from the Aplysia abdominal ganglion.

198

Figure C-3. Histograms of log10 transformed absolute numbers of number of ion channel related transcripts for deeply sequenced neurons with RNA spike-ins. The eight colored bars represent neurons discussed in the text: L11 – red, L7 – blue, R2 – green. The other neurons cells isolated from the Aplysia abdominal ganglion.

199

Figure C-4. The percentages of the absolute number of ion channel transcripts present in eight of the 23 individual neurons that received external RNA spike-ins. About 0.07% of L7_3 molecules are annotated as ion channels. The chart shows that a higher proportion of the expressed cellular transcriptome of L7 neurons corresponds to ion channel- related molecules compared to L11 or R2 neurons.

200

Figure C-5. Global comparison the effects of several common data transformations on our scRNA-seq count data. The red line in A displays the variance trend for transcripts. The left portion of the line has distinct hump, indicating that the variance is greater for genes with low read counts compared to those with greater read counts. This means the variance is a function of the mean and the data are better described as heteroscedastic. To reduce the amount of heteroscedasticity, we employed DESeq2 to shrink the variance of genes with low read counts. This is done using the dispersion-mean trend obser ed for the entire dataset as a reference. D eq ’s vst function returns values that are both normalized for sequencing depth and display (in D) values adjusted to fit the experiment-wide trend of the variance-mean relationship.

201

Figure C-6. Modification of Figure 3-1. Saturation curves for single neurons L11_1, L11_2, L7_5, and R2_5 using a gene expression threshold of 10 transcripts per million (TPM) or greater. The first things to notice are the differences in the axes compared to Figure 3-1. These neurons only express about 4000-5000 transcripts at a threshold of > 10 TPMs. Additionally, the x-axis is reduced by a factor of 10 compared to Figure 3-1 (millions to hundreds of thousands of reads). This is because when we randomly sample only ~2% of the total number of reads, we already detect the transcripts present at a threshold of > 10 TPMs. The saturation curves rapidly level off and oscillate by a few hundred genes since we are taking small random samples of the total number of reads per single neuron.

202

Figure C-7. These are saturation curves for single neurons L11_1, L11_2, L7_5, and R2_5 using a gene expression threshold of 100 transcripts per million (TPM) or greater. The first things to notice are the differences in the scale of the axes compared to Figure 3-1. These neurons only express about 800-1250 transcripts at a threshold of > 100 TPMs. Additionally, the x-axis is reduced by a factor of 10 compared to Figure 3-1 (millions to hundreds of thousands of reads). This is because when we randomly sample only ~1-2% of the total number of reads, we are already detecting the transcripts present at a threshold of > 100 TPMs. The saturation curves level off and oscillate by a few dozen genes since we are taking small random samples of the total number of reads per single neuron. The reason the curves decrease after the 1% random read sample is that because when we double and triple (to 2%, 3%, and so forth) the amount of randomly sampled reads, the TPMs of the genes detected are decreasing slightly, but this is sufficient (along with random sampling chance) to slightly decrease the number of genes exceeding the > 100 TPM expression threshold.

203

LIST OF REFERENCES

1 Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289-300 (1995).

2 Moroz, L. L. NeuroSystematics and Periodic System of Neurons: Model vs Reference Species at Single-Cell Resolution. ACS Chem Neurosci 9, 1884-1903 (2018).

3 Hodgkin, A. L. & Huxley, A. F. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. The Journal of Physiology 116, 449-472 (1952).

4 Katz, P. S. & Quinlan, P. D. The importance of identified neurons in gastropod molluscs to neuroscience. Current Opinion in Neurobiology 56, 1-7 (2019).

5 Gerschenfeld, H. M. Observations on the ultrastructure of synapses in some pulmonate molluscs. Zeitschrift für Zellforschung und Mikroskopische Anatomie 60, 258-275 (1963).

6 Sakharov, D. A., Bobovyagin, V. L. & Zs.-Nagy, I. Light, fluorescence and electron microscopic studies on “neurosecretion” in tritonia diomedia bergh (Mollusca, Nudibranchia). Zeitschrift für Zellforschung und Mikroskopische Anatomie 68, 660-673 (1965).

7 Kerkut, G. A. & Cottrell, G. A. Acetylcholine and 5-hydroxytryptamine in the snail brain. Comparative Biochemistry and Physiology 8, 53-63 (1963).

8 Kandel, E. R. & Tauc, L. Prolonged increase in the efficiency of an efferent pathway of an isolated ganglion after the coupled activation of a more effective tract. J. Physiol. (Paris) 55, 271 (1963).

9 Willows, A. O. D. Behavioral Acts Elicited by Stimulation of Single, Identifiable Brain Cells. Science 157, 570 (1967).

10 Moroz, L. L. Aplysia. Curr Biol 21, R60-61 (2011).

11 Carew, T. J., Pinsker, H. M. & Kandel, E. R. Long-Term Habituation of a Defensive Withdrawal Reflex in Aplysia. Science 175, 451 (1972).

12 Pinsker, H. M., Hening, W. A., Carew, T. J. & Kandel, E. R. Long-Term Sensitization of a Defensive Withdrawal Reflex in Aplysia. Science 182, 1039 (1973).

13 Pinsker, H., Kupfermann, I., Castellucci, V. & Kandel, E. Habituation and Dishabituation of the GM-Withdrawal Reflex in Aplysia. Science 167, 1740 (1970).

204

14 Hawkins, R. D., Kandel, E. R. & Bailey, C. H. Molecular Mechanisms of Memory Storage in Aplysia. The Biological Bulletin 210, 174-191 (2006).

15 Kandel, E. R. The molecular biology of memory storage: a dialogue between genes and synapses. Science (New York, N.Y.) 294, 1030-1038 (2001).

16 Cleary, L. J., Byrne, J. H. & Frost, W. N. Role of interneurons in defensive withdrawal reflexes in Aplysia. Learning & Memory 2, 133-151 (1995).

17 Cowan, N. in Progress in Brain Research Vol. 169 (eds Wayne S. Sossin, Jean- Claude Lacaille, Vincent F. Castellucci, & Sylvie Belleville) 323-338 (Elsevier, 2008).

18 Chen, G., Ning, B. & Shi, T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Frontiers in Genetics 10, 317 (2019).

19 Haque, A., Engel, J., Teichmann, S. A. & Lönnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine 9, 75 (2017).

20 Colonnier, M., Tremblay, J. P. & McLennan, H. Synaptic contacts on glial cells in the abdominal ganglion of Aplysia californica. Journal of Comparative Neurology 188, 391-400 (1979).

21 Lockhart, S. T., Levitan, I. B. & Pikielny, C. W. Ag, a novel protein secreted from Aplysia glia. J Neurobiol 29, 35-48 (1996).

22 Walters, E. T., Bodnarova, M., Billy, A. J., Dulin, M. F., Díaz-Ríos, M., Miller, M. W. & Moroz, L. L. Somatotopic organization and functional properties of mechanosensory neurons expressing sensorin-A mRNA in Aplysia californica. J Comp Neurol 471, 219-240 (2004).

23 Jezzini, S. H., Bodnarova, M. & Moroz, L. L. Two-color in-situ hybridization in the CNS of Aplysia californica. J Neurosci Methods 149, 15-25 (2005).

24 Moroz, L. L., Edwards, J. R., Puthanveettil, S. V., Kohn, A. B., Ha, T., Heyland, A., Knudsen, B., Sahni, A., Yu, F., Liu, L., Jezzini, S., Lovell, P., Iannucculli, W., Chen, M., Nguyen, T., Sheng, H., Shaw, R., Kalachikov, S., Panchin, Y. V., Farmerie, W., Russo, J. J., Ju, J. & Kandel, E. R. Neuronal transcriptome of Aplysia: neuronal compartments and circuitry. Cell 127, 1453-1467 (2006).

25 Antonov, I., Ha, T., Antonova, I., Moroz, L. L. & Hawkins, R. D. Role of Nitric Oxide in Classical Conditioning of Siphon Withdrawal in Aplysia. The Journal of Neuroscience 27, 10993 (2007).

26 Cleary, L. J., Lee, W. l. & Byrne, J. H. Cellular correlates of long-term sensitization in Aplysia. (1998).

205

27 Rosen, S. C., Weiss, K. R., Goldstein, R. S. & Kupfermann, I. The role of a modulatory neuron in feeding and satiation in Aplysia: effects of lesioning of the serotonergic metacerebral cells. The Journal of Neuroscience 9, 1562 (1989).

28 Eisenstadt, M., Goldman, J. E., Kandel, E. R., Koike, H., Koester, J. & Schwartz, J. H. Intrasomatic Injection of Radioactive Precursors for Studying Transmitter Synthesis in Identified Neurons of Aplysia californica. Proceedings of the National Academy of Sciences 70, 3371 (1973).

29 Weinreich, D., McCaman, M. W., McCaman, R. E. & Vaughn, J. E. Chemical, enzymatic and ultrastructural characterization of 5-hydroxytryptamine-containing neurons from the ganglia of Aplysia californica and Tritonia diomedia. Journal of Neurochemistry 20, 969-976 (1973).

30 Weiss, K. R. & Kupfermann, I. Homology of the giant serotonergic neurons (metacerebral cells) in Aplysia and pulmonate molluscs. Brain Research 117, 33- 49 (1976).

31 Gillette, R. & Davis, W. J. The role of the metacerebral giant neuron in the feeding behavior ofPleurobranchaea. Journal of comparative physiology 116, 129-159 (1977).

32 Rosen, S. C., Weiss, K. R., Cohen, J. L. & Kupfermann, I. Interganglionic cerebral-buccal mechanoafferents of Aplysia: receptive fields and synaptic connections to different classes of neurons involved in feeding behavior. Journal of Neurophysiology 48, 271-288 (1982).

33 Walters, E. T., Byrne, J. H.,Carew, T. J., Kandel, E. R. Mechanoafferent neurons innervating tail of Aplysia. I. Response properties and synaptic connections. Journal of Neurophysiology 50, 1543-1559 (1983).

34 Carew, T. J., Walters, E. T. & Kandel, E. R. Associative learning in Aplysia: cellular correlates supporting a conditioned fear hypothesis. Science 211, 501 (1981).

35 Hening, W. A., Walters, E. T., Carew, T. J. & Kandel, E. R. Motorneuronal control of locomotion in Aplysia. Brain Research 179, 231-253 (1979).

36 Carew, T. J., Hawkins, R. D. & Kandel, E. R. Differential classical conditioning of a defensive withdrawal reflex in Aplysia californica. Science 219, 397 (1983).

37 Carew, T. J., Walters, E. T. & Kandel, E. R. Classical conditioning in a simple withdrawal reflex in Aplysia californica. The Journal of Neuroscience 1, 1426 (1981).

206

38 Frazier, W. T., Kandel, E. R., Kupfermann, I., Waziri, R. & Coggeshall, R. E. Morphological and functional properties of identified neurons in the abdominal ganglion of Aplysia californica. Journal of Neurophysiology 30, 1288-1351 (1967).

39 Kandel, E. R. & Schwartz, J. H. Molecular biology of learning: modulation of transmitter release. Science 218, 433 (1982).

40 Frost, W. N., Castellucci, V. F., Hawkins, R. D. & Kandel, E. R. Monosynaptic connections made by the sensory neurons of the gill- and siphon-withdrawal reflex in Aplysia participate in the storage of long-term memory for sensitization. Proceedings of the National Academy of Sciences 82, 8266 (1985).

41 Gillette, R. On the Significance of Neuronal Giantism in Gastropods. The Biological Bulletin 180, 234-240 (1991).

42 Moroz, L. & Kohn, A. Do different neurons age differently? Direct genome-wide analysis of aging in single identified cholinergic neurons. Frontiers in Aging Neuroscience 2, 6 (2010).

43 S.G., R., Ambron, R. T. & Babiarz, J. Identified cholinergic neurons R2 and LPl1 control mucus release in Aplysia. (1983).

44 Tauc, L. Site of Origin and Propagation of Spike in the Giant Neuron of Aplysia. The Journal of General Physiology 45, 1077 (1962).

45 Schwartz, J. H. Axonal Transport: Components, Mechanisms, and Specificity. Annual Review of Neuroscience 2, 467-504 (1979).

46 Geduldig, D. & Junge, D. Sodium and calcium components of action potentials in Aplysia giant neurone. The Journal of Physiology 199, 347-365 (1968).

47 Horn, R. Propagating calcium spikes in an axon of Aplysia. The Journal of Physiology 281, 513-534 (1978).

48 Giller, E. & Schwartz, J. H. Choline acetyltransferase in identified neurons of abdominal ganglion of Aplysia californica. Journal of Neurophysiology 34, 93-107 (1971).

49 Koike, H., Eisenstadt, M. & Schwartz, J. H. Axonal transport of newly synthesized acetylcholine in an identified neuron of Aplysia. Brain Research 37, 152-159 (1972).

50 Kandel, E. R. & Tauc, L. Mechanism of heterosynaptic facilitation in the giant cell of the abdominal ganglion of Aplysia depilans. The Journal of Physiology 181, 28-47 (1965).

207

51 Byrne, J. H. & Hawkins, R. D. Nonassociative Learning in Invertebrates. Cold Spring Harbor Perspectives in Biology 7 (2015).

52 Brunelli, M., Castellucci, V. & Kandel, E. R. Synaptic facilitation and behavioral sensitization in Aplysia: possible role of serotonin and cyclic AMP. Science 194, 1178 (1976).

53 Walters, E. T., Byrne, J. H., Carew, T. J. & Kandel, E. R. Mechanoafferent neurons innervating tail of Aplysia. II. Modulation by sensitizing stimulation. Journal of Neurophysiology 50, 1543-1559 (1983).

54 Glanzman, D. L., Mackey, S. L., Hawkins, R. D., Dyke, A. M., Lloyd, P. E. & Kandel, E. R. Depletion of serotonin in the nervous system of Aplysia reduces the behavioral enhancement of gill withdrawal as well as the heterosynaptic facilitation produced by tail shock. The Journal of Neuroscience 9, 4200 (1989).

55 Byrne, J. H. & Kandel, E. R. Presynaptic facilitation revisited: state and time dependence. The Journal of Neuroscience 16, 425 (1996).

56 Jiang, L., Schlesinger, F., Davis, C. A., Zhang, Y., Li, R., Salit, M., Gingeras, T. R. & Oliver, B. Synthetic spike-in standards for RNA-seq experiments. Genome Res 21, 1543-1551 (2011).

57 Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120 (2014).

58 Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. & Gingeras, T. R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013).

59 Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41, e108 (2013).

60 Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 (2010).

61 Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. & Pachter, L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515 (2010).

62 Wagner, G. P., Kin, K. & Lynch, V. J. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131, 281-285 (2012).

208

63 Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 5, 621-628 (2008).

64 Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550-550 (2014).

65 Huber, W., Carey, V. J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B. S., Bravo, H. C., Davis, S., Gatto, L., Girke, T., Gottardo, R., Hahne, F., Hansen, K. D., Irizarry, R. A., Lawrence, M., Love, M. I., MacDonald, J., Obenchain, V., Oles, A. K., Pages, H., Reyes, A., Shannon, P., Smyth, G. K., Tenenbaum, D., Waldron, L. & Morgan, M. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115-121 (2015).

66 R-Core_Team. R: a language and environment for statistical computing. (2014).

67 Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, 2009).

68 Schurch, N. J., Schofield, P., Gierlinski, M., Cole, C., Sherstnev, A., Singh, V., Wrobel, N., Gharbi, K., Simpson, G. G., Owen-Hughes, T., Blaxter, M. & Barton, G. J. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? Rna 22, 839-851 (2016).

69 Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W. & Robinson, M. D. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 8, 1765-1786 (2013).

70 Lun, A. T. L., Calero-Nieto, F. J., Haim-Vilmovsky, L., Gottgens, B. & Marioni, J. C. Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data. Genome Res 27, 1795-1806 (2017).

71 AlJanahi, A. A., Danielsen, M. & Dunbar, C. E. An Introduction to the Analysis of Single-Cell RNA-Sequencing Data. Mol Ther Methods Clin Dev 10, 189-196 (2018).

72 Athanasiadou, R., Neymotin, B., Brandt, N., Wang, W., Christiaen, L., Gresham, D. & Tranchina, D. A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory. PLoS Comput Biol 15, e1006794 (2019).

73 Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16, 133-145 (2015).

74 Schiffman, C., Lin, C., Shi, F., Chen, L., Sohn, L. & Huang, H. SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data. Stat Biosci 9, 200-216 (2017).

209

75 Bacher, R., Chu, L.-F., Leng, N., Gasch, A. P., Thomson, J. A., Stewart, R. M., Newton, M. & Kendziorski, C. SCnorm: robust normalization of single-cell RNA- seq data. Nature methods 14, 584-586 (2017).

76 Bacher, R. & Kendziorski, C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol 17, 63 (2016).

77 Ellenbroek, B. & Youn, J. Rodent models in neuroscience research: is it a rat race? Dis Model Mech 9, 1079-1087 (2016).

78 Usoskin, D., Furlan, A., Islam, S., Abdo, H., Lonnerberg, P., Lou, D., Hjerling- Leffler, J., Haeggstrom, J., Kharchenko, O., Kharchenko, P. V., Linnarsson, S. & Ernfors, P. Unbiased classification of sensory neuron types by large-scale single- cell RNA sequencing. Nat Neurosci 18, 145-153 (2015).

79 Salzberg, S. L. Open questions: How many genes do we have? BMC Biol 16, 94 (2018).

80 Zeisel, A., Munoz-Manchado, A. B., Codeluppi, S., Lonnerberg, P., La Manno, G., Jureus, A., Marques, S., Munguba, H., He, L., Betsholtz, C., Rolny, C., Castelo-Branco, G., Hjerling-Leffler, J. & Linnarsson, S. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science (New York, N.Y.) 347, 1138-1142 (2015).

81 Tasic, B., Yao, Z., Graybuck, L. T., Smith, K. A., Nguyen, T. N., Bertagnolli, D., Goldy, J., Garren, E., Economo, M. N., Viswanathan, S., Penn, O., Bakken, T., Menon, V., Miller, J., Fong, O., Hirokawa, K. E., Lathia, K., Rimorin, C., Tieu, M., Larsen, R., Casper, T., Barkan, E., Kroll, M., Parry, S., Shapovalova, N. V., Hirschstein, D., Pendergraft, J., Sullivan, H. A., Kim, T. K., Szafer, A., Dee, N., Groblewski, P., Wickersham, I., Cetin, A., Harris, J. A., Levi, B. P., Sunkin, S. M., Madisen, L., Daigle, T. L., Looger, L., Bernard, A., Phillips, J., Lein, E., Hawrylycz, M., Svoboda, K., Jones, A. R., Koch, C. & Zeng, H. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72-78 (2018).

82 Zhong, S., Zhang, S., Fan, X., Wu, Q., Yan, L., Dong, J., Zhang, H., Li, L., Sun, L., Pan, N., Xu, X., Tang, F., Zhang, J., Qiao, J. & Wang, X. A single-cell RNA- seq survey of the developmental landscape of the human prefrontal cortex. Nature 555, 524-528 (2018).

83 Koroshetz, W., Gordon, J., Adams, A., Beckel-Mitchener, A., Churchill, J., Farber, G., Freund, M., Gnadt, J., Hsu, N. S., Langhals, N., Lisanby, S., Liu, G., Peng, G. C. Y., Ramos, K., Steinmetz, M., Talley, E. & White, S. The State of the NIH BRAIN Initiative. J Neurosci 38, 6427-6438 (2018).

210

84 Ecker, J. R., Geschwind, D. H., Kriegstein, A. R., Ngai, J., Osten, P., Polioudakis, D., Regev, A., Sestan, N., Wickersham, I. R. & Zeng, H. The BRAIN Initiative Cell Census Consortium: Lessons Learned toward Generating a Comprehensive Brain Cell Atlas. Neuron 96, 542-557 (2017).

85 Croll, R. P. in Nervous Systems in Invertebrates (ed M. A. Ali) 41-59 (Springer US, 1987).

86 Tauc, L. [Studies on elementary activity of cells of the abdominal ganglion of Aplysia]. J. Physiol. (Paris) 47, 769-792 (1955).

87 Kupfermann, I. Feeding behavior in Aplysia: a simple system for the study of motivation. Behav Biol 10, 1-26 (1974).

88 Lasek, R. J. & Dower, W. J. Aplysia californica: Analysis of Nuclear DNA in Individual Nuclei of Giant Neurons. Science 172, 278 (1971).

89 Hinegardner, R. Cellular DNA content of the Mollusca. Comparative Biochemistry and Physiology Part A: Physiology 47, 447-460 (1974).

211

90 Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J. P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange- Thomann, Y., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J. C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R. H., Wilson, R. K., Hillier, L. W., McPherson, J. D., Marra, M. A., Mardis, E. R., Fulton, L. A., Chinwalla, A. T., Pepin, K. H., Gish, W. R., Chissoe, S. L., Wendl, M. C., Delehaunty, K. D., Miner, T. L., Delehaunty, A., Kramer, J. B., Cook, L. L., Fulton, R. S., Johnson, D. L., Minx, P. J., Clifton, S. W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J. F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R. A., Muzny, D. M., Scherer, S. E., Bouck, J. B., Sodergren, E. J., Worley, K. C., Rives, C. M., Gorrell, J. H., Metzker, M. L., Naylor, S. L., Kucherlapati, R. S., Nelson, D. L., Weinstock, G. M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D. R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H. M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R. W., Federspiel, N. A., Abola, A. P., Proctor, M. J., Myers, R. M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D. R., Olson, M. V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G. A., Athanasiou, M., Schultz, R., Roe, B. A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W. R., de la Bastide, M., Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J. A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D. G., Burge, C. B., Cerutti, L., Chen, H. C., Church, D., Clamp, M., Copley, R. R., Doerks, T., Eddy, S. R., Eichler, E. E., Furey, T. S., Galagan, J., Gilbert, J. G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L. S., Jones, T. A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W. J., Kitts, P., Koonin, E. V., Korf, I., Kulp, D., Lancet, D., Lowe, T. M., McLysaght, A., Mikkelsen, T., Moran, J. V., Mulder, N., Pollara, V. J., Ponting, C. P., Schuler, G., Schultz, J., Slater, G., Smit, A. F., Stupka, E., Szustakowki, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y. I., Wolfe, K. H., Yang, S. P., Yeh, R. F., Collins, F., Guyer, M. S., Peterson, J., Felsenfeld, A., Wetterstrand, K. A., Patrinos, A., Morgan, M. J., de Jong, P., Catanese, J. J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y. J., Szustakowki, J. & International Human Genome Sequencing, C. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001).

212

91 Desler, C., Durhuus, J. A. & Rasmussen, L. J. in Functional Genomics: Methods and Protocols (eds Michael Kaufmann & Claudia Klinger) 25-38 (Springer New York, 2012).

92 Konc, J., Hodošček, ., Ogrizek, ., Trykowska Konc, J. & Janežič, D. Structure-Based Function Prediction of Uncharacterized Protein Using Binding Sites Comparison. PLoS Comput Biol 9, e1003341 (2013).

93 Ijaq, J., Chandrasekharan, M., Poddar, R., Bethi, N. & Sundararajan, V. S. Annotation and curation of uncharacterized proteins- challenges. Frontiers in Genetics 6, 119 (2015).

94 Louie, B., Tarczy-Hornoch, P., Higdon, R. & Kolker, E. Validating annotations for uncharacterized proteins in Shewanella oneidensis. OMICS 12, 211-215 (2008).

95 Bork, P. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res 10, 398-400 (2000).

96 Blair, J. D., Hockemeyer, D., Doudna, J. A., Bateup, H. S. & Floor, S. N. Widespread Translational Remodeling during Human Neuronal Differentiation. Cell Rep 21, 2005-2016 (2017).

97 Ramsköld, D., Wang, E. T., Burge, C. B. & Sandberg, R. An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data. PLoS Comput Biol 5, e1000598 (2009).

98 Love, M., Anders, S., Kim, V. & Huber, W. RNA-Seq workflow: gene-level exploratory analysis and differential expression [version 2; peer review: 2 approved]. F1000Research 4 (2016).

99 Akalal, D.-B. G., Bottenstein, J. E., Lee, S.-H., Han, J.-H., Chang, D.-J., Kaang, B.-K. & Nagle, G. T. Aplysia mollusk-derived growth factor is a mitogen with adenosine deaminase activity and is expressed in the developing central nervous system. Molecular Brain Research 117, 228-236 (2003).

100 Akalal, D.-B. G. & Nagle, G. T. Mollusk-derived growth factor: cloning and developmental expression in the central nervous system and reproductive tract of Aplysia. Molecular Brain Research 91, 163-168 (2001).

101 Ferré-D'Amaré, A. R., Prendergast, G. C., Ziff, E. B. & Burley, S. K. Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain. Nature 363, 38- 45 (1993).

102 Kadesch, T. Consequences of heteromeric interactions among helix-loop-helix proteins. Cell Growth Differentiation 4, 49-55 (1993).

213

103 Murre, C., McCaw, P. S., Vaessin, H., Caudy, M., Jan, L. Y., Jan, Y. N., Cabrera, C. V., Buskin, J. N., Hauschka, S. D., Lassar, A. B., Weintraub, H. & Baltimore, D. Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence. Cell 58, 537-544 (1989).

104 Jones, S. An overview of the basic helix-loop-helix proteins. Genome Biol 5, 226- 226 (2004).

105 Landschulz, W., Johnson, P. & McKnight, S. The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240, 1759- 1764 (1988).

106 Pabo, C. O., Peisach, E. & Grant, R. A. Design and Selection of Novel Cys2His2 Zinc Finger Proteins. Annu Rev Biochem 70, 313-340 (2001).

107 Boncinelli, E. Homeobox genes and disease. Curr Opin Genet Dev 7, 331-337 (1997).

108 Holland, P. W. H., Booth, H. A. F. & Bruford, E. A. Classification and nomenclature of all human homeobox genes. BMC Biol 5, 47-47 (2007).

109 Mallo, M. & Alonso, C. R. The regulation of Hox gene expression during animal development. Development (Cambridge, England) 140, 3951 (2013).

110 Kokic, G., Chernev, A., Tegunov, D., Dienemann, C., Urlaub, H. & Cramer, P. Structural basis of TFIIH activation for nucleotide excision repair. Nat Commun 10, 2885 (2019).

111 Ceballos, S. J. & Heyer, W.-D. Functions of the Snf2/Swi2 family Rad54 motor protein in homologous recombination. Biochim Biophys Acta 1809, 509-523 (2011).

112 Hauk, G. & Berger, J. M. The role of ATP-dependent machines in regulating genome topology. Curr Opin Struct Biol 36, 85-96 (2016).

113 Jang, C.-W., Shibata, Y., Starmer, J., Yee, D. & Magnuson, T. Histone H3.3 maintains genome integrity during mammalian development. Genes & development 29, 1377-1392 (2015).

114 Dennis, G., Jr., Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C. & Lempicki, R. A. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4, P3 (2003).

115 Aloyz, R. S. & DesGroseillers, L. Processing of the L5-67 precursor peptide and characterization of LUQIN in the LUQ neurons of Aplysia californica. Peptides 16, 331-338 (1995).

214

116 Li, L., Moroz, T. P., Garden, R. W., Floyd, P. D., Weiss, K. R. & Sweedler, J. V. Mass spectrometric survey of interganglionically transported peptides in Aplysia. Peptides 19, 1425-1433 (1998).

117 Cropper, E. C., Tenenbaum, R., Kolks, M. A., Kupfermann, I. & Weiss, K. R. Myomodulin: a bioactive neuropeptide present in an identified cholinergic buccal motor neuron of Aplysia. Proceedings of the National Academy of Sciences 84, 5483 (1987).

118 Schoofs, L. & Beets, I. Neuropeptides control life-phase transitions. Proceedings of the National Academy of Sciences 110, 7973 (2013).

119 Lee, J.-W. & Helmann, J. D. Functional specialization within the Fur family of metalloregulators. Biometals 20, 485-499 (2007).

120 Chun, J. Y., Korner, J., Kreiner, T., Scheller, R. H. & Axel, R. The function and differential sorting of a family of Aplysia prohormone processing enzymes. Neuron 12, 831-844 (1994).

121 Adamson, K. J., Wang, T., Zhao, M., Bell, F., Kuballa, A. V., Storey, K. B. & Cummins, S. F. Molecular insights into land snail neuropeptides through transcriptome and comparative gene analysis. BMC Genomics (2015).

122 De Oliveira, A. L., Calcino, A. & Wanninger, A. Extensive conservation of the proneuropeptide and peptide prohormone complement in mollusks. Scientific Reports 9, 4846 (2019).

123 Robinson, S. D., Li, Q., Bandyopadhyay, P. K., Gajewiak, J., Yandell, M., Papenfuss, A. T., Purcell, A. W., Norton, R. S. & Safavi-Hemami, H. Hormone- like peptides in the venoms of marine cone snails. General and Comparative Endocrinology 244, 11-18 (2017).

124 Wang, T., Zhao, M., Liang, D., Bose, U., Kaur, S., McManus, D. P. & Cummins, S. F. Changes in the neuropeptide content of Biomphalaria ganglia nervous system following Schistosoma infection. Parasit Vectors 10, 275 (2017).

125 Benjamin, P. R. & Kemenes, I. Lymnaea neuropeptide genes. Scholarpedia 8, 11520 (2013).

126 Ahn, S.-J., Martin, R., Rao, S. & Choi, M.-Y. Neuropeptides predicted from the transcriptome analysis of the gray garden slug Deroceras reticulatum. Peptides 93, 51-65 (2017).

127 Robinson, S. D., Li, Q., Bandyopadhyay, P. K., Gajewiak, J., Yandell, M., Papenfuss, A. T., Purcell, A. W., Norton, R. S. & Safavi-Hemami, H. Hormone- like peptides in the venoms of marine cone snails. Gen Comp Endocrinol (2107).

215

128 Stewart, M. J., Favrel, P., Rotgans, B. A., Wang, T., Zhao, M., Sohail, M., O'Connor, W. A., Elizur, A., Henry, J. & Cummins, S. F. Neuropeptides encoded by the genomes of the Akoya pearl oyster Pinctata fucata and Pacific oyster Crassostrea gigas: a bioinformatic and peptidomic survey. BMC Genomics 15, 840 (2014).

129 Zhang, M., Wang, Y., Li, Y., Li, W., Li, R., Xie, X., Wang, S., Hu, X., Zhang, L. & Bao, Z. Identification and Characterization of Neuropeptides by Transcriptome and Proteome Analyses in a Bivalve Mollusc Patinopecten yessoensis. Frontiers in genetics 9, 197-197 (2018).

130 Zatylny-Gaudin, C., Cornet, V., Leduc, A., Zanuttini, B., Corre, E., Le Corguillé, G., Bernay, B., Garderes, J., Kraut, A., Couté, Y. & Henry, J. Neuropeptidome of the Cephalopod Sepia officinalis: Identification, Tissue Mapping, and Expression Pattern of Neuropeptides and Neurohormones during Egg Laying. Journal of Proteome Research 15, 48-67 (2016).

131 Price, D. A. & Greenberg, M. J. Structure of a molluscan cardioexcitatory neuropeptide. Science 197, 670 (1977).

132 Price, D. A. & Greenberg, M. J. The Hunting of the FaRPs: The Distribution of FMRFamide-Related Peptides. Biological Bulletin 177, 198-205 (1989).

133 Veenstra, J. A. Neurohormones and neuropeptides encoded by the genome of Lottia gigantea, with reference to other mollusks and insects. General and Comparative Endocrinology 167, 86-103 (2010).

134 Cropper, E. C., Miller, M. W., Tenenbaum, R., Kolks, M. A., Kupfermann, I. & Weiss, K. R. Structure and action of buccalin: a modulatory neuropeptide localized to an identified small cardioactive peptide-containing cholinergic motor neuron of Aplysia californica. Proceedings of the National Academy of Sciences of the United States of America 85, 6177-6181 (1988).

135 Johnson, J. I., Kavanaugh, S. I., Nguyen, C. & Tsai, P.-S. Localization and functional characterization of a novel adipokinetic hormone in the mollusk, Aplysia californica. PLoS ONE 9, e106014 (2014).

136 Sweedler, J. V., Li, L., Rubakhin, S. S., Alexeeva, V., Dembrow, N. C., Dowling, O., Jing, J., Weiss, K. R. & Vilim, F. S. Identification and Characterization of the Feeding Circuit-Activating Peptides, a Novel Neuropeptide Family ofAplysia. The Journal of Neuroscience 22, 7797-7808 (2002).

137 Furukawa, Y., Nakamaru, K., Wakayama, H., Fujisawa, Y., Minakata, H., Ohta, S., Morishita, F., Matsushima, O., Li, L., Romanova, E., Sweedler, J. V., Park, J. H., Romero, A., Cropper, E. C., Dembrow, N. C., Jing, J., Weiss, K. R. & Vilim, F. S. The enterins: a novel family of neuropeptides isolated from the enteric nervous system and CNS of Aplysia. The Journal of Neuroscience 21, 8247-8261 (2001).

216

138 Zhang, G., Vilim, F. S., Liu, D.-D., Romanova, E. V., Yu, K., Yuan, W.-D., Xiao, H., Hummon, A. B., Chen, T.-T., Alexeeva, V., Yin, S.-Y., Chen, S.-A., Cropper, E. C., Sweedler, J. V., Weiss, K. R. & Jing, J. Discovery of leucokinin-like neuropeptides that modulate a specific parameter of feeding motor programs in the molluscan model, Aplysia. J Biol Chem (2017).

139 Furukawa, Y., Nakamaru, K., Sasaki, K., Fujisawa, Y., Minakata, H., Ohta, S., Morishita, F., Matsushima, O., Li, L., Alexeeva, V., Ellis, T. A., Dembrow, N. C., Jing, J., Sweedler, J. V., Weiss, K. R. & Vilim, F. S. PRQFVamide, a Novel Pentapeptide Identified From the CNS and Gut of Aplysia. Journal of Neurophysiology 89, 3114-3127 (2003).

140 Sossin, W. S., Kirk, M. D. & Scheller, R. H. Peptidergic modulation of neuronal circuitry controlling feeding in Aplysia. The Journal of Neuroscience 7, 671-681 (1987).

141 Romanova, E. V., Sasaki, K., Alexeeva, V., Vilim, F. S., Jing, J., Richmond, T. A., Weiss, K. R. & Sweedler, J. V. Urotensin II in invertebrates: from structure to function in Aplysia californica. PLoS ONE 7, e48764 (2012).

142 Alevizos, A., Weiss, K. R. & Koester, J. SCP-containing R20 neurons modulate respiratory pumping in Aplysia. The Journal of Neuroscience 9, 3058 (1989).

143 Sasaki, K., Fujisawa, Y., Morishita, F., Matsushima, O. & Furukawa, Y. The enterins inhibit the contractile activity of the anterior aorta of Aplysia kurodai. J Exp Biol 205, 3525-3533 (2002).

144 Scheller, R. H., Jackson, J. F., McAllister, L. B., Schwartz, J. H., Kandel, E. R. & Axel, R. A family of genes that codes for ELH, a neuropeptide eliciting a stereotyped pattern of behavior in Aplysia. Cell 28, 707-719 (1982).

145 Eliassen, J. C., Rajpara, S. M. & Mayeri, E. Isolation and partial characterization of neuropeptides that mimic prolonged inhibition produced by bag cell neurons in Aplysia. J Neurobiol 22, 698-706 (1991).

146 Cummins, S. F., Nichols, A. E., Warso, C. J. & Nagle, G. T. Aplysia seductin is a water-borne protein pheromone that acts in concert with attractin to stimulate mate attraction. Peptides 26, 351-359 (2005).

147 Pearson, W. L. & Lloyd, P. E. Distribution and characterization of pedal peptide immunoreactivity in Aplysia. J Neurobiol 21, 883-892 (1990).

148 Hall, J. D. & Lloyd, P. E. Involvement of pedal peptide in locomotion in Aplysia: modulation of foot muscle contractions. J Neurobiol 21, 858-868 (1990).

217

149 Livnat, I., Tai, H.-C., Jansson, E. T., Bai, L., Romanova, E. V., Chen, T.-T., Yu, K., Chen, S.-A., Zhang, Y., Wang, Z.-Y., Liu, D.-D., Weiss, K., Jing, J. & Sweedler, J. V. A d-Amino acid-Containing neuropeptide discovery funnel. Anal Chem 88, 11868-11876 (2016).

150 Yang, C.-Y., Yu, K., Wang, Y., Chen, S.-A., Liu, D.-D., Wang, Z.-Y., Su, Y.-N., Yang, S.-Z., Chen, T.-T., Livnat, I., Vilim, F. S., Cropper, E. C., Weiss, K. R., Sweedler, J. V. & Jing, J. Aplysia Locomotion: Network and Behavioral Actions of GdFFD, a D-Amino Acid-Containing Neuropeptide. PLoS ONE 11, e0147335 (2016).

151 Belkin, K. J. & Abrams, T. W. The effect of the neuropeptide FMRFamide on Aplysia californica siphon motoneurons involves multiple ionic currents that vary seasonally. J Exp Biol 201, 2225-2234 (1998).

152 Hu, J.-Y., Wu, F. & Schacher, S. Two signaling pathways regulate the expression and secretion of a neuropeptide required for long-term facilitation in Aplysia. The Journal of Neuroscience 26, 1026-1035 (2006).

153 Cai, D., Chen, S. & Glanzman, D. L. Postsynaptic regulation of long-term facilitation in Aplysia. Curr Biol 18, 920-925 (2008).

154 Wang, Z. Y. & Ragsdale, C. W. Multiple optic gland signaling pathways implicated in octopus maternal behaviors and death. J Exp Biol 221 (2018).

155 Satake, H. Invertebrate Neuropeptides and Hormones: Basic Knowledge and Recent Advances. (Transworld Research Network, 2006).

156 Martin, R. & Voigt, K. H. The neurosecretory system of the octopus vena cava: A neurohemal organ. Experientia 43, 537-543 (1987).

157 Ramon y Cajal, S. Histología. Consideraciones generales sobre la morfología de la célula nerviosa. . La Veterinaria Española 37, 257-291 (1894).

158 Ramon y Cajal, S. The Croonian lecture.—La fine structure des centres nerveux. Proceedings of the Royal Society of London 55, 444-468 (1894).

159 Konorski, J. Conditioned reflexes and neuron organization. (Cambridge University Press, 1948).

160 Hebb, D. O. The organization of behavior; a neuropsychological theory. (Wiley, 1949).

161 Kandel, E. R. & Spencer, W. A. Cellular neurophysiological approaches in the study of learning. Physiological Reviews 48, 65-134 (1968).

218

162 Thompson, R. F., McCormick, D. A., Lavond, D. G., Clark, G. A., Kettner, R. E. & Mauk, M. D. Initial localization of the memory trace for a basic form of associative learning. Prog Psychobiol Physiol Psychol 10, 167-196 (1983).

163 Spencer, W. A., Thompson, R. F. & Neilson, D. R. Decrement of ventral root electrotonus and intracellularly recorded PSPs produced by iterated cutaneous afferent volleys. Journal of Neurophysiology 29, 253-274 (1966).

164 Kupfermann, I. & Kandel, E. R. Neuronal Controls of a Behavioral Response Mediated by the Abdominal Ganglion of Aplysia. Science 164, 847 (1969).

165 Krasne, F. B. Excitation and Habituation of the Crayfish Escape Reflex: The Depolarizing Response in Lateral Giant Fibres of the Isolated Abdomen. Journal of Experimental Biology 50, 29 (1969).

166 Dudai, Y., Jan, Y. N., Byers, D., Quinn, W. G. & Benzer, S. dunce, a mutant of Drosophila deficient in learning. Proceedings of the National Academy of Sciences of the United States of America 73, 1684-1688 (1976).

167 Alkon, D. L. Associative Training of Hermissenda. The Journal of General Physiology 64, 70 (1974).

168 Gelperin, A. Rapid food-aversion learning by a terrestrial mollusk. Science 189, 567 (1975).

169 Menzel, R. & Erber, J. 102-110 (Scientific American, Inc., 1978).

170 Kandel, E. R. The Biology of Memory: A Forty-Year Perspective. The Journal of Neuroscience 29, 12748 (2009).

171 Benjamin, P. R., Kemenes, G. & Staras, K. Molluscan Nervous Systems. eLS (2005).

172 Benjamin, P. R. & Kemenes, G. in Learning and Memory: A Comprehensive Reference (Second Edition) (ed John H. Byrne) 427-440 (Elsevier, 2017).

173 Sutherland, E. W. & Rall, T. W. Fractionation and Characterization of a Cyclic Adenine Ribonucleotide formed by Tissue Particles. Journal of Biological Chemistry 232, 1077-1092 (1958).

174 Sutherland, E. W. Studies on the Mechanism of Hormone Action. Science 177, 401 (1972).

175 Skalhegg, B. S. & Tasken, K. Specificity in the cAMP/PKA signaling pathway. Differential expression,regulation, and subcellular localization of subunits of PKA. Frontiers in Bioscience 5, 678-693 (2000).

219

176 Tanenhaus, A., Zhang, J. & Yin, J. C. P. in Novel Mechanisms of Memory (eds Karl Peter Giese & Kasia Radwanska) 119-140 (Springer International Publishing, 2016).

177 Illich, P. A. & Walters, E. T. Mechanosensory neurons innervating Aplysia siphon encode noxious stimuli and display nociceptive sensitization. (1997).

178 Mackey, S. L., Glanzman, D. L., Small, S. A., Dyke, A. M., Kandel, E. R. & Hawkins, R. D. Tail shock produces inhibition as well as sensitization of the siphon-withdrawal reflex of Aplysia: possible behavioral role for presynaptic inhibition mediated by the peptide Phe-Met-Arg-Phe-NH2. Proceedings of the National Academy of Sciences of the United States of America 84, 8730-8734 (1987).

179 Buonomano, D. V., Cleary, L. J. & Byrne, J. H. Inhibitory neuron produces heterosynaptic inhibition of the sensory-to-motor neuron synapse inAplysia. Brain Research 577, 147-150 (1992).

180 S.H., T. & Byrne, J. H. Motor controls of opaline secretion in Aplysia californica. (1980).

181 Fredman, S. M. & Jahan-Parwar, B. Intra- and interganglionic synaptic connections in the CNS of Aplysia. Brain Research Bulletin 4, 393-406 (1979).

182 Herdegen, S., Holmes, G., Cyriac, A., Calin-Jageman, I. E. & Calin-Jageman, R. J. Characterization of the rapid transcriptional response to long-term sensitization training in Aplysia californica. Neurobiology of Learning and Memory 116, 27-35 (2014).

183 P.G, M., P., G., Castellucci, V. F., Morgan, J., Kandel, E. R. & Schacher, S. A critical period for macromolecular synthesis in long-term heterosynaptic facilitation in Aplysia. (1986).

184 Yap, E. L. & Greenberg, M. E. Activity-Regulated Transcription: Bridging the Gap between Neural Activity and Behavior. Neuron 100, 330-348 (2018).

185 Bartsch, D., Ghirardi, M., Skehel, P. A., Karl, K. A., Herder, S. P., Chen, M., Bailey, C. H. & Kandel, E. R. Aplysia CREB2 represses long-term facilitation: relief of repression converts transient facilitation into long-term functional and structural change. Cell 83, 979-992 (1995).

186 Bonnick, K., Bayas, K., Belchenko, D., Cyriac, A., Dove, M., Lass, J., McBride, B., Calin-Jageman, I. E. & Calin-Jageman, R. J. Transcriptional Changes following Long-Term Sensitization Training and In Vivo Serotonin Exposure in Aplysia californica PLoS ONE 7, e47378 (2012).

220

187 Mohamed, H. A., Yao, W., Fioravante, D., Smolen, P. D. & Byrne, J. H. cAMP- response Elements in Aplysia creb1, creb2, and Ap-uch Promoters: Implications for feedback loops modulating long term memory. Journal of Biological Chemistry 280, 27035-27043 (2005).

188 Liu, R.-Y., Shah, S., Cleary, L. J. & Byrne, J. H. Serotonin- and training-induced dynamic regulation of CREB2 in Aplysia. Learning & Memory 18, 245-249 (2011).

189 Alberini, C. M., Ghirardl, M., Metz, R. & Kandel, E. R. C/EBP is an immediate- early gene required for the consolidation of long-term facilitation in Aplysia. Cell 76, 1099-1114 (1994).

190 Glanzman, D. L. Common Mechanisms of Synaptic Plasticity in Vertebrates and Invertebrates. Curr Biol 20, R31-R36 (2010).

191 Liu, Q.-R., Hattar, S., Endo, S., MacPhee, K., Zhang, H., Cleary, L. J., Byrne, J. H. & Eskin, A. A Developmental Gene (Tolloid/BMP-1) Is Regulated in Aplysia Neurons by Treatments that Induce Long-Term Sensitization. The Journal of Neuroscience 17, 755 (1997).

192 Abraham, W. C., Dragunow, M. & Tate, W. P. The role of immediate early genes in the stabilization of long-term potentiation. Molecular Neurobiology 5, 297 (1991).

193 Dragunow, M. A role for immediate-early transcription factors in learning and memory. Behavior Genetics 26, 293-299 (1996).

194 Poirier, R., Cheval, H., Mailhes, C., Garel, S., Charnay, P., Davis, S. & Laroche, S. Distinct functions of Egr gene family members in cognitive processes. Frontiers in Neuroscience 2, 2 (2008).

195 Moorman, S., Mello, C. V. & Bolhuis, J. J. From songs to synapses: Molecular mechanisms of birdsong memory. BioEssays 33, 377-385 (2011).

196 Cyriac, A., Holmes, G., Lass, J., Belchenko, D., Calin-Jageman, R. J. & Calin- Jageman, I. E. An Aplysia Egr homolog is rapidly and persistently regulated by long-term sensitization training. Neurobiology of Learning and Memory 102, 43- 51 (2013).

197 Cheung, W. Y. Calmodulin: an overview. Federation proceedings (1982).

198 Zwartjes, R. E., West, H., Hattar, S., Ren, X., Noel, F., Nuñez-Regueiro, M., MacPhee, K., Homayouni, R., Crow, M. T., Byrne, J. H. & Eskin, A. Identification of specific mRNAs affected by treatments producing long-term facilitation in Aplysia. Learning & Memory 4, 478-495 (1998).

221

199 Hegde, A. N., Inokuchi, K., Pei, W., Casadio, A., Ghirardi, M., Chain, D. G., Martin, K. C., Kandel, E. R. & Schwartz, J. H. Ubiquitin C-Terminal Hydrolase Is an Immediate-Early Gene Essential for Long-Term Facilitation in Aplysia. Cell 89, 115-126 (1997).

200 Zhao, Y., Hegde, A. N. & Martin, K. C. The Ubiquitin Proteasome System Functions as an Inhibitory Constraint on Synaptic Strengthening. Curr Biol 13, 887-898 (2003).

201 Hegde, A. N., Goldberg, A. L. & Schwartz, J. H. Regulatory subunits of cAMP- dependent protein kinases are degraded after conjugation to ubiquitin: a molecular mechanism underlying long-term synaptic plasticity. Proceedings of the National Academy of Sciences 90, 7436 (1993).

202 Chain, D. G., Casadio, A., Schacher, S., Hegde, A. N., Valbrun, M., Yamamoto, N., Goldberg, A. L., Bartsch, D., Kandel, E. R. & Schwartz, J. H. Mechanisms for Generating the Autonomous cAMP-Dependent Protein Kinase Required for Long-Term Facilitation in Aplysia. Neuron 22, 147-156 (1999).

203 Möller, C., Melaun, C., Castillo, C., Díaz, M. E., Renzelman, C. M., Estrada, O., Kuch, U., Lokey, S. & Marí, F. Functional Hypervariability and Gene Diversity of Cardioactive Neuropeptides. Journal of Biological Chemistry 285, 40673-40680 (2010).

204 Scheller, R. H. & Kirk, M. D. Neuropeptides in identified Aplysia neurons: precursor structure, biosynthesis and physiological actions. Trends in Neurosciences 10, 46-52 (1987).

205 Mahon, A. C., Lloyd, P. E., Weiss, K. R., Kupfermann, I. & Scheller, R. H. The small cardioactive peptides A and B of Aplysia are derived from a common precursor molecule. Proceedings of the National Academy of Sciences 82, 3925 (1985).

206 Cummins, S. F., Nichols, A. E., Amare, A., Hummon, A. B., Sweedler, J. V. & Nagle, G. T. Characterization of Aplysia Enticin and Temptin, Two Novel Water- borne Protein Pheromones That Act in Concert with Attractin to Stimulate Mate Attraction. Journal of Biological Chemistry 279, 25614-25622 (2004).

207 Floyd, P. D., Li, L., Rubakhin, S. S., Sweedler, J. V., Horn, C. C., Kupfermann, I., Alexeeva, V. Y., Ellis, T. A., Dembrow, N. C., Weiss, K. R. & Vilim, F. S. Insulin Prohormone Processing, Distribution, and Relation to Metabolism in Aplysia californica. The Journal of Neuroscience 19, 7732 (1999).

208 Church, P. J. & Lloyd, P. E. Expression of diverse neuropeptide cotransmitters by identified motor neurons in Aplysia. The Journal of Neuroscience 11, 618-625 (1991).

222

209 Jing, J., Vilim, F. S., Horn, C. C., Alexeeva, V., Hatcher, N. G., Sasaki, K., Yashina, I., Zhurov, Y., Kupfermann, I., Sweedler, J. V. & Weiss, K. R. From Hunger to Satiety: Reconfiguration of a Feeding Network by Aplysia Neuropeptide Y. The Journal of Neuroscience 27, 3490 (2007).

210 McClard, C. K., Kochukov, M. Y., Herman, I., Liu, Z., Eblimit, A., Moayedi, Y., Ortiz-Guzman, J., Colchado, D., Pekarek, B., Panneerselvam, S., Mardon, G. & Arenkiel, B. R. POU6f1 Mediates Neuropeptide-Dependent Plasticity in the Adult Brain. The Journal of Neuroscience 38, 1443 (2018).

211 topGO: Enrichment Analysis for Gene Ontology (2019).

212 Bou-Abdallah, F. The iron redox and hydrolysis chemistry of the ferritins. Biochimica et Biophysica Acta (BBA) - General Subjects 1800, 719-731 (2010).

213 Codazzi, F., Pelizzoni, I., Zacchetti, D. & Grohovaz, F. Iron entry in neurons and astrocytes: a link with synaptic activity. Frontiers in Molecular Neuroscience 8, 18 (2015).

214 Munro, H. N. & Linder, M. C. Ferritin: structure, biosynthesis, and role in iron metabolism. Physiological Reviews 58, 317-396 (1978).

215 Zheng, J., Jiang, R., Chen, M., Maimaitiming, Z., Wang, J., Anderson, G. J., Vulpe, C. D., Dunaief, J. L. & Chen, H. Multi-Copper Ferroxidase–Deficient Mice Have Increased Brain Iron Concentrations and Learning and Memory Deficits. The Journal of Nutrition 148, 643-649 (2018).

216 Wang, J.-Y., Zhuang, Q.-Q., Zhu, L.-B., Zhu, H., Li, T., Li, R., Chen, S.-F., Huang, C.-P., Zhang, X. & Zhu, J.-H. Meta-analysis of brain iron levels of arkinson’s disease patients determined by postmortem and I measurements. Scientific Reports 6, 36669 (2016).

217 Anderson, G. J. Mechanisms of iron loading and toxicity. American Journal of Hematology 82, 1128-1131 (2007).

218 Castellani, R. J., Moreira, P. I., Liu, G., Dobson, J., Perry, G., Smith, M. A. & Zhu, X. Iron: The Redox-active Center of Oxidative Stress in Alzheimer Disease. Neurochemical Research 32, 1640-1645 (2007).

219 El Kadmiri, N., Slassi, I., El Moutawakil, B., Nadifi, S., Tadevosyan, A., Hachem, A. & Soukri, A. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and Alzheimer's disease. Pathologie Biologie 62, 333-336 (2014).

220 Mazzola, J. L. & Sirover, M. A. Reduction of glyceraldehyde-3-phosphate dehydrogenase activity in Alzheimer's disease and in Huntington's disease fibroblasts. Journal of Neurochemistry 76, 442-449 (2001).

223

221 Butterfield, D. A., Hardas, S. S. & Lange, M. L. B. Oxidatively Modified Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH) and Alzheimer's Disease: Many Pathways to Neurodegeneration. Journal of Alzheimer's Disease 20, 369-393 (2010).

222 Oishi, A., Makita, N., Sato, J. & Iiri, T. Regulation of RhoA Signaling by the cAMP-dependent hosphorylation of hoGDIα. Journal of Biological Chemistry 287, 38705-38715 (2012).

223 Lang, P., Gesbert, F., Delespine-Carmagnat, M., Stancou, R., Pouchelet, M. & Bertoglio, J. Protein kinase A phosphorylation of RhoA mediates the morphological and functional effects of cyclic AMP in cytotoxic lymphocytes. The EMBO Journal 15, 510-519 (1996).

224 Lodish, H. F., Berk, A., Kaiser, C. A., Krieger, M., Bretscher, A., Ploegh, H., Amon, A. & Martin, K. C. Molecular cell biology. (2016).

225 Priel, A., Tuszynski, J. A. & Woolf, N. J. Neural cytoskeleton capabilities for learning and memory. Journal of Biological Physics 36, 3 (2009).

226 Lynch, G., Rex, C. S., Chen, L. Y. & Gall, C. M. The substrates of memory: Defects, treatments, and enhancement. European Journal of Pharmacology 585, 2-13 (2008).

227 Cleveland, D. W., Hwo, S.-Y. & Kirschner, M. W. Purification of tau, a microtubule-associated protein that induces assembly of microtubules from purified tubulin. Journal of Molecular Biology 116, 207-225 (1977).

228 Bacaj, T., Wu, D., Burré, J., Malenka, R. C., Liu, X. & Südhof, T. C. Synaptotagmin-1 and -7 Are Redundantly Essential for Maintaining the Capacity of the Readily-Releasable Pool of Synaptic Vesicles. PLOS Biology 13, e1002267 (2015).

229 Sirover, M. A. New insights into an old protein: the functional diversity of mammalian glyceraldehyde-3-phosphate dehydrogenase. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1432, 159- 184 (1999).

230 Sirover, M. A. Role of the glycolytic protein, glyceraldehyde-3-phosphate dehydrogenase, in normal cell function and in cell pathology. Journal of Cellular Biochemistry 66, 133-140 (1997).

231 Zheng, L., Roeder, R. G. & Luo, Y. S Phase Activation of the Histone H2B Promoter by OCA-S, a Coactivator Complex that Contains GAPDH as a Key Component. Cell 114, 255-266 (2003).

224

232 Engel, M., Seifert, M., Theisinger, B., Seyfert, U. & Welter, C. Glyceraldehyde-3- phosphate Dehydrogenase and Nm23-H1/Nucleoside Diphosphate Kinase A: Two old enzymes combine for the novel nm23 protein phosphotransferase function. Journal of Biological Chemistry 273, 20058-20065 (1998).

233 Duclos-Vallée, J. C., Capel, F., Mabit, H. & Petit, M. A. Phosphorylation of the hepatitis B virus core protein by glyceraldehyde-3-phosphate dehydrogenase protein kinase activity. Journal of General Virology 79, 1665-1670 (1998).

234 Bryksin, A. V. & Laktionov, P. P. Role of glyceraldehyde-3-phosphate dehydrogenase in vesicular transport from Golgi apparatus to endoplasmic reticulum. Biochemistry (Moscow) 73, 619 (2008).

235 Huitorel, P. & Pantaloni, D. Bundling of microtubules by glyceraldehyde-3- phosphate dehydrogenase and its modulation by ATP. European Journal of Biochemistry 150, 265-269 (1985).

236 Kumagai, H. & Sakai, H. A Porcine Brain Protein (35K Protein) which Bundles Microtubules and Its Identification as Glyceraldehyde 3-Phosphate Dehydrogenase 1. The Journal of Biochemistry 93, 1259-1269 (1983).

237 Evergren, E., Benfenati, F. & Shupliakov, O. The synapsin cycle: A view from the synaptic endocytic zone. Journal of Neuroscience Research 85, 2648-2656 (2007).

238 Clementi, F., Fesce, R., Meldolesi, J., Valtorta, F., Hilfiker, S., Pieribone Vincent, A., Czernik Andrew, J., Kao, H.-T., Augustine George, J. & Greengard, P. Synapsins as regulators of neurotransmitter release. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 354, 269-279 (1999).

239 Greengard, P., Valtorta, F., Czernik, A. J. & Benfenati, F. Synaptic vesicle phosphoproteins and regulation of synaptic function. Science 259, 780 (1993).

240 Wimalasena, K. Vesicular monoamine transporters: Structure-function, pharmacology, and medicinal chemistry. Medicinal Research Reviews 31, 483- 519 (2011).

241 Camp, A. J. & Wijesinghe, R. Calretinin: Modulator of neuronal excitability. The International Journal of Biochemistry & Cell Biology 41, 2118-2121 (2009).

242 Wang, B., Chen, J., Santiago Fernando, S., Janes, M., Kavurma Mary, M., Chong Beng, H., Pimanda John, E. & Khachigian Levon, M. Phosphorylation and Acetylation of Histone H3 and Autoregulation by Early Growth Response 1 ediate Interleukin β Induction of arly Growth esponse Transcription. Arteriosclerosis, Thrombosis, and Vascular Biology 30, 536-545 (2010).

225

243 Kouzarides, T. Chromatin Modifications and Their Function. Cell 128, 693-705 (2007).

244 Conesa, A., Nueda, M. J., Ferrer, A. & Talón, M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 22, 1096-1102 (2006).

245 Nueda, M. J., Tarazona, S. & Conesa, A. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics 30, 2598-2602 (2014).

246 Fraley, C. & Raftery, A. E. Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association 97, 611-631 (2002).

247 Lee, H.-K., Takamiya, K., He, K., Song, L. & Huganir, R. L. Specific Roles of AMPA Receptor Subunit GluR1 (GluA1) Phosphorylation Sites in Regulating Synaptic Plasticity in the CA1 Region of Hippocampus. Journal of Neurophysiology 103, 479-489 (2009).

248 Haeuptle, M. A., Pujol, F. M., Neupert, C., Winchester, B., Kastaniotis, Alexander J., Aebi, M. & Hennet, T. Human RFT1 Deficiency Leads to a Disorder of N-Linked Glycosylation. The American Journal of Human Genetics 82, 600- 606 (2008).

249 Zheng, N. & Shabek, N. Ubiquitin Ligases: Structure, Function, and Regulation. Annual Review of Biochemistry 86, 129-157 (2017).

250 Haberland, M., Montgomery, R. L. & Olson, E. N. The many roles of histone deacetylases in development and physiology: implications for disease and therapy. Nature Reviews Genetics 10, 32 (2009).

251 Sheikh, B. N., Guhathakurta, S. & Akhtar, A. The non-specific lethal (NSL) complex at the crossroads of transcriptional control and cellular homeostasis. EMBO reports 20, e47630 (2019).

252 Cech, Thomas R. & Steitz, Joan A. The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones. Cell 157, 77-94 (2014).

253 Johannes, L., Jacob, R. & Leffler, H. Galectins at a glance. Journal of Cell Science 131, jcs208884 (2018).

254 Bliss, T. V. P. & Lømo, T. Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. The Journal of Physiology 232, 331-356 (1973).

255 Ha, J. T. Genomic approaches to studies of a simple memory forming network in Aplysia californica. (University of Florida, 2006).

226

256 Lee, Y.-S., Choi, S.-L., Kim, T.-H., Lee, J.-A., Kim, H. K., Kim, H., Jang, D.-J., Lee, J. J., Lee, S., Sin, G. S., Kim, C.-B., Suzuki, Y., Sugano, S., Kubo, T., Moroz, L. L., Kandel, E. R., Bhak, J. & Kaang, B.-K. Transcriptome analysis and identification of regulators for long-term plasticity in Aplyisa kurodai. Proceedings of the National Academy of Sciences 105, 18602 (2008).

257 Loebrich, S. & Nedivi, E. The Function of Activity-Regulated Genes in the Nervous System. Physiological Reviews 89, 1079-1103 (2009).

258 Tkatch, T., Baranauskas, G. & Surmeier, D. J. Kv4.2 mRNA Abundance and A- Type K+; Current Amplitude Are Linearly Related in Basal Ganglia and Basal Forebrain Neurons. The Journal of Neuroscience 20, 579 (2000).

259 Gaston, K. & Jayaraman, P. S. Transcriptional repression in eukaryotes: repressors and repression mechanisms. Cellular and Molecular Life Sciences CMLS 60, 721-741 (2003).

227

BIOGRAPHICAL SKETCH

Caleb Bostwick earned his Bachelor of Science degree from the University of

Florida, double majoring in chemistry (graduating cum laude) and microbiology and cell sciences (graduating summa cum laude) in 2010. He completed an undergraduate thesis titled “Deletions in the b Subunit of Yeast Mitochondrial F1F0 ATP Synthase” working under the mentorship of Dr. Brian Cain. Caleb then entered the Interdisciplinary

Program (IDP) in Biomedical Sciences Graduate Program at the University of Florida in

2011, where he performed his doctoral research in the neuroscience concentration with advisor Dr. Leonid Moroz. He received his Ph.D. from the University of Florida in 2019.

228