On the Existence of Functional Clusters in the Dorsomedial Striatum

DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 On the Existence of Functional Clusters in the Dorsomedial Striatum THEODOR OHLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES On the Existence of Functional Clusters in the Dorsomedial Striatum THEODOR OHLSSON Degree Projects in Optimization and Systems Theory (30 ECTS credits) Master’s Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2020 Supervisor at Meletis Laboratory (KI): Emil Wärnberg Supervisor at KTH: Xiaoming Hu Examiner at KTH: Xiaoming Hu TRITA-SCI-GRU 2020:301 MAT-E 2020:074 Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci Abstract In recent years, the understanding of the brain has progressed immensely by advanced data gathering methods that can track the activity of individual neurons. This has enabled researchers to investigate the function and dynamics of di↵erent parts of the brain in detail. Using data gathered from mice engaged in an two-alternative choice task, this thesis sought to answer whether neurons of the dorsomedial striatum are clustered with regards to their activity profiles by using four fundamentally di↵erent mathematical approaches. This analysis could not find any reliable evidence of functional clusters in the dorsomedial striatum, but their existence cannot be excluded. Sammanfattning Med hjälp av avancerade metoder fördatainsamling har forskare inom neu- rovetenskap kunnat följa enskilda neuroners aktivitet i realtid. P˚as˚asätt har funktionen av delar av hjärnan kunnat kartläggas. Detta arbete utforskade huruvida neuroner i dorsomediala striatum är klustrade med avseende p˚a deras funktion. Detta gjordes genom att tillämpa fyra olika metoder p˚adata fr˚anförsöksmöss.Denna analys kan inte hitta beläggföratt det finns funk- tionella kluster i dorsomedial striatum, men kan heller inte utesluta att det finns. i Acknowledgments The foundation of this thesis is the experiments done by researchers at the Dinos Meletis Lab, KI, to whom I am grateful for sharing their data with me. Any faults in this thesis are my own. Lastly, special thanks to Emil Wärnberg for his guidance and sharing of endless insights into neurosciene and mathematics alike. ii Contents 1 Introduction 1 1.1 AdvancesinNeuroscience . 1 1.2 Previous Research . 2 1.3 Research Question . 3 2 Theoretical Background 5 2.1 IntroductiontoNeuroscience. 5 2.1.1 TheNeuron ........................ 5 2.1.2 The Striatum . 6 2.2 ClusterAnalysis.......................... 7 2.3 ClusterValidityIndicies . 8 2.3.1 The Gap Statistic . 8 2.4 Modality . 9 2.4.1 The Dip Test . 10 2.4.2 Principle Curve . 10 2.4.3 The Folding Test . 11 2.5 ePAIRS .............................. 13 2.6 CommunityDetection . 14 2.6.1 Correlation Clustering . 14 3 Methodology 16 3.1 Data Gathering . 16 3.2 ArtificialDataSets ........................ 17 3.2.1 SkewNormal ....................... 18 3.2.2 Three Blobs . 18 3.2.3 15 blobs . 19 3.2.4 Cross distribution . 20 3.3 Procedure . 21 4 Results 22 4.1 Folding Test . 22 4.2 ePAIRS .............................. 23 4.3 Principle Curve . 24 4.4 Gap Statistic . 26 iii 5 Discussion 28 5.1 InterpretationofResults . 28 5.2 Limitations . 29 5.2.1 Data Gathering . 29 5.2.2 UniformastheNullDistribution . 29 5.3 Future Research . 29 5.3.1 Experiments with Other Dimensions . 30 5.3.2 Algorithms for Categorical Mixed Selectivity . 30 5.3.3 Validation Techniques for Correlation Clustering . 32 6 Conclusion 33 References 34 7 Appendix 38 7.1 Principle curve . 38 7.2 Gap Statistic . 39 7.3 ePAIRS .............................. 41 iv 1 Introduction 1.1 Advances in Neuroscience The function of the brain has long been shrouded in mystery. Indeed, Aris- totle posited that its main purpose was to cool the circulating blood [1]. Since the introduction of modern medicine and science, the function of the brain as the central controlling organ which houses thoughts, planning and emotions has been well established. Whereas the function of and dynamics between di↵erent parts of the brain has historically been understood through observations of the e↵ects of head trauma or pathologies, recent advances in technology has allowed researchers to monitor the live activity of the brain in vivo.Throughtheuseoftestanimals,researchersnowhavetheability to record a large number of individual neurons at the same time which has allowed for the role of di↵erent parts of the brain to be studied in detail [2]. This is in practice done by simultaneously recording the behaviour of atestanimalandtheactivityofneurons,whichwillinfercorrelation,or even through controlling the activity of individual neurons through optoge- netic techniques and noting the behaviour, which will allow for conclusions regarding causality. The ability to record activity of individual neurons in a high temporal reso- lution has made any conclusions contingent on the analysis of large troves of data. This has increasingly made neuroscientists reliant on machine learning techniques and advanced mathematical methods as a part of their research. Moreover, this has also in itself generated new inquiries into mathematical models of neural circuitry and even established computational neuroscience as a field of its own [3]. One such inquiry has been to establish or disprove the existence of neuronal clusters. Let an experiment have recorded N neurons whilst a test subject engages in d number of tasks. Let then subsequent analysis give each neuron az-scoresignifyinghowactivesaidneuronisoneachtaskcomparedtomean activity of neurons. One is then left with N points in a d-dimensional space. It is then of interest to know whether there are k distinct groups of neurons in this d-dimensional space. If there are distinct groups, this would imply that those neurons can be understood as of di↵erent types which could lead researchers to infer a more detailed description of the inner workings of a 1 certain part of the brain. This concept is often referred to as non-random or categorical mixed selectivity,asopposedtorandom mixed selectivity in which neurons are located in this feature space without any clear pattern. 1.2 Previous Research There has been a longstanding e↵ort to find clusters of neurons in many di↵erent parts of the brain with mixed results. This has roughly manifested itself in inventing new methods for generating clusters and/or using old methods of validating those clusters. With the use of a mathematical analysis on the angles between data points (see section 2.5), neurons in the posterior parietal and orbitofrontal cortex have been found to be clustered and unclustered re- spectively [4, 5]. However, this method has has not been rigorously tested on artificial data sets and so its robustness is still unclear. Gründemann et al. [6] grouped neurons of the amygdala in two occasions in apaperpublishedinScience2019.Inthefirstinstancetheyemployedthek- means algorithm to produce 3 clusters, but did not elaborate on the validity or separation of those 3 clusters. In the second instance they excluded some neurons that are deemed to be not active enough, i.e the absolute value of their score is lower than some threshold, to generate clusters. The practice of excluding neurons form a data set was also used by Jennings et. al [7] in the hypothalamus and Barbera et al. [8] to form clusters in the dorsal striatum using neuronal correlation (see section 2.6). Excluding data points is problematic since it, for some cut o↵, might produce a clustered data set from an unclustered one (see Figure 1). 2 Figure 1: A figure illustrating how the exclusion of data points may result in the impression of a clustered data set The approach to validate clusters generated will rely on some measures of cluster fitness to find the appropriate number of clusters. Alder et al. [9] uses the Silhouette coefficient (see section 2.2.2) to this end on striatal neurons. However, the Silhouette coefficient implicitly assumes that a data set is clustered, and cannot be reliably used to di↵erentiate between clustered and unclustered data. Moreover, it seems like Alder et al. interpreted the results of the Silhouette wrong by choosing a di↵erent k than what the method pro- poses. Sales-Carbonell et al. [10] also used the Silhouette on striatal neurons with which they concluded that there were no clusters. In summary, there are 3 main pitfalls that one commonly encounters in previous inquires into the existence of clusters in the brain. The methods: 1. have not been properly tested so their results cannot be accurately interpreted. 2. filter out data points which make them biased towards finding clusters. 3. implicitly or explicitly assumes the existence of clusters a priori. 1.3 Research Question This paper will examine the existence of functional clusters in the dorsomedial striatum. This is a similar, yet distinct inquiry from that of the number of clusters in the striatum, which will not be extensively elaborated on. The 3 question of clusters will be examined with regard to fallacies in the research field. That is, methods to be used in this paper will have to be tested on artificial data sets, be used on the whole of data sets and will not assume that there are any clusters to begin with. 4 2 Theoretical Background 2.1 Introduction to Neuroscience 2.1.1 The Neuron The brain consist of a vast number of cells. The morphological and functional heterogeneity of these cells is great, although the primary type of cell is neuron. These cells are specialized in transmitting information to other cells, not exclusively other neurons. Neurons can roughly be described as consisting of a cell body or soma,dendritesandaxon. Figure 2: A schematic of the rough structure of a neuron.

On the Existence of Functional Clusters in the Dorsomedial Striatum

Impact of Multimodality of Distributions on Var and ES Calculations Dominique Guegan, Bertrand Hassani, Kehan Li

Methods to Analyze Likert-Type Data in Educational Technology Research

Resonance-Induced Multimodal Body-Size Distributions in Ecosystems

A Unified View on Skewed Distributions Arising from Selections

General Solution of the Chemical Master Equation and Modality of Marginal Distributions for Hierarchic ﬁrst-Order Reaction Networks

Multimodal Nested Sampling: an Efﬁcient and Robust Alternative to Markov Chain Monte Carlo Methods for Astronomical Data Analyses

Spatially Heterogeneous Estimates of Fire Frequency in Ponderosa Pine Forests of Washington, Usa

Empirical Measures of Mutational Effects Define Neutral Models of Regulatory Evolution in Saccharomyces Cerevisiae

Evidence That Computer Science Grades Are Not Bimodal

Multi-Modal Albedo Distributions in the Ablation Area of the Southwestern Greenland Ice Sheet

Multimodal Data Representations with Parameterized Local Structures

Development of a Severe Fire Potential Map for the Contiguous United States