DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020

On the Existence of Functional Clusters in the Dorsomedial Striatum

THEODOR OHLSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

On the Existence of Functional Clusters in the Dorsomedial Striatum

THEODOR OHLSSON

Degree Projects in Optimization and Systems Theory (30 ECTS credits) Master’s Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2020 Supervisor at Meletis Laboratory (KI): Emil Wärnberg Supervisor at KTH: Xiaoming Hu Examiner at KTH: Xiaoming Hu

TRITA-SCI-GRU 2020:301 MAT-E 2020:074

Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

In recent years, the understanding of the brain has progressed immensely by advanced data gathering methods that can track the activity of individual neurons. This has enabled researchers to investigate the function and dy- namics of di↵erent parts of the brain in detail. Using data gathered from mice engaged in an two-alternative choice task, this thesis sought to answer whether neurons of the dorsomedial striatum are clustered with regards to their activity profiles by using four fundamentally di↵erent mathematical approaches. This analysis could not find any reliable evidence of functional clusters in the dorsomedial striatum, but their existence cannot be excluded.

Sammanfattning Med hj¨alp av avancerade metoder f¨ordatainsamling har forskare inom neu- rovetenskap kunnat f¨olja enskilda neuroners aktivitet i realtid. P˚as˚as¨att har funktionen av delar av hj¨arnan kunnat kartl¨aggas. Detta arbete utforskade huruvida neuroner i dorsomediala striatum ¨ar klustrade med avseende p˚a deras funktion. Detta gjordes genom att till¨ampa fyra olika metoder p˚adata fr˚anf¨ors¨oksm¨oss.Denna analys kan inte hitta bel¨aggf¨oratt det finns funk- tionella kluster i dorsomedial striatum, men kan heller inte utesluta att det finns.

i

Acknowledgments

The foundation of this thesis is the experiments done by researchers at the Dinos Meletis Lab, KI, to whom I am grateful for sharing their data with me. Any faults in this thesis are my own. Lastly, special thanks to Emil W¨arnberg for his guidance and sharing of endless insights into neurosciene and mathematics alike.

ii

Contents

1 Introduction 1 1.1 AdvancesinNeuroscience ...... 1 1.2 Previous Research ...... 2 1.3 Research Question ...... 3

2 Theoretical Background 5 2.1 IntroductiontoNeuroscience...... 5 2.1.1 TheNeuron ...... 5 2.1.2 The Striatum ...... 6 2.2 ClusterAnalysis...... 7 2.3 ClusterValidityIndicies ...... 8 2.3.1 The Gap Statistic ...... 8 2.4 Modality ...... 9 2.4.1 The Dip Test ...... 10 2.4.2 Principle Curve ...... 10 2.4.3 The Folding Test ...... 11 2.5 ePAIRS ...... 13 2.6 CommunityDetection ...... 14 2.6.1 Correlation Clustering ...... 14

3 Methodology 16 3.1 Data Gathering ...... 16 3.2 ArtificialDataSets ...... 17 3.2.1 SkewNormal ...... 18 3.2.2 Three Blobs ...... 18 3.2.3 15 blobs ...... 19 3.2.4 Cross distribution ...... 20 3.3 Procedure ...... 21

4 Results 22 4.1 Folding Test ...... 22 4.2 ePAIRS ...... 23 4.3 Principle Curve ...... 24 4.4 Gap Statistic ...... 26

iii 5 Discussion 28 5.1 InterpretationofResults ...... 28 5.2 Limitations ...... 29 5.2.1 Data Gathering ...... 29 5.2.2 UniformastheNullDistribution ...... 29 5.3 Future Research ...... 29 5.3.1 Experiments with Other Dimensions ...... 30 5.3.2 Algorithms for Categorical Mixed Selectivity ...... 30 5.3.3 Validation Techniques for Correlation Clustering . . . . 32

6 Conclusion 33

References 34

7 Appendix 38 7.1 Principle curve ...... 38 7.2 Gap Statistic ...... 39 7.3 ePAIRS ...... 41

iv 1 Introduction

1.1 Advances in Neuroscience The function of the brain has long been shrouded in mystery. Indeed, Aris- totle posited that its main purpose was to cool the circulating blood [1]. Since the introduction of modern medicine and science, the function of the brain as the central controlling organ which houses thoughts, planning and emotions has been well established. Whereas the function of and dynamics between di↵erent parts of the brain has historically been understood through observations of the e↵ects of head trauma or pathologies, recent advances in technology has allowed researchers to monitor the live activity of the brain in vivo.Throughtheuseoftestanimals,researchersnowhavetheability to record a large number of individual neurons at the same time which has allowed for the role of di↵erent parts of the brain to be studied in detail [2]. This is in practice done by simultaneously recording the behaviour of atestanimalandtheactivityofneurons,whichwillinfercorrelation,or even through controlling the activity of individual neurons through optoge- netic techniques and noting the behaviour, which will allow for conclusions regarding causality.

The ability to record activity of individual neurons in a high temporal reso- lution has made any conclusions contingent on the analysis of large troves of data. This has increasingly made neuroscientists reliant on machine learning techniques and advanced mathematical methods as a part of their research. Moreover, this has also in itself generated new inquiries into mathematical models of neural circuitry and even established computational neuroscience as a field of its own [3].

One such inquiry has been to establish or disprove the existence of neuronal clusters. Let an experiment have recorded N neurons whilst a test subject engages in d number of tasks. Let then subsequent analysis give each neuron az-scoresignifyinghowactivesaidneuronisoneachtaskcomparedtomean activity of neurons. One is then left with N points in a d-dimensional space. It is then of interest to know whether there are k distinct groups of neurons in this d-dimensional space. If there are distinct groups, this would imply that those neurons can be understood as of di↵erent types which could lead researchers to infer a more detailed description of the inner workings of a

1 certain part of the brain. This concept is often referred to as non-random or categorical mixed selectivity,asopposedtorandom mixed selectivity in which neurons are located in this feature space without any clear pattern.

1.2 Previous Research There has been a longstanding e↵ort to find clusters of neurons in many di↵er- ent parts of the brain with mixed results. This has roughly manifested itself in inventing new methods for generating clusters and/or using old methods of validating those clusters. With the use of a mathematical analysis on the angles between data points (see section 2.5), neurons in the posterior parietal and orbitofrontal cortex have been found to be clustered and unclustered re- spectively [4, 5]. However, this method has has not been rigorously tested on artificial data sets and so its robustness is still unclear.

Gr¨undemann et al. [6] grouped neurons of the amygdala in two occasions in apaperpublishedinScience2019.Inthefirstinstancetheyemployedthek- algorithm to produce 3 clusters, but did not elaborate on the validity or separation of those 3 clusters. In the second instance they excluded some neurons that are deemed to be not active enough, i.e the absolute value of their score is lower than some threshold, to generate clusters. The practice of excluding neurons form a data set was also used by Jennings et. al [7] in the hypothalamus and Barbera et al. [8] to form clusters in the dorsal striatum using neuronal correlation (see section 2.6). Excluding data points is problematic since it, for some cut o↵, might produce a clustered data set from an unclustered one (see Figure 1).

2 Figure 1: A figure illustrating how the exclusion of data points may result in the impression of a clustered data set

The approach to validate clusters generated will rely on some measures of cluster fitness to find the appropriate number of clusters. Alder et al. [9] uses the Silhouette coecient (see section 2.2.2) to this end on striatal neu- rons. However, the Silhouette coecient implicitly assumes that a data set is clustered, and cannot be reliably used to di↵erentiate between clustered and unclustered data. Moreover, it seems like Alder et al. interpreted the results of the Silhouette wrong by choosing a di↵erent k than what the method pro- poses. Sales-Carbonell et al. [10] also used the Silhouette on striatal neurons with which they concluded that there were no clusters.

In summary, there are 3 main pitfalls that one commonly encounters in pre- vious inquires into the existence of clusters in the brain. The methods:

1. have not been properly tested so their results cannot be accurately interpreted.

2. filter out data points which make them biased towards finding clusters.

3. implicitly or explicitly assumes the existence of clusters a priori.

1.3 Research Question This paper will examine the existence of functional clusters in the dorsome- dial striatum. This is a similar, yet distinct inquiry from that of the number of clusters in the striatum, which will not be extensively elaborated on. The

3 question of clusters will be examined with regard to fallacies in the research field. That is, methods to be used in this paper will have to be tested on artificial data sets, be used on the whole of data sets and will not assume that there are any clusters to begin with.

4 2 Theoretical Background

2.1 Introduction to Neuroscience 2.1.1 The Neuron The brain consist of a vast number of cells. The morphological and functional heterogeneity of these cells is great, although the primary type of cell is neuron. These cells are specialized in transmitting information to other cells, not exclusively other neurons. Neurons can roughly be described as consisting of a cell body or soma,dendritesandaxon.

Figure 2: A schematic of the rough structure of a neuron. Input signals will be received at the dendrites and output will be transmitted through the axon. Credit: Edvin Wester

The cell membrane, which encloses the cell, is made of phospholipids and is as of such in itself non permeable to much of the surrounding environ- ment. However, channels and pumps covers the cell membrane which allow for exchange of ions and nutrients between the neuron and its surrounding. These pumps and channels will in a resting cell maintain a higher concen- tration of K+ inside the cell than outside of the cell. Conversely, they will also maintain a lower concentration of Na+ and Ca+ inside the cell than outside. This results in a voltage over the cell membrane called the resting membrane potential. When a neuron sends a signal (known as an action potential), a chain reaction of highly coordinated in- and eflux of ions results in a depolarization of the cell membrane. This depolarization will propagate throughout the length of the neuron until it reaches the terminal axon which connects to a the dendrites of another new neuron. This connection between

5 two neurons is known as a synapse. Although the exact mechanisms of neu- ral signaling will depend on the receptor of the post synaptic neuron, neural transmitters will be released to the post-synaptic neuron which will either increase or decrease the probability of inducing an action potential [2].

2.1.2 The Striatum The striatum is a centrally located part of the brain, below the cortex, which in humans consists of the nucleus accumbens, olfactory tubercule, the cau- date nucleus and the putamen. Post mortem findings in patients diagnosed with Parkinson’s disease or Huntington’s disease has prompted researchers investigate the role of the striatum through animal experiments, which has re- vealed that the striatum enforces smooth motor movements [11]. It has been observed that the striatum receives contextual information from the cortex, which is then processed and relayed to the relevant areas to initiate some behaviour. The three main output pathways are the direct pathway, the in- direct pathway and the patch pathway. The expression of in neurons of the direct, indirect and patch pathways di↵er which has enabled researchers to selectively study these neurons. These genotypes expressions are denoted as D1, A2A and OPRM1 for the three pathways respectively. The classic rate model describes how the direct pathway selectively excites a movement while the indirect pathway inhibits competing movements. The idea is then that, under this theory, the imbalance between selection of wanted movements and inhibition of unwanted movements cause the rigid and jerking movements of patients with Parkinson’s [12]. Recent research has however showed that the role and dynamics of the striatum is more complex than what can be fit the classical rate model [11, 12, 13]. It has for instance been established that the patch pathway encodes the expected value for an action [14].

6 Figure 3: A coronal section of mouses brain showing the dorsomedial stria- tum, taken from [15].

The striatum itself is divided into subregions that are functionally slightly di↵erent. The dorsomedial striatum, corresponding to the caudate nucleus, is thought to regulate behaviour that are contingent on the outcome of an action. That is, the dorsomedial striatum promotes the behaviour that is associated with a positive value, while suppressing actions that are no longer associated with a positive value [16].

2.2

In a data set X consisting of vectors xi for i =1,...,N,aclusterisdefined as a group of data points which are more similar to one another than to points of other groups [17]. In general, finding clusters is done in order to shed light on the underlying process that generates a data set and so what should and shouldn’t be considered clustered varies depending on the context.

The problem of finding clusters in a data set, known as cluster analysis, has generated much research and has resulted in a plethora of algorithms devoted to this task [18]. However, the majority of these algorithms in use assume that number of clusters, k,areknowna priori.Thisisforinstance the case with k means, hierarchical clustering and the EM-algorithm[19]. For some applications this is a completely reasonable limitation, but for oth- ers the exact number of k is the point of inquiry.

7 In light of this, a number of algorithms that cope with the problem of an un- known k have been developed. Some notable examples are DBSCAN [20] and OPTICS [21]. These do all, however, rely on some sort of hyperparameter selection which in turn will dictate the output. Using such an algorithm will then output some number k corresponding to the most appropriate choice of clusters according to the algorithm, but the user is left to question the va- lidity of said clusters as k will be a function of some hyperparameter space.

2.3 Cluster Validity Indicies Acommonmethodtojudgeandcomparedi↵erentclusteringoutputsistouse a cluster validity index (CVI). For any clustering algorithm that is applied to a data set, it is run for every reasonable k. In the case of no intuition about the data set, the values for k =1,...,N which at times can be infeasible with large data sets. For every run, the fitness of the clusters are evaluated using some CVI. The number of CVI’s are ever growing, and the selection of which one to use is done fairly arbitrarily. A common one is the Silouhette method [22] which was used by [9, 10]. The Silhouette method do however only have ability to discern between k =2,...,N number of clusters and can therefore not really be used to understand whether a data set is clustered or not. Moreover, a comparative study by Arbelaitz et al. [23] show that the performance of many CVIs, including the Silhouette, are poor. When used to find the optimal k on data sets with a known k, the best CVIs only has an accuracy of roughly 50%.

2.3.1 The Gap Statistic One method formulated by Tibrashani et al. [24] that elaborates on the no- tion of comparing clusters fitness is the Gap statistic. The Gap statistic was also proposed to be able to discern between unclustered and clustered data compared to other CVIs. Given that we have data that has been clustered using some arbitrary method, let dij denote the squared Euclidean distance between observation xi and xj.ThenDr is the sum of all pairwise distances

8 in cluster r Dr = di,j (1)

i,j Cr X2 Wk is then defined as k 1 W = D (2) k 2n r r=1 r X The gap statistic is then to compare the log(Wk)toitsexpectationgiven a null distribution. Given that the uniform distribution is the distribution most like to generate spurious cluster, the null distribution will be taken to follow a uniform distribution over the same range as the original data set. M samples of the uniform distribution are generated, for which Wk⇤ is computed. One then gets the estimated gap as

Gap (k)=E (log(W ⇤) log(W )) (3) n n k k

From this sample of Gapn(k)themeanandthestandarderrorcanbeeval- uated. Gapn(k)isdoneforsomeappropriatenumberofk.

Given that the Gapn(k)andcorrespondingsk has been evaluated, there are afewwaysforthenfindingtheoptimalk. What was originally proposed by Tibrashani was to find the smallest k such that

Gap (k) Gap (k +1) s (4) n n k+1 2.4 Modality One approach to assess the presence of clusters is to evaluate multimodality. For a real function F, it is said to be unimodal if and only if F is convex on ( ,m]andconcaveon(m, ]. Note that m does not necessarily need to be1 unique. One should note that1 the uniform distribution is then said to be unimodal, an implication which will be important later in this section. The reasoning is that if the underlying distribution only has one , it can be considered to only consist of one cluster. Conversely, a multimodal underlying distribution will generate clustered data [25]. Reliable methods and tested methods have been developed for this, but mostly in the one- dimensional case.

9 2.4.1 The Dip Test The dip test by Hartigan and Hartigan [26] evaluates probability of unimodal- ity in the one-dimensional case by using the dip statistic. The dip statistic, D(F, G)isthemaximumdi↵erencebetweentheempiricaldistributionF and the unimodal distribution G that minimizes the di↵erence. That is

D(F, G)=sup F (x) G(x) (5) x | |

The unimodal distribution G is found in the following way: let x1, x2,..., x be numbers that are ordered so that x x for every i

⇢(F, G)=mindij (6) i,j

To evaluate the power of the dip, samples from a uniform distribution over the same interval as F are generated M times. The uniform distribution is chosen as it is the unimodal distribution that empirically will have the largest dip. The dip is then evaluated for each sample, which can be use to create p-value for the null hypothesis that F is unimodal. Although Hartigan and Hartigan proposed a method of expanding this method into higher dimen- sional spaces, none has been implemented since the publishing of the method in 1985. Therefore the reduction of dimensionality is required for the use of dip-test in the high-dimensional case.

2.4.2 Principle Curve For a data set X which distribution is denoted as h,theprinciplecurve seeks to repackage the information into a one dimensional format [27]. A smooth curve is drawn through the data set in such a way as to minimize the orthogonal distances between the curve and the nearest data points. Moreover, the principle curve needs to fulfill the criteria of self-consistency. Let f()denotetheparameterizedcurve.Thentheprojectionindexf (xi)

10 is the for which xi is closest to the curve in the Euclidean sense. That is

f (xi)=inf xi f(µ) (7) µ || || Let a subset of data points have f()astheirclosestpointonthecurve.Self- consistency property then implies that f()istheaverageofthissubset,for every .Thatis, E(x (x)=)=f()(8) | f The distance between a point xi and its projection is expressed as d(xi,f)= 2 2 xi f(f (xi)) .LetthenD (h, f)=Eh[d (X, f)], which is to be minimized. || || dD2(h,f) 2 Alocalminimumisfoundwhen dt =0,butinpracticeD (h, f)is evaluated iteratively until the change is below some threshold. Set f 0()= x¯ + a,wherea is the first principal component. Then update f so that it fulfills the self-consistency criteria

j+1 f ()=E[X j (X)=](9) | f The distance function can then be computed for the current iteration

2 j j 2 D (h, f )=E [ X f ( j (X)) ](10) h || f ||

When a principle curve has been settled on will also have the projection indices at hand. Ahmed and Walther [28] proposed to use the concept of principle curve for the inference of multimodality. The idea is here to reduce the information of density to one dimension. If f (x)isfoundtobefollow amultimodaldistribution,thedatasetwouldalsobeinterpretedasmulti- modal. In practice the method is fairly straightforward: the principle curve for a data set is computed. Then a test of , can be applied on f (x) for the data set. In the paper by Ahmed and Walther the Silvermans bandwidth test is used, but this test requires calibration to provide accurate results [29], as well as being notably conservative compared to the dip test [30].

2.4.3 The Folding Test The folding test likewise seeks to classify modality by comparing the empirical distribution to a null distribution [31]. For any given distribution, it is folded

11 on to itself via a pivoting point s⇤ (see Figure 4). The intuition is then that the variance for a will be reduced by more than if it where a unimodal distribution. If the transform X X + s⇤,thenthe folding ratio is defined as 7!

V ar(X + s ) (X)= ⇤ (11) V ar(X) where s⇤ is chosen as to minimize the variance the most

s⇤ =min[V ar(X + s)] (12) s Rd 2 However, since the numerator equation 11 may approach to infinity, V ar( X s 2)isminimizedinstead.Thiscanthenbeexpressedas || || 4sT V ar(X)s 2sT Cov( X 2,X)+V ar( X 2)(13) || || || || Taking the derivative and setting equal to zero then yields

1 2 s⇤ = Cov(X, X )(14) V ar(X) || ||

Figure 4: A image ilustrating the folding procedure, taken from the original paper [31].

This ratio is then to be compared with the null distribution, which is set to be uniform sphere in d dimensions. The folding ratio of this distribution is 1 ˜(U)= (15) 1+d2

12 The comparison becomes (X) (X)= (16) ˜(U) If (X) is larger than 1, then X is taken to be unimodal. However, since the (X) is an estimate of a random variable, a confidence bound has to be computed. By computing (Ud) M times, one can create a distribution of how likely it is that a uniform distribution of the same dimensions will yield afoldingratiolowerthanthatof(X)

2.5 ePAIRS As mentioned briefly in the introduction, the e↵orts of finding clusters in a neuroscience context are not novel, and has in itself produced methods for finding clusters. The ”elliptical projection angle index of response similar- ity” (ePAIRS)[4] is a slight modification of the algorithm PAIRS that was proposed by Xu et al. [5]. The intuition behind eP AIRS is that the angles between points of a clustered data set should be smaller than that of a un- clustered data set. Furthermore, eP AIRs will in theory be able to detect ”X-shaped” data sets where the data seems to be gathered along two or more axis but are still unimodal. Such data structure would be an important find- ing in the neuroscience context as it implies that neurons are indeed grouped with regards to some linear combination of functions, i.e the neurons express categorical mixed selectivity.

For every point its closest cosine neighbour is computed as

xixj ✓i =min arccos( ) , i =1,...N (17) j=i { x x } 8 6 | i|| j| The of ✓sisthecomparedtothatofanulldistribution.Thenull distribution is an elliptical , with the same and covariance matrix as X. N points are generated from the null distribution, from which median of the closest angle is computed as in equation 17. This is repeated M times to generate a p-value for X being unclustered.

13 2.6 Community Detection If one instead has a data set describing the activity of each point over time, amatrixdescribingthesimilarityofeachpointtooneanother,suchasa correlation matrix, can be constructed. This matrix can then be interpreted as a graph G =(V,E)whereeachvertexrepresentsapointandtheedges the correlation between edge point. The e↵ort of finding groups in G is the practice of community detection, which there are many algorithms for [32]. Indeed, the idea of finding clusters in a neuroscience setting through the community detection algorithms has been suggested before, albeit with the use of an arbitrary filtering process [33].

Figure 5: A graph in which each edge corresponds to the correlation between vertices

2.6.1 Correlation Clustering The crux of finding clusters based on a correlation matrix was formulated as an optimization problem by Bansal et al [34]. A partition of the graph G is sought as to minimize the number of disagreements. Disagreements does in this context refer to a pair of points that positively correlate but are not partitioned into the same cluster, or conversely, a pair that has a negative correlation but are yet assigned the same cluster. Set e(i, j)betheedge + between vertices i and j,andlete E if the edge is positive and e E 2 2 if it’s negative. Then if xe is a decision variable that equals to 0 if point i and j should be in the same cluster and 1 if not, the mixed integer program

14 becomes

min ce(1 xe)+ ce(xe)(18a) x + XE XE subject to x x + x , i, j, k (18b) ij  ik kj 8 x = 0, 1 , i, j, k (18c) ij { } 8 where equation 18c (the triangle inequality constraint) forces points only to belong to one cluster. Analogously with the problem statement, one can also change the objective function to maximize the number of agreements. Which one to use depends of the situation, but may generate the same results. It has long been known that integer programming is NP-complete [35], and as of such, much e↵ort has been put into finding approximations [36]. There has not, to the knowledge of the author of this paper, been any development in terms of validating the optimal solution to the program through hypothesis testing.

15 3 Methodology

3.1 Data Gathering The data was gathered by researchers at Dinos Meletis Lab [37]. Mice were injected with the viral particles AAV5-CAG-Flex-GCaMP6s into the dorso- medial striatum. This virus will selectively infect either D1, A2A or OPRM1 neurons which will in turn produce the protein Cre recombinase-activated ultrasensitive protein. When a neuron fires an action potential, the influx of Ca2+ result in the protein illuminating. A small endoscope is then surgi- cally implanted above the injection site. Depending on the mouse, 15 to 674 neurons could be registered by the endoscope. The mice were subject to a reward experiment in which the mouse was filmed whilst at the same time the endoscope captured neural activity.

Figure 6: An image showing a mouse engaging in the experiment described in [37] next to an image taken from an endosocope in the dorsomedial striatum showing neural activity as white traces. Courtesy of Dinos Meletis Lab, KI.

The mouse initiated the experiment by sticking its nose into a middle port. The animal could then choose to enter either a port to its left or right where it would be rewarded with a sucrose solution. The reward could only be found in one port at the time. After the reward, the mouse had to return to the middle port to initiate the task again. When the task was repeated there was a 5% chance of the reward switching to the opposite port, i.e from left to

16 right or right to left. In summary, the mouse would have to find out which port was associated with a reward and then adapt its behaviour when reward was switched to the other port. Based on this experiment set up, the mouse was then said to engage in one of 12 phases of the experiment depending on whether the mouse turns left or right and whether it does so when expecting areward,whenithasreceivedarewardornotreceivedareward.

One was then left with a video of a number of neurons with varying illumi- nation depending on the behaviour of the mouse. Because the illumination lingers for some time after an action potential, the activity is deconvolved as to reveal the action potentials themselves. For each neuron, the z-score of the deconvolved activity was computed for every phase. A shu✏ed data set was then constructed so that that the connection between neural activity and a given phase was lost. Using this shu✏ed data set, the z-score was com- puted for every neuron for each phase. This repeated 1000 times to create a standard error of the shu✏ed activity. The tuning score was then taken to be the real z-score subtracted with the shu✏ed tuning scores, divided by the standard error of the shu✏ed z-score.

This procedure was repeated for 19 di↵erent mice, in which one of the three genotypes D1, A2A and OPRM1 were recorded. All the neurons from the same genotype were grouped into one data set resulting in three data sets of 12 dimensions with 748 rows (D1-neurons), 2683 rows (A2A-neurons) and 777 rows (OPRM1-neurons) respectively. For more specifics of the experi- ment, the reader is directed to the original paper [37]

Because each genotype corresponds to a di↵erent pathway, they were an- alyzed separately. Finally a combined data set consisting of all genotypes also analysed in order to discern any possible di↵erence between the path- ways.

3.2 Artificial Data Sets As there is an inherent diculty to assess the validity of any clustering method, a number of constructed data sets of known k was used to to test them. Indeed, this is the procedure used in [23, 24, 26, 27]. This high- lighted limitations, pros and cons with each method, albeit not necessarily in a conclusive manner.

17 3.2.1 Skew Normal The is a normal distribution with a heavy tail. Apart from the mean and variance, the skew normal depends on two vari- ables: and , which dictates how heavy the tail is and how asymmetric the distribution is. As with the normal distribution, it only has one mode. This data set was chosen as it is not unlikely to find data of this distribution in a natural setting, such as the brain. Moreover, as it is not symmetric it was thought to act as a challenge for clustering algorithms.

Figure 7: An artifical data set generated from a skew normal distribution.

Two skew-normal data sets with 1000 points are generated: one in two di- mensions (see Figure 7) and one in 12 dimensions, as to be comparable with the striatum data. This is done using the python function ”skewnorm.rvs”.

3.2.2 Three Blobs Three groups of points were generated using the python function ”datasets.make blobs” from the package sklearn.datasets. This data set represents a schoolbook ex- ample of three clusters, but all with di↵erent variance.

18 Figure 8: An artificial data set with three clusters.

This is type of data structure was generated in 2 and 12 dimensions, with 1000 points each.

3.2.3 15 blobs Abenchmarkdataconsistingof5000pointsin2dimensionssetconstructed by Fr¨anti and Sieranoja [38] was included. This data set consisted of 15 clusters with varying overlap and variance. This data set could hopefully highlight any shortcomings in the methods when the number of clusters are large.

19 Figure 9: 15 blobs

3.2.4 Cross distribution As previously discussed, some data sets may be classified as either clustered or not clustered depending on the context. This data set was generated from two normal distributions with orthogonal covariance matrices in a two dimensions only.

20 Figure 10: An artificial data set following an X-shaped distribution.

3.3 Procedure The four methods described above for clustering in the Eucledian space (prin- ciple curve in conjunction with the dip, gap statistic, ePAIRS and the folding test) was implemented on the striatum data sets and the artificial data sets. The correlation clustering was not implemented, as no approach for validation of its result could be found. The principle curve method was implemented using the package ”princurve” and ”diptest” in R, with M =1000.Thegap statistic was similarly implemented in R using the package ”clusGap”. The clusters where formed using k-means for k =1..., 6withM=500. ePAIRS was implemented in python with M =1000.

21 4 Results

Table 2 describes the conclusions of the di↵erent approaches on the mouse experiment data as well as the constructed data sets. Subsequent subsec- tions will examine the results in more detail. As the results of the di↵erent partitions of the striatum data were highly similar, plots and results of the D1 partitioned will be shown, with the other partitions (A2A, OPRM1 and the full striatum data set) can be found in Appendix A.

Gap Statistic Principle Curve Folding Test ePAIRS 2-dimensional data sets Threeblobs Yes Yes No Yes 15 blobs Yes Yes No Yes Skew No No No Yes Cross No No No Yes 12-dimensional data sets Threeblobs Yes Yes No Yes Skew No No No Yes Cross No No No Yes Striatum (D1) No No No Yes Striatum (A2A) No No No Yes Striatum (OPRM1) No No No Yes Striatum (All) No No No Yes

Table 1: Each methods conclusion as to whether a data set was clustered or not

4.1 Folding Test The folding test could not reject the null hypothesis of unimodality for any data set with significance level of 5%. Indeed, this method acted as the oppo- site to ePAIRS in being overly conservative. As with ePAIRS, no conclusion as to whether the striatum is clustered or not can be drawn.

22 (X)MarginofErroratp =0.05 Reject H0 2-dimensional data sets Threeblobs 0.9378601 0.08930391 No Skew 1.659332 0.08930391 No Cross 2.71855 0.08930391 No 15blobs 1.139143 0.03993792 No 12-dimensional data sets Threeblobs 1.588866 0.1480916 No Skew 6.017205 0.1480916 No Striatum(D1) 13.87667 0.1712299 No Striatum (A2A) 17.09608 0.09117863 No Striatum(OPRM1) 11.83181 0.1680041 No Striatum(All) 20.91991 0.07219258 No

Table 2: Values for (X)fordi↵erentdatasets,aswellastheconfidence bound

4.2 ePAIRS The ePAIRS algorithm showed that the median of the nearest neighbour co- sine angle was indeed less in the striatum data set than the null distribution. This was true for the D1, A2A and OPRM1 partitions of the data set as well as the full data set, which all had an estimated p-value of 0. It should be noted that the distribution of angles of the striatum data and its null distri- bution were fairly similar, as opposed to the case of the three blobs data set. This could imply that the data structures of the striatum data and its null are similar, although this conclusion is not necessarily well founded and at the very least non-trivial.

23 (a) Striatum (D1) (b) Three blobs

Figure 11: Top figures showing the distribution of nearest angles compared to one sample from the null distribution. Bottom figure showing the distribution of median angles compared to the median angle of the true data set.

Moreover, since eP AIRS labeled the skew distribution as clustered when it was not, one is forced to conclude that the algorithm is prone to yield false positive results. Accordingly, it cannot be conclusively established that the striatum data set is clustered using this method.

4.3 Principle Curve The method of combining principle curve and unimodality tests revealed that the striatum data (for all partitions) was not clustered. Using the dip statistic on the D1 partition resulted in a p-value of 0.3846 so the null hypothesis of unimodality cannot be rejected. The principle curve for the D1 data was not unlike that of the 12-dimensional skew data.

24 (a) Striatum (D1) (b) 12-dimensional skew data

Figure 12: The principle curve, which each circle representing a x

The histogram of values for are clearly unimodal for the striatum data and the skew data.

(a) Striatum data (D1) (b) 12 dimensional skew data

Figure 13: Histogram of the projections show a clear unimodal distribution

This is contrasted by, the principle curve for the three blobs data could by eye hint that the data was clustered. The method could accurately conclude that the three blobs data set was clustered in both 2 and 12 dimensions. Not only, that but the number of clusters can be discerned by inspecting the distributions of x (see figure 15). However, the cross distribution was understood as not being clustered.

25 (b) The projected values of x onto the (a) Three blobs principal curve x Figure 14: The principle curve generated in the 2-dimensional plane

Interestingly enough, the principle curve of the cross distribution was similar to that of the A2A, OPRM1 and full data sets. Whether this implies that those data sets also share X-shape is not known.

(a) OPRM1 (b) Cross distribution

Figure 15: A comparison between the principle curves of the cross distribu- tion and the OPRM1 data.

4.4 Gap Statistic The Gap statistic concluded that there where no clusters in the striatum data set. At the same time, the method accurately concluded the existence

26 of clusters in three blobs data set, both in 2 and 12 dimensions, as well as the 15 blobs, even though the method was only run for k =1..., 6whenthe true k =15.Thisillustratesthat,iftherewerealargenumberofseparate clusters k>6 in the striatum data then this should have been detected.

Gap Se k=1 1.0205357 0.007339721 k=2 0.9749839 0.008173128 k=3 0.9481887 0.006400719 k=4 0.9306703 0.006206643 k=5 0.9473116 0.005771181

Table 3: Gapk with its respective standard error. The optimal number of clusters is marked in bold

Figure 16: A plot of the Gapk with respect to k. Whiskers denote the standard error.

As with the principle curve method, the cross distribution was not identified as clustered.

27 5 Discussion

5.1 Interpretation of Results Although results showed no reliable evidence of the striatum being clustered, it does not necessarily mean that we can definitively exclude the possibility of some underlying clusters. Firstly, the 12 dimensions of the striatume data set are completely arbitrary and a result of the experiment by Dinos Meletis Lab. In fact, these 12 dimensions are almost surely a subspace of what is really encoded in the dorsomedial striatum. Therefore, neurons in the dorsomedial striatum may be clustered in some higher or di↵erent dimension, which is not captured in the current subspace (see Figure 17)

Figure 17: A clustered data set in dimensions x and y in which values for x are only observed. This will result in the impression of a unclustered data set.

Secondly, three of aforementioned methods output some p-value. Indeed, the algorithms have been included because they have same measure and can thus be directly compared. This p-value represents the likelihood of data given a null hypothesis. A rejection of this null hypothesis as a consequence of a low p-value means that one can confirm the alternative hypothesis. However, a high p-value does not imply that the null hypothesis is true. This has the implication on the results in this thesis in the sense that it cannot be con- firmed that the the dorsomedial striatum is not clustered just because the existence of clustered cannot be armed.

Thirdly, because the principle curve method and the gap statistic did not

28 recognize the cross distribution as clustered, it would be that the striatum is configured in a similar way.

5.2 Limitations 5.2.1 Data Gathering There is a number of inherent limitations in the data set that was used which could have had an e↵ect on the results. Firstly, the number of neurons that were recorded for each mouse was limited. This is because, roughly speaking, endoscopic lens could only capture one or a few layers of neurons. Indeed, the majority of neurons would not have been recorded because of the three dimensional shape of the striatum. If more data points would be gathered it could possibly reveal the neurons as being clustered.

Secondly, the recording of the camera might have been too slow to capture temporal dynamics. The resulting variance can than been inflated, which would have masked that presence of a multimodal .

Thirdly, the method of extracting tuning scores from the raw data is is some- what arbitrary. Although intuitive, it may be that clusters are lost in this representation of the data. There could possibly be a more rigouros way of projecting the raw tunings to a d-dimensional Euclidean space.

5.2.2 Uniform as the Null Distribution The computation of the Gap statistic, the folding test and the Dip all use the uniform as a null distribution as it is the unimodal distribution that is most likely to generate spurious clusters or multimodal data. This, of course, sets a standard of which data set must pass in order to be considered clustered, but it may very well be that the bar is set too high. As a result, the tests may be too conservative which more ambiguous artificial data sets could have shown.

5.3 Future Research The null hypothesis has hereto been that a data set is unclustered, which in the case of striatum data set has not been rejected. The implications of this, as elaborated above, are that we cannot really prove anything. Would

29 it then be possible in a future e↵ort to construct tests where the null hy- pothesis instead posits that a data set is clustered? A comprehensive test for this to the author is not known. In theory, the reductions of dimensions through a data sets principal curve and then using the Silverman’s bandwith test could test the significance for modes 1, 2,...N. However, the reliability of the principle curve method in the context of finding clusters is not yet completely understood and further testing would be required to accurately value its strength in di↵erent data structures.

5.3.1 Experiments with Other Dimensions In order to fully comprehend what is encoded in the dorsomedial striatum more experiments with a variety of di↵erent tasks will have to be made. In any of those experiments, a neural representation in some subset might be clustered, in which case the dorsomedial striatum can be said to be clus- tered. On the other hand, all experiments may not reveal any clusters at all. The problem then, apart from the construction of a reliable statistical test, would be that one cannot rule out the existence of a cluster in a higher or di↵erent dimensional space. One single experiment capturing the whole representational space of the dorsomedial striatum is probably not feasible and there may therefore never be an experiment that conclusively establishes the dorosomedial striatum as unclustered, although it can be proved to be clustered.

5.3.2 Algorithms for Categorical Mixed Selectivity All methods presented in this thesis, apart from ePAIRS, are constructed to identify separate clusters based on the notion of underlying multimodal probability distribution. It is therefore not a surprise that these methods have all rejected a the cross distribution as unclustered. This is problematic since this type of structure is included in the notion of categorical mixed selectivity, which in this context is the definition that dictates what should and shouldn’t be clustered. ePAIRS is an e↵ort to capture this dynamic but, as demonstrated in this thesis ePAIRs is overly generous in labeling distribu-

30 tions as clustered. It may be possible that more direct comparison between the distributions of angles between a data set and a null distribution would provide more insight into the data structure, as opposed to only evaluate the median angle. Indeed, it may be implied that because of the similar angle distributions between the striatum data and the skew data then they hold asimilarstructure,althoughthismightnotatallbethecase.Instead,a better algorithm should be found for discerning those data sets that have a cross distribution from those who don’t.

As was noted in the result, the principal curve resembled that of the OPRM1 data. It seems rather precarious to draw any conclusions on the basis of this, since the principle curve is a 2-dimensional representation of a high dimen- sional space. Nonetheless, it would be an interesting to further investigate if the principle curve can be used identify X-shaped distributions.

The question also arises as to what a formal definition of what should be considered as categorical mixed selectivity. That is, how close to a square shape should a data set be allowed but still be considered clustered (see figure 18). In the case of classically separated clusters, the use of multimodality of the underlying probability distribution is deemed as a handy proxy for clus- ter analysis, but no such corresponding definition exist in the identification of X-shaped distributions.

Figure 18: Two examples of data sets that may or may not form hold as proof for categorically mixed selectivity depending on how strict the definition is

31 5.3.3 Validation Techniques for Correlation Clustering This thesis opted out of the use of the correlation clustering framework be- cause of lacking validation techniques; the result of any optimal solution would be hard to evaluate. However, it was included the mathematical back- ground as it is recognized as a potentially powerful tool in the inquiry of clusters, given the existence of appropriate validation measures. The use of similarity matrix includes the information about internal dynamics between the neurons, which is not captured through the use of tuning scores to each dimension. The use of correlation clustering could therefore be more sensi- tive to finding clusters, although analysis would be restricted to one session at the time and would as of such have fewer data points. That being said, alargenumberofpointscanalsobeahindrancetocorrelationclustering. When experimenting with the correlation clustering framework, a mixed in- teger program as well as a relaxed linear program in GAMS consisting of 300 points and was submitted to the NEOS server [39, 40, 41]. They were both terminated because the compilation of the programs exceeded the limit of 10Gb.

32 6 Conclusion

This thesis has employed 4 di↵erent techniques to find clusters, as well as evaluated the performances of these methods on artificial data sets. Of these, only the ePAIRS concluded the existence of functional clusters in the dor- somedial striatum. However, ePAIRS failed to recognize unclusterd data as unclustered and so its reliability is question. The folding viewed all the data sets as being unclustered. The principle curve method and the gap statistic were successful on the artificial data sets, apart from the cross distribution, and could not find any clusters in the striatum data. It is thus possible that the striatum data is of a X-shape, something that can be neither rejected or proven based on these results. It can therefore be concluded that no evidence of any clusters in the striatum was found, although the existence of clusters cannot be excluded.

33 References

[1] Clark E, Stannard. Aristotle on the Anatomy of the Brain. Journal of the History of Medicine. 1964;p. 132–148.

[2] Purves D. Neuroscience 6th edition. Sinauer Associates; 2018.

[3] M¨aki-Marttunen T, Kaufmann T, Elvs˚ashagen T, Devor A, Djurovic S, Westlye LT, et al. Biophysical Psychiatry—How Computational Neu- roscience Can Help to Understand the Complex Mechanisms of Mental Disorders. Front Psychaiatry. 2019;534(10).

[4] Hirokawa J, Vaughan A, Masset P, Kepecs TOA. Frontal cortex neuron types categorically encode single decision variables. Nature. 2019;576:446–451.

[5] Raposo D, Kaufman MT, Churchland AK. A category-free neural popu- lation supports evolving demands during decision-making. Nature Neu- roscience. 2014;17:1784–1792.

[6] Gr¨undemann J, Bitterman Y, Lu1 T, Krabbe1 S, Grewe BF, Schnitzer MJ, et al. Amygdala ensembles encode behavioral states. Science. 2019;364(6437).

[7] Hennings J, Ung RL, Resendez SL, Deisseroth K, S O. Visualizing Hypothalamic Network Dynamics for Appetitive and Consummatory Behaviors. Cell. 2015;160(3).

[8] Barbera G, Liang B, Zhang L, Chen R, Li Y, Ling D. Spatially Compact Neural Clusters in the Dorsal Striatum Encode locomotion Information. Neuron. 2016;92(1).

[9] Adler A, Katabi S, Finkes I, Israel Z, Prut Y, Bergman H. Temporal Convergence of Dynamic Cell Assemblies in the Striato-Pallidal Net- work. J Neurosci. 2012;32(7):2473–2484.

[10] Sales-Carbonell C, Taouali W, Kahlki L. No Discrete Start/Stop Signals in the Dorsal Striatum of Mice Performing a Learned Action. Current Biology. 2018;28(19):3044–3055.

34 [11] Klaus A, da Silva JA, Costa RM. What, If, and When to Move: Basal Ganglia Circuits and Self-Paced Action Initiation. Annual Review of Neuroscience. 2019;42:459–483.

[12] Parker JG, Marshall JD, Ahanonu B, Wu YW, Kim TH, Grewe BF, et al. Diametric neural ensemble dynamics in parkinsonian and dyski- netic states. Nature. 2018;(557):177 – 182.

[13] Schultz W. Reward functions of the basal ganglia. Journal of Neural Transmission. 2016;123:679–693.

[14] Samejima K, Ueda Y, Doya K, Kimura M. Representation of Action- Specific Reward Values in the Striatum. Science. 2005;35(5752):1337– 1340.

[15] Sidman RL, Kosaras B, Misra B, Senft S. High Resolution Mouse Brain Atlas;. Available from: http://www.hms.harvard.edu/research/ brain/atlas.html.

[16] Cox J, Witten IB. Striatal circuits for reward learning and decision- making. Nature Reviews Neuroscience;(20):482–494.

[17] Halkidi M, Batistakis Y, Vazirgiannis M. On Clustering Validation Tech- niques. Journal of Intelligent Information Systems. 2001 10;17:107–145.

[18] Saxena A, Prasad M, Gupta A, Bharill N. A Review of Clustering Techniques and Developments. Neurocomputing. 2017 07;267.

[19] Dempster AP, Laird NM, Rubin DB. Maximum Likelihood from In- complete Data via the EM Algorithm. Journal of the Royal Statistical Society Series B (Methodological. 1977;39(1):1–38.

[20] Ester M, Kriegel HP, Sander J, Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Pro- ceedings of the Second International Conference on Knowledge Discovery and Data Mining. KDD’96. AAAI Press; 1996. p. 226–231.

[21] Ankerstand M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering Points to Identify the Clustering Structure. SIGMOD Rec. 1999;28(2).

35 [22] Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65. [23] Arbelaitz O, Gurrutxaga I, Muguerza J, P´erez JM, Perona I. An exten- sive comparative study of cluster validity indices. Pattern Recognition. 2013;46(1):243–256. [24] Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society;63:411–423. [25] Menardi G. A Review on Modal Clustering. International Statistical Review. 2016;84(3):413–433. [26] Hartigan JA, Hartigan PM. The Dip Test for Unimodality. The Annals of . 1984;13(1):70–84. [27] Hastie T, Steutzle W. Principle Curves. Journal of the American Sta- tistical Association. 1989;84(406):502–516. [28] Ahmed MO, Walther G. Investigating the multimodality of multivariate data with principle curves. Computational Statistics Data Analysis. 2012;56(12):4462–4469. [29] Hall P, M Y. On the Calibration of the Silverman’s test for Multimodal- ity. Statistica Sinica. 2001;11:515–536. [30] Xu L, Bedrick E, Hanson T, Restrepo C. A Comparison of Statistical Tools for Identifying Modality in Body Mass Distributions. Journal of Data Science. 2014;12:175–196. [31] Si↵er A, Fouque PA, Termier A, Largou¨et C. Are your data gath- ered? The Folding Test of Unimodality;. Available from: https: //hal.archives-ouvertes.fr/hal-01951676/document. [32] Yang Z, Algesheimer R, Tessone C. A Comparative Analysis of Com- munity Detection Algorithms on Artificial Networks. Scientific Reports. 2016;6. [33] Humphries M. Spike-Train Communities: Finding Groups of Similar Spike Trains. The Journal of Neuroscience. 2011;31(6):2321–2336.

36 [34] Bansal N, Blum A, Chawla S. Correlation Clustering. Machine Learning. 2004;56:89–113.

[35] Karp RM. In: Miller RE, Thatcher JW, Bohlinger JD, editors. Reducibility among Combinatorial Problems. Boston, MA: Springer US; 1972. p. 85–103. Available from: https://doi.org/10.1007/ 978-1-4684-2001-2_9.

[36] Pandove D, Goel S, Rani R. Correlation clustering methodologies and their fundamental results. Expert Systems. 2017;35.

[37] Weglage M, W¨arnberg E, Lazaridis I, Tzortzi O, Meletis K. Complete representation of action space and value in all striatal pathways; 2020. Available from: https://www.biorxiv.org/content/10.1101/2020. 03.29.983825v1.

[38] Fr¨anti P, Sieranoja S. K-means properties on six clustering benchmark datasets; 2018. Available from: http://cs.uef.fi/sipu/datasets/.

[39] Gropp W, Mor´eJJ. Optimization Environments and the NEOS Server. In: Buhman MD, Iserles A, editors. Approximation Theory and Opti- mization. Cambridge University Press; 1997. p. 167–182.

[40] Dolan ED. The NEOS Server 4.0 Administrative Guide. Mathematics and Computer Science Division, Argonne National Laboratory; 2001. ANL/MCS-TM-250.

[41] Czyzyk J, Meisner MP, Mor´eJJ. The NEOS server. IEEE Journal on the Computational Science and Engineering. 1998;5(3):68–75.

37 7 Appendix

7.1 Principle curve

(a) The principle curve of (b) Histogram of x Figure 19: The principal curve method applied on the A2A data, p-value = 0.9928.

(b) Histogram of (a) The principle curve x

Figure 20: The principal curve method applied on the OPRM1 data, p-value =0.9896.

38 (a) The principle curve of (b) Histogram of x

Figure 21: The principal curve method applied on the data set consisting of D1, A2a, OPRM1, p-value = 0.8485.

7.2 Gap Statistic

Figure 22: Plot of Gapk with respect to k for the A2A data.

39 Figure 23: Plot of Gapk with respect to k for the OPRM1 data.

Figure 24: Plot of Gapk with respect to k for the full striatum data.

40 7.3 ePAIRS

Figure 25: ePAIRS on the A2A data, p-value = 0

Figure 26: ePAIRS on the OPRM1 data, p-value = 0

41 Figure 27: ePAIRS on the combined striatum data set, p-value = 0

42

TRITA -SCI-GRU 2020:301

www.kth.se