Neural Decoding
Total Page:16
File Type:pdf, Size:1020Kb
Neural Decoding Matthias Hennig School of Informatics, University of Edinburgh January 2019 0Acknowledgements: Mark van Rossum, Chris Williams and slides from Gatsby Liam Paninski. 1 / 54 Decoding brain activity Classi¡ cation Which one? Reconstruction: Homunculus 2 / 54 The Homunculus Flynculus [Rieke et al., 1996] 3 / 54 Overview 1 Stimulus discrimination, signal detection theory 2 Maximum likelihood and MAP decoding 3 Bounds and Fisher information 4 Spike train decoding and GLMs 4 / 54 Why decoding? Understanding the neural code. Given spikes, what was the stimulus? What aspects of the stimulus does the system encode? (capacity is limited) What information can be extracted from spike trains: By “downstream” areas? Homunculus. By the experimenter? Ideal observer analysis. What is the coding quality? Design of neural prosthetic devices Related to encoding, but encoding does not answer above questions explicitly. 5 / 54 Decoding examples Hippocampal place cells: how is location encoded? Retinal ganglion cells: what information is sent to the brain? What is discarded? Motor cortex: how can we extract as much information as possible from a collection of M1 cells? 6 / 54 Decoding theory Probability of the stimulus, prior: P(s) Probability of a measured neural response: P(r) Joint probability of stimulus and response: P(r; r) Conditional probabilities: P(rjs), P(sjr) P Marginal: P(r) = s P(rjs)P(s) Note: P(r; s) = P(rjs)P(s) = P(sjr)P(r) Bayes theorem: P(rjs)P(s) P(sjr) = P(r) Note that we need to know the stimulus statistics. 7 / 54 Example: Discrimination between two stimuli Test subject report left or right motion (2 alternative forced choice, 2AFC). See [Dayan and Abbott, 2002], chapter 3.2. 8 / 54 MT neurons in this task [Britten et al., 1992] Some single neurons do as well as animal! Possibility for averaging might be limited due to correlation? Population might still be better/faster? [Cohen and Newsome, 2009] 9 / 54 [Britten et al., 1992] Assuming rate histograms are Gaussian with equal variance σ2, the discriminability is < r > − < r > d 0 = + − σ 10 / 54 Discriminate between response distributions P(r−) and P(r+). (directions + and −), and discrimination threshold z on firing rate: Hit rate: β(z) = P(r ≥ zj+) False alarm rate: α(z) = P(r ≥ z|−) stimulus correct False + β 1 − β - 1 − α α Probability of correct answer: (β(z) + 1 − α(z))=2 Can be used to find the optimal z. 11 / 54 ROC curves Discriminate between response distributions P(r−) and P(r+). The Receiver Operating Characteristic (ROC) gives graphical intuition: Vary decision threshold and measure error rates. Larger area under curve means better discriminability. Shape relates to underlying distributions. 12 / 54 [Britten et al., 1992] Z 1 P(correct) = βdα 0 When responses are Gaussian: 1 < r > − < r > P(correct) = erfc − + 2 2σ 13 / 54 Population decoding [Dayan and Abbott (2001) after Theunissen and Miller (1991)] Cricket Cercal System: Information about wind direction is encoded by four types of neurons f (s) = [cos(s − sa)]+ rmax 14 / 54 Let ca denote a unit vector in the direction of sa, and v be a unit vector parallel to the wind velocity f (s) = [v · ca]+ rmax Crickets are Cartesian, 4 directions 45◦, 135◦, −135◦, −45◦ Population vector is defined as 4 X r vpop = ca rmax a=1 a 15 / 54 Vector method of decoding [Dayan and Abbott (2001) after Salinas and Abbott (1994)] 16 / 54 Primary Motor Cortex (M1) Certain neurons in M1 of the monkey can be described by cosine functions of arm movement direction (Georgopoulos et al, 1982) Similar to cricket cercal system, but note: Non-zero offset rates r0 f (s) − r0 = v · ca rmax Non-orthogonal: there are many thousands of M1 neurons that have arm-movement-related tuning curves 17 / 54 [Dayan and Abbott (2001) after Kandel et al (1991)] 18 / 54 Optimal Decoding p(rjs)p(s) p(sjr) = p(r) Maximum likelihood decoding (ML): s^ = argmaxs p(rjs) Maximum a posteriori (MAP): s^ = argmaxs p(s)p(rjs) Note these two are equivalent if p(s) is flat. Bayes: mimimize loss Z sB = argmins∗ L(s; s∗)p(sjr)ds s For squared loss L(s; s∗) = (s − s∗)2, optimal s∗ is posterior R mean, sB = s p(sjr)s. 19 / 54 Optimal Decoding for the cricket For the cercal system, assuming indep. noise Y p(rjs) = p(rajs) a where each p(rajs) is modelled as a Gaussian with means and variances p(s) is uniform (hence MAP=ML) ML decoding finds a peak of the likelihood Bayesian method finds posterior mean These methods improve performance over the vector method (but not that much, due to orthogonality...) 20 / 54 Cricket Cercal System [Dayan and Abbott (2001) after Salinas and Abbott (1994)] 21 / 54 General Consideration of Population Decoding [Dayan and Abbott (2001)] Gaussian tuning curves. 22 / 54 Poisson firing model over time T , count na = raT spikes. N na Y (fa(s)T ) p(rjs) = exp(−fa(s)T ) na! a=1 N X log p(rjs) = na log fa(s) + ::: a=1 P The terms in ::: are independent of s, and we assume a fa(s) is independent of s (all neurons sum to the same average firing rate). 23 / 54 ML decoding sML is stimulus that maximizes log p(rjs), determined by N 0 X fa(sML) ra = 0 fa(sML) a=1 2 2 If all tuning curves are Gaussian fa = A exp[−(s − sa) =2σw ] then P a rasa sML = P a ra which is simple and intuitive, known as Center of Mass (cf population vector) 24 / 54 Accuracy of the estimator Bias and variance of an estimator sest best (s) = hsest i − s 2 2 σest (s) = h(sest − hsest i) i 2 2 2 h(s − sest ) i = best (s) + σest 2 2 Thus for an unbiased estimator, MSE h(s − sest ) i is given by σest , the variance of the estimator 25 / 54 Fisher information Fisher information is a measure of the curvature of the log likelihood near its peak @2 log p(rjs) Z @2 log p(rjs) ( ) = − = − ( j ) IF s 2 drp r s 2 @s s @s (the average is over trials measuring r while s is fixed) Cramér-Rao bound says that for any estimator [Cover and Thomas, 1991] 0 2 2 (1 + best (s)) σest ≥ IF (s) (1+b0 (s))2 efficient estimator if σ2 = est . est IF (s) 2 In the bias-free case an efficient estimator σest = 1=IF (s). ML decoder is typically efficient when N ! 1. 26 / 54 Fisher information In homogeneous systems IF indep. of s. D @2 log p(rjs) E More generally Fisher matrix (IF )ij (s) = − . @si @sj s Taylor expansion of Kullback-Leibler P DKL(P(s); P(s + δs)) ≈ ij δsi δsj (IF )ij Not a Shannon information measure (not in bits), but related to Shannon information in special cases,e.g. [Brunel and Nadal, 1998, Yarrow et al., 2012]. 27 / 54 Fisher information for a population For independent Poisson spikers 2 ! @2 log p(rjs) X f 0 (s) f 00(s) I (s) = − = T hr i a − a F @s2 a f (s) f (s) a a a For dense, symmetric tuning curves, the second term sums to zero. Using fa(s) = hrai we obtain X (f 0 (s))2 I (s) = T a F f (s) a a −(s−s0+a:ds)2=2σ2 For dense fa(s) = Ae w with density ρ = 1=ds, sum becomes integral p IF = 2πTAρ/σw 28 / 54 For Gaussian tuning curves [Dayan and Abbott (2001)] 0 Note that Fisher information vanishes at peak as fa(s) = 0 there. Can be used to create optimal tuning curves, [Dayan and Abbott, 2002] chapter 3.3. 0 p Discriminability d = ∆s IF (s) for a small ∆F. 29 / 54 FI predicts human performance [Dayan and Abbott (2001)] Orientation discrimination for stimuli of different size (different N) Solid line: estimated minimum standard deviation at the Cramer Rao bound Triangles: human standard deviation as function of stimulus size (expressed in N) 30 / 54 Slope as strategy From paper on bat echo location [Yovel et al., 2010] ) 31 / 54 Spike train decoding Dayan and Abbott §3.4 Estimate the stimulus from spike times ti to minimize e.g. 2 hs(t) − sest (t)i First order reconstruction: X Z sest (t − τ0) = K (t − ti ) − hri dτK (τ) ti The second term ensures that hsest (t)i = 0 Delay τ0 can be included to make decoding easier: predict stimulus at time t − τ0 based on spikes up to time t 32 / 54 Causal decoding Organism faces causal (on-line) decoding problem. Prediction of the current/future stimulus requires temporal correlation of the stimulus. Example: in head-direction system neural code correlates best with future direction. Requires K (t − ti ) = 0 for t ≤ ti . X Z sest (t − τ0) = K (t − ti ) − hri dτK (τ) ti Delay τ0 buys extra time 33 / 54 Causal decoding Delay τ0 = 160 ms. (B: shifted/causal kernel, C: non-causal kernel) At time t estimate s(t − τ0): Spikes 1..4: contribute because stimulus is correlated (right tail of K) Spikes 5..7: contribute because of τ0 Spikes 8, 9,... : have not occurred yet. Stimulus uncorrelated: Kernel from STA [Dayan and Abbott (2001)] 34 / 54 Decoding from a GLM 4 stimulus 2 STA GLM 0 2 0 25 50 75 100 125 150 175 200 7.5 5.0 2.5 0.0 0 25 50 75 100 125 150 175 200 10 s 5 0 25 50 75 100 125 150 175 200 Time bin 35 / 54 H1 neuron of the fly Solid line is reconstruction using acausal filter Note, reconstruction quality will depend on stimulus [Dayan and Abbott (2001) after Rieke et al (1997)] 36 / 54 MAP estimation According to Bayes theorem p(sjr) = p(rjs)p(s)=p(r) log p(sjr) = log p(rjs) + log p(s) + c Tractable as long as rhs.