Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Chapter 21

Point Processes

Let us return to the spike train data recorded from the supplemental eye field (SEF) shown in Figure 21.1. The raster plot displays a set of sequences of time points at which action potentials occurred, i.e., a set of spike trains. There is considerable irregularity in the spike times, both within and across trials, which may be described in terms of models. For sequences of event times, however, we need a richer structure than we have developed so far. The probability models we use in this context are called point processes, reflecting the localization of the events as points in time (or, sometimes, points in space) together with the notion that the probability distributions evolve across time according to a stochastic process. The simplest of point processes are called homogeneous Poisson processes. As we outline in Sec- tions 21.0.3, homogeneous Poisson processes can describe highly irregular sequences of event times that have no discernable temporal structure, and they are easy to work with mathematically. When an experimental stim- ulus or behavior is introduced, however, time-varying characteristics of the process become important. In subsequent subsections we indicate ways that more general processes can retain some of the elegance of Poisson processes while gaining the ability to describe a wide variety of phenomena.

1 2 CHAPTER 21. POINT PROCESSES

Spatial Pattern trial number

-200 0 200 400 600 -200 0 200 400 600 120 120 80 80 40 40 0 0

firing rate per second -200 0 200 400 600 -200 0 200 400 600 Time (ms) Time (ms)

Figure 21.1: Raster plot (TOP) and PSTH (BOTTOM) for an SEF neuron under both the external-cue or “spatial” condition (LEFT) and the complex cue or “pattern” condition (RIGHT). Each line in each raster plot contains data from a single trial, that is, a single instance in which the condition was applied. There are 15 trials for each condition. The tick marks represent spike times, i.e., times at which the neuron fired.

Spike trains are fundamental to information processing performed by the brain, and point processes form the foundation for distinguishing signal from noise in spike trains. In the remainder of this introduction we discuss several examples of spike-train data.

Before we begin, we would like to issue an important warning: point pro- cesses are not the same as continuous-valued stochastic processes, which can take on a continuum of possible values at each point in time. A point process specifies a random sequence of points or, equivalently, a binary indication at every time point as to whether or not an event occurs at that time. Many standard signal-processing techniques are designed primarily for continuous- valued data. Because event-time data are different, alternative methods of analysis are often preferable. 3

Example: Retinal neuron under constant light and environmen- tal conditions Neurons in the retina typically respond to patterns of light displayed over small sections of the visual field. When retinal neurons are grown in culture and held under constant light and environmental condi- tions, however, they will still spontaneously fire action potentials. In a fully functioning retina, this spontaneous activity is sometimes described as back- ground firing activity, which is modulated as a function of visual stimuli. Fig- ure 21.2 shows the spiking activity of one such neuron firing spontaneously over a period of 30 seconds. Even though this neuron is not responding to any explicit stimuli, we can still see structure in its firing activity. Although most of the ISIs are shorter than 20 msec, some are much longer: there is a small second mode in the histogram around 60-120 milliseconds. This sug- gests that the neuron may experience two distinct states, one in which there are bursts of spikes (with short ISIs) and another, more quiescent state (with longer ISIs). From Figure 21.2 we may also get an impression that there may be bursts of activity, with multiple spikes arriving in quick succession of one another. 2

Example: Spiking activity of a primary motor cortical neuron The spiking activity of neurons in primate motor cortex has been shown to relate to intended motor outputs, such as limb reaching movements. Experi- ments where a monkey performs a two-dimensional reach have shown velocity dependent cosine tuning, whereby a motor cortical neuron fires most when the hand moves in a single preferred direction and the intensity drops off as a cosine function of the difference between the intended movement and that preferred direction, and additionally increases with increasing move- ment speed. Figure 3 shows an example of the spiking activity of a neuron in primate motor cortex as a function of hand movement direction during a center-out reaching task. The neuron fires most intensely when the hand moves in a direction about 170 degrees from east. These firing patterns have also been shown to vary as a function of movement speed (Moran & Schwartz, (1999)). 2

Example: Hippocampal place cell Neurons in rodent hippocampus have spatially specific firing properties, whereby the spiking intensity is high- est when the animal is at a specific location in an environment, and falls off as the animal moves further away from that point. Such receptive fields are called place fields, and neurons that have such firing properties are called 4 CHAPTER 21. POINT PROCESSES

Figure 21.2: Spontaneous spiking activity of a goldfish retinal neuron in cul- ture under constant light and environmental conditions over 30 seconds. (A) Retinal ganglion cell (taken from web, may be copyrighted) (B) Histogram of interspike intervals and (C) spike train, from a retinal ganglion cell under constant conditions.

Figure 21.3: Cosine tuning in primate motor cortex. (A) Spike rasters for a center out task with eight principal directions. (B) Spike count as a function of direction shows a sinusoidal trend. to be re-done 5

Figure 21.4: Movement trajectory (blue) and hippocampal spiking activity (red) of a rat during a free-foraging task in a circular environment. place cells. Figure 4 shows an example of the spiking activity of one such place cell, as a rat executes a free-foraging task in a circular environment. The rat’s path through this environment is shown in blue, and the location of the animal at spike times is overlain in red. It is clear that the firing in- tensity is highest slightly to the southwest of the center of the environment, and decreases when the rat moves away from this point. 2

Point processes also arise in imaging. For instance, in PET imaging, a radioisotope that has been incorporated into a metabolically active molecule is introduced into the subject’s bloodstream. These molecules become con- centrated in specific tissues and the radioisotopes decay, emitting positrons. These emissions represent a spatiotemporal point process because they are lo- calized occurrences both spatially, throughout the tissue, and in time. After being emitted, the positrons interact with nearby electrons, producing a pair of photons that shoot out in opposite directions and are detected by a circu- lar ring of photosensors. The arrival of photons at each sensor represents a temporal point process and, by characterizing the temporal interactions be- tween arrivals at multiple sensors, it is possible to infer the original location of the positron emission. By observing and inferring the locations of many such occurrences, it is possible to construct an image of specific metabolically active tissues.

Point processes have been applied to many physical phenomena outside of neuroscience. For example, temporal point processes have been used to char- acterize the timing and regularity of heart beats (Barbieri and Brown, 2005); 6 CHAPTER 21. POINT PROCESSES to describe geyser eruptions (Azzalini and Bowman, 1990); and to character- ize and predict the locations and times of major earthquakes (Ogata, 1988).

21.0.1 A point process may be specified in terms of event times, inter-event intervals, or event counts.

If s1, s2, . . . , sn are times at which events occur within some time interval we may take xi = si − si−1, i.e., xi is the elapsed time between si−1 and si, and define x1 = s1. This gives the inter-event waiting times xi from the event times and we could reverse the arithmetic to find the event times from a j set of inter-event waiting times x1, . . . , xn using sj = i=1 xi. In discussing point processes, both of these representations are useful.P In the context of spike trains, s1, s2, . . . , sn are the spike times, while x1, . . . , xn are the inter- spike intervals (ISIs). Nearly all of our discussion of event-time sequences will involve modeling of spike train behavior.

To represent the variability among the event times we let X1, X2, . . . be a sequence of positive random variables. Then the sequence of random vari- j ables S1, S2, . . . defined by Sj = i=1 Xi is a point process on (0, ∞). In fitting point processes to data wePinstead consider finite intervals of time over which the process is observed, and these are usually taken to have the form (0, T ], but for many theoretical purposes it is more convenient to assume the point process ranges across (0, ∞).

Another useful way to describe a set of event times is in terms of the counts of events observed over time intervals. The event count in a particular time interval may be considered a . For theoretical purposes it is helpful to introduce a function N(t) that counts the total number of events that have occurred up to and including time t. N(t) is called the counting process representation of the point process. See Figure 21.5. If we let ∆N(t1,t2) denote the number of events observed in the interval (t1, t2], then we have ∆N(t1,t2) = N(t2) − N(t1). The count ∆N(t1,t2) is often called the increment of the point process between t1 and t2. In the case of a neural spike train, Si would represent the time of the ith spike, Xi would represent the ith inter-spike interval (ISI), and ∆N(t1,t2) would represent the spike count in the interval (t1, t2]. For event times Si and inter-event waiting times Xi 7

Figure 21.5: Multiple specifications for point process data: the process may be specified in terms of spike times, waiting times, counts, or discrete binary indicators. we are dealing with mathematical objects that are already familiar, namely sequences of random variables, with the index i being a positive integer. The counting process, N(t), on the other hand, is a continuous-time stochastic process, which determines count increments that are random variables.

Keeping track of the times at which the count increases is equivalent to keeping track of increments. Furthermore, if t1 = si and t2 < si+1 then

∆N(t1,t2) = 0 but when t2 = si+1 then ∆N(t1,t2) = 1. Thus, keeping track of the times at which the count increases is equivalent to keeping track of events themselves and, therefore, the counts provide a third way to characterize a point process.

As an example of the way we may identify the event times with the counting process, the set of times {t : N(t) < j} when the counting process is less than some value j is equivalent to the set of times {t : Sj > t} when the jth spike has not yet occurred. Both of these representations express the set of all times that precede the jth spike, but they do so differently. Specifying any one of the spike times, inter-spike intervals, or counting process fully specifies the other two and fully specifies the point process as a whole. It is often possible to simplify theoretical calculations by taking advantage of these multiple equivalent expressions for a point process. 8 CHAPTER 21. POINT PROCESSES

The event times, inter-event intervals, and counting process all specify the point process in continuous time. It is often useful, both for developing intuition and for constructing data analysis methods, to take an observation interval (0, T ] and break it up into n small, evenly-spaced time bins. Let ∆t = T/n, and tk = k · ∆t, for k = 1, ..., n. We can now consider the discrete increments ∆Nk = N(tk) − N(tk−1), which count the number of events in a single bin. If we make ∆t small enough, it becomes extremely unlikely for there to be more than one event in a single bin. The set of increments

{∆Nk}k=1,...,n then becomes a sequence of 0s and 1s, with the 1s indicating the bins in which the events are observed (see Figure 21.5). In the case of spike trains, data are often recorded in this form, with ∆t = 1 millisecond. Such a discrete-time process is yet another way to represent a point process. It loses some information about the precise timing of events within each bin, but for sufficiently small ∆t this loss of information becomes irrelevant for practical purposes. Indeed, as we discuss below, a point process may be considered the limit of its discrete-time counterpart as ∆t → 0. The representation of a point process, approximately, in terms of a sequence of binary random variables is very convenient; it is central to much existing analytical methodology.

21.0.2 Point processes can display a wide variety of history-dependent behaviors.

In many stochastic systems, past behavior influences the future. The bio- physical properties of ion channels, for example, limit how fast a neuron can recover immediately following an action potential. This leads to an abso- lute refractory period following a spike, when the neuron is unable to fire another spike. In addition, after the absolute refractory period there is a relative refractory period during which the neuron can fire again, but re- quires stronger input in order to do so. This is perhaps the most basic illustration of history dependence in neural spike trains. To describe spike train variability accurately (at least for moderate to high firing rates where the refractory period is important), the probability of a spike occurring at a given time must depend on how recently the neuron has fired in the past. A more complicated history-dependent neural behavior is bursting, which is characterized by short sequences of spikes with small interspike intervals. In 9 addition, spike trains are sometimes oscillatory. For example, neurons in the CA1 region of rodent hippocampus tend to fire at particular phases of the EEG theta rhythm. Thus, in a variety of settings, probability models for spike trains make dependence on spiking history explicit.

On the other hand, a great simplification is achieved by ignoring history dependence and, instead, assuming the probability of spiking at a given time has no relationship with previous spiking behavior. This assumption leads to the class of Poisson processes, which are very appealing from a mathematical point of view: although they rarely furnish realistic models for spike train data, they are a pedagogical—and often practical—starting point for point processes in much the way that the Normal distribution is for continuous random variables.

21.0.3 Poisson processes are point processes for which event do not depend on occurrence or timing of past events.

Poisson processes, are characterized by lack of memory: among point pro- cesses in general, Poisson processes are those for which the occurrence of events in an interval (t, t+δ) is independent of the occurrence of events prior to time t (for all t and δ). Thus, the specification of Poisson processes does not involve event histories. This is an enormous simplification, which makes Poisson processes the easiest point processes to use for data analysis, and a good place to start a pedagogical discussion. On the other hand, neural event sequences such as spike trains usually exhibit history dependence, so that accurate descriptions of them usually require non-Poisson processes. As we shall see below (and again in Chapter 7), it is not hard to modify Poisson process models to make them more realistic.

Two kinds of Poisson processes must be distinguished. When event prob- abilities are invariant in time Poisson processes are called homogeneous; oth- erwise they are called inhomogeneous. We begin with the homogeneous case, and present two different but equivalent definitions.

Definition: A homogeneous Poisson process with intensity λ is a point 10 CHAPTER 21. POINT PROCESSES process satisfying the following conditions:

1. For any interval, (t, t + ∆t), ∆N(t,t+∆t) ∼ P (µ) with µ = λ∆t.

2. For any non-overlapping intervals, (t, t + ∆t) and (s, s + ∆s), ∆N(t,t+∆t) and ∆N(s,s+∆s) are independent.

For spike trains, the first condition states that for any time interval of length ∆t, the spike count is a Poisson random variable with mean µ = λ·∆t. In particular, the mean, which is the expected number of spikes in the interval, increases in proportion to the length of the interval; the distribution of the spike count depends on the length of the interval, but not on its starting time. This homogeneous process is time-invariant, or stationary. The second condition states that the spike counts (the counting process increments) from non-overlapping intervals are independent. In other words, the distribution of the number of spikes in an interval does not depend on the spiking activity outside that interval. Another way to state this definition is to say that a homogeneous Poisson process is a point process with stationary, independent increments.

There is one technical point to check: we need to be sure that the dis- tributions of overlapping intervals, given in Definition 1, are consistent. For example, if we consider intervals (t1, t2) and (t2, t3) we must be sure that the Poisson distributions for the counts in each of these are consistent with the Poisson distribution for the count in the interval (t1, t3). Specifically, in this case, we must know that the sum of two independent Poisson random variables with means µ = λ(t2 − t1) and µ = λ(t3 − t2) is a Poisson ran- dom variable with mean µ = λ(t3 − t1). In Appendix B we present a simple proof that shows that this, and all such resulting consistency relations, are satisfied.

It is easy to derive the inter-event waiting time distribution—in the con- text of spike trains, the ISI distribution—for a homogeneous Poisson process. st Recalling that Xi is the length of the inter-event interval between the (i−1) th and i event times, we have Xi > t precisely when ∆N(Si−1,Si−1+t] = 0. From −λt Definition 1, P ∆N(Si−1,Si−1+t] = 0 = e . Therefore, the CDF of Xi is −λt FXi (t) = P (Xi ≤ t) = 1−e , which is the CDF of an Exp(λ) random vari- able. We may turn things around to obtain the following characterization. 11

Theorem: A homogeneous Poisson process with intensity λ is a point process for which the inter-event waiting times are i.i.d. Exp(λ).

Proof: See Appendix B. 2

Recall that the is memoryless. According to this theorem, for a Poisson process, at any given moment the time at which the next event will occur does not depend on past events. Indeed, it may also be shown that Poisson processes are the only memoryless processes (e.g., Ross, 1996).

Another way to think about homogeneous Poisson processes is that the event times are scattered “as irregularly as possible.” Here is a characteriza- tion that provides the basis for this intuition.1

Theorem: Suppose we observe N(T ) = n events from a homogeneous Poisson process on an interval (0, T ]. Then the distribution of the event times is the same as that of a sample of size n from a uniform distribution on (0, T ].

Proof: See Appendix B. 2

1

0.5

0 0 5 10 15 20 25 30 Time (sec)

Figure 21.6: A sequence of MEPSC arrival times. The arrival times are highly irregular.

Example: Miniature excitatory post-synamptic currents Fig- ure 21.6 displays arrival times of miniature excitatory postsynaptic currents (MEPSCs) recorded from neurons in neonatal mice at multiple days of devel- opment. To record these events the neurons are patched clamped at the cell body and treated so that they cannot propagate action potentials. These MEPSCs are thought to represent random activations of the dendritic ar- bors of the neuron at distinct spatial locations, so that the two assumptions of a Poisson process are plausible. The sequence of events in Figure 21.6

1A second aspect of the “irregularity” notion is that the Exp(λ) waiting-time distribu- tion maximizes the entropy among all distributions on (0, ∞) having mean µ = λ−1. 12 CHAPTER 21. POINT PROCESSES looks highly irregular, with no temporal structure. Figure 21.7 displays a histogram of the interarrival waiting times between MEPSC events. The distribution of waiting times is well captures by an exponential fit, overlain in red. This is further supported by the P-P plot, comparing the empirical CDF to that of an exponential, also shown2 in Figure 21.7. 2

30 1

0.9 25 0.8

0.7 20 0.6

15 0.5 Count

Model CDF 0.4 10 0.3

0.2 5 0.1

0 0 0 500 1000 1500 2000 2500 0 0.2 0.4 0.6 0.8 1 Interevent Interval (ms) Empirical CDF

Figure 21.7: Histogram and P-P plot of MEPSC inter-arrival waiting times. (LEFT) Overlaid on the histogram is an Exponential pdf. (RIGHT) P-P plot falls within diagonal bands, indicating no lack of fit according to the Kolmogorov-Smirnov test (discussed in Chapter 4).

We said earlier that an important, approximate representation of a point process is in terms of a sequence of binary random variables, each of which indicates whether an event has occurred in some small time bin of width ∆t. We may also consider the discrete version of a homogeneous Poisson process. For most physical systems, and certainly for neural spiking data, it is possible to pick a discrete time bin size, ∆t, such that at most one event (one spike) can occur in any one bin. Suppose we have an observation interval, (0, T ], and we partition it into bins of size ∆t, with ∆t sufficiently small that at most one event will occur in each bin. Let Yi be the binary random variable that indicates whether an event has occurred in the ith time bin, and let P (Yi = 1) = pi, for i = 1, . . . , n (so that there are n time bins and T = n∆t). If the Yi random variables are Bernoulli trials, i.e., they are independent and p1 = p2 = · · · = pn = p for some p, then the sequence of event times follows, approximately, a Poisson process. The approximation here comes from the

2The small deviation of the curve from the diagonal in the lower left-hand corner of the P-P plot is probably due to inaccuracy of measurement for very short waiting times. 13 discretization of time into small bins: when we represent spike times as Bernoulli trials we would be assuming that spiking events are not usefully quantified with an accuracy more precise than ∆t; in electrophysiological experiments, typically ∆t = 1 millisecond. The intuition here is crucial: a homogeneous Poisson process is essentially a sequence of Bernoulli trials. We provide some mathematical justification for all this in Appendix B, the main idea being that as ∆t becomes sufficiently small, the probability of observing a spike in each bin is approximately p = λ∆t, which is itself a small number; in this case the Poisson distribution from Definition 1 is nearly identical to a Bernoulli distribution; thus we may say that the sequence of Bernoulli trials converges to a Poisson process as ∆t → 0.

21.0.4 The Exponential distribution is used to describe waiting times without memory.

A random variable X is said to have an Exponential distribution with pa- rameter λ when its pdf is f(x) = λe−λx (21.1) for x > 0, and is 0 for x ≤ 0. We will then say that X has an Exp(λ) distribution and we will write X ∼ Exp(λ). The cdf of an Exponential distribution is

x F (x) = λe−λtdt 0 Z x = −e−λt 0

= 1 − e− λx.

Thus, when X ∼ Exp(λ) we also have

P (X > x) = e−λx. (21.2)

We defined the Exponential distribution in Equation (21.1), using it to illustrate calculations based on the pdf. The Exponential finds practical application in describing distributions of event durations. 14 CHAPTER 21. POINT PROCESSES

Figure 21.8: Current recordings from individual ion channels in the presence of acetylcholine-type agonists. The records show the opening (higher current levels) and closing (lower current levels), with the timing of opening and closing being stochastic. From Colquhoun and Sakmann. TO BE RE- DONE

Example: Duration of Ion Channel Activation To investiage the functioning of ion channels, Colquhoun and Sakmann (1985) used patch- clamp methods to record currents from individual ion channels in the pres- ence of various acetylcholine-like agonists. (Colquhoun, D. and Sakmann, B. (1985), Fast events in single-channel currents activated by acetylcholine and its analogues at the frog muscle end-plate, J. Physiology, 369: 501–557; see also Colquhoun, D. (2007) Classical Perspective: What have we learned from single ion channels? J. Physiology, 581: 425–427.) A set of their recordings is shown in Figure 21.8. One of their main objectives was to describe the opening and closing of the channels in detail, and to infer mechanistic actions from the results. Colquhoun and Sakmann found that channels open in sets 15

Figure 21.9: Duration of channel openings. Panel A depicts the distribution of burst durations for a particular agonist. Panel B displays the distribu- tion of bursts for which there was only 1 opening, with an Exponential pdf overlaid. This illusrates the good fit of the Exponential distribution to the durations of ion channel opening. Panels C displays the distributions of bursts for which there were 2 apparent openings, with a Gamma pdf, with shape parameter 2, overlaid. Panel C again indicates good agreement. Pan- els D-F show similar results, for bursts with 3-5 openings. From Colquhoun and Sakmann. TO BE RE-DONE of activation “bursts” in which the channel may open, then and shut again and open again in rapid succession, and this may be repeated, with small 16 CHAPTER 21. POINT PROCESSES gaps of elapsed time during which the ion channel is closed. A burst may thus have 1 or several openings. As displayed in Figure 21.9, Colquhoun and Sak- mann examined separately the bursts having a single opening, then bursts with 2 openings, then bursts 3, 4, and 5 openings. Panel B of Figure 21.9 indicates that, for bursts with a single opening, the opening durations follow closely an Exponential distribution. In the case of bursts with 2 openings, if each of the two opening durations were Exponentially distributed, and the two were independent, then their sum—the total opening duration—would be Gamma with shape parameter α = 2. Panel C of Figure 21.9 indicates the good agreement of the Gamma with the data. The remaining panels show similar results for the other cases. 2

The Exponential distribution is very special3 because of its “memoryless” property. To understand this, let X be the length of time an ion channel is open, and let us consider the probability that the channel will remain open for the next time interval of length h. For example, h might be 5 milliseconds. How do we write this? If we begin the moment the channel opens, i.e., at x = 0, the next interval of length h is (0, h) and we want P (X ≥ h). On the other hand, if we begin at time x = t, for some positive t, such as 25 milliseconds, the interval in question is (t, t + h) and we are asking for a : if the channel is open at time t we must have X ≥ t, so we are asking for P (X ≥ t + h|X ≥ t). We say that the channel opening duration is memoryless if P (X ≥ t + h|X ≥ t) = P (X ≥ h) (21.3) for all t > 0 and h > 0. That is, if t = 25 milliseconds, the channel does not “remember” that it has been open for 25 milliseconds already; it still has the same probability of remaining open for the next 5 milliseconds that it had when it first opened. And this is true regardless of the time t we pick. The Exponential distributions are the only distributions that satisfy Equation (21.3).

Contrast this memorylessness with, say, a Uniform distribution on the interval [0, 10], measured in milliseconds. According to this Uniform distri- bution, the event (e.g., the closing of the channel) must occur within 10 mil-

3Another reason the Exponential distribution is special is that among all distributions on (0, ∞) with mean µ = 1/λ, the Exp(λ) distribution has the maximum entropy; cf. the footnote on page ??. 17 liseconds and initially every 5 millisecond interval has the same probability. In particular, the probability the event will occur in the first 5 milliseconds, i.e., in the interval [0, 5], is the same as the probability it will occur in the 1 last 5 milliseconds, in [5, 10]. Both probabilities are equal to 2 . However, if at time t = 5 milliseconds the event has not yet occurred then we are certain it will occur in the next half second [5, 10], i.e., this probability is 1 1, which is quite different than 2 . In anthropomorphic language we might say the random variable “remembers” that no event has yet occurred, so its conditional probability is adjusted. For the Exponential distribution, the probability the event will occur in the next 5 milliseconds, given that it has not already occurred, stays the same as time progresses.

Theorem If X ∼ Exp(λ) then Equation (21.3) is satisfied.

Proof: Using Equation (21.2) we have P (X ≥ t + h, X ≥ t) P (X ≥ t + h|X ≥ t) = P (X ≥ t) P (X ≥ t + h) = P (X ≥ t) e−λ(t+h) = e−λt = e−λh = P (X ≥ h). 2

An additional characterization of the Exponential distribution is that it has a constant hazard function. The exponential hazard function is easy to compute: h(t)dt = P (t < X < t + dt|X > t) P (t < X < t + dt) = P (X > t) λe−λtdt = e−λt = λ · dt. This says that the hazard h(t) = λ is constant over time. That is, given that the event has not already occurred, the probability that the event occurs in 18 CHAPTER 21. POINT PROCESSES the next infinitesimal interval (t, t + dt) is the same as it would be for any other infinitesimal interval (t0, t0 + dt). Again, Exponential distributions are the only distributions that have constant hazard functions.

21.0.5 Inhomogeneous Poisson processes have time-varying intensities.

We made two assumptions in defining a simple Poisson process: that the increments were (i) stationary, and (ii) independent for non-overlapping in- tervals. The first step in modeling a larger class of point processes is to eliminate the stationarity assumption. For spike trains, we would like to construct a class of models where the spike count distributions vary across time. In terms of the Bernoulli-trial approximation, we wish to allow the event probabilities pi to differ.

Definition: An inhomogeneous Poisson process with intensity function λ(t) is a point process satisfying the following conditions:

t2 1. For any interval, (t1, t2), ∆N(t1,t2) ∼ P (µ) with µ = t1 λ(t)dt. R 2. For any non-overlapping intervals, (t1, t2) and (t3, t4), ∆N(t1,t2) and

∆N(t3,t4) are independent.

The reason this process is called an inhomogeneous Poisson process is clear. It still has Poisson increments, but each increment has its own mean, determined by the value of the rate function over the interval in question. This process no longer possesses the independent increments property, but still has independent increments. As a result, this process also still has the memoryless property, according to which the probability of spiking at any instant does not depend on occurrences or timing of past spikes. In terms of the discrete approximation using time bins of length ∆t (with ∆ sufficiently small that the probability of more than one event in any bin is negligible) we may write pi = λ(t)dt, where the integral is over the ith time bin, and we then obtain indepR endent Bernoulli random variables Yi with P (Yi = 1) = pi. The independence among these Bernoulli random variables corresponds to 19

the memorylessness of the Poisson process. In this case, the values of pi may vary across time, corresponding to the inhomogeneity of the process.

For future reference, the next theorem provides an important formula.

Theorem The event time sequence S1, S2, . . . , SN(T ) from a Poisson pro- cess with intensity function λ(t) on an interval (0, T ] has joint pdf

T n

fS1,...,SN(T ) (s1, ..., sn) = exp − λ(t)dt λ(si). (21.4) ( 0 ) i Z Y=1

Lemma The pdf of the ith waiting-time distribution is

si fSi (si|Si−1 = si−1) = λ(si) exp − λ(t)dt . (21.5) ( Zsi−1 )

Proof of the lemma: Note that {Si > si|Si−1 = si−1}, is equivalent to there being no events occur in the interval (si−1, si]. Therefore, P (Si > si|Si−1 = si−1) = si P ∆N(si−1,si] = 0 = exp − si−1 λ(t)dt , and the ith waiting time CDF is   n R o si therefore P (Si ≤ si|Si−1 = si−1) = 1 − exp − si−1 λ(t)dt . The derivative of the CDF n R o d si fSi (si|Si−1 = si−1) = 1 − exp − λ(t)dt dsi ( Zsi−1 )! gives the desired pdf. 2

Proof of the theorem: We have

fS1,...,SN(T ) (s1, ..., sn)

= fS1 (s1)fS2 (s2|S1 = s2) · · · fSN(T ) (sn|Sn−1 = sn−1) · P (∆N(sn,T ] = 0).

The factors involving waiting-time densities are given by the lemma. The last factor is T P (∆N(sn,T ] = 0) = exp − λ(t)dt . Zsn ! Combining these gives the result. 2 20 CHAPTER 21. POINT PROCESSES

The next theorem gives another interesting way to think about inhomo- geneous Poisson processes. Note, first, that the length of the sequence of event times S1, S2, . . . , SN(T ) depends on the random quantity N(T ). Thus, to be more thorough we might write the joint pdf above in the form

fS1,...,SN(T ) (s1, ..., sn) = fS1,...,SN(T ),N(T )(s1, ..., sn, N = n). That is, the pdf on the left-hand side is really a short-hand notation for the pdf on the right-hand side. This observation is used in the proof of the following theorem. We will write fN (n) for the pdf of N(T ) and note that, T for a Poisson process with intensity λ(t), N(T ) ∼ P (µ) with µ = 0 λ(t)dt. R Theorem Let S1, S2, . . . , SN(T ) be an event sequence from a Poisson pro- cess with intensity function λ(t) on an interval (0, T ]. Conditionally on N(T ) = n, the sequence S1, S2, . . . , Sn, has the same joint distribution as an ordered set of i.i.d. observations from a univariate distribution having pdf λ(t) g(t) = T . 0 λ(u)du R Proof: We write the conditional pdf as

fS1,...,SN(T ) (s1, ..., sn) fS1,...,SN(T ) (s1, ..., sn|N(T ) = n) = fN (n) T − λ(t)dt n e 0 i=1 λ(si) n = T RT λ(t)dt − λ(t)dt Q0 e 0 R n!  Rn λ(si) = n! T i=1 0 λ(t)dt Yn R = n! g(si). i Y=1

Noting that there are n! ways to order the observations s1, . . . , sn, this com- pletes the proof. 2

The theorem says that we may consider an inhomogeneous Poisson pro- cess with intensity λ(t) to be equivalent to a two-stage process in which we (1) generate an observation N = n from a Poisson distribution with mean 21

T µ = 0 λ(t)dt, and then (2) generate n i.i.d. observations from a distribution T havingR g(t) = λ(t)/ 0 λ(u)du as its pdf. One use of this interpretation is that it explains the senseR in which the PSTH is actually a histogram. Fur- thermore, it motivates any application of a density estimator (e.g., a Normal kernel density estimator or Gaussian filter) as a method of smoothing the PSTH.

Corollary Let S1, S2, . . . , SN(T ) be an event sequence from a homoge- neous Poisson process with intensity λ on an interval (0, T ]. Conditionally on N(T ) = n, the sequence S1, S2, . . . , Sn, has the same joint distribution as an ordered set of i.i.d. observations from a Uniform distribution on [0, T ].

21.0.6 Renewal processes have i.i.d. inter-event wait- ing times.

The homogeneous Poisson process developed above assumed that the point process increments were both stationary and independent of past event his- tory. To accommodate event probabilities that change across time, we gen- eralized from homogeneous to inhomogeneous Poisson processes; this elimi- nated the stationarity assumption, but preserved the independence assumption— which entailed history independence. Systems that produce point process data, however, typically have physical mechanisms that lead to history- dependent variation among the events, which cannot be explained with Pois- son models. Therefore, it is necessary to further generalize by removing the independence assumption.

The simplest kind of history-dependent behavior occurs when the prob- ability of the ith event depends on the occurence time of the previous event si−1, but not on any events prior to that. This implies that the ith waiting time Xi is independent of event times prior to Si−1, and is therefore inde- pendent of all waiting times Xj for j < i. Thus, the waiting time random variables are all mutually independent. In the time-homogeneous case, they also all have the same distribution. A point process with i.i.d waiting times is called a renewal process. We already saw that homogeneous Poisson pro- cesses have i.i.d. Exponential waiting times. Therefore, renewal processes 22 CHAPTER 21. POINT PROCESSES may be considered generalizations of homogeneous Poisson processes.4

A renewal model is specified by the distribution of the inter-event wait- ing times. Typically, this takes the form of a probability density function, fXi (xi), where xi can take values in [0, ∞). In principle we can define a re- newal process using any that takes on positive values, but there are some classes of probability models that are more commonly used either because of their distributional properties, or because of some physical or physiological features of the underlying process.

For example, the Gamma distribution, which generalizes the Exponential, may be use when one wants to describe interspike interval distributions using two parameters: the Gamma shape parameter gives it flexibility to capture a number of characteristics that are often observed in point process data. If this shape parameter is equal to one, then the Gamma distribution simplifies to an exponential, which as we have shown, is the ISI distribution of a simple Poisson process. Therefore, renewal models based on the Gamma distribution generalize simple Poisson processes, and can be used to address questions about whether data is actually Poisson. If the shape parameter is less than one, then the density drops off faster than an exponential. This can be useful in providing a rough description of spike trains from neurons fire in rapid bursts. If the shape parameter is greater than one, then the Gamma density function takes on the value zero at xi = 0, rises to a maximum value at some positive value of xi, and then falls back to zero. This can be useful in describing relatively regular spike trains, such as those from a neuron having oscillatory input. Thus, this very simple class of distributions with only two parameters is capable of capturing, at least roughly, some interesting types of history dependent structure.

For neural spiking data, a renewal model with a strong theoretical foun- dation is the inverse Gaussian, which also has two parameters. While the Gamma distribution is simple and flexible, it doesn’t directly relate to the physiological properties of neurons. On the other hand, the inverse Gaus-

4We should note that for homogeneous Poisson processes, the distributions of both the waiting times and the counting process increments are time-invariant. For renewal processes the waiting time distributions are time-invariant but the counting process is typically nonstationary, because the increment distributions depend on the time of the previous event. 23 sian distribution is motivated by the integrate-and-fire conception of neural behavior.

21.0.7 The Inverse Gaussian distribution describes the waiting time for a threshold crossing by Brow- nian motion.

A random variable X is said to have an Inverse Gaussian distribution if its pdf is f(x) = λ/(2πx3) exp(−λ(x − µ)2/(2µ2x)) q for x > 0. Here, E(X) = µ and V (X) = µ3/λ.

The Inverse Gaussian arises in conjunction with Brownian motion, where it is the distribution of “first passage time,” meaning the time it takes for the Brownian motion (with drift) to cross a boundary. (See Whitmore, G.A. and Seshadri, V. (1987) A Heuristic Derivation of the Inverse Gaussian Distri- bution The American Statistician, 41: 280-281. Also, Mudholkar, G.S. and Tian, L. (2002) An entropy characterization of the inverse Gaussian distribu- tion and related goodness-of-fit test, J. Statist. Planning and Inference, 102: 211–221.) In theoretical neurobiology the interspike interval distribution for an integrate-and-fire neuron is Inverse Gaussian when the subthreshold neu- ronal voltage is modeled as Brownian motion, with drift, and the “boundary” is the voltage threshold for action potential generation. The essential idea here is that excitatory and inhibitory post-synaptic potentials, EPSPs and IPSPs, are considered to arrive in a sequence of time steps of length δ, with each EPSP and IPSP contributing normalized voltages of +1 and -1, re- spectively, and with the probability of EPSP and IPSP being p and 1 − p, where p > 1 − p creates the upward “drift” toward positive voltages. Let Xt, . . . be the post-synaptic potential at time t with t = 1, 2, . . . and let Sn = X1 + X2 + · · · + Xn. The variable Sn is said to follow a random walk and an action potential occurs when Sn exceeds a particular threshold value a. The behavior of an integrate-and-fire neuron based on such a random walk process is illustrated in Figure 21.10. The continuous-time stochastic process known as Brownian motion with drift (and thus the Inverse Gaussian distribution of the ISIs) results from taking δ → 0 and n → ∞, while also 24 CHAPTER 21. POINT PROCESSES

Figure 21.10: Example of an integrate-and-fire neuron. At each time step there is either an EPSP or an IPSP, with probabilities p and 1 − p. For p > 1 − p this creates a stochastic upward “drift” of the voltage (as the inputs are summed or “integrated”) until it crosses the threshold and the neuron fires. The neuron then resets to its baseline voltage. The resulting ISI distribution is approximately Inverse Gaussian. 25

0 2 4 6 8 10

Figure 21.11: Inverse Gaussian pdf plotted together with a Gamma(2, 1) pdf. The Inverse Gaussian (blue) has the same mean and variance as the Gamma. Note its convexity near 0.

constraining the mean and variance in the form E(Sn) → m and V (Sn) → v, for some m and v.

Figure 21.11 gives an example of an Inverse Gaussian pdf, with a Gamma pdf for comparison. Note in particular that when x is near 0 the Inverse Gaussian pdf is very small. This gives it the ability to model, approximately, neuronal interspike intervals in the presence of a refractory period, i.e., a period at the beginning of the interspike interval (immediately following the previous spike) during which the neuron doesn’t fire, or has a very small probability of firing.

21.0.8 The conditional intensity function specifies the joint probability density of spike times for a general point process.

Previously, we described the structure of an inhomogeneous Poisson process in terms of an intensity function that characterized the instantaneous proba- 26 CHAPTER 21. POINT PROCESSES bility of firing a spike at each instant in time. In an analagous way, a general point process may be characterized by its conditional intensity function,

P (∆N(t,t+∆t] = 1|Ht) λ(t|Ht) = lim , (21.6) ∆t→0 ∆t where P (∆N(t,t+∆t] = 1|Ht) is the conditional probability of an event in (t, t + ∆t] given the history Ht = (s1, s2, . . . , sn) of events up to time t (and N(t) = n is the number of events prior to time t). Since the probability of an event in any interval must be non-negative, so too must be the conditional intensity function. Taking ∆t to be small we may rewrite Equation (21.6) in the form P (∆N(t,t+∆t] = 1|Ht) ≈ λ(t|Ht)∆t. (21.7) Thus, the conditional intensity function expresses the instantaneous proba- bility of an event. It serves as the fundamental building block for constructing the probability distributions needed for general point processes.5

A mathematical assumption needed for theoretical constructions is that the point process is orderly, which means that for a sufficiently small interval, the probability of more than one event occurring is negligible. Mathemati- cally, this is stated as

P (∆N > 1|H ) lim (t,t+∆t] t = 0. (21.8) ∆t→0 ∆t This assumption is biophysically plausible for a point process model of a neu- ron because neurons have an absolute refractory period. In most situations the probability of a neuron firing more than one spike is negligibly small for ∆t < 1 millisecond.

Once we specify the conditional intensity for a point process it is not hard to write down the pdf for the sequence of event times in an observation interval (0, T ]. In fact, the argument is essentially the same as in the case of the inhomogeneous Poisson process, with the conditional intensity λ(t|Ht) substituted for the intensity λ(t). The key observation is that the conditional

5The conditional intensity can be considered a stochastic process itself, because it can depend on spiking history, which is stochastic. A conditional intensity function that depends on history or on any other stochastic process is often called a conditional intensity process, and the resulting point process is called a doubly stochastic point process. 27 intensity behaves essentially like a hazard function, the only distinction being the appearance of the stochastic history Ht.

Lemma For an orderly point process with conditional intensity λ(t|Ht) on [0, T ], the pdf of the ith waiting-time distribution, conditionally on S1 = s1, . . . , Si−1 = si−1, for t ∈ (si−1, T ] is

si

fSi|S1,...,Si−1 (si|S1 = s1, . . . , Si−1 = si−1) = λ(si|Ht) exp − λ(t|Ht)dt . ( Zsi−1 ) (21.9)

Proof: Let Xi be the waiting time for the ith event, conditionally on S1 = s1, . . . , Si−1 = si−1. For t > si−1 we have Xi ∈ (t, t + ∆t) if and only if ∆N(t,t+∆t) > 0. Furthermore, if the ith event has not yet occurred at time t we have Ht = (s1, . . . , si−1). We then have P (X ∈ (t, t + ∆t)|X > t, S = s , . . . , S = s ) P (∆N > 0|H )) lim i i 1 1 i−1 i−1 = lim (t,t+∆t) t ∆t→0 ∆t ∆t→0 ∆t and, because the point process is regular, the right-hand side is λ(t|Ht). Just as we argued in the case of hazard functions, the numerator of the left-hand side may be written

F (t + ∆t|Ht) − F (t|Ht) P (Xi ∈ (t, t + ∆t)|Xi > t, Ht) = 1 − F (t|Ht) where F is the CDF of the waiting time distribution, conditionally on Ht. Passing to the limit again gives P (X ∈ (t, t + ∆t)|X > t, H ) f(t|H ) lim i i t = t . ∆t→0 ∆t 1 − F (t|Ht) In other words, just as in the case of a hazard function, the conditional intensity function satisfies f(t|H ) λ(t|H ) = t . t 1 − F (t|Ht) Proceeding as in the case of the hazard function we then get the conditional pdf t − λ(u|Hu)du si− f(t|Ht) = λ(t|Ht)e 1 R 28 CHAPTER 21. POINT PROCESSES as required. 2

Theorem The event time sequence S1, S2, . . . , SN(T ) of an orderly point process on an interval (0, T ] has joint pdf

T n

fS1,...,SN(T ) (s1, ..., sn) = exp − λ(t|Ht)dt λ(si|Hsi ) (21.10) ( 0 ) Z iY=1 where λ(t|Ht) is the conditional intensity function of the process.

Proof: The argument follows from the Lemma by the same steps as the theorem for inhomogeneous Poisson processes. 2

We may also approximate a general point process by a binary process. For small ∆t, the probability of an event in an interval (t, t + ∆t]

P ( event in (t, t + ∆t]|Ht) ≈ λ(t|Ht)∆t (21.11) and the probability of no event is

P (no event in (t, t + ∆t]|Ht) ≈ 1 − λ(t|Ht)∆t. (21.12)

If we consider the discrete approximation, analogous to the Poisson process case, we may define pi = λ(t|Ht)dt where the integral is over the ith time bin. We again get BernoulliR random variables Yi with P (Yi = 1) = pi but now these Yi random variables are dependent, e.g., we may have P (Yi = 1|Yi−1 = 1) =6 pi. This is somewhat more complicated than the Poisson case, but it remains relatively easy to formulate history-dependent models for these Bernoulli trials.

21.0.9 A point process model expresses the conditional intensity as a function of time, history, and other variables.

In experimental settings, event-time data, such as spike trains, are collected to see how they differ under varying experiments conditions. The conditions may be summarized by a variable or vector x(t), often called a covariate (because it co-varies with the stochastic response). The conditional intensity 29 then becomes a function not only of time and history, but also of the co- variate, and a preferable notation then becomes λ(t|x(t), Ht). Here are two simple examples.

Example: Spiking activity of a primary motor cortical neuron (continued) As discussed earlier, primate motor cortical neurons have ve- locity modulated cosine tuned receptive fields. Figure 21.12 shows an exam- ple of the spiking activity of a neuron in primate motor cortex as a function of hand speed and direction using an occupancy normalized histogram. The neuron fires most intensely when the hand moves in a direction about 1.6 radians from east and increases as a function of hand speed. Specifying a

Figure 21.12: Motor cortical neural activity during a two-dimensional reach- ing task as a function of hand direction and speed. (A) Empirical visual- ization of motor cortical spiking activity based on a histogram. (B) Poisson model for this spiking activity. (The parameters were fit by the method of maximum likelihood, which is discussed in Chapter 3.) point process model for this neuron requires writing an expression for its con- ditional intensity in terms of the direction and speed of the hand movement. A simple form is

λ(t) = exp {α + β |v(t + 150ms)| cos (φ(t + 150ms) − φp)} (21.13) where the covariates are v(t) and φ(t), the speed and direction of the intended hand movement, and the 150 ms lag relates the current spiking activity to movements that will occur 150 ms later. The parameters of this model are (α, β, φp), where exp α is the baseline firing rate, β is the depth of modulation, and φp is the preferred direction of the neuron. Figure 21.12 displays a fit of the model to the data. The preferred direction and degree of modulation 30 CHAPTER 21. POINT PROCESSES as a function of speed are in good agreement with those observed in the histogram. 2

Note that the conditional intensity in (21.13) does not depend on the past spiking history. The assumption, therefore, is that the spike trains follow an inhomogeneous Poisson process. Alternatives that incorporate spiking history might be expected to provide a more thorough description of the many aspects of spike train variability; but for representing the effects of direction and speed the Poisson process seems to do a good job. Another situation where a Poisson process model can capture some of the dominant effects, but not the more subtle ones, is the following.

Example: Hippocampal place cell (continued) Hippocampal place cells have firing patterns that relate to an animal’s location within an en- vironment. Therefore a place field model should describe the conditional intensity as a function of the animal’s location at each point in time. Figure 21.13 again shows the spiking activity