(2008) Fast Gaussian Process Methods for Point Process Intensity Estimation

Fast Gaussian Process Methods for Point Process Intensity Estimation John P. Cunningham [email protected] Department of Electrical Engineering, Stanford University, Stanford, CA, USA 94305 Krishna V. Shenoy [email protected] Department of Electrical Engineering and Neurosciences Program, Stanford University, Stanford, CA, USA 94305 Maneesh Sahani [email protected] Gatsby Computational Neuroscience Unit, UCL London, WC1N 3AR, UK Abstract tional complexity inherent in GP methods (e.g. Rasmussen Point processes are difficult to analyze because & Williams, 2006). The data size n will grow with the they provide only a sparse and noisy observa- length (e.g. total time) of the point process. Naive methods will be O(n2) in memory requirements (storing Hessian tion of the intensity function driving the process. 3 Gaussian Processes offer an attractive framework matrices) and O(n ) in run time (matrix inversions and de- within which to infer underlying intensity func- terminants). At one thousand data points (such as one sections. The result of this inference is a continu- ond of millisecond-resolution data), a naive solution to this ous function defined across time that is typically problem is already quite burdensome on a common work- more amenable to analytical efforts. However, a station. At ten thousand or more, this problem is for all naive implementation will become computation- practical purposes intractable. ally infeasible in any problem of reasonable size, While applications of doubly-stochastic point processes are both in memory and run time requirements. We numerous, there is little work proposing solutions to the demonstrate problem specific methods for a class serious computational issues inherent in these methods. of renewal processes that eliminate the memory Thus, the development of efficient methods for intensity es- burden and reduce the solve time by orders of timation would be of broad appeal. In this paper, we do not magnitude. address the appropriateness of doubly-stochastic point process models for particular applications, but rather we focus on the significant steps required to make such modelling 1. Introduction computationally tractable. We build on previous work from both GP regression and large-scale optimization to create Point processes with temporally or spatially varying inten- a considerably faster and less memory intensive algorithm sity functions arise naturally in many fields of study. When for doubly-stochastic point-process intensity estimation. the intensity function is itself a random process (often a Gaussian Process), the process is called a doubly-stochastic As part of the GP intensity estimation problem we optimize or Cox point process. Application domains including eco- model hyperparameters using a Laplace approximation to nomics and finance (e.g. Basu & Dassios, 2002), neu- the marginal likelihood or evidence. This requires an iter- roscience (e.g. Cunningham et al., 2008), ecology (e.g. ative approach which divides into two major parts. First, Moller et al., 1998), and others. at each iteration we must find a modal (MAP) estimate of the intensity function. Second, we must calculate the ap- Given observed point process data, one can use a Gaussian proximate model evidence and its gradients with respect to Process (GP) framework to infer an optimal estimate of the GP hyperparameters. Both aspects of this problem present underlying intensity. In this paper we consider GP prior computational and memory problems. We develop meth- intensity functions coupled with point process observation ods to reduce the costs of both drastically. models. The problem of intensity estimation then becomes a modification of GP regression and inherits the computa- We show that for certain classes of renewal process observation models, MAP estimation may be framed as a 25 th Appearing in Proceedings of the International Conference tractable convex program. To ensure nonnegativity in the on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). intensity function we use a log barrier Newton method 192 Fast Gaussian Process Methods for Point Process Intensity Estimation (Boyd & Vandenberghe, 2004), which we solve efficiently truncating the GP prior in the continuous, infinite dimen- by deriving decompositions of matrices with known struc- sional function space; see Horrace, 2005). Thus, if the ob- ture. By exploiting a recursion embedded in the algorithm, servation model is also log concave in x, the MAP estimate we avoid many costly matrix inversions. We combine these x∗ is unique and can be readily found using a log barrier advances with large scale optimization techniques, such as Newton method (Boyd & Vandenberghe, 2004; Paninski, conjugate gradients (CG, as used by Gibbs & MacKay, 2004). Renewal processes are simply defined by their in- 1997) and fast fourier transform (FFT) matrix multiplica- terarrival distribution fz(z). A common construction for tion methods. a renewal process with an inhomogeneous underlying intensity is to use the intensity rescaling m(ti j ti−1) = To evaluate the model evidence, as well as its gradients ti x(u)du (in practice, a discretized sum of x) (Barbieri with respect to hyperparameters, we again exploit the struc- ti−1 et al., 2001; Daley & Vere-Jones, 2002). Accordingly, the ture imposed by the renewal process framework to find an R exact but considerably less burdensome representation. We density for an observation of event times y can be defined then show that a further approximation loses little in accu- N racy, but makes the cost of this computation insignificant. p(y) = p(yi j yi−1) i=1 Combining these advances, we are able to reduce a prob- Y N lem that is effectively computationally infeasible to a prob- 0 = jm (y j y − )j f (m(y j y − )) (1) lem with minimal memory load and very fast solution time. i i 1 z i i 1 i=1 O(n2) memory requirements are eliminated, and O(n3) Y computation is reduced to modestly superlinear. by a change of variables for the interarrival distribution (Papoulis & Pillai, 2002). Since m(t) is a linear trans- formation of the intensity function (our variables of inter- 2. Problem Overview est), the observation model obeys log concavity as long as Define x 2 IRn to be the intensity function (the high dimen- the distribution primitive fz(z) is log concave. Examples sional signal of interest); x is indexed by input1 time points of suitable renewal processes include the inhomogeneous n N+1 t 2 IR . Let the observed data y = fy0; : : :; yN g 2 IR Poisson, gamma interval, Weibull interval, inverse Gaus- be a set of N + 1 time indices into the vector x; that is, the sian (Wald) interval, Rayleigh interval, and other processes ith point event occurs at time yi, and the intensity at that (Papoulis & Pillai, 2002). For this paper, we choose one of these distributions and focus on its details. However, for time is xyi . Denote all hyperparameters by θ. In general, the prior and observation models are both functions of θ. processes of the form above, the implementation details are The GP framework implies a normal prior on the intensity identical up to the forms of the actual distributions. p(x j θ) = N (µ1; Σ), where the nonzero mean is a sen- To solve the GP intensity estimation, we first find a MAP sible choice because the intensity function is constrained estimate x∗ given fixed hyperparameters θ, and then we ap- to be nonnegative. Thus we treat µ as a hyperparameter proximate the model evidence p(y j θ) (for which we need (µ 2 θ). The positive definite covariance matrix Σ (also a x∗) and its gradients in θ. Iterating these two steps, we can function of θ) is defined by an appropriate kernel such as find the optimal model θ^ (we do not integrate over hyper- a squared exponential or Ornstein-Uhlenbeck kernel (see parameters). Finally, MAP estimation under these optimal Rasmussen & Williams, 2006, for a discussion of GP ker- hyperparameters θ^ gives an optimal estimate of the under- nels). The point-process observation model gives the like- lying intensity. This iterative solution for θ^ can be written: lihood p(y j x; θ). In this work, we consider renewal processes (i.e. one-dimensional point processes with indepen- dent event interarrival times), a family of point processes θ^ = argmax p(θ)p(y j θ) (2) that has both been well-studied theoretically and applied in θ n many domains (Daley & Vere-Jones, 2002). ∗ ∗ (2π) 2 ≈ argmax p(θ)p(y j x ; θ)p(x j θ) ∗ − 1 ; The GP prior is log concave in x, and the nonnegativity θ jΛ + Σ 1j 2 constraint on intensity (x 0) is convex (constraining x to be nonnegative is equivalent to solving an unconstrained where the last term is a Laplace approximation to the in- ∗ problem where the prior on the vector x is a truncated mul- tractable form of p(y j θ), x is the mode of p(y j x)p(x) ∗ 2 tivariate normal distribution, but this is not the same as (MAP estimate), and Λ = −∇x log p(y j x; θ) jx=x∗ . The log concavity of our problem in x supports the choice 1 In this work we restrict ourselves to a single input dimen- of a Laplace approximation. Each of the two major steps sion (which we call time), as it aligns with the family of renewal processes in one-dimension. Some ideas here can be extended to in this algorithm (MAP estimation and model selection) in- multiple dimensions (e.g. if using a spatial Poisson process). volves computational and memory challenges. We address these challenges in Sections 4 and 5. 193 Fast Gaussian Process Methods for Point Process Intensity Estimation 1 −2 −2 The computational problems inherent in GP methods have ( τ )diag(x1 ; : : :; xn ) being positive definite and diago- been well studied, and much progress has been made nal.

(2008) Fast Gaussian Process Methods for Point Process Intensity Estimation

Cox Process Representation and Inference for Stochastic Reaction-Diffusion Processes

Poisson Representations of Branching Markov and Measure-Valued

Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes

Deep Neural Networks As Gaussian Processes

Local Conditioning in Dawson–Watanabe Superprocesses

Cox Process Functional Learning Gérard Biau, Benoît Cadre, Quentin Paris

Gaussian Process Dynamical Models for Human Motion

Modelling Multi-Object Activity by Gaussian Processes 1

Financial Time Series Volatility Analysis Using Gaussian Process State-Space Models

Gaussian-Random-Process.Pdf

Gaussian Markov Processes

FULLY BAYESIAN FIELD SLAM USING GAUSSIAN MARKOV RANDOM FIELDS Huan N