arXiv:1312.6536v1 [stat.ME] 23 Dec 2013 ee .Dgl,PuaMrg,BryRwigo n Benjami and Rowlingson Barry Moraga, Paula Diggle, J. Peter the Paradigm Extending Geostatistical Processes: Log-Gaussian Spatio-Temporal Cox and Spatial acse,L14G ntdKndme-mail: Kingdom United 4YG, LA1 Lancaster, School, Medical University Lancaster Lecturer, e-mail: Kingdom United 4YG, LA1 School, Lancaster, Medical University Lancaster Fellow, Research e-mail: Kingdom United 4YG, LA1 Lancaster, School, Medical University Lancaster Associate, 03 o.2,N.4 542–563 4, DOI: No. 28, Vol. 2013, Science Statistical [email protected] [email protected] [email protected] [email protected] iepo 6 B,Uie igo e-mail: Kingdom United Liverpool, 7BE, of L69 Liverpool University Health, of Global Institute and Professor, Infection and Kingdom United LA1 4YG, Lancaster, School, Medical University Lancaster

ee .Dgl sDsigihdUiest Professor, University Distinguished is Diggle J. Peter c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint nttt fMteaia Mathematical of Institute 10.1214/13-STS441 nttt fMteaia Statistics Mathematical of Institute , inpoes pta on process. point spatial process, sian i formats. phrases: that and data words problems or Key defi scientific models useful of particular more by class a than the that rather by suggest is We geostatistics locations. th of o of as discrete number defined finite spatially a using is processes traditionally continuous natural which fit spatially geostatistics, kind real- this of and of data; realm problems discrete that spatially argue We from surveillance. spatial risk constructing disease of process; invest maps multi-type process; applications: a point four spatial in a describing segregation of by surface intensity LGCP the the ing t of the on We usefulness focus inference. the particular likelihood-based a of with challenges inference, tational discuss We spatio-tempora data. and spatial cess for models as (LGCPs) processes Abstract. 03 o.2,N.4 542–563 4, No. 28, Vol. 2013, al oaai Research is Moraga Paula . . 2013 , ejmnM alris Taylor M. Benjamin . nti ae efis ecietecaso o-asinCox log-Gaussian of class the describe first we paper this In ar olnsnis Rowlingson Barry . o rcs,eieilg,gottsis Gaus- geostatistics, epidemiology, process, Cox This . in 1 ohv engnrtda ata elsto of realisation partial a as generated been have to e fsaillocations are data spatial the of pattern set spatial a a in Finally, locations. rnil,each principle, ihnadsgae pta region spatial designated a within aacnito bevdvle soitdwt a with associated values locations of observed set fixed of consist data oeo fitrs xssol tthose at only exists interest of nomenon e fsaillocations spatial with of associated set interest of a val- phenomenon observed some of of classi- ues consist this data Within geostatistical pat- patterns). fication, spatial point data, (meaning lattice terns data, sub- geostatistical three into areas: statistics spatial of classification used rsi ( Cressie aeo hsdvlpeti h otatbetween contrast ( the Bartlett of is the pages development 90 of the this illustration contrived, of slightly century.pace twentieth if the striking, of methodol- half A statistical second the of during development ogy the for areas pta ttsishsbe n ftems fertile most the of one been has statistics Spatial 1991 .Cesesbo salse widely a established book Cressie’s ). .INTRODUCTION 1. x i .Taylor M. n ol aebe n location any been have could demonstrate n gtn spatial igating srain at bservations ycontinuous ly x i addresses, t iehealth time on pro- point l : td of study e ecompu- he yit the into ly i 1975 1 = estimat- x x i i nition : n , . . . , : i n h 0 ae of pages 900 the and ) i 1 = 1 = n , . . . , n , . . . , hti,tephe- the is, that , A ⊂ R hr,in where, , 2 presumed n Lattice . specific x 2 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR a that is itself the object of scien- through the lens of point pattern or aggregated count tific interest. Almost 20 years later, Gelfand et al. data. Sections 2 to 4 concern theoretical properties, (2010) used the same classification but with a dif- inference and computation. Section 5 describes sev- ferent terminology focused more on the underlying eral applications. Section 6 discusses the extension process than on the extant data: continuous spa- to spatio-temporal data. Section 7 gives an outline tial variation, discrete spatial variation, and spatial of how this approach to modelling incompletely ob- . With this process-based terminol- served spatial phenomena extends naturally to the ogy in place, continuous spatial variation implies a joint analysis of multivariate spatial data when the 2 {Y (x) : x ∈ R }, discrete spatial different data elements are observed at incommen- variation implies only a finite-dimensional random surate spatial scales. Section 8 is a short, concluding variable, Y = {Yi : i = 1,...,n}, and a point pattern discussion. implies a counting measure, {dN(x) : x ∈ R2}. In this paper, we argue first that the most impor- 2. THE LOG-GAUSSIAN COX PROCESS tant theoretical distinction within spatial statistics is between spatially continuous and spatially dis- A (univariate, spatial) Cox process (Cox (1955)) crete stochastic processes, and second that most nat- is a point process defined by the following two pos- ural processes are spatially continuous and should tulates: be modelled accordingly. One consequence of this R2 point of view is that in many applications, main- CP1: Λ= {Λ(x) : x ∈ } is a nonnegative-valued taining a one-to-one linkage between data formats stochastic process; (geostatistical, lattice, point pattern) and associated CP2: conditional on the realisation Λ(x)= λ(x) : R2 model classes (spatially continuous, spatially dis- x ∈ , the point process is an inhomogeneous Pois- crete, point process) is inappropriate. In particular, son process with intensity λ(x). we suggest a redefinition of geostatistics as the col- Cox processes are natural models for point process lection of statistical models and methods whose pur- phenomena that are environmentally driven, much pose is to enable predictive inference about a spa- less natural for phenomena driven primarily by in- tially continuous, incompletely observed phenome- teractions amongst the points. Examples of these non, S(x), say. two situations in an epidemiological context would Classically, geostatistical data Y : i = 1,...,n cor- i be the spatial distribution of cases of a noninfec- respond to noisy versions of S(x ). A standard geo- i tious or infectious disease, respectively. In a non- statistical model, expressed here in hierarchical form, infectious disease, the observed spatial pattern of is that S = {S(x) : x ∈ R2} is a Gaussian stochastic cases results from spatial variation in the exposure process, whilst conditional on S, the Y are mutu- i of susceptible individuals to a combination of ob- ally independent, Normally distributed with means 2 served and unobserved risk-factors. Conditional on S(xi) and common variance τ . A second scenario, and the focus of the current paper, is when S deter- exposure, cases occur independently. In contrast, in mines the intensity, λ(x), say, of an observed Poisson an infectious disease the observed pattern is at least point process. An example that we will consider in partially the result of infectious cases transmitting detail is a log-linear specification, λ(x)=exp{S(x)}, the disease to nearby susceptibles. Notwithstanding where S is a . A third form is when this phenomenological distinction, it can be diffi- the point process is reduced to observations of the cult, or even impossible, to distinguish empirically between processes representing stochastically inde- numbers of points Yi in each of n regions Ai that form a partition (or subset) of the region of inter- pendent variation in a heterogeneous environment est A. Hence, conditional on S, the Yi are mutually and stochastic interactions in a homogeneous envi- independent, Poisson-distributed with means ronment (Bartlett (1964)). The moment properties of a Cox process are in- (1) µi = λ(x) dx. herited from those of the process Λ(x). For exam- ZAi ple, in the stationary case the intensity of the Cox In the remainder of the paper we show how the process is equal to the expectation of Λ(x) and the log-Gaussian Cox process can be used in a range covariance density of the Cox process is equal to of applications where S(x) is incompletely observed the covariance function of Λ(x). Hence, writing λ = SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 3

E[Λ(x)] and C(u) = Cov{Λ(x), Λ(x − u)}, the re- and Guttorp (2010a). The theoretical requirement duced second moment measure or K-function (Rip- for a function C(x,y) to be a valid covariance func- ley 1976, 1977) of the Cox process is tion is that it be positive-definite, meaning that for u all positive integers n, any associated set of points 2 −2 2 (2) K(u)= πu + 2πλ C(v)v dv. xi ∈ R : i = 1,...,n, and any associated set of real 0 Z numbers ai : i = 1,...,n, Møller, Syversveen and Waagepetersen (1998) in- n n troduced the class of log-Gaussian processes (4) aiajC(xi,xj) ≥ 0. (LGCPs). As the name implies, an LGCP is a Cox i=1 j=1 process with Λ(x)=exp{S(x)}, where S is a Gaus- X X sian process. This construction has an elegant sim- Checking that (4) holds for an arbitrary candidate plicity. One of its attractive features is that the trac- C(x,y) is not straightforward. In practice, we choose tability of the multivariate Normal distribution car- covariance functions from a catalogue of parametric ries over, to some extent, to the associated Cox pro- families that are known to be valid. In the stationary case, a widely used family is the Mat´ern (1960) class cess. 2 In the stationary case, let µ = E[S(x)] and C(u)= C(u)= σ r(u; φ, κ), where 2 σ r(u) = Cov{S(x),S(x − u)}. It follows from the r(u; φ, κ) moment properties of the log-Normal distribution (5) κ−1 −1 κ that the associated LGCP has intensity λ = = {2 Γ(κ)} (u/φ) Kκ(u/φ) u ≥ 0. exp(µ + 0.5σ2) and covariance density g(u) = In (5), Γ(·) is the complete Gamma function, K (·) λ2[exp{σ2r(u)}− 1]. This makes it both convenient κ is a modified Bessel function of order κ, and φ> 0 and natural to re-parameterise the model as and κ> 0 are parameters. The parameter φ has (3) Λ(x)=exp{β + S(x)}, units of distance, whilst κ is a dimensionless shape parameter that determines the differentiability of where E[S(x)] = −0.5σ2, so that E[exp{S(x)}] = 1 the corresponding Gaussian process; specifically, the and λ = exp(β). This re-parameterisation gives a process is k-times mean square differentiable if κ> clean separation between first-order (mean value) k. This physical interpretation of κ is useful because and second-order (variation about the mean) prop- κ is difficult to estimate empirically (Zhang (2004)), erties. Hence, for example, if we wished to model a hence, a widely used strategy is to choose between spatially varying intensity by including one or more a small set of values corresponding to different de- spatially indexed explanatory variables z(x), a nat- grees of differentiability, for example, κ = 0.5, 1.5 or ural first approach would be to retain the stationar- 2.5. Estimation of φ is more straightforward. ity of S(x) but replace the constant intensity λ by In summary, the LGCP is the natural analogue for a regression model, λ(x)= λ{z(x); β}. The resulting point process data of the linear Gaussian model for Cox process is now an intensity-reweighted station- real-valued geostatistical data (Diggle and Ribeiro ary point process (Baddeley, Møller and Waagepeter- (2007)). Like the linear Gaussian model, it lacks sen, 2000), which is the analogue of a real-valued any mechanistic interpretation. Its principal virtue process with a spatially varying mean and a sta- is that it provides a flexible and relatively tractable tionary residual. class of empirical models for describing spatially cor- The definition of a multivariate LGCP is imme- related phenomena. This makes it extremely useful diate—we simply replace the scalar-valued Gaussian in a range of applications where the scientific focus process S(x) by a vector-valued multivariate Gaus- is on spatial prediction rather than on testing mech- sian process—and its moment properties are equally anistic hypotheses. Section 5 gives several examples. tractable. For example, if S(x) is a stationary bi- variate Gaussian process with intensities λ and λ , 1 2 3. INFERENCE FOR LOG-GAUSSIAN COX and cross-covariance function C (u)= σ σ r (u), 12 1 2 12 PROCESSES the cross-covariance density of the associated Cox process is g12(u)= λ1λ2[exp{σ1σ2r12(u)}− 1]. In this section we distinguish between two infer- There is an extensive literature on parametric spec- ential targets, namely, estimation of model parame- ifications for the covariance structure of real-valued ters and prediction of the realisations of unobserved processes S(x); for a recent summary, see Gneiting stochastic processes. Within the Bayesian paradigm, 4 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR this distinction is often blurred, because parameters where are treated as unobserved random variables and the n −n ∗ formal machinery of inference is the same in both (8) ℓ (Λ; X)= Λ(xi) Λ(x) dx cases, consisting of the calculation of the conditional i=1 A Y Z  distribution of the target given the data. However, is the likelihood for an inhomogeneous Poisson pro- from a scientific perspective parameter estimation cess with intensity Λ(x). The evaluation of (7) in- and prediction are fundamentally different, because volves integration over the infinite-dimensional dis- the former concerns properties of the process being tribution of Λ. In Section 4.1 below we describe an modelled whereas the latter concerns properties of implementation in which the continuous region of a particular realisation of that process. interest A is approximated by a finely spaced regu- 3.1 Parameter Estimation lar lattice, hence replacing Λ by a finite set of val- ues Λ(gk) : k = 1,...,N, where the points g1,...,gN For parameter estimation, we consider three ap- cover A. Even so, the high dimensionality of the im- proaches: moment-based estimation, maximum like- plied integration appears to present a formidable lihood estimation, and Bayesian estimation. The first obstacle to analytic progress. One solution, easily approach is typically very simple to implement and stated but hard to implement robustly and efficiently, is useful for the initial exploration of candidate mod- is to use Monte Carlo methods. els. The second and third are more principled, both Monte Carlo evaluation of (7) consists of approxi- being likelihood-based. mating the expectation by an empirical average over 3.1.1 Moment-based estimation In the stationary simulated realisations of some kind. A crude Monte case, moment-based estimation consists of minimis- Carlo method would use the approximation ing a measure of the discrepancy between empiri- s −1 (j) cal and theoretical second-moment properties. One (9) ℓMC(θ)= s ℓ(θ; X, λ ), class of such measures is a weighted least squares Xj=1 criterion, (j) (j) where λ = {λ (gk) : k = 1,...,N} : j = 1,...,s are u0 2 simulated realisations of Λ on the set of grid-points (6) D(θ)= w(u){Kˆ (u)c − K(u; θ)c} du. g . In practice, this is hopelessly inefficient. A bet- Z0 k ter approach is to use an ingenious method due to In the intensity-re-weighted case, (6) can still be Geyer (1999), as follows. used after separately estimating a regression model Let f(X, Λ; θ) denote the un-normalised joint den- for a spatially varying λ(x) under the working as- sity of X and Λ. Then, the associated likelihood is sumption that the data are a partial realisation of an inhomogeneous Poisson process. (10) ℓ(θ; X, Λ) = f(X, Λ; θ)/a(θ), This method of estimation has an obviously ad hoc quality. In particular, it is difficult to give generally where applicable guidance on appropriate choices for the (11) a(θ)= f(X, Λ; θ) dΛ dX values of u and c in (6). Because the method is 0 Z intended only to give preliminary estimates, there is ˆ is the intractable normalising constant for f(·). It something to be said for simply matching K(u) and follows that K(u; θ) by eye. The R (R Core Team (2013)) package lgcp (Taylor et al., 2013) includes an interactive Eθ0 [f(X, Λ; θ)/f(X, Λ; θ0)] graphics function to facilitate this. = f(X, Λ; θ)/f(X, Λ; θ0) 3.1.2 Maximum likelihood estimation The general ZZ form of the Cox process likelihood associated with f(X, Λ; θ0) data X = {xi ∈ A : i = 1,...,n} is (12) × dΛ dX a(θ0) 1 ℓ(θ; X)=P(X|θ)= P(X, Λ|θ) dΛ = f(X, Λ; θ) dΛ dX Λ a(θ ) (7) Z 0 Z ∗ =EΛ|θ(ℓ (Λ; X)), = a(θ)/a(θ0), SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 5 where θ0 is any convenient, fixed value of θ, and realisations of X and Λ at θ = θ0, whilst in the

Eθ0 denotes expectation when θ = θ0. However, the first term X is held fixed at the observed data and function f(X, Λ; θ) in (10) is also an un-normalised the simulated realisations λ(j) are conditional on X. conditional density for Λ given X. Under this second Conditional simulation of Λ requires interpretation, the corresponding normalised condi- Monte Carlo (MCMC) methods, for which careful tional density is f(X, Λ; θ)/a(θ|X), where tuning is generally needed. We discuss computa- tional issues, including the design of a suitable MCMC (13) a(θ|X)= f(X, Λ; θ) dΛ, algorithm, in Section 4. Z Bayesian estimation and the same argument as before gives 3.1.3 One way to implement Bayesian estimation would be directly to combine

Eθ0 [f(X, Λ; θ)/f(X, Λ; θ0)|X] Monte Carlo evaluation of the likelihood with a prior (14) for θ. However, it turns out to be more efficient to = a(θ|X)/a(θ0|X). incorporate Bayesian estimation and prediction into It follows from (7), (10) and (13) that the likelihood a single MCMC algorithm, as described in Section 4. for the observed data, X, can be written as 3.2 Prediction f(x, Λ; θ) (15) ℓ(θ; X)= dΛ= a(θ|X)/a(θ). For prediction, we consider plug-in and Bayesian a(θ) Z prediction. Suppose, quite generally, that data Y are Hence, the log-likelihood ratio between any two pa- to be used to predict a target T under an assumed rameter values, θ and θ0, is model with parameters θ. Then, plug-in prediction consists of a series of probability statements within L(θ; X) − L(θ0; X) the conditional distribution [T |Y ; θˆ], where θˆ is a

(16) = log{a(θ|X)/a(θ)}− log{a(θ0|X)/a(θ0)} point estimate of θ, whereas Bayesian prediction re- places [T |Y ; θˆ] by = log{a(θ|X)/a(θ0|X)}− log{a(θ)/a(θ0)}. Substitution from (12) and (14) gives the result that (19) [T |Y ]= [T |Y ; θ][θ|Y ] dθ. Z L(θ; X) − L(θ0; X) This shows that Bayesian prediction is a weighted average of plug-in predictions, with different values (17) = log Eθ [r(X, Λ,θ,θ0)|X] 0 of θ weighted according to the Bayesian posterior for − log Eθ0 [r(X, Λ,θ,θ0)], θ. The Bayesian solution (19) is the more correct in that it incorporates parameter uncertainty in a way where r(X, Λ,θ,θ0)= f(X, Λ; θ)/f(X, Λ; θ0). For any that is both natural, albeit on its own terms, and fixed value of θ0, a Monte Carlo approximation to elegant. the log-likelihood, ignoring the constant term L(θ0) on the left-hand side of (17), is therefore given by 4. COMPUTATION s −1 (j) Lˆ(θ) = log s r(X, λ ,θ,θ0) Inference for LGCPs is a computationally chal- ( j=1 ) lenging problem. Throughout this section we will (18) X use the notation and language of purely spatial pro- s R2 −1 (j) (j) cesses on , but the discussion applies in more gen- − log s r(X , λ ,θ,θ0) . ( ) eral settings including spatio-temporal LGCPs. Xj=1 4.1 The Computational Grid The result (18) provides a Monte Carlo approxima- tion to the log-likelihood function, and therefore to Although we model the latent process S as a spa- the maximum likelihood estimate θˆ, by simulating tially continuous process, in practice, we work with the process only at a single value, θ0. The accuracy a piecewise-constant equivalent to the LGCP model of the approximation depends on the number of sim- on a collection of cells that form a disjoint parti- ulations, s, and on how close θ0 is to θˆ. tion of the region of interest, A. In the limit as the Note that in the second term on the right-hand number of cells tends to infinity, this process be- side of (18) the pairs (X(j), λ(j)) are simulated joint haves like its spatially continuous counterpart. We 6 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR call the collection of cells on which we represent the Two options for computation are as follows: MCMC, process the computational grid. The choice of grid which generates random samples from [S,β,θ|Y ], reflects a balance between computational complex- and the integrated nested Laplace approximation ity and accuracy of approximation. The computa- (INLA), which uses a mathematical approximation. tional bottleneck arises through the need to invert Taylor and Diggle (2013a) compare the perfor- the covariance matrix, Σ, corresponding to the vari- mance of MCMC and INLA for a spatial LGCP with ance of S evaluated on the computational grid. constant expectation β and parameters θ treated as Typically, we shall use a computational grid of known values. In this restricted scenario, they found square cells. This is an example of a regular grid, that MCMC, run for 100,000 iterations, delivered by which we mean that on an extension of the grid more accurate estimates of predictive probabilities notionally wrapped on a torus, a strictly station- than INLA. However, they acknowledged that “fur- ary covariance function of the process on R2 will ther research is required in order to design better induce a block-circulant covariance structure on the MCMC algorithms that also provide inference for grid (Wood and Chan (1994); Møller, Syversveen the parameters of the latent field”. and Waagepetersen, 1998). For simplicity of presen- Approximate methods such as INLA have the ad- tation, we make no distinction between the extended vantages that they produce results quickly and cir- grid and the original grid, since for extensions that cumvent the need to assess the convergence and mix- at least double the width and height of the original ing properties of an MCMC algorithm. This makes grid, the toroidal distance between any two cells in INLA very convenient for quick comparisons amongst the original observation window coincides with their multiple candidate models, which would be a daunt- Euclidean distance in R2. For a second-order sta- ing task for MCMC. Against this, MCMC meth- tionary process S, inversion of Σ on a regular grid ods are more flexible in that extensions to stan- is best achieved using Fourier methods (Frigo and dard classes of models can usually be accommodated Johnson (2011)). On irregular grids, sparse matrix with only a modest amount of coding effort. Also, methods in conjunction with an assumption of low- an important consideration in some applications is order Markov dependence are more efficient (Rue that the currently available software implementa- and Held (2005); Rue, Martino and Chopin (2009); tion of INLA is limited to the evaluation of pre- Lindgren, Rue and Lindstr¨om, 2011). In this con- dictive distributions for univariate, or, at best, low- text, Lindgren, Rue and Lindstr¨om (2011) demon- dimensional, components of the underlying model, strate a link between models assuming a Markov whereas MCMC provides direct access to joint pos- dependence structure and spatially continuous mod- terior/predictive distributions of nonlinear functions els whose covariance function belongs to a restricted of the parameters and of the latent process S. Mix- subset of the Mat´ern class. ing INLA and MCMC can therefore be a good over- all computational strategy. For example, Haran and 4.2 Implementing Bayesian Inference, MCMC or Tierney (2012) use a heavy-tailed approximation sim- INLA? ilar in spirit to INLA to construct efficient MCMC We now suppose that the computational grid has proposal schemes. been defined and the point process data X have been 4.2.1 Markov Chain Monte Carlo inference for log- converted to a set of counts, Y , on the grid cells; Gaussian Cox processes MCMC methods generate note that we envisage using a finely spaced grid, for samples from a Markov chain whose stationary dis- which cell-counts will typically be 0 or 1. Our goal tribution is the target of interest, in our case [S,β, is to use the data Y to make inferences about the θ|Y ]. Such samples are inherently dependent but, latent process S and the parameters β and θ, which, subject to careful checking of mixing and conver- respectively, parameterise the intensity of the LGCP gence properties, their empirical distribution is an and the covariance structure of S. unbiased estimate of the target, and, in principle, In the Bayesian paradigm we treat S, β and θ as the associated Monte Carlo error can be made ar- random variables, assign priors to the model param- bitrarily small by using a sufficiently long run of eters (β,θ) and make inferential statements using the chain. In the current context, we follow Møller, the posterior/predictive distribution, Syversveen and Waagepetersen (1998) and Brix and [S,β,θ|Y ] ∝ [Y |S,β,θ][S|θ][β,θ]. Diggle (2001) in using a standardised version of S, SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 7

−1 denoted Γ, and transform θ to the log-scale, so that of ζ, that is, Ξopt = {−E[I(ζˆ)]} where I is the the MCMC algorithm operates on the whole of Rd, observed information. However, this matrix is mas- rather than on a restricted subset. We denote the sive, dense and intractable. In practice, we can ob- ith sample from the chain by ζ(i) and write π(ζ|Y ) tain an efficient algorithm by choosing Ξ to be an for the target distribution. approximation of Ξopt and further by changing h The aim in designing MCMC algorithms for any during the course of the algorithm using adaptive specific class of problems is to achieve faster con- MCMC (Andrieu and Thoms (2008); Roberts and vergence and better mixing than would be obtained Rosenthal (2007)). In MALA algorithms, h can be by generic off-the-shelf methods. Gilks, Richardson tuned adaptively to achieve an approximately opti- and Spiegelhalter (1995) and Gamerman and Lopes mal acceptance rate of 0.574 (Roberts and Rosenthal (2006) give overviews of the extensive literature on (2001)). this topic. We focus our discussion on the Metropolis- Since the gradient of log π with respect to θ can Hastings (MH) algorithm, which includes as a spe- be both difficult to compute and computationally cial case the popular Gibbs sampler (Metropolis costly, we instead suggest a proposal et al., 1953; Hastings (1970); Geman and Geman for the θ-component of ζ. In the examples described (1984); Spiegelhalter, Thomas and Best, 1999). In in Section 5 we used the following overall proposal: ∗ order to use the MH algorithm, we require a pro- q(ζ(i )|ζ(i−1)) posal density, q(·|ζ(i−1)). At the ith iteration of the ∗ algorithm, we sample a candidate, ζ(i ), from q(·), (i) (i∗) and set ζ = ζ with probability ∗ = N ζ(i ); (i∗) (i−1) (i∗) π(ζ |Y ) q(ζ |ζ )  min 1, ∗ ,  π(ζ(i−1)|Y ) q(ζ(i )|ζ(i−1))     (i) (i−1) (21)  otherwise set ζ = ζ . The choice of q(·) is crit- 2 2 (i−1) (i−1) h hΓ ∂ log{π(ζ |Y )} ical. Previous research on inferential methods for Γ + ΞΓ 2 ∂Γ spatial and spatio-temporal log-Gaussian Cox pro-  2 2  h h ∂ log{π(ζ(i−1)|Y )} , cesses has advocated the Metropolis-adjusted Lan- β(i−1) + β Ξ  2 β ∂β  gevin algorithm (MALA), which mimics a Langevin    (i−1)  diffusion on the target of interest; see Roberts and  θ  Tweedie (1996), Møller, Syversveen and Waagepeter-   sen (1998) and Brix and Diggle (2001); note also 2 hΓΞΓ 0 0 Brix and Diggle (2003) and Taylor and Diggle (2013b). 2 2  h 0 h Ξβ 0 . Alternatives to MH include Hamiltonian Monte Carlo  β  2  methods, as discussed in Girolami and Calderhead 0 0 chθΞθ    (2011).   The Metropolis-adjusted Langevin algorithm ex- −1 In (21), ΞΓ is an approximation to {−E[I(Γ)]ˆ } , ploits gradient information to identify efficient pro- 2 2 and similarly for Ξβ and Ξθ. The constants h , h posals. The algorithms in this article make use of a Γ β and h2 are the approximately optimal scalings for “pre-conditioning matrix”, Ξ (Girolami and Calder- θ Gaussian targets explored by the Gaussian random head (2011)), to define the proposal walk or MALA proposals (Roberts and Rosenthal ∗ 2 1/3 q(ζ(i )|ζ(i−1)) (2001)); these are, respectively, 1.65 / dim(Γ) , 1.652/ dim(β)1/3 and 2.382/ dim(θ), where dim is ∗ (20) = N ζ(i ); the dimension.  The acceptance rate for a random walk proposal is h2 often tuned to around 0.234, which is optimal for a ζ(i−1) + Ξ∇ log{π(ζ(i−1)|Y )}, h2Ξ , Gaussian target in the limit as the dimension of the 2  target goes to infinity. At each step in our algorithm, where h is a scaling constant. Ideally, Ξ should be we jointly propose new values for (S, β) and for θ us- the negative inverse of the Fisher information ma- ing, respectively, a MALA and a random walk com- trix evaluated at the maximum likelihood estimate ponent in the overall proposal, but we also seek to 8 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR maintain an acceptance rate of 0.574 to achieve op- area, B(x,t), of the intersection of A and a circular timality for the MALA parts of the proposal. As a disc with centre x and radius h, hence, compromise, in our proposal we scale the matrix Ξθ n −1 by a constant factor c and the proposal covariance (23) λ˜(x; h)= B(x; h) I(kx − xik≤ h). matrix by a single adaptive h. In the examples de- i=1 X scribed in Section 5 we used a value of c = 0.4, which This estimate is, in essence, a simple form of bivari- appears to work well across a range of scenarios. ate kernel smoothing with a uniform kernel function (Silverman (1986)). Berman and Diggle (1989) de- 5. APPLICATIONS rived the mean square error of (23) as a function of h under the assumption that the underlying point pro- 5.1 Smoothing a Spatial Point Pattern cess is a stationary Cox process. They then showed The intensity, λ(x), of an inhomogeneous spatial how to estimate, and thereby approximately min- point process is the unique nonnegative valued func- imise, the mean square error without further para- tion such that the expected number of points of the metric assumptions. process, called events, that fall within any spatial A different way to formalise the smoothing prob- region B is lem is as a prediction problem associated with the log-Gaussian Cox process, (3). In this formulation, (22) µ(B)= λ(x) dx. Λ(x) = exp{β + S(x)}, where S(·) is a stationary ZB Gaussian process indexed by a parameter θ and the Suppose that we wish to estimate λ(x) from a par- target for prediction is Λ(x). The formal solution is tial realisation consisting of all of the events of the the predictive distribution of Λ(·) given X. For a smooth estimate, analogous to (23), we take λˆ(x) to process that fall within a region A, hence, X = {xi ∈ A : i = 1,...,n}. Figure 1 shows an example in which be a suitable summary of the predictive distribution, the data are the locations of 703 hickory trees in a for example, its point-wise expectation or median. This is still a nonparametric solution, in the sense 19.6 acre (281.6 by 281.6 metre) square region A that no parametric form is specified in advance for (Gerrard (1969)), which we have re-scaled to be of λˆ(x). The parameterisation of the Gaussian process dimension 100 by 100. S(·) is the counterpart of the choices made in the An intuitively reasonable class of estimators for kernel estimation approach, namely, the specifica- λ(x) is obtained by counting the number of events tion of the uniform kernel in (23) and the value of that lie within some fixed distance, h, say, of x and 2 the bandwidth, h. dividing by πh or, to allow for edge-effects, by the For this application, we specify that S(·) has mean −0.5σ2, variance σ2 and exponential correlation func- tion, r(u) = exp(−u/φ), hence, θ = (σ2, φ). We con- duct Bayesian predictive inference using MCMC methods implemented in an extension of the R pack- age lgcp (Taylor et al., 2013). For β we chose a dif- fuse prior, β ∼ N(0, 106). For σ and φ, we chose Nor- mal priors on the log scale: log σ ∼ N(log(1), 0.15) and log φ ∼ N(log(10), 0.15). We initialised the MCMC as follows. For σ and φ, we minimised 25 (Kˆ (r)0.25 − K(r; σ, φ)0.25)2 dr, Z0 where K(r; σ, φ) is the K-function of the model and Kˆ (r) is Ripley’s estimate (Ripley 1976, 1977), re- sulting in initial values of σ = 0.50 and φ = 12.66. The initial value of Γ was set to a 256 × 256 matrix of zeros and β was initialised using estimates from Fig. 1. Locations of 703 hickories in a 19.6 acre square plot, an overdispersed Poisson generalised linear model re-scaled to 100 by 100 units (Gerrard (1969)). fitted to the cell counts, ignoring spatial correlation. SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 9

Fig. 2. Prior (continuous curve) and posterior (histogram) distributions for the parameters β, σ and φ in the LGCP model for the hickory data.

For the MCMC, we used a burn-in of 100,000 intensity. The LGCP-based solution also enables us iterations followed by a further 900,000 iterations, to map areas of particularly low or high intensity. of which we retained every 900th iteration so as The middle and right plots in Figure 3 are maps of to give a weakly dependent sample of size 1000. P{exp[S(x)] < 1/2} and P{exp[S(x)] > 2}. The ar- Convergence and mixing diagnostics are shown in eas in these plots where the posterior probabilities the supplementary material [Diggle et al. (2013)]. are high correspond, respectively, to areas where the Figure 2 compares the prior and posterior distri- density of trees is less than half and more than dou- butions of the three model parameters showing, in ble the mean density. particular, that the data give only weak informa- The LGCP-based solution to the smoothing prob- tion about the correlation range parameter, φ. This lem is arguably over-elaborate by comparison with is well known in the classical geostatistical context simpler methods such as kernel smoothing. Against where the data are measured values of S(x) (see, this, arguments in its favour are that it provides a e.g., Zhang (2004)), and is exacerbated in the point principled rather than an ad hoc solution, proba- process setting. bilistic prediction rather than point prediction, and The left plot in Figure 3 shows the pointwise 50th an obvious extension to smoothing in the presence of percentiles of the predictive distribution for the tar- explanatory variables by specifying Λ(x) = get, Λ(x) over the observation window; this clearly exp{u(x)′β + S(x)}, where u(x) is a vector of spa- identifies the pattern of the spatial variation in the tially referenced explanatory variables.

Fig. 3. Left: 50% posterior percentiles of Λ(x) = exp{β + S(x)} for the hickory data. Middle: plot of posterior P{exp[S(x)] < 1/2}. Right: plot of posterior P{exp[S(x)] > 2}. Middle and right plots also show the locations of the trees. 10 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR

5.2 Spatial Segregation: Genotypic Diversity of Bovine Tuberculosis in Cornwall, UK Our second application concerns a multivariate version of the smoothing problem described in Sec- tion 5.1. Events are now of k types, hence, the data are X = {Xj : j = 1,...,k}, where Xj = {xij ∈ A : i = 1,...,nj} and the corresponding intensity functions k are λj(x) : j = 1,...,k. Write λ(x)= j=1 λj(x) for the intensity of the superposition. Under the addi- tional assumption that the underlyingP process is an inhomogeneous Poisson process, then conditional on the superposition, the labellings of the events are a sequence of independent multinomial trials with position-dependent multinomial probabilities, Fig. 4. Locations of cattle herds in Cornwall, UK, that have pj(x)= λj(x)/λ(x) tested positive for bovine tuberculosis (BTB) over the period = P(event at location x is of type j) 1989 to 2002. Points are coded according to the genotype of the infecting BTB organism. j = 1,...,k. A basic question for any multivariate point pro- reservoirs of infection in local wildlife populations cess data is whether the type-specific component (Woodroffe et al., 2005; Donnelly et al. (2006)). processes are independent. When they are not, fur- To model the data, we consider a multivariate log- ther questions of interest are context-specific. Here, Gaussian Cox process with we describe an analysis of data relating to bovine Λk(x) = exp(βk + S0(x)+ Sk(x)) tuberculosis in the county of Cornwall, UK. (24) Bovine tuberculosis (BTB) is a serious disease of k = 1,...,m. cattle. It is endemic in parts of the UK. As part of In (24), m = 4 is the number of genotypes, the pa- the national control strategy, herds are regularly in- rameters βk relate to the intensities of the compo- spected for BTB. When disease in a herd is detected nent processes, S0(x) is a Gaussian process common and at least one tuberculosis bacterium is success- to all types of points and the Sk(x) : k = 1,...,m fully cultured, the genotype that is responsible for are Gaussian processes specific to each genotype. Al- the BTB breakdown can be determined. Here, we though S0(x) is not identifiable from our data with- re-visit an example from Diggle, Zheng and Durr out additional assumptions, its inclusion helps the (2005) in which the events are the locations of cat- interpretation of the model, in particular, by em- tle herds in the county of Cornwall, UK, that have phasising that the component intensities Λk(x) are tested positive for bovine BTB over the period 1989 not mutually independent processes. to 2002, labelled according to their genotypes. The In this example, we used informative priors for the data, shown in Figure 4, are limited to the 873 lo- model parameters: log σ ∼ N(log 1.5, 0.015), log φ ∼ 6 cations with the four most common genotypes; six N(log 15,000, 0.015) and βk ∼ N(0, 10 ). Because the less common genotypes accounted for an additional algorithm mixes slowly, this proved to be a very 46 cases. challenging computational problem. For the MCMC, The question of primary interest in this example we used a burn-in of 100,000 iterations followed by is whether the genotypes are randomly intermin- a further 18,000,000 iterations, of which we retained gled amongst the locations and, if not, to what ex- every 18,000th iteration so as to give a sample of size tent specific genotypes are spatially segregated. This 1000. Convergence, mixing diagnostics and plots of question is of interest because the former would be the prior and posterior distributions of σ and φ are consistent with the major transmission mechanism shown in the supplementary material [Diggle et al. being cross-infection during the county-wide move- (2013)]. These plots show that the chain appeared to ment of animals to and from markets, whereas the have reached stationarity with low autocorrelation latter would be indicative of local pools of infection, in the thinned output. The plots also illustrate that possibly involving transmission between cattle and there is little information in the data on σ and φ. SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 11

Fig. 5. Genotype-specific probability surfaces for the Cornwall BTB data. Upper-left panel corresponds to genotype 9, up- per-right to genotype 12, lower-left to genotype 15, lower-right to genotype 20.

Within (24) the hypothesis of randomly intermin- tial variation in the probability that a case at loca- gled genotypes corresponds to Sk(x)=0: k = 1,..., 4, tion x is of type k, for each of k = 1,..., 4. These for all x. Were it the case that farms were uniformly conditional probabilities are distributed over Cornwall, S0(x) would then repre- sent the spatial variation in the overall risk of BTB, Λk(x) pk(x)= m = exp − {βj + Sj(x)} irrespective of genotype. Otherwise, S (x) conflates Λj(x) 0 j=1  j6=k  spatial variation in overall risk with the spatial dis- X P tribution of farms. For the Cornwall BTB data the and do not depend on the unidentifiable common evidence against randomly intermingled genotypes component S0(x). Figure 5 shows point predictions is overwhelming and we focus our attention on spa- of the four genotype-specific probability surfaces, 12 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR

with Λk(x,t), and Sk(x,t) for k = 0,...,m spatio- temporal versions of the purely spatial processes in (24) and Zk(x,t) a vector of spatio-temporal covari- ates. Unlike purely spatial models, spatio-temporal models are potentially able to investigate mecha- nistic hypotheses about disease transmission. For example, in the context of this example a spatio- temporal analysis could distinguish between segre- gated patches that are stable over time or that grow from initially isolated cases. 5.3 Disease Atlases Figure 7 is a typical example of the kind of map that appears in a variety of cancer atlases. This ex- ample is taken from a Spanish national disease atlas project (L´opez-Abente et al., 2006). The map esti- mates the spatial variation in the relative risk of lung cancer in the Castile-La Mancha Region of Spain and some surrounding areas. It is of a type known Fig. 6. k-dominant areas for each of the four genotypes in to geographers as a choropleth map, in which the ge- the Cornwall data. ographical region of interest, A, is partitioned into a set of subregions Ai and each subregion is colour- coded according to the numerical value of the quan- defined as the conditional expectationsp ˆk(x) = tity of interest. The standard statistical methodol- E[pk(x)|X] for each of k = 1,..., 4. As argued earlier, one advantage of a model-based ogy used to convert data on case-counts and the approach to spatial smoothing is that results can number of people at risk in each subregion is the fol- be presented in ways that acknowledge the uncer- lowing hierarchical Poisson-Gaussian Markov ran- tainty on the point predictions. We could replace dom field model, due to Besag, York and Moli´e(1991). each panel of Figure 5 by a set of percentile plots, Let Yi denote the number of cases in subregion as in Figure 3. For an alternative display that fo- Ai and Ei a standardised expectation computed as cuses more directly on the core issue of spatial seg- the expected number of cases, taking into account the demographics of the population in subregion A regation, let Ak(c, q) denote the set of locations x i but assuming that risk is otherwise spatially homo- for which P{pk(x) > c|X} >q. As c and q both ap- geneous. Assume that the Y are conditionally inde- proach 1, each Ak(c, p) shrinks towards the empty i set, but more slowly in a highly segregated pattern pendent Poisson-distributed conditional on a latent than in a weakly segregated one. In Figure 6 we random vector S = (S1,...,Sm), with conditional show the areas Ak(0.8,q) for each of q = 0.6, 0.7, 0.8 means µi = Ei exp(α + Si). Finally, assume that S and 0.9. Genotype 9, which contributes 494 to the is multivariate Gaussian, with its distribution spec- total of 873 cases, dominates strongly in an area to ified as a Gaussian Markov random field (Rue and the east and less strongly in a smaller area to the Held (2005)). A Markov random field is a multivari- west. Genotype 15 contributes 166 cases and domi- ate distribution specified indirectly by its full condi- nates in a single, central area. Genotypes 12 and 20 tionals, [Si|Sj : j 6= i]. In the Besag, York and Moli´e each contribute a proportion of approximately 0.12 (1991) model the full conditionals take the so-called to the total, with only small pockets of dominance intrinsic autoregressive form, to the south-west. 2 (25) Si|Sj : j 6= i ∼ N(S¯i,τ /ni), If infection times were known, we could perform ¯ −1 inference via MCMC under a spatio-temporal ver- where Si = ni j∼i Sj is the mean of the Sj over sion of the model, subregions Aj considered to be neighbours of Ai and n is the numberP of such neighbours. Typically, sub- Λ (x,t) = exp(Z (x,t)β + S (x,t)+ S (x,t)) i k k k 0 k regions are defined to be neighbours if they share a k = 1,...,m, common boundary. SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 13

Fig. 7. Lung Cancer mortality in the Castile-La Mancha Region of Spain. Figure reproduced from page 42 of L´opez-Abente et al. (2006) by kind permission of the authors.

An alternative approach is to model the locations incorporate by fitting the model of individual cancer cases as an LGCP with inten- (26) Λ(x)= d(x)exp{z(x)′β + S(x)}, sity Λ(x)= d(x)R(x), where d(x) represents popula- tion density, assumed known, and R(x) denotes dis- treating the covariate surfaces z(x) as piece-wise ease risk, R(x) = exp{S(x)}. Conditional on R(·), constant. case-counts in subregions Ai are independent and For Bayesian inference under the continuous model Poisson-distributed with means (26) we follow Li et al. (2012) by adding standard data augmentation techniques to the MCMC fitting µ = d(x)R(x) dx. i algorithm described earlier. Recall that for compu- ZAi This approach leads to spatially smooth risk-maps tational purposes, we perform all calculations on whose interpretation is independent of the partic- a fine grid, treating the cell counts in each grid cell as Poisson distributed conditional on the la- ular partition of A into subregions Ai. This is an tent process S(·). Provided the computational grid important consideration when the Ai differ greatly in size and shape, as the definition of neighbours is fine enough, each Ai can be approximated by the in an MRF model then becomes problematic; see, union of a set of grid cells, and we can use a grid- for example, Wall (2004). Fitting a spatially con- based Gibbs sampling strategy, repeatedly sampling tinuous model also has the potential to add infor- first from [S,β,θ|N,Y+] = [S,β,θ|N] and then from mation to an analysis of aggregated data, for ex- [N|S,β,θ,Y+], where N are the cell counts on the ample, when data on environmental risk-factors are computational grid, Y+ = {Yi = x∈Ai N(x) : i = available at high spatial resolution. A caveat is that 1,...,m} and θ parameterises the covariance struc- P the population density may only be available in the ture of S. Sampling from the first of these densities form of small-area population counts, implying a can be achieved using a Metropolis-Hastings update piece-wise constant surface d(x) that can only be as discussed in Section 4. The second density is a a convenient fiction. Note, however, that spatially multinomial distribution and poses no difficulty. continuous modelled population density maps have Our priors for this example were as follows: log σ ∼ been constructed and are freely available; see, for N(log 1, 0.3), log φ ∼ N(log 3000, 0.15) and β ∼ example, http://sedac.ciesin.columbia.edu/data/ MVN(0, 106I). For the MCMC algorithm, we used set/gpw-v3-population-density. a burn-in of 100,000 iterations followed by a fur- For the Spanish lung cancer data, we have covari- ther 18,000,000 iterations, of which we retained ev- ate information available at small-area, which we ery 18,000th iteration so as to give a sample of size 14 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR

Table 1 Selected quantiles of the posterior distributions of standardised covariate effects for the Spanish lung cancer data

Quantile Parameter 0.50 0.025 0.975

Percentage illiterate 1.13 1.03 1.24 Percentage unemployed 0.92 0.8 1.03 Percentage farmers 0.88 0.76 1.00 Percentage of people over 65 years old 1.2 0.96 1.51 Income index 1.19 1.03 1.39 Average number of people per home 0.98 0.75 1.26

65 and average number of people per home) had a protective effect, but only significantly so in the case of percentage farmers. Fig. 8. Posterior covariance function. Figure 9 shows the resulting maps. The top left- hand panel shows the predicted, covariate-adjusted relative risk surface derived from the log-Gaussian 1000. Convergence, mixing diagnostics and plots of Cox process model (26). This predicted relative risk the prior and posterior distributions of σ and φ are surface reveals several small areas of raised risk that shown in the supplementary material [Diggle et al. are not apparent in Figure 7. The top right-hand (2013)]. As in the Cornwall BTB analysis, these panel shows the log of the estimated variance of rel- plots indicated convergence to the stationary distri- ative risk. To account for this variation, we produced bution and low autocorrelation in the thinned out- a plot of the posterior probability that relative risk put. In the analysis reported here, we base our offset exceeds 1.1, shown in the bottom panel. This shows on modelled population data at 100 metre resolution that higher rates of incidence appear to be mainly obtained from the European Environment Agency; confined to a number of small townships, the largest see http://www.eea.europa.eu/data-and-maps/ of which is an area to the north of Toledo and sur- data/population-density-disaggregated-with-corine- rounding the Illescas municipality, where there are a land-cover-2000-2. We projected this very fine pop- number of contiguous cells for which the probability ulation information onto our computational grid, exceeds 0.6. which consisted of cells 3100 × 3100 metres in di- We acknowledge that this is an illustrative exam- mension. We used an exponential model for the co- ple. In particular, we cannot guarantee the reliabil- variance function of S(·) and estimated its param- ity of the estimate of population density used as an eters (posterior median and 95% credible interval) offset. to be σ = 1.57 (1.45, 1.71) and φ = 1294 (814, 1849) In a discussion of Markov models for spatial data, metres. Figure 8 illustrates the shape of the pos- Wall (2004) investigated properties of the covari- terior covariance function; it can be seen from this ance structure implied by the simultaneous and con- plot that the posterior dependence between cells is ditional autoregressive models on an irregular lat- over a relatively small range. tice. She concluded that the “implied spatial corre- Table 1 summarises our estimation of covariate lation [between cells in these] models does not seem effects. Our results show that estimated (posterior to follow an intuitive or practical scheme” and ad- median) mortality rates were higher in areas with vises “[using] other ways of modelling lattice data higher rates of illiteracy and higher income; these . . . should be considered, especially when there is in- effects were statistically significant at the 5% level, terest in understanding the spatial structure”. Our in the sense that the Bayesian 95% credible intervals approach is one such. Others, which we discuss in excluded zero. The remaining covariates (unemploy- Section 7, include proposals in Best, Ickstadt and ment, percentage farmers, percentage of people over Wolpert (2000) and Kelsall and Wakefield (2002). SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 15

Fig. 9. Lung Cancer mortality in the Castile-La Mancha Region of Spain. The top left panel shows covariate-adjusted relative risk. The range of values was restricted to lie between 0.5 and 1.5 to allow comparison with Figure 7. Inside the Castile-La Mancha region, cells with mean relative risk greater than 1.5 appear dark red and cells with relative risk below 0.5 appear white. The top right panel shows the log of the estimated variance of relative risk. The bottom panel shows the predictive probability that the covariate-adjusted relative risk exceeds 1.1.

Our spatially continuous formulation does not en- units, but rather operates at the fine resolution of tirely rescue us from the trap of the ecological fal- the computational grid. In effect, this enables us to lacy (Piantadosi, Byar and Green, 1988; Greenland place a spatially continuous interpretation on any and Morgenstern (1990)). In a spatial context, this parameters relating to continuously measured com- refers to the fact that the association between a ponents of the model, whether covariates or the la- risk-factor and a health outcome need not be, and tent stochastic process S(x). usually is not, independent of the spatial scale on which the risk-factor and outcome variables are de- 6. SPATIO-TEMPORAL LOG-GAUSSIAN COX fined. In our example, we have to accept that treat- PROCESSES ing covariate surfaces as if they were piece-wise con- 6.1 Models stant is a convenient fiction. However, our method- ology avoids any necessity to aggregate all covariate A spatio-temporal LGCP is defined in the obvi- and outcome variables to a common set of spatial ous way, as a spatio-temporal 16 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR conditional on the realisation of a stochastic inten- where the hij are functions of the corresponding lo- sity function Λ(x,t)=exp{S(x,t)}, where S(·) is a cations, xi and xj. For a review of models of this Gaussian process. Gneiting and Guttorp (2010b) re- kind, see Gamerman (2010). view the literature on formulating models for spatio- temporal Gaussian processes. They make a useful 6.2 Spatio-Temporal Prediction: Real-Time distinction between physically motivated construc- Monitoring of Gastrointestinal Disease tions and more empirical formulations. An exam- An early implementation of spatio-temporal log- ple of the former is given in Brown et al. (2000), Gaussian process modelling was used in the AEGISS who propose models based on a physical disper- project (Ascertainment and Enhancement of sion process. In discrete time, with δ denoting the Gastroenteric Infection Surveillance Statistics, see time-separation between successive realisations of http://www.maths.lancs.ac.uk/˜diggle/Aegiss/day. the spatial field, their model takes the form html%3fyear=2002). The overall aim of the project S(x,t) was to investigate how health-care data routinely (27) collected within the UK’s National Health Service = hδ(u)S(x − u, t − δ) du + Zδ(x,t), (NHS) could be used to spot outbreaks of gastro- Z intestinal disease. The project is described in detail where hδ(·) is a smoothing kernel and Zδ(·) is a noise in Diggle et al. (2003), whilst Diggle, Rowlingson process, in each case with parameters that depend and Su (2005) give details of the spatio-temporal on the value of δ in such a way as to give a consistent statistical model. interpretation in the spatio-temporally continuous As part of the government’s modernisation limit as δ → 0. Amongst empirical spatio-temporal covariance programme for the NHS, the nonemergency NHS models, a basic distinction is between separable and Direct telephone service was launched in the late nonseparable models. Suppose that S(x,t) is sta- 1990s, and by 2000 was serving all of England tionary, with variance σ2 and correlation function and Wales (http://www.nhsdirect.nhs.uk/About/ r(u, v) = Corr{S(x,t),S(x − u, t − v)}. In a separa- WhatIsNHSDirect/History). Callers to this 24-hour ble model, r(u, v)= r1(u)r2(v), where r1(·) and r2(·) system were questioned about their problem and ad- are spatial and temporal correlation functions. The vised accordingly. This process reduced calls to an separability assumption is convenient, not least be- “algorithm code” which was a broad classification cause any valid specification of r1(u) and r2(v) guar- of the problem. Basic information on the caller, in- antees the validity of r(u, v), but it is not especially cluding age, sex and postal code, was also recorded. natural. Parametric families of nonseparable models Cooper and Chinemana (2004) give a more detailed are discussed in Cressie and Huang (1999), Gneiting description of the NHS Direct system. Mark and (2002), Ma (2003, 2008) and Rodrigues and Diggle Shepherd (2004) analyse its impact on the demand (2010). for primary care in the UK. Cooper et al. (2003) As noted by Gneiting and Guttorp (2010b), whilst report a retrospective analysis of 150,000 calls to spatio-temporally continuous processes are, in for- NHS Direct classified as diarrhoea or vomiting, and mal mathematical terms, simply spatially continu- concluded that fluctuations in the rate of such calls ous processes with an extra dimension, from a sci- could be a useful proxy for monitoring the incidence entific perspective models need to reflect the funda- of gastrointestinal illness. mentally different nature of space and time, and, in In the AEGISS project, residential postal codes particular, time’s directional quality. For this rea- associated with calls classified as relating to diar- son, in applications where data arise as a set of rhoea or vomiting were converted to grid references spatially indexed time-series, a natural way to for- mulate a spatio-temporal model is as a multivariate using a lookup table. Postal codes at this level are whose cross-covariance functions are spa- referenced to 100 metre precision, which on the scale tially structured. For example, a spatially discrete of the study area (the county of Hampshire) is effec- version of (27) on a finite set of spatial locations tively continuous. The data then formed a spatio- xi : i = 1,...,n and integer times t would be temporal point pattern. n The daily extraction of data for Hampshire and (28) Sit = hijSi,t−1 + Zit, the location coding was done by the NHS at South- ampton. These data were encrypted and sent by Xj=1 SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 17

Fig. 10. AEGISS web page design. Left-hand panel shows the original design, right-hand panel a modern redesign. email to Lancaster, where the emails were automat- Plug-in predictive inference was then performed ically filtered, decrypted and stored. An overnight using the MALA algorithm on each new set of data run of the MALA algorithm described in Brix and arriving overnight. Instead of storing the outputs Diggle (2001) took the latest data and produced from each of 10,000 iterations, only a count of where maps of predictive probabilities for the risk exceed- S(x,t) exceeded a threshold that corresponded to 2, ing multiples 2, 4 and 8 of the baseline rate. 4 or 8 times the baseline risk was retained. This The specification of the model, based on an ex- range of thresholds was chosen in consultation with ploratory analysis of the data, was a spatio-temporal clinicians; a doubling of risk was considered of pos- LGCP with intensity sible interest, whilst an eightfold increase was con- sidered potentially serious. These exceedence counts Λ(x,t)= λ (x)µ (t)exp{S(x,t)}. 0 0 were then converted into exceedence probabilities. The spatial baseline component, λ0(x), was calcu- Presentation of these exceedence maps was an im- lated by a kernel smoothing of the first two years of portant aspect of the AEGISS project. At the time, case locations, whilst the temporal baseline, µ0(t), there were few implementations of maps on the inter- was obtained by fitting a standard Poisson regres- net—UMN MapServer was released as open source sion model to the counts over time. This regression in 1997 and the Google Maps service started in 2005. model included an annual seasonal component, a A simpler approach was used where static images factor representing the day-of-the-week and a trend of the exceedence probabilities were generated by term to represent the increasing take-up of the NHS R’s graphics system. Regions where the exceedence Direct service during the life-time of the project. probability was higher than 0.9 were outlined with a The parameters of S(x,t) were then estimated us- box and displayed in a zoomed-in version below the ing moment-based methods, as in Brix and Dig- main graphic. Other page controls enabled the user gle (2001), with a separable correlation structure. to select the threshold value as 2, 4 or 8, and to se- Uncertainty in these parameter estimates was con- lect a day or month. A traffic light system of green, sidered to have a minimal effect on the predictive amber and red warnings dependent on the sever- distribution of S(x,t) because parameter estimates ity of exceedence threshold crossings was developed are informed by all of the data, whereas predic- for rapid assessment of conditions on any particular tion of S(x,t) given the model parameters benefits day. The left-hand panel of Figure 10 shows a day only from data points that lie close to (x,t), that where two clusters of grid cells show high predictive is, within the range of the spatio-temporal correla- probability of at least a doubling of risk relative to tion. baseline. 18 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR

With modern web-based technologies the user in- deal with data in the form of disease counts on a terface could be constructed as a dynamic web- partition of the region of interest, A, into a dis- mapping system that would allow the user freely crete set of subregions, Ai, together with covariate to navigate the study region. Layers of information, information on a different partition, Bi, say. Their such as cases or exceedence probability maps, can solution is based on creating a single, finer parti- then be selected by the user as overlays. The right- tion that includes all nonzero intersections Ai ∩ Bj hand panel of Figure 10 shows the same day as the . Best, Ickstadt and Wolpert (2000) also consider left-hand panel, but uses the OpenLayers (http:// count data on a discrete partition of A, but assume www.openlayers.org) web-mapping toolkit to super- that covariate information on a risk factor of inter- impose the cases and risk surface on a base map est is available throughout A. They consider count composed of data from OpenStreetMap (http:// data to be derived from an underlying Cox pro- www.openstreetmap.org). This also shows the layer cess whose intensity varies in a spatially continu- selector menu for further customisation. ous manner through the combination of a covariate Increases in computing power and algorithmic ad- effect and a latent stochastic process modelled as vances mean that longer MCMC runs can be per- a kernel-smoothed gamma random field. They then formed overnight or on finer spatial resolutions. How- derive the distribution of the observed counts by ever, increasing ethical concerns over data use and spatial integration over the A . Kelsall and Wake- patient confidentiality mean that finely resolved i spatio-temporal data are becoming harder to ob- field (2002) take a similar approach, but using a tain. Recent changes in the organisation of the NHS log-Gaussian latent stochastic process rather than 24-hour telephone helpline has meant that several a gamma random field. The technical and compu- providers will now be responsible for regional ser- tational issues that arise when handling spatial in- vices contributing to a new system, NHS111 (http:// tegrals of stochastic processes can be simplified by www.nhs.uk/111). AEGISS was originally conceived using low-rank models, such as the class of Gaus- as a pilot project that could be rolled out to all of the sian predictive process models proposed by Banerjee UK, but obtaining data from all the new providers et al. (2008) and further developed by Finley et al. and dealing with possible systematic differences be- (2009). Gelfand (2012) gives a useful summary of tween them in order to perform a statistically rig- this and related work. orous analysis is now more challenging. The future All of these approaches can be subsumed within of health surveillance systems may lie in the use a single modelling framework for multiple exposures of multivariate spatio-temporal models to combine and disease risk by considering these as a set of spa- information from multiple data streams including tially continuous processes, irrespective of the spa- nontraditional proxies for health outcomes, such as tial resolution at which data elements are recorded. nonprescription medicine sales, counts of key words For example, a model for the spatial association be- and phrases used in search engine queries, and text- tween disease risk, R(x), and m exposures Tk(x) : k = mining of social media sites. 1,...,m can be obtained by treating individual case- locations as an LGCP with intensity 7. DATA SYNTHESIS: INTEGRATED ANALYSIS OF EXPOSURE AND HEALTH p OUTCOME DATA AT MULTIPLE SPATIAL (29) R(x)=exp α + βkTk(x)+ S(x) , ( ) SCALES Xk=1 The ubiquitous problem of dealing with exposure where S(x) denotes stochastic variation in risk that and health outcome data recorded at disparate spa- is not captured by the p covariate processes Tk(x). tial scales is known to geographers as the “modifi- The inferential algorithms associated with model (29) able areal unit problem.” See, for example, the re- would then depend on the structure of the available views by Gotway and Young (2002) and Dark and data. Bram (2007). In the statistical literature, a more Suppose, for example, that health outcome data common term is “spatial misalignment.” See, for ex- are available in the form of area-level counts, Yi : i = ample, Gelfand (2010). Several authors have consid- 1,...,n, in subregions Ai, whilst exposure data are ered special cases of this problem in an epidemio- obtained as collections of unbiased estimates, Uik, of logical setting. Mugglin, Carlin and Gelfand (2000) the Tk(x) at corresponding locations xik : i = 1,...,mk. SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 19

Suppose further that the Uik are conditionally in- the study, approximately 1700 subjects i = 1,...,n 2 dependent, with Uik|Tk(·) ∼ N(Tk(xik),τk ), the pro- at residential locations xi provide blood-samples on cesses Tk(·) are jointly Gaussian and the process S(·) recruitment and at subsequent times tij approxi- is also Gaussian and independent of the Tk(·). A pos- mately 6, 12, 18 and 24 months later. At each post- sible inferential goal is to evaluate the predictive recruitment visit, sero-conversion is defined as a distribution of the risk surface R(·) given the data change from zero to positive, or at least a fourfold Yi : i = 1,...,m and Uik : i = 1,...,mk; k = 1,...,p. increase in concentration. The resulting data con- In an obvious shorthand, and temporarily ignoring sist of binary responses, Yij = 0/1 : j = 1, 2, 3, 4 (sero- the issue of parameter estimation, the required pre- conversion no/yes), together with a mix of time- dictive distribution is [S, T |U, Y ]. The joint distri- constant and time-varying risk-factors, rij . bution of S, T , U and Y factorises as A conventional analysis might treat the data from each subject as a time-sequence of binary responses (30) [S, T, U, Y ] = [S][T ][U|T ][Y |S, T ], with associated explanatory variables. Widely used where [S] and [T ] are multivariate Gaussian densi- methods for data of this kind include generalised ties, [U|T ] is a product of univariate Gaussian densi- estimating equations (Liang and Zeger (1986)) and ties, and [Y |S, T ] is a product of Poisson probability generalised linear mixed models (Breslow and Clay- distributions with means ton (1993)). An analysis more in keeping with the philosophy of the current paper would proceed as µi = R(x) dx. follows. Ai Z Let ai and bi(t) denote time-constant and time- Sampling from the required predictive distributions varying explanatory variables associated with sub- can then proceed using a suitable MCMC algorithm. ject i, and tij the times at which blood samples For Bayesian parameter estimation, we would aug- are taken, setting ti0 = 0 for all i. Note that ex- ment (30) by a suitable joint prior for the model planatory variables can be of two distinct kinds: parameters before designing the MCMC algorithm. characteristics of an individual subject, for exam- A specific example of data synthesis concerns an ple, their age; and characteristics of a subject’s place ongoing leptospirosis cohort study in a poor commu- of residence, for example, its proximity to an open- nity within the city of Salvador, Brazil. Leptospiro- sewer. In principle, the latter can be indexed by a sis is considered to be the most widespread of the spatially continuous location, hence, ai = A(xi) and zoonotic diseases. This is due to the large number bi(t)= B(xi,t). A response Yij = 1 indicates that at of people worldwide, but especially in poor com- least one infection event has occurred in the time- munities, who live in close proximity to wild and interval (ti,j−1,tij). A model for each subject’s risk domestic mammals that serve as reservoirs of in- of infection then requires the specification of a set fection and shed the agent in their urine. The ma- of person-specific hazard functions, Λi(t). A model jor mode of transmission is contact with contam- that allows for unmeasured risk factors would be a inated water or soil (Levett (2001); Bharti et al. set of LGCPs, one for each subject, with respective (2003); McBride et al., 2005). In the majority of stochastic intensities, cases infection leads to an asymptomatic or mild, (31) Λ (t)=exp{a′ α + b (t )′β + U + S(x ,t)}, self-limiting febrile illness. However, severe cases can i i i ij i i 2 lead to potentially fatal acute renal failure and pul- where the Ui are mutually independent N(0,ν ) and monary haemorrhage syndrome. Leptospirosis is tra- S(x,t) is a spatio-temporally continuous Gaussian ditionally associated with rural-based subsistence process. It follows that farming communities, but rapid urbanization and P{Yit = 1|Λi(·)} widening social inequality have led to the dramatic (32) growth of urban slums, where the lack of basic san- tij itation favours rat-borne transmission (Ko et al., = 1 − exp − Λi(u) du . 1999; Johnson et al., 2004).  Zti,j−1  The goals of the cohort study are to investigate In practice, values of a(x) and b(x,t) may only be the combined effects of social and physical environ- observed incompletely, either at a finite number of mental factors on disease risk, and to map the un- locations or as small-area averages. For notational explained spatio-temporal variation in incidence. In convenience, we consider only a single, incompletely 20 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR observed spatio-temporal covariate whose measured Within the Monte Carlo inferential framework, there values, bk : k = 1,...,m, we model as is no reason why other, less severe transformations from R to R+ should not be used. (33) bk = B(xk,tk)+ Zk, Two areas of current methodological research are where B(x,t) is a spatio-temporal Gaussian process 2 the formulation of models and methods for princi- and the Zk are mutually independent N(0,τ ) mea- pled analysis of multiple data streams that include surement errors. Then, (31) becomes data of variable quality from nontraditional sources, ′ (34) Λi(t)=exp{B(xi,tij) β + Ui + S(xi,t)}. and the further development of robust computa- Inference for the model defined by (32), (33) and tional algorithms that can deliver reliable inferences (34), based on data {yij : j = 1,..., 4; i = 1,...,n} for problems of ever-increasing complexity. and b = {bk : k = 1,...,m}, would require further de- Our general approach reflects a continuing trend velopment of MCMC algorithms of the kind de- in applied statistics since the 1980s. The explosion in scribed in Section 4. the development of computationally intensive meth- ods and associated complex stochastic models has 8. DISCUSSION encouraged a move away from a methods-based clas- In this paper we have argued that the LGCP pro- sification of the statistics discipline and towards a vides a useful class of models, not only for point pro- multidisciplinary, problem-based focus in which sta- cess data but also for any problem involving predic- tistical method (singular) is thoroughly embedded tion of an incompletely observed spatial or spatio- within scientific method. temporal process, irrespective of data format. De- velopments in statistical computation have made the ACKNOWLEDGEMENTS combination of likelihood-based, classical or Bayesian We thank the Department of Environmental and parameter estimation and probabilistic prediction Cancer Epidemiology in the National Center For feasible for relatively large data sets, including real- Epidemiology (Spain) for providing aggregated data time updating of spatio-temporal predictions. from the Castile-La Mancha region for permission to In each of our applications, the focus has been on use the Spanish lung cancer data. prediction of the spatial or spatio-temporal variation The leptospirosis study described in Section 7 is in a response surface, rather than on estimation of funded by a USA National Science Foundation grant, model parameters. In problems of this kind, where with Principal Investigator Professor Albert Ko (Yale parameters are not of direct interest but rather are University School of Public Health). This work was a means to an end, Bayesian prediction in conjunc- supported by the UK Medical Research Council tion with diffuse priors is an attractive strategy, as (Grant number G0902153). its predictions naturally accommodate the effect of parameter uncertainty. Model-based predictions are essentially nonparametric smoothers, but embedded SUPPLEMENTARY MATERIAL within a probabilistic framework. This encourages Supplementary materials for “Spatial and spatio- the user to present results in a way that emphasises, rather than hides, their inherent imprecision. temporal log-Gaussian Cox processes: Extending the In many public health settings, identifying where geostatistical paradigm” and when a particular phenomenon, such as disease (DOI: 10.1214/13-STS441SUPP; .pdf). This mate- incidence, is likely to have exceeded an agreed inter- rial contains mixing, convergence and inferential di- vention threshold is more useful than quoting either agnostics for all of the examples in the main article a point estimate and its standard error or the sta- and is also available from http://www.lancs.ac.uk/ tistical significance of departure from a benchmark. staff/taylorb1/statsciappendix.pdf. The log-linear formulation is convenient because of the tractable moment properties of the log-Gaus- REFERENCES sian distribution. It also gives the model a natural interpretation as a multiplicative decomposition of Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC. Stat. Comput. 18 343–373. MR2461882 the overall intensity into deterministic and stochas- Baddeley, A. J., Møller, J. and Waagepetersen, R. tic components. However, it can lead to very highly (2000). Non- and semi-parametric estimation of interaction skewed marginal distributions, with large patches of in inhomogeneous point patterns. Stat. Neerl. 54 329–350. near-zero intensity interspersed with sharp peaks. MR1804002 SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 21

Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. Diggle, P., Rowlingson, B. and Su, T.-l. (2005). Point (2008). Gaussian predictive process models for large spatial process methodology for on-line spatio-temporal disease data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 825– surveillance. Environmetrics 16 423–434. MR2147534 848. MR2523906 Diggle, P. J., Zheng, P. and Durr, P. (2005). Non- Bartlett, M. S. (1964). The spectral analysis of two- parametric estimation of spatial segregation in a multivari- dimensional point processes. Biometrika 51 299–311. ate point process. Applied Statistics 54 645–658. MR0175254 Diggle, P. J., Knorr-Held, L., Rowlingson, B., Su, T., Bartlett, M. S. (1975). The Statistical Analysis of Spatial Hawtin, P. and Bryant, T. (2003). Towards on-line Pattern. Chapman & Hall, London. MR0402886 spatial surveillance. In Monitoring the Health of Popula- Berman, M. and Diggle, P. (1989). Estimating weighted tions: Statistical Methods for Public Health Surveillance integrals of the second-order intensity of a spatial point (R. Brookmeyer and D. Stroup, eds.). Oxford Univ. process. J. R. Stat. Soc. Ser. B Stat. Methodol. 51 81–92. Press, Oxford. MR0984995 Diggle, P. J., Moraga, P., Rowlingson, B. and Tay- Besag, J., York, J. and Mollie,´ A. (1991). Bayesian image lor, B. M. (2013). Supplement to “Spatial and spatio- restoration, with two applications in spatial statistics. Ann. temporal log-Gaussian Cox processes: Extending the geo- Inst. Statist. Math. 43 1–59. MR1105822 statistical paradigm.” DOI:10.1214/13-STS441SUPP. Best, N. G., Ickstadt, K. and Wolpert, R. L. (2000). Donnelly, C. A., Woodroffe, R., Cox, D. R., Spatial Poisson regression for health and exposure data Bourne, F. J., Cheesman, C. L., Clifton- measured at disparate resolutions. J. Amer. Statist. Assoc. Hadley, R. S., Wei, G., Gettinby, G., Gilks, P., 95 1076–1088. MR1821716 Jenkins, H., Johnston, W. T., Le Fevre, A. M., Bharti, A. R., Nally, J. E., Ricaldi, J. N., McInery, J. P. and Morrison, W. I. (2006). Positive Matthias, M. A., Diaz, M. M., Lovett, M. A., and negative effects of widespread badger culling on Levett, P. N., Gilman, R. H., Willig, M. R., Go- tuberculosis in cattle. Nature 485 843–846. tuzzo, E. and Vinetz, J. M. (2003). Leptospirosis: Finley, A. O., Sang, H., Banerjee, S. and Gelfand, A. E. A zoonotic disease of global importance. Lancet. Infect. (2009). Improving the performance of predictive process Dis. 3 757–771. modeling for large datasets. Comput. Statist. Data Anal. Breslow, N. E. and Clayton, D. G. (1993). Approximate 53 2873–2884. MR2667597 inference in generalized linear mixed models. J. Amer. Frigo, M. and Johnson, S. G. (2011). FFTW Statist. Assoc. 88 9–25. fastest Fourier transform in the west. Available at Brix, A. and Diggle, P. J. (2001). Spatiotemporal predic- http://www.fftw.org/. tion for log-Gaussian Cox processes. J. R. Stat. Soc. Ser. Gamerman, D. (2010). Dynamic spatial models includ- B Stat. Methodol. 63 823–841. MR1872069 ing spatial time series. In Handbook of Spatial Statis- Brix, A. and Diggle, P. J. (2003). Corrigendum: Spatio- tics (A. E. Gelfand, P. J. Diggle, M. Fuentes and temporal prediction for log-Gaussian Cox processes. J. R. P. Guttorp, eds.) 437–448. CRC Press, Boca Raton, FL. Stat. Soc. Ser. B Stat. Methodol. 65 946. MR2730959 Brown, P. E., K˚aresen, K. F., Roberts, G. O. and Gamerman, D. and Lopes, H. F. (2006). Markov Chain Tonellato, S. (2000). Blur-generated non-separable Monte Carlo: Stochastic Simulation for Bayesian Infer- space–time models. J. R. Stat. Soc. Ser. B Stat. Methodol. ence, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. 62 847–860. MR1796297 MR2260716 Cooper, D. and Chinemana, F. (2004). NHS direct derived Gelfand, A. E. (2010). Misaligned spatial data: The change data: An exciting new opportunity or an epidemiological of support problem. In Handbook of Spatial Statistics headache? J. Public Health (Oxf.) 26 158–160. (A. E. Gelfand, P. J. Diggle, M. Fuentes and Cooper, D. L., Smith, G. E., O’Brien, S. J., Hol- P. Guttorp, eds.) 517–539. CRC Press, Boca Raton, FL. lyoak, V. A. and Baker, M. (2003). What can analysis MR2730964 of calls to NHS direct tell us about the epidemiology of Gelfand, A. E. (2012). Hierarchical modelling for spatial gastrointestinal infections in the community? J. Infect. 46 data problems. Spatial Statistics 1 30–39. 101–105. Gelfand, A. E., Diggle, P. J., Fuentes, M. and Gut- Cox, D. R. (1955). Some statistical methods connected with torp, P., eds. (2010). Handbook of Spatial Statistics. CRC series of events. J. R. Stat. Soc. Ser. B Stat. Methodol. 17 Press, Boca Raton, FL. MR2761512 129–157; discussion, 157–164. MR0092301 Geman, S. and Geman, D. (1984). Stochastic relaxation, Cressie, N. A. C. (1991). Statistics for Spatial Data. Wiley, Gibbs distributions, and the Bayesian restoration of im- New York. MR1127423 ages. IEEE Trans. Pattern. Anal. Mach. Intell. 6 721–741. Cressie, N. and Huang, H.-C. (1999). Classes of nonsepa- Gerrard, D. J. (1969). Competition Quotient: A New Mea- rable, spatio-temporal stationary covariance functions. J. sure of the Competition Affecting Individual Forest Trees. Amer. Statist. Assoc. 94 1330–1340. MR1731494 Research Bulletin 20. Agricultural Experiment Station, Dark, S. J. and Bram, D. (2007). The modifiable areal unit Michigan State Univ., East Lansing, MI. problem (MAUP) in physical geography. Progress in Phys- Geyer, C. (1999). Likelihood inference for spatial point ical Geography 31 471–479. processes: Likelihood and computation. In Stochastic Ge- Diggle, P. J. and Ribeiro, P. J. Jr. (2007). Model-Based ometry (Toulouse, 1996), (O. E. Barndorff-Nielsen, Geostatistics. Springer, New York. MR2293378 W. S. Kendall and M. N. M. van Lieshout, eds.). 22 DIGGLE, MORAGA, ROWLINGSON AND TAYLOR

Monogr. Statist. Appl. Probab. 80 79–140. Chapman & Lopez-Abente,´ G., Ramis, R., Pollan,´ M., Hall/CRC, Boca Raton, FL. MR1673118 Aragones,´ N., Perez-G´ omez,´ B., Gomez-Barroso,´ D., Gilks, W., Richardson, S. and Spiegelhalter, D. (1995). Carrasco, J. M., Lope, V., Garc´ıa-Perez,´ J., Markov Chain Monte Carlo in Practice. Chapman & Hall, Boldo, E. and Garc´ıa-Mendizabal,´ M. J. (2006). London. ATLAS municipal de mortalidad por c´ancer en Espa˜na, Girolami, M. and Calderhead, B. (2011). Riemann man- 1989–1998. Instituto de Salud Carlos III, Madrid. ifold Langevin and Hamiltonian Monte Carlo methods. Ma, C. (2003). Families of spatio-temporal stationary co- J. R. Stat. Soc. Ser. B Stat. Methodol. 73 123–214. variance models. J. Statist. Plann. Inference 116 489–501. MR2814492 MR2000096 Gneiting, T. (2002). Nonseparable, stationary covariance Ma, C. (2008). Recent developments on the construction of functions for space–time data. J. Amer. Statist. Assoc. 97 spatio-temporal covariance models. Stoch. Environ. Res. 590–600. MR1941475 Risk Assess. 22 39–47. MR2418410 Gneiting, T. and Guttorp, P. (2010a). Continuous pa- Mark, A. L. and Shepherd, D. H. (2004). NHS Direct: rameter stochastic process theory. In Handbook of Spatial Managing demand for primary care? International Journal Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes of Health Planning and Management 19 79–91. and P. Guttorp, eds.) 17–28. CRC Press, Boca Raton, Matern,´ B. (1960). Spatial Variation. Meddelanden fran FL. MR2730952 Statens Skogsforskningsinstitut, Stockholm. Band 49, num- Gneiting, T. and Guttorp, P. (2010b). Continuous pa- ber 5. MR0169346 rameter spatio-temporal processes. In Handbook of Spatial McBride, A. J., Athanazio, D. A., Reis, M. G. and Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes Ko, A. I. (2005). Leptospirosis. Current Opinions in In- and P. Guttorp, eds.) 427–436. CRC Press, Boca Raton, fectious Diseases 18 376–386. FL. MR2730958 Metropolis, N., Rosenbluth, A. W., Rosen- Gotway, C. A. and Young, L. J. (2002). Combining incom- bluth, M. N., Teller, A. H. and Teller, E. (1953). patible spatial data. J. Amer. Statist. Assoc. 97 632–648. Equation of state calculations by fast computing machines. MR1951636 The Journal of Chemical Physics 21 1087–1092. Greenland, S. and Morgenstern, H. (1990). Ecological Møller, J., Syversveen, A. R. and bias, confounding and effect modification. International Waagepetersen, R. P. (1998). Log Gaussian Cox Journal of Epidemiology 18 269–274. processes. Scand. J. Stat. 25 451–482. MR1650019 Haran, M. and Tierney, L. (2012). On automating Markov Mugglin, A. S., Carlin, B. P. and Gelfand, A. E. chain Monte Carlo for a class of spatial models. Available (2000). Fully model-based approaches for spatially mis- at http://arxiv.org/abs/1205.0499. aligned data. J. Amer. Statist. Assoc. 95 877–887. Hastings, W. K. (1970). Monte Carlo sampling methods Piantadosi, S., Byar, D. P. and Green, S. B. (1988). The using Markov chains and their applications. Biometrika 57 ecological fallacy. Am. J. Epidemiol. 127 893–904. 97–109. R Core Team. (2013). R: A Language and Environment for Johnson, M. A., Smith, H., Joseph, P., Gilman, R. H., Statistical Computing. Vienna, Austria. Bautista, C. T., Campos, K. J., Cespedes, M., Ripley, B. D. (1976). The second-order analysis of stationary Klatsky, P., Vidal, C., Terry, H., Calderon, M. M., point processes. J. Appl. Probab. 13 255–266. MR0402918 Coral, C., Cabrera, L., Parmar, P. S. and Vi- Ripley, B. D. (1977). Modelling spatial patterns. J. R. Stat. netz, J. M. (2004). Environmental exposure and lepto- Soc. Ser. B Stat. Methodol. 39 172–212. MR0488279 spirosis, Peru. Emerging Infectious Diseases 10 1016–1022. Roberts, G. O. and Rosenthal, J. S. (2001). Optimal Kelsall, J. and Wakefield, J. (2002). Modeling spatial scaling for various Metropolis–Hastings algorithms. Statist. variation in disease risk: A geostatistical approach. J. Sci. 16 351–367. MR1888450 Amer. Statist. Assoc. 97 692–701. MR1941405 Roberts, G. O. and Rosenthal, J. S. (2007). Coupling Ko, A. I., Reis, M. G., Dourado, C. M. R., John- and of adaptive Markov chain Monte Carlo al- son, W. D. Jr. and Riley, L. W. (1999). Urban epi- gorithms. J. Appl. Probab. 44 458–475. MR2340211 demic of severe leptospirosis in Brazil. Salvador Leptospiro- Roberts, G. O. and Tweedie, R. L. (1996). Exponential sis Study Group. Lancet 354 820–825. convergence of Langevin distributions and their discrete Levett, P. N. (2001). Leptospirosis. Clininical Microbiology approximations. Bernoulli 2 341–363. MR1440273 Reviews 14 296–326. Rodrigues, A. and Diggle, P. J. (2010). A class of Li, Y., Brown, P. E., Gesink, D. C. and Rue, H. (2012). convolution-based models for spatio-temporal processes Log Gaussian Cox processes and spatially aggregated dis- with non-separable covariance structure. Scand. J. Stat. ease incidence data. Stat. Methods Med. Res. 21 479–507. 37 553–567. MR2779636 Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data Rue, H. and Held, L. (2005). Gaussian Markov Random analysis using generalized linear models. Biometrika 73 13– Fields: Theory and Applications. Monographs on Statistics 22. MR0836430 and Applied Probability 104. Chapman & Hall/CRC, Boca Lindgren, F., Rue, H. and Lindstrom,¨ J. (2011). An ex- Raton, FL. MR2130347 plicit link between Gaussian fields and Gaussian Markov Rue, H., Martino, S. and Chopin, N. (2009). Approximate random fields: The stochastic partial differential equation Bayesian inference for latent Gaussian models by using in- approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 423– tegrated nested Laplace approximations. J. R. Stat. Soc. 498. MR2853727 Ser. B Stat. Methodol. 71 319–392. MR2649602 SPATIAL AND SPATIO-TEMPORAL LOG-GAUSSIAN COX PROCESSES 23

Silverman, B. W. (1986). Density Estimation for Statistics Wall, M. M. (2004). A close look at the spatial structure and Data Analysis. Chapman & Hall, London. MR0848134 implied by the CAR and SAR models. J. Statist. Plann. Spiegelhalter, D. J., Thomas, A. and Best, N. G. (1999). Inference 121 311–324. MR2038824 WinBUGS Version 1.2 User Manual. Wood, A. T. A. and Chan, G. (1994). Simulation of sta- Taylor, B. M. and Diggle, P. J. (2013a). INLA or tionary Gaussian processes in [0, 1]d. J. Comput. Graph. MCMC? A tutorial and comparative evaluation for Statist. 3 409–432. MR1323050 spatial prediction in log-Gaussian Cox processes. J. Woodroffe, R., Donnelly, C. A., Johnston, W. T., Stat. Comput. Simul. To appear. Preprint available at Bourne, F. J., Cheesman, C. L., Clifton- http://arxiv.org/abs/1202.1738. Hadley, R. S., Cox, D. R., Gettinby, G., Hewin- Taylor, B. M. and Diggle, P. J. (2013b). Corrigen- son, R. G., Le Fevre, A. M., McInery, J. P. and dum: Spatiotemporal prediction for log-Gaussian Cox pro- Morrison, W. I. (2005). Spatial association of Mycobac- cesses. J. R. Stat. Soc. Ser. B Stat. Methodol. 75 601–602. terium bovis infection in cattle and badgers Meles meles. MR3065481 Journal of Applied Ecology 42 852–862. Taylor, B. M., Davies, T. M., Rowlingson, B. S. and Zhang, H. (2004). Inconsistent estimation and asymptoti- Diggle, P. J. (2013). lgcp: Inference with spatial and cally equal interpolations in model-based geostatistics. J. spatio-temporal log-Gaussian Cox processes in R. Journal Amer. Statist. Assoc. 99 250–261. MR2054303 of Statistical Software 52 Issue 4.