<<

Anne Randi Syversveen

Spatial Models for Reservoir Characterization

Dr.Ing. Thesis

Department of Mathematical Sciences Norwegian University of Science and Technology 1997 DISCLAIMER

Portions of this document may be illegible in electronic image products. Images are produced from the best available original document. Preface

This thesis is submitted in partial fulfillment of the requirements for the degree “Doktor Ingenipr ” (Dr.Ing.). The work is financed by the Research Council of Norway under the Propetro program. My research has been carried out while I was a member of the group at the Department of Mathematical Sciences, a period I have greatly enjoyed. I thank my supervisor Henning Omre for all his support and guidance. I am grateful to my coauthors, Havard Rue, Jesper Mgller and Rasmus Waagepetersen for the good collaboration we had. I thank Jesper M0ller for inviting me to Arhus in the autumn 1995 and for his hospitality during my stay. I also thank Hakon Tjelmeland for many enlightening discussions. At last, I thank Knut-Andreas for his encouragement, love and care, but also for technical support.

Trondheim, November 1997

Anne Randi Syversveen Thesis outline

The thesis consists of the following articles:

I Conditioning of marked within a Bayesian framework. (With Henning Omre) In Scandinavian Journal of Statistics, Volume 24, Number 3, 1997.

II Marked point models for facies units conditioned on well data. (With Henning Omre) In Geostatistics Wollongong ’96, Volume 1. III An approximate fully Bayesian estimation of the parameters in a reservoir charac­ terization model.

IV Bayesian object recognition with Baddeley ’s delta loss. (With Havard Rue) To appear in Advances in Applied (SGSA) March 1998. V Log Gaussian Cox processes. (With Jesper M0ller and Rasmus Waagepetersen) To appear in Scandinavian Journal of Statistics. Appendix: A method for approximate fully Bayesian analysis of parameters. (With Havard Rue) In Communications in Statistics; and Computation, Volume 26, Issue 3, 1997.

Papers I—III are on stochastic modeling of in petroleum reservoirs, and they are the main part of the thesis. It is natural to read paper I before paper II and III, although they can be read independently of each other. Paper IV is on object recognition in image analysis and in paper V a special type of spatial Cox processes is discussed. Co mmon for all papers is the use of spatial point processes, see the classical books Stoyan et al. (1995), Daley and Vere-Jones (1988) and Diggle (1983). We start by describing how stochastic modeling is used in reservoir characterization. After that, a short presentation of each paper is given.

Background, on Reservoir Characterization

In order to forecast future oil production from a reservoir, the flow properties must be modeled. Flow modeling is usually done by solving a set of partial differential equations numerically in a flow simulator, see for example Aziz and Settari (1979) and Lake (1989). Parameters in these equations are reservoir characteristics such as permeability, porosity and initial saturation. These characteristics must be known everywhere in the reservoir

i in order to solve the equations, which in turn requires a description of the geology in the reservoir, for example the amount and distribution of different rock types or faults. However, there are very few observations of the geological properties of the reservoir. Observations can be collected from wells, but usually the number of wells is quite limited, due to high drilling costs. Other sources for information about reservoir characteristics are seismic data and production history, but the amount of observations from the reservoir is generally rather limited. On the other hand, extensive geological experience and knowledge are usually available. For example, geologists may have been studying outcrops which are believed to be analogs to the petroleum reservoir and they have knowledge about how geological processes develop. Because the information about geology is limited, a stochastic model is used to characterize the geologic properties of the reservoir. A realization from the stochastic model is used as input to the flow simulator, and oil production associated with the actual realization from the stochastic model is obtained. Repeating this several , the uncertainty in oil production associated with the stochastic model is investigated. In order to utilize all available information about the reservoir, geological knowledge as well as observations should be included in the stochastic model of the geology. This is best achieved by a Bayesian framework, where expert knowledge guides the model parameter­ ization and choice of prior distribution. Observations from the reservoir under study are incorporated through likelihood functions. Realizations from the posterior distribution are used as input to the flow simulator. In this thesis, only well observations are considered. Examples on models where seismic data and data from production history are included are found in Hegstad and Omre (1997), Bide et al. (1996) and Tjelmeland and Omre (1997). The models described in the first two of these references describe reservoir characteristics such as permeability and porosity, while the last mentioned paper describes a model for spatial distribution of rocks in the reservoir. Modeling of rock distribution is also a theme for this thesis, as mentioned below. The spatial distribution of rock types in a petroleum reservoir can be described in two different ways, either as a mosaic phenomenon or as an event phenomenon; see Hjort and Omre (1994), Haldorsen and Damsleth (1990) and Ripley (1992). In the first case, different rock types are randomly packed without a background, for example varying facies types in a shallow marine environment. Markov random fields are often used to model this, see Tjelmeland and Besag (1996). In the second case, objects of one or more facies types are randomly distributed in a background of one rock type, for example shale units in a sand matrix. Marked point models may be used to model this phenomenon. The central geoscience reference on this field is Haldorsen and Lake (1984), who described a simple marked point model for shales in a background of sand. This thesis focuses on modeling of event phenomena by marked point models, and the model described is an extension of the model discussed by Haldorsen and Lake (1984). Mosaic phenomena are not discussed. Marked point models are also used to model fractures, or faults in the reservoir, see for example Munthe et al. (1994) and Wen and Sinding-Larsen (1997). Faults are usually

n clustered, and marked point models with spatial attraction between points will be natural models.

Summary

The first three papers are on the reservoir characterization problem. In paper I a marked point model is defined for objects against a background in a two dimensional vertical cross section of the reservoir. The model extends the simple model defined by Haldorsen and Lake (1984) in several ways. The model handles conditioning on observations from more than one well for each object, it contains interaction between objects, and the objects have the correct distribution when penetrated by wells. The Haldorsen model did not have any of these properties. Haldorsen and Lake (1984) modeled each object as a rectangle, while we model the objects as rectangles with Gaussian random fields superimposed on top and bottom. This makes it possible to condition on observations from more than one well. Our model has pairwise repulsion between objects and stochastic number of objects. Chessa (1995) pointed out that penetrated objects tend to be larger than non-penetrated, and the simulation algorithm takes care of this. The model is developed in a Bayesian setting. As mentioned before, few observations of the reservoir are available. Therefore it is natural to use Bayesian models to incorporate geological (prior) knowledge. Observations available are top and bottom of objects in a limited number of wells. Moreover, it is assumed as known whether observations from different wells come from the same object or not. The Metropolis-Hustings algorithm is used to simulate from the posterior distribution. Because the well observations are assumed to be exact, the proposal kernels must be chosen carefully. In image analysis, proposals are often drawn from the prior distribution. This will not work well here, since the acceptance rate will be zero because of the structure of the likelihood function. However, we are able to calculate marginal posteriors for the objects, and proposals are drawn from these distributions. The model and the simulation algorithm is demonstrated on an example with simulated data. The ideas regarding conditioning on well observations are implemented in the commercial software STORM developed by Smevig Technologies a/s. Lia et al. (1996) give a description of the stochastic model used in STORM. They make a simple extension to three dimensions. Three types of objects are used, and one type is rectangular boxes with (two-dimensional) Gaussian fields on top and bottom. None of the object types are able to handle conditioning on all types of well configurations. For large objects and many wells, the conditioning might be impossible to fulfill. This is a possible for further research. A solution might involve Gaussian fields added to the sides of the objects too, not only top and bottom. This can make objects flexible enough to fulfill conditioning.

Paper II is written for a geostatistical audience, and presents almost the same model as paper I. However, one important extension is made: It is not longer assumed to be known

iii whether observations from different wells come from the same object or not. Furthermore, a real data example with observations taken from an outcrop is also presented. Paper III is about parameter estimation in the model described in paper I. Two parame­ ters are estimated, one related to the intensity of objects and one related to the interaction between objects. There are several methods for parameter estimation in pairwise interac­ tion point processes, see for example the review paper by Diggle et al. (1994). In order to use these methods, a point pattern must be completely observed. In the reservoir char­ acterization problem only well observations are available, so these methods do not work. However, prior information about parameter values may be available, and this suggests a Bayesian treatment of the parameter estimation problem. Then one can use for example the posterior mean or mode as estimator. A fully Bayesian analysis of the parameters can be done by Monte Carlo methods, but it turns out to be very consum ­ ing. Instead we propose a faster, approximate method for doing fully Bayesian analysis of the parameters in cases where the number of observations is quite limited. The method relies on the Monte Carlo maximum likelihood method described by Geyer and Thompson (1992). We show simulation examples with both simulated and real data.

In paper IV we consider another situation where objects against a background are modeled, namely object recognition in images. The work is motivated by the problem of recognizing cells in confocal microscopy images. A Bayesian framework is adopted, and we use a marked point model for objects against a background as first suggested by Baddeley and Van Lieshout (1993). Since the cells are nearly circular, each object is modeled as a deformed circle. The deformation is done in the same way as in paper I, by adding a Gaussian to the boundary of the circles. Stoyan and Stoyan (1994) describe a similar model, called the Tooth contour model. We use a very simple pairwise interaction function ensuring no overlapping objects. We assume that the image is observed pixel by pixel with independent Gaussian noise in each pixel. This means that the observations are of an entirely different type than in the reservoir characterization problem. In reservoir characteristics, we have few, but exact observations, while in image analysis there are lots of observations with noise. The Metropolis-Eastings algorithm is used to simulate from the model. The usual tran­ sitions we do when we simulate from a marked are adding, deleting and moving/changing objects. These transitions are sufficient for moving between two arbi­ trary configurations. In practice, however, we need two additional transitions, splitting and merging objects, when we have images with little noise. With our model, it is complicated to calculate acceptance for the split and merge transitions, and it is difficult to code. Rue and Hum (1997) have solved the problem in a more elegant way by using resolution varying templates. In image analysis, the problem is usually to estimate the true image, in difference from the reservoir characterization problem, where we wanted realizations from the posterior model. However, from the posterior model are interesting also in this case if the analytical expression for the posterior model is unknown. Realisations from the posterior

IV model can of course be used to obtain an estimate of the true image. In the Bayesian framework the concept of loss functions is used to create estimators. The optimal Bayes estimator is obtained by minimizing the posterior expectation of some given loss function. The maximum a-posteriori (MAP) estimator, corresponding to the zero-one loss function, is commonly used as estimator. However, this estimator does not work well in all situations. For example, we experienced that the number of objects may be wrongly estimated in images with much noise. Frigessi and Rue (1997) proposed to use Baddeley ’s delta metric as loss function in Bayesian image restoration and classification, resulting in the delta estimator. See also Rue (1996) for a description of the Delta metric. We use the Delta estimator in the object recognition problem. Examples with artificial images show that the Delta estimator gives better results than the MAP estimator, both with respect to the shape and number of objects. The intuitive explanation of this is that the MAP-estimate depends only on the mode of the posterior distribution, and not on how much probability this mode contains. The Delta estimator takes into account where and how the probability mass in the posterior distribution is located. Rue and Hum (1997) have extended the Delta estimator to object identification. They show how properties such as object type or grey level can be estimated by the Delta estimator.

All point models used in papers I-TV have pairwise repulsion between objects. In paper V we discuss a model for clustering, namely Cox processes, or doubly stochastic processes. Cox processes are formed as inhomogeneous Poisson processes with stochastic intensity function, see for example Stoyan et al. (1995). Paper V suggests a special form of Cox processes, called log-Gaussian Cox processes. In this case, the logarithm of the intensity function is a . The class of log-Gaussian Cox processes provides flexible models for clustering. The distribution of the log-Gaussian is completely characterized by the intensity and the pair correlation function of the Cox process, because these are in a one-to-one correspondence with the mean and covariance function of the Gaussian process. This fact makes parameter estimation quite simple. The log-Gaussian Cox process is also extended to the multivariate case, and point patterns consisting of different types of points can be modeled. Various types of positive and negative correlation structures between types of points can be modeled. A simulation algorithm for log-Gaussian Cox processes is described. Parameter estimation is done based on the pair correlation function. The log-Gaussian Cox model is fitted to two different data sets where the data are location of trees in a forest. In the first data set, only one tree type is present. The chosen model, with parameters estimated from the data shows good fit with respect to several characteristics. The other data set contains two different tree types. In this case we are not able to find a model that gives satisfactory fit to the data. The log-Gaussian Cox process is not able to model all kinds of clustered point patterns. At last, the paper describes how to predict the unobserved log-Gaussian intensity process under a given model. An empirical Bayes approach is used.

v References

Aziz, K. and A. Settari (1979). Petroleum Reservoir Simulation. Elsevier, London and New York. Baddeley, A. and M. Van Lieshout (1993). models in high-level vision. In K. Mardia and G. Kanji (Eds.), Statistics and Images, Volume 20, Chap­ ter 11, pp. 235-256. Abingdon: Carfax Publishing. Chessa, A. (1995). Conditional Simulation of Spatial Stochastic Models for Reservoir Heterogeneity. Ph. D. thesis, Delft University of Technology. Daley, D. and D. Vere-Jones (1988). An Introduction to the Theory of Point Processes. Springer-Verlag. Diggle, P. (1983). Statistical Analysis of Spatial Point Patterns. Academic Press. Diggle, P., T. Fiksel, P. Grabamik, Y. Ogata, D. Stoyan, and M. Tanemura (1994). On parameter estimation for pairwise interaction point processes. Int. Statist. Rev. 62(1), 99-117. Eide, A., H. Omre, and B. Ursin (1997). Stochastic facies modeling conditioned on seismic data. In E. Baafi and N. Schofield (Eds.), Geostatistics Wollongong ’96, Volume 1, pp. 442-453. Kluwer Academic Publishers. Frigessi, A. and H. Rue (1997). Bayesian image classification with Baddeley ’s delta loss. Journal of Computer Graphics and Statistics 6(1), 55-73. Geyer, C. and E. Thompson (1992). Constrained Monte Carlo maximum likelihood for dependent data. J. R. Stat. Soc. B 6^(3), 657-699. Haldorsen, H. and E. Damsleth (1990). Stochastic modeling. J. of Petroleum Technology, 404-412. Haldorsen, H. and L. Lake (1984). A new approach to shale management in field-scale models. Society of Petroleum Journal, 447-457. Hegstad, B. and H. Omre (1997). Uncertainty assessment in history matching and fore­ casting. In E. Baafi and N. Schofield (Eds.), Geostatistics Wollongong ’96, Volume 1, pp. 585-596. Kluwer Academic Publishers. Hjort, N. and H. Omre (1994). Topics in spatial statistics. Scand. J. of Stat. 21(4), 289-343. Lake, L. (1989). Enhanced Oil Recovery. Prentice Hall, Englewood Cliffs, NJ. Lia, O., H. Tjelmeland, and L. Kjellesvik (1997). Modeling of facies architecture by marked point models. In E. Baafi and N. Schofield (Eds.), Geostatistics Wollongong ’96, Volume 1. Kluwer Academic Publishers. Munthe, K., L. Holden, and P. Mostad (1994). Modelling sub-seismic fault patterns using a marked point process. In Proceedings from 4th European Conference on the of Oil Recovery, R0ros, Norway.

vi Ripley, B. (1992). Stochastic models for the distribution of rock types in petroleum reservoirs. In A. Walden and P. Guttorp (Eds.), Statistics in the Environmental and Earth Sciences. Edward Arnold. Rue, H. (1996). An introduction to Baddeley ’s delta metric. Preprint Statistics 4, Nor­ wegian University of Science and Technology, Trondheim, Norway. Rue, H. and M. Hum (1997). Bayesian object identification. Preprint Statistics 6, De­ partment of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway. Stoyan, D., W. Kendall, and J. Mecke (1995). Stochastic Geometry and Its Applications. Wiley. 2nd ed. Stoyan, D. and H. Stoyan (1994). Fractals, Random Shapes and Point Fields. Wiley. Tjelmeland, H. and J. Besag (1996). Markov random fields with higher order interactions. Preprint Statistics 1, Norwegian University of Science and Technology, Trondheim, Norway. Tjelmeland, H. and H. Omre (1997). A complex sand-shale facies model conditioned on observations from wells, seismics and production. In E. Baafi and N. Schofield (Eds.), Geostatistics Wollongong ’96, Volume 1. Kluwer Academic Publishers. Wen, R. and R. Sinding-Larsen (1997). Stochastic modeling and simulation of small faults by marked point processes and kriging. In E. Baafi and N. Schofield (Eds.), Geostatistics Wollongong ’96, Volume 1, pp. 398-414. Kluwer Academic Publishers. I Conditioning of Marked Point Processes within a Bayesian Framework i

Conditioning of Marked Point Processes within a Bayesian Framework

Anne Randi Syversveen and Henning Omre Department of Mathematical Sciences Norwegian University of Science and Technology

Abstract I i Shale units with low permeability create barriers to fluid flow in a sandstone j reservoir. A spatial stochastic model for the location of shale units in a reservoir is defined. The model is based on a marked point process formulation, where the marks are parameterized by random functions for the shape of a shale unit. This extends the traditional formulation in the sense that conditioning on the actual observations of ! the shale units is allowed in an arbitrary number of wells penetrating the reservoir. ' The marked point process for the shale units includes spatial interaction of units and allows a random number of units to be present. The model is defined in a Bayesian setting with prior pdfs assigned to size-shape parameters of shale units. The observations of shales in wells are associated with a likelihood function. The posterior pdf of the marked point process can only partially be developed analytically; the final solution must be determined by using the Metropolis-Hastings algorithm. An example is presented, demonstrating the consequences of increasing the number of wells in which observations are made.

! Keywords: Geostatistics, Bayesian statistics, Metropolis-Hastings algorithm, Reservoir evaluation.

1 Introduction

The primary objective of the evaluation of petroleum reservoirs is to forecast production of oil and gas. This involves modeling of fluid flow in porous media. In reservoir evaluation, the reservoir characteristics constituting the medium will not be completely known. The sources of information for determining these characteristics are usually observations in wells, seismic data and previous production history. The production forecasts are usually obtained by firstly generating a reservoir description and then simulating fluid flow through

1 this description. In order to obtain unbiased forecasts the variability in the reservoir characteristics has to be reproduced. Hence one objective is to generate samples from a stochastic reservoir description model which later on will be used as input to a simulator for fluid flow. By repeating this procedure, the uncertainty in the production forecasts can be assessed by sampling. The uncertainty associated with the reservoir characteristics contributes significantly to the uncertainty in the forecasts of production, see Omre et al. (1991) and Lia et al. (1996). In the petroleum related literature, stochastic modeling and simulation of reservoir characteristics have been important topics in the last decade, see Haldorsen and Damsleth (1990) and Hjort and Omre (1994). In the trend-setting paper of Haldorsen and Lake (1984) the problem of modeling and simulating shale units in a vertical cross section of a sand matrix was considered. A model of marked point type without spatial interaction, known as the Boolean model, see Stoyan et al. (1987), was used. The model fails to handle cases where any shale unit is penetrated by more than one well and there appears to be an error in the simulation procedure since the fact that the penetrated shales tend to be larger than the non-penetrated ones was not accounted for, see Chessa (1995). Moreover, important aspects like spatial interaction between shales and a stochastic model for the number of shale units were not discussed. The multi-penetration problem in Haldorsen and Lake (1984) is caused by the use of a marked point process with a low-dimensional mark . Each shale unit in the cross section is modeled by a rectangle. The heights of some shale units can be observed without error in the wells. Let the heights observed be denoted by o. Assume that o contains more than one height observation for at least one shale unit, and that these height observations are different. Under the simple model used in Haldorsen and Lake (1984) it is obvious that the pdf for the observations is /(o) = 0. The problem is that the parameterization of the model is not sufficiently flexible to reproduce the observations. This is a problem frequently experienced in spatial statistics. In this article, the model of Haldorsen and Lake (1984) is extended to account for condition ­ ing on shale observations in an arbitrary number of wells, and the simulation procedure is such that the length distribution is correct for both observed and unobserved shales. More­ over, the model contains spatial interaction, and the number of shale units is stochastic. An exact model and a simulation procedure for this are defined.

2 Model definition

The model presented is two-dimensional, but the idea can be extended to higher dimen­ sions. The spatial coordinates are (xi, a%), see Figure 1, and each shale will be modeled as a rectangle superimposed by a deformation of the top and bottom, denoted by, U = (C,S,Z(.)). (1) Here C = (C1, C2) denotes position of the center point; S = (L, H) is the length and height of the rectangle; and Z(-) = (Zg(-), Zr(-)) is a pair of random functions representing top

2 and bottom position. The height of the rectangle is the expected height of the shale, while the length of the shale equals the length of the rectangle. In order to simplify notation, let

»ii % *i

Figure 1: Model for a single shale. the parameter vector describing the rectangles be denoted © = (C,S) = (C\C\T, H). One can now write U = (©, Z(-)). The bounded reservoir domain V will contain many shale units, and the number of shales N is defined stochastically. The sedimentary process creating shale units in a sand matrix are such that one would expect that the units interact by repulsing each other. The model for N shales can be written U = (Ui,... ,Ujv), and the distribution of U will be developed in a Bayesian setting.

2.1 Prior model

The prior model for U = (Ui,...,U#) is defined by the pdf

f(9,n,z(-)) = f(0,n)f(z(-)\0,n). (2)

This pdf represents the family of all finite-dimensional densities for U. The pdf defining the geometry of shale units is determined by integrating over 0 and summing over n. The prior model for the rectangles © = (©%,..., ©#) and the number of shales N is firstly described, and thereafter the prior model for top and bottom position, Z(-) = (Zi(-),... ,Z#(-)) is presented.

3 The prior pdf for 0 and N, entailing only pairwise interaction between shales and 0 being unordered, is of the general form

/(0, n) = const x exp{]T) &(0£) + 5313 4%, 6j) + (3) i=i j=i i

/(0, n) = const x exp{J](6 1(c-, h) + 62(c?, hi)) + 53 d(c h cj) + no}. (4) t=l j=1 i

1 p{xj - x'{) X p(x[ - x'D 1 j ’

4 a2 is the variance, r is the covariance between top and bottom, and p(-) is the spatial correlation function. In order to ensure positive definiteness of the covariance matrix, we must have |r| < a2. Note that the expected top and bottom are symmetrical around C2. Assuming that the stochastic fields Zi(-)|0j = 6i in different shales are conditionally inde­ pendent, the following pdf is obtained for a reservoir containing N shales:

/(0,tz,z(-)) = f{9,n)f[f{zi{-)\6i). (7) t=i

2.2 Observations

The observations consist of exact observations of intersections of shales along the transects of m vertical wells. Well to well correlations through individual shales are also identified. Observations associated with shale number i consist of the indices of the m,- wells with locations wu,..., wmi penetrating the shale, and the observed depths to top and bottom of the shale (%;%), (%%)),..., (z;B (wmii), ZiT(wmii)). This is illustrated in Figure 1 for the case of two penetrating wells. The number of penetrated shales is v. In addition, the wells not intersecting a shale give information about the length of that shale. In the case of a shale not being penetrated by any well, this is in fact the only information available. Denote the observations described above by o. It is assumed that no overlap between shales is observed in the wells. Moreover, it is assumed that there is no conflicting information. For instance, if two wells are interpreted to penetrate the same shale, then all wells between them must penetrate the same shale. The observations carry considerable information about the expected height of the shales, H. Based on zw{wji) — ziT{wji), i = j = 1,...,m,-, and certain reasonable assumptions about independence between location and height, assessments of the prior parameters hh and a2H can be made along the lines of empirical Bayes, see Casella (1985). The assessments of the prior parameter values are approximate, but there are reasons to believe, however, that this gives a better prior guess than solely geological experience can provide.

2.3 The likelihood function

The observations of shale units are without error, hence the likelihood function can be expressed by

/(o|0, n, z(-)) = const x JJ 7(o,|%, z,-(-)) = const x I(o\8) JJ J(o;|z;(-)) (8) i=l t=l where 7(o;|#i, Zj(-)) is an indicator function taking the value one whenever the conditioning on shale number i is fulfilled, and zero otherwise, /(o|0) is one when conditioning with

5 respect to length of shales is fulfilled and zero otherwise, and /(oj|z,-(-)) is one when height observations of shale number i are reproduced and zero otherwise. Note that o; is the subset of the observations o which is relevant for shale number i. The posterior model is defined by the prior, equation (7), and the likelihood function, equation (8). However, in order to be able to simulate {8,n) and z separately, we want the posterior pdf on the form of equation (14), and we look at the likelihood function for (©,AQ, n /(o|0, n) = const x JJ /(o;|#t). (9) t=i Here /(o;|#,) is the likelihood for observing the observations related to shale number i given the model parameters From expression (7) it is obvious that the observations for different shales are considered to be conditionally independent. From the assumption that the horizontal and vertical dimensions in each shale are inde­ pendent in the prior, the likelihood function becomes

f{o\0,n) = const x JJ /(o £|c?, /i£) JJ /(o,-|cj, lj). (10) i=i j=l Here /(o £|cf, hi) is the likelihood related to the height observations in the wells penetrating shale number i, and I(oj\c),lj) is an indicator function taking value one whenever the conditioning in the horizontal direction is fulfilled for shale number j and zero otherwise. Note that /(o £|cf, hi) will be a multi-Gaussian pdf since it is directly related to f(z(-)\cj, hi) due to the observations being without error.

2.4 Posterior model

Since the horizontal and vertical parameters are separable in expressions (4) and (10), the posterior pdf for the model parameters can be written as

f(9,n\o) = const x f(9,n)f(o\9,n) = const x exp{^(6 1(c),Zi) + b2(c£,/i£)) + X)XMci>Cj) + na}f(°\9,n) i=l j=1 i ci) + na} (11) i'=2 j=l iv. For the number of shales less than the observed number, n < v, the likelihood function is zero, giving that /(0,n|o) equals zero. The prior term exp{6 2(c?,hi)} has been assumed to be bi-Gaussian and the corresponding likelihood is multivariate Gaussian, hence the posterior, const x exp{6 2(cf, h£|o)}, is bi- Gaussian. According to Omre and Halvorsen (1989), the mean and covariance are

is( ^ | O = o j = p + (FS)t(K + FSFt)_1(Z - F fi) (12)

6 and Cov ^ J|0 = oj=S - (FE)r(K + FSFr)-1(F£) (13) where p, = are prior means, S=f ^ a°l’H ) X a Ci,H a H J is the prior covariance matrix; Z = , ZiB(wmii),ziT(wu),..., ziT(wmii)) is a vector of length 2m,- containing the observations; T 1 ... 1 1 ... 1 F = 1/2 ... 1/2 -1/2 ... -1/2 is a 2to £ x 2 matrix; and is a 2mi x 2m, matrix with (k^y = a2p(wki — wji) and (k2)jy = rp(wki — W/,-). It can be shown that if C2 and H have independent prior distributions, that is equals zero, the posterior covariance matrix is a diagonal matrix, which means that C2 and H are independent also in the posterior distribution. For shales not penetrated by any well, the posterior distribution is identical to the prior distribution. Consider secondly the term exp{6 1(c), Zj|o)}. From expression (10), one has that it is equal to exp{6 1(c), l;)}/(o;|4, k). Denote the posterior pdf according to equation (7) and (8): f (8,n,z(-)\o) = f (6, n\o)f[f(zi(-)\6i,o). (14) »=i The conditional pdf of the geometry of an arbitrary shale given the model parameters and the observations, /(%;(-) |#;,o), represents a bi-Gaussian random function with given parameters and conditioned on a set of observations. This corresponds to the classical geostatistical model, which is straightforward to sample from, see Joumel and Huijbregts (1978). Consequently, it is the posterior pdf of the model parameters, /(0,n|o) that creates prob ­ lems, since the normalizing constant cannot be evaluated analytically. However, sampling from the posterior pdf can be done by the Metropolis-Hastings algorithm, see Metropolis et al. (1953) and Hastings (1970).

3 Sampling procedure

The sampling is done in two steps, and the procedure is as follows: First find a realization of the model parameters 0|O using a continuous version of the Metropolis-Hastings al­

7 gorithm. Then use this realization to generate a realization of Z{(-)|0j, O* for each shale, using geostatistical simulation. In practice Z,(-)|0j, Oj is represented by values on a grid. The objective of the first step is to sample from the distribution given by equation (11). A continuous version of the Metropolis-Hastings algorithm is used, see Metropolis et al. (1953), Hastings (1970), and Geyer and Mpller (1994). In the Metropolis-Hastings algo ­ rithm, the distribution to sample from is the limiting distribution it of a Markov chain with irreducible and aperiodic transition matrix P = {pij}. This matrix is decomposed into another irreducible and aperiodic transition matrix Q and an acceptance matrix a such that = aijQij. In order to get fast convergence of the algorithm, it is important to choose the transition probability matrix Q in a suitable way. In the continuous case, the transition matrices are replaced by transition kernels. The transition kernel Q chosen for this problem consist of three types of transitions: With probability 1 — 2/S alter one randomly chosen penetrated shale, with probability (3 add a new non-penetrated shale, and with probability /? delete a randomly chosen non-penetrated shale, where /? < 0.5 is a parameter in the algorithm. Due to the conditioning, it is impossible to delete or add a penetrated shale. In the case of deleting a shale, one of the non-penetrated shales is chosen uniformly. The transition kernel Q is constructed by drawing potentially new shales from a marked Poisson process with mark density close to the marginal distribution of that of a single shale. The interaction is corrected for in the acceptance probability. First introduce some new notation. Let exp{6 1(cj, Z,)} = f(c\,U), exp{6 2(cf, hi)} = /(cf, hi), const x exp{6i(cj, Z,jo)} = /(cl,Z,|o), and const x exp{6 2(cf,/i,jo)} = f(cj,Zi;|o). The posterior pdf /(cf, A;|o) for vertical location C2 and height if of an individual shale, was determined analytically in equation (12) and (13), and potentially new values for cf and hi are drawn from this density. In the case of non-penetrated shales, the posterior equals the prior. Now consider sampling horizontal location C1 and length L of an individual shale with no spatial interaction. The posterior density /(cl,Zj|o) is proportional to /(cj,Zj)J(o|c; ,Z;). The variables Cl and L for the potential new shale could be drawn from this distribution by using rejection sampling. However, this can be very time-consuming if the number of wells is large, so instead the following procedure is used. For unobserved shales, one draws from the prior f{c\,li). For observed shales, the interval for which /(cjjo) is nonzero is determined, and cl is drawn from a uniform distribution over this interval, denoted by /°(ci I0)- The length U is drawn from the prior conditioned on cl, /(Z;|cl). With this procedure, the acceptance rate becomes larger than when sampling from the prior distri­ bution is done, since drawing center-points that are unacceptable due to the conditioning is avoided. The acceptance probabilities will now be derived for each of the three transition types. For the case of altering a randomly chosen penetrated shale k, the proposed new shale is

8 drawn from, the density

9{0'k) = fief, 4l0)/°(4'l°)/(4l4')- (15)

According to the most common choice of acceptance rate in the Metropolis-Eastings algo­ rithm, we get the acceptance probability

ay mm{l,

4W ex P(£ (^(4, c0 - d{c k , q)))Z(ot|4', l'k)} min{l, (16) J \ck) l^k where f(i) and f(j) are the densities defined in expression (11) for the old and potentially new state respectively, and f(c\) is f(ck, lk) integrated over lk. For the case of adding a non-penetrated shale, the proposed new shale is drawn from the density gi^n+l) = /(Cn+l 1 hn+i)/(<4+1, ln +l). (17) According to the algorithm, the acceptance probability becomes n ay = min{l, expQT d(cn+1, c,) + a)/(o n+1|4+1, ln+1)/(n + 1- v)}. (18) i=i

For the case of deleting a shale k uniformly chosen among the n — v non-penetrated shales, the acceptance probability is

ay = min{l, (n - v)/exp(^) d(c k, c() + a)}, (19) l^k if n > v, and zero otherwise. Irreducibility of P is satisfied since all states obviously can be reached from an arbitrary state. It is easy to check that P is aperiodic, since states with period one exist. The Metropolis-Hastings algorithm for sampling from the posterior pdf /(0,n|o) of the model parameters is: Initiate by drawing the v penetrated shales from the posterior pdf in expression (11) with N = v and without interaction. The initial state must have positive posterior pdf value according to expression (11). If the interaction is very exclusive, more attempts may be required to obtain an initial state with positive posterior pdf value. Iterate a preassigned number of times:

• With probability 1 — 2/3: Select an arbitrary penetrated shale k. Generate new parameters for shale k from the density given by expression (15). Replace 9k by 6 k with probability ay given by expression (16). •

• With probability (3: Generate new parameters for shale n + 1 from the density given by expression (17). Add the new shale with probability ay given by expression (18).

9 • With probability /?: Select an arbitrary non-penetrated shale k. Delete the shale with probability ay given by expression (19).

The number of iterations must be sufficiently large such that the Markov chain converges to equilibrium. This is hard to verify, but certain statistics can indicate whether equilibrium is reached. One such statistic is the number of shales versus number of iterations. At equilibrium, the number of shales will stabilize. Step two utilizes standard procedures for sampling from conditional Gaussian random functions. Given a set of model parameters (0,n|o), realizations of (Z,(-)|@i = 6i, O, = o,), i = l,...,n, can be generated by conditional simulation as defined in geostatistics, see Journel and Huijbregts (1978). In practice, this provides a discrete representation on a grid of the realization of the conditional model. The observations are reproduced up to the discretization of this representation. For shales not penetrated by wells, the top and bottom surface will of course be simulated from the prior distribution given by expression (5) and (6). The two step sampling procedure described above will provide samples according to the posterior pdf for the geometry of the shales in the reservoir, given the observations in wells.

4 Simulation example

A simple simulation example is performed on a reservoir of size 100 x 18. One realization from the prior distribution is generated, and this is regarded as the “true” reservoir, see Figure 2. A varying number of wells is assumed to penetrate this reservoir, and realizations from the posterior distribution conditioned on the actual observations are generated. The prior model for the model parameters (&,N) is defined in expression (3) and (4). The parameters defining b(6i) are set to pc* = 10.0, <7^2 = 9.0, /j-h = 1.0, o# = 0.5 and = 0, moreover the pdf for shale length is a positive-truncated Gaussian pdf with Hl = 22.0 and

( -oo if |Xi — Xj\< 2.0 d{x i,xj) = \ In(kriM) if 2.0 < |xi — Xj\ < 30.0 (20) [O ' if \xi — ay | > 30.0

The shale intensity is defined by a = 3.5, corresponding to the expected number of shales in the domain being approximately 5. In expression (6), a = 0.48, r = 0.2, and the correlation function is defined to be p(x) = exp(-0.05|#|). It is natural to specify the parameters such that the probability of crossings of the top and bottom of the shales is ignorable. If it happens in practice, one will reject that realization and hence use a truncated model. In the Metropolis-Eastings algorithm the parameter /? is set to 0.2, and 4000 iterations

10 are performed. There are good reasons to believe that this is sufficient for obtaining convergence, and this will be documented later on. Based on the prior model, the “true” reservoir is generated, see Figure 2. The algorithm is demonstrated on three cases conditioned on three, seven and 19 wells respectively. For each case two independent realizations are generated, see Figure 3. The empirical Bayes approach is used in assessing the prior for expected shale heights, resulting in the priors fin = 0.82, On = 0.26; fin = 1.00, a% = 0.91; and fin = 0.97, cr% — 0.56 respectively for the three cases. The , yn#, seems to be reliably determined even for a low number of wells, while the corresponding variance, a2H, is less stable. In all the realizations in Figure 3, the observations of shale units in wells are reproduced. A small number of wells gives larger room for variability as can be seen from the upper row. A large number of wells ties the realizations to be almost identical, as in the lower row. The sampling is based on 4000 iterations in the Metropolis-Hastings algorithm. In Figure 4 the number of shale units versus iteration number is displayed for two runs with different initial states for the case with three wells. One initial state contains three shale units, which is the lowest number providing a positive posterior pdf value. The other initial state contains 15 shale units. At equilibrium the distribution of the number of shales is stabilized, hence there are reasons to believe that 4000 iterations are sufficient for obtaining convergence. The computer requirements on a medium size work station for generating one realization conditioned on 19 wells is approximately 80 cpu-seconds.

11 5 Conclusion

The stochastic model for shale units presented in the trend-setting paper Haldorsen and Lake (1984) is extended to account for conditioning on shale observations in an arbitrary number of wells, and the sampling procedure is corrected such that the length distribution of shales is reproduced. Moreover, spatial interaction and varying intensity are introduced. Commercial software modules based on the methodology are already developed. The mod ­ ules operate in three dimensions and several short cuts relative to an exact solution have been made. The insight gained from developing an exact solution in two dimensions have been highly beneficial, however. One could imagine solving the problem in an image analysis setting by defining a discrete lattice over the domain. A traditional (Mrf) prior could be enforced and the well observations could be assigned to the lattice nodes. This would have several shortcomings, however. Firstly, resolution will be lost by working on a lattice. Secondly, the general knowledge of shale shape can hardly be captured in a prior of Mrf type. Thirdly, the convergence rates in a Mrf model on a lattice is expected to be dramatically slower than for the marked point model, since the dimensionality of the former is much higher. Hence, a marked point model appears to be well suited to the problem at hand. Further work is pursued in order to weaken the assumptions concerning knowledge of well-to-well correlations, and to estimate more of the parameters in the model.

6 Acknowledgments

The first author was funded by a PhD grant from the Norwegian Research Council.

References

Baddeley, A. and Mpller, J. (1989) “Nearest-Neighbour Markov Point Processes and Random Sets.” International Statistical Review, 57, No 2, pp.89-121. Casella, G. (1985) “An Introduction to Empirical Bayes Data Analysis. ” The American Statistician, 39, No 2, pp.83-87. Chessa, A.G. (1995) “Conditional Simulation of Spatial Stochastic Models for Reservoir Heterogeneity ” PhD thesis, Delft University Press. Geyer, C.J., Mpller, J. (1994) “Simulation procedures and likelihood inference for spatial point processes. ” Scand.J.Statist. 21, No 4, pp.359-373. Haldorsen, H.H., Damsleth, E. (1990) “Stochastic Modelling.” Journal of Petroleum Technology, April 1990, pp.404-412. Haldorsen, H.H., Lake, L.W. (1984) “A New Approach to Shale Management in Field-Scale

12 Models. ” SPEJ, Aug. 1984, pp.447-57. Hastings, W.K. (1970) “Monte Carlo Sampling Methods Using Markov Chains and their Applications. ” Biometrika 57, No 1, pp.97-109. Hjort, N.L., Omre, H. (1994) “Topics in Spatial Statistics.” Scand. J. Statist. 21,No 4, pp. 289-343. Journel, A.G., Huijbregts, Ch.J. (1978) Mining Geostatistics. Academic Press, London. Lia, 0., Omre, H., Tjelmeland, H., Holden, L., Egeland, T. (1996) “Uncertainties in Reser­ voir Production Forecasts. ” Accepted for publication in AAPG Bulletin. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. (1953) “Equa ­ tion of State Calculation by Fast Computing Machines.” Journal of Chemical , 21, pp.1087-1092. Omre, H., Halvorsen, K.B. (1989) “The Bayesian Bridge Between Simple and Universal Kriging. ” Mathematical Geology, Vol.21, No.7, pp.767-86. Omre, H., Tjelmeland, H., Qi, Y., Hinderaker, L. (1991) “Assessment of Uncertainty in the Production Characteristics of a Sand Stone Reservoir. ” In Reservoir Characterization III eds. B. Linville, T.E. Burchfield and T.C. Wesson. Penwell books, Tulsa, pp.556-604. Ripley, B.D. (1987) , John Wiley & Sons, Inc., New York. Stoyan, D., Kendall, W.S., Mecke, J. (1987) Stochastic Geometry and Its Applications. Akademie-Verlag, Berlin.

13 18

Figure 3: Realizations from the posterior distribution conditioned on different numbers of wells.

14 ■ Ui I I !

Figure 4: The number of shales plotted against the number of iterations.

15 II Marked Point Models for Facies Units Conditioned on Well Data MARKED POINT MODELS FOR FACIES UNITS CONDITIONED ON WELL DATA

ANNE RANDISYVERSVEEN AND HENNING OMRE Department of Mathematical Sciences Norwegian University of Science and Technology N-7034 Trondheim, Norway

Abstract. A marked point model for modeling of facies units conditioned on observations from an arbitrary number of wells is presented. The model includes spatial interaction between units, and it does not require speci­ fication of well to well contacts. A simulation algorithm for the model is derived and demonstrated on real life data taken from an outcrop. Realiza­ tions from the model are compared to the outcrop. 1

1. Introduction

The paper by Haldorsen and Lake (1984) on stochastic modeling of shale units in a sand matrix is considered to be trend-setting for stochastic reser­ voir characterization. The model in the paper had several shortcomings, however. Since each shale was modeled by a rectangle, shale units pene­ trated by more than one well could not be represented. Neither did the model include spatial interaction between shales. Moreover, the simulation algorithm seems not to be exactly correct in the sense that shales penetrated by wells tend to be too short. In Syversveen and Omre (1997) we described a model that handles these shortcomings. Here we will present an extension of this model, which no longer needs information of whether observations belong to the same facies unit or not, as was assumed in Syversveen and Omre (1997). In the current paper the term shale unit according to Hal­ dorsen and Lake (1984) will be used. The model may be used for modeling any type of facies units in a background facies. In the example, sand units in a non-permeable background is modeled. 2 ANNE RANDI SYVERSVEEN AND HENNING OMRE

2. Model definition

Each shale is modeled as a rectangle with center position C = (C1, C2), length and height S = (L, H), and a deformation on top and bottom repre­ sented by a pair of random functions Z{-) = (Zg(-), Zx(-)), and we write

U = (C,S,Z(-)). (1)

The height of the rectangle is the expected height of the shale, while the length of the shale equals the length of the rectangle, see Figure 1.

Xl

ZnfwiO 2iT(v6i)

X/(C1,eC2) L ' ZiBtWiO^ ^(

Figure 1. Model for a single shale.

The reservoir contains an unknown number of shales N, which normally interact by repulsing each other. The model for a number of shale units in the bounded reservoir V will be developed in a Bayesian setting.

2.1. PRIOR MODEL

We let 9{ = (cf, c2, hi) denote the rectangle associated with shale number i, and z,-(-) denote its top and bottom positions for shale number i. The Marked Point Models for Facies Units Conditioned on Well Data 3 joint density for the N shales can be written

/(0, n, z(-)) = f(0t n)f(z(-)\0, n), (2) where 0 = ., 9n) and z(-) = (%%(-),..., z„(-)). The first term f{9,n) is a marked point field with pairwise interaction, see Stoyan and Stoyan (1994) of the form n n f{9, n) = const x expg 6(c), /,•) + 6(cf, ^i) + ^353 d (c" ci) + na )- (3) i=l j=l i

Here 6(c), k) + 6(c), hi) are marginal characteristics of the shales, d(c{,cj ) is a pairwise interaction function between center-points of shales, and the last term is related to the intensity of shales. The term exp(6(c), h{)) is defined to be a truncated bi-Gaussian pdf in order to be able to calculate parts of the posterior density analytically. The truncation is such that C2 only takes values within the reservoir, and H is positive. The mean value of C2 is chosen to be the center point of the reservoir in %% direction, and the variance will normally be chosen to be large, in order to approximate a uniform distribution over the reservoir. The mean and variance for H might be determined from the observations by an empirical Bayes approach, see Syversveen and Omre (1997). The term exp (6(c), Z,-)) can be chosen accord ­ ing to geological experience. The interaction function d(c,-, cj) will usually be defined such that shales repulse each other. The intensity parameter a might be estimated from the data by Monte Carlo methods, see Geyer and Thompson (1992) and Syversveen (1996).

The second term /(z(-)|0,ti) in equation (2) is related to the top and bottom position of shale number i. The function Z{(-) will, conditioned on ©i = 0{, be a second order stationary Gaussian random function with E(Zi(®i)|0i - 9i) - ( c? ^ hj/2 ) (4)

c°v (Zi{x 1), Zi(x'{)\&{ = 0i) = ( arR^x'{) ) (5) for 6 [cj ± U/2]. The matrix E($i,$") is given by

a 2 is the variance, r is the covariance between top and bottom, and p(-) is the spatial correlation function. In order to ensure positive definiteness 4 ANNE RANDI SYVERSVEEN AND HENNING OMRE of the covariance matrix, we must have |r| < a 2. Note that the expected top and bottom are symmetrical around C2. The correlation function, the variance and the covariance are chosen according to geological experience. Assuming that the stochastic fields £*■(•) |0; = 0; in different shales are conditionally independent, the following probability density is obtained for a reservoir containing N shales:

/(«,»,*(•))=/(»,«) n/(*••(•)!*)• (6) i=1

2.2. OBSERVATIONS AND THE LIKELIHOOD FUNCTION

The observations consist of the dimension of the rectangular reservoir and exact observations of the reservoir along the transects of ra wells. Observa­ tions of depths to top and bottom of shales are contained in the vector o. The location of the wells are denoted Wi, i = 1,..., ra. It is assumed that no overlap between shales is observed in the wells, and that it is unknown whether observations belong to the same shale or not. Hence, the likeli­ hood may be written as an indicator function, taking the value one when conditioning is fulfilled and zero otherwise. With this form of the likelihood function, it is impossible to do analytical calculations on the posterior distribution, since well to well contacts are unknown. We will therefore introduce a new stochastic variable T, which is implicitely defined by © and tells us which observations belong to the same shale. We call T the configuration of shales. To explain better what T is, we look at the three realizations in Figure 2, which have different configurations. For example, the three observations belonging to the same object in the “true” formation, appear as one, two, or three objects in the realizations from the posterior. Having introduced T, the likelihood function can be written

n /(o|0,n,z(-)) = const x JI/(o;|0t-,z;(-),t), (7) i=1 where /(o tj0;, z,-(-),t) equals one when the conditioning in shale number i is fulfilled and zero otherwise. The reason why t is present on the right hand side of equation (7), is that the knowledge of one shale unit only is not sufficient for identifying the configuration t. The posterior model is defined by the prior, equation (2) and the likeli­ hood, equation (7). However, we want to simulate (0, n) and z separately. Marked Point Models for Facies Units Conditioned on Well Data 5

Therefore we will define the posterior pdf in the form of equation (9), and we look at the likelihood function for (0, N), n f(o\e, n) = const x JJ /(o t-|0j, t) 2=1 vt n = const x JJ /(o;|cf, hi, t) JJ I{oj\c), lj, t). (8) i=l 3—1 Here /(o;|c?,/t;,£) is the likelihood related to the height observations in wells penetrating shale number i under configuration t, and I(oj\cj,lj,t) is an indicator function ensuring that the lateral conditioning is honored. The number vt is the number of penetrated shales within configuration t. Since the observations are without error, /(o,-|c|, hi, t) is directly related to f(zi(’)\Ci, hi) and is therefore a multi-Gaussian pdf. The second equality follows from the fact that the horizontal and vertical dimensions in each shale are independent in the prior distribution. In order to be able to calculate the posterior model analytically, it is im­ portant that the likelihood function can be factorized in conditionally in­ dependent parts, as we will see in the following sections.

2.3. POSTERIOR MODEL

Analogous to equation (6), the posterior model for (&,N,Z(-)) can be written as f(d,n,z(-)\o) = /(0,re|o)J]V(zi(-)|0;,o). (9) 2=1 The first term is derived as follows: /(0,?z|o) = const x /(0, n)f(o\0, n) n = const x exp{^(6(c-, k\o,t) + b(cf,hi\o,£)) 2=1 n + 5353 Cj) + na}I(p\t, n). (10) 3=1 i

3. Simulation from the model

The simulations are done in two steps. First the parameters 0 and the number of shales n are sampled from the density defined by equation (10) by using Metropolis-Eastings algorithm, see Metropolis et al. (1953) and Hastings (1970). Then the conditional Gaussian random fields Zi(-)\0i,o are sampled using conditional geostatistical simulation, see Journel and Huijbregts (1978). The transition kernel in the Metropolis-Eastings algorithm consists of three different types of transitions; adding an unobserved shale, deleting an arbi­ trary unobserved shale, and changing observed shales, including a change of configuration. When adding a new shale, the proposed new shale is drawn from the marginal prior distribution of the corresponding homogeneous marked point process. When changing observed shales, a configuration is drawn uniformly over the possible configurations. Given this configuration, each shale influ­ enced by the change in configuration is drawn from a marginal distribution conditioned on the data as described in Syversveen and Omre (1997). To draw the shales from the conditional marginal distribution is crucial for the convergence of the algorithm. The acceptance probabilities in the Metropolis-Eastings algorithm are de­ fined by the posterior pdf to be sampled from and these acceptance ker­ nels, and they are computed according to Syversveen and Omre (1997). The convergence of the algorithm do not pose critical problems since the model is defined in a relatively low dimensional space, see Syversveen and Omre (1997) for further discussion.

4. Simulation Example

In this example we model a part of the outcrop taken from the Menefee for ­ mation in the San Juan basin, Rocky Mountains, see Safari (1995), shown in Figure 2a. The outcrop is from a low to moderate sand-gross delta plain environment, and shows a cross section of meandering channel sandstones embedded in lower to upper delta plain mudstone, siltstone and coal. Hence the objects are sand and the background is nonpermeable shale. This is re­ garded as our “true” formation, and three artificial wells are added and shown as vertical lines in Figure 2a. Taking the observations from the wells into account, we generate realizations from the posterior model and com ­ pare them with the truth. Two laterally extensive objects are shown in a darker color than the rest. These are not modeled, because they have a geologic interpretation which is different from the objects in the model. Marked Point Models for Facies Units Conditioned on Well Data 7

The size of the formation is rescaled and taken to be 100 x 16 units, where one unit is about 11 meters. We have chosen the following prior distri­ butions for the parameters 0 = {Cl,C2,L,H), see equation (3). We let exp(6(c x, /)) = exp(6(c 1)+6(Z)) and exp(6(c 2, h)) = exp (b(c2)+b(h)); that is, C1, C2, L and H are independent. As mentioned in section 2.1, exp(6(c2, h)) is a truncated bi-Gaussian pdf. The mean and variance of C2 are set to fic2 =8.0 and = 122 respectively. With these parameter choices the truncated Gaussian distribution approximates a uniform distribution over [0,16]. Mean and variance of H are estimated by empirical Bayes calcula­ tions, and we get fijj = 0.63 and ajj = 0.02. We let exp(6(c 1)) be a uniform distribution over the length of the reservoir and exp(6(i)) be a mixture of two Gaussian distributions truncated at zero with parameters pli = 25, aLi = 152, fXL2 = 4, and 15.0 which gives a weak repulsion between objects. For the Gaussian fields on top and bottom of the objects we use correlation function p(x) = exp( —0.03a;2), variance a2 = 0.38 and the covariance r = 0.05 between top and bottom of objects.

The intensity parameter a is set to 7.0, which gives a reasonably good fit to the number of objects. In order to get a better model fit, this parameter could be estimated from the data, for example by Monte Carlo Maximum Likelihood, see Geyer and Thompson (1992) and Syversveen (1996). In Figure 2 the “true” formation is shown together with three realizations from our model. Observe that the conditioning is fulfilled in all wells. We see that the sand bodies observed in more than one well is represented as one or more sand bodies in the realizations from the model.

5. Closing Remarks

The current paper contains an extension of the model presented in Syversveen and Omre (1997). In these papers, the traditional Haldorsen and Lake model is extended to cover facies units with spatial interaction and multi­ well conditioning. The current model is demonstrated on real data from an outcrop and appears to reproduce the real geologic features reasonably well. 8 ANNE RANDI SYVERSVEEN AND HENNING OMRE

i i i in i i

ll

I#

a? IT ( | 1 \ /

'III!1 ' 1 1 et a o a m a o et a o o o t a o r r r r r r r r

Figure 2. The “true” formation together with three different realizations from the model. Marked Point Models for Facies Units Conditioned on Well Data 9

6. Acknowledgements

The research is funded by a PhD grant from The Research Council of Norway. We are grateful to the Safari project for giving us permission to use data from the Menefee formation.

References

C. J. Geyer and E.A. Thompson. Constrained Monte Carlo maximum likelihood for dependent data. J. R. Stat. Soc. B, 54(3):657-699, 1992. H.H. Haldorsen and L.W. Lake. A new approach to shale management in field-scale models. Society of Petroleum Engineers Journal, 447-457, 1984. W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applica­ tions. Biometrika, 57(1):97-109, 1970. A.G. Journel and Ch.J. Huijbregts. Mining Geostatistics. Academic Press, 1978. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A. Teller, and E. Teller. Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21:1087- 1092, 1953. Safari - Sedimentary Architecture of Field Analogues for Reservoir Information. Norsk Hydro, Statoil, Saga Petroleum, Norwegain Petroleum Directorate, 1995. D. Stoyan and H. Stoyan. Fractals, random shapes and point fields. John Wiley & Sons, 1994. A.R. Syversveen. An approximate fully Bayesian estimation of the parameters in a reservoir characterization model. In preparation, 1996. A.R. Syversveen and H. Omre. Conditioning of marked point processes within a Bayesian framework. To appear in Scandinavian Journal of Statistics, 1997. Ill An Approximate Fully Bayesian Estimation of the Parameters in a Reservoir Characterization Model An Approximate Fully Bayesian Estimation of the Parameters in a Reservoir Characterization Model

Anne Randi Syversveen Department of Mathematical Sciences Norwegian University of Science and Technology

Abstract

In order to investigate flow properties in oil reservoirs, stochastic modeling of ge­ ology is important. Stochastic models contain parameters which must be estimated based on observations and geological knowledge. What is special about the reservoir characterization problem is that the data are very scarce, and therefore give little information about the parameters. Modem Markov chain Monte Carlo methods can be used to do fully Bayesian analysis with respect to parameters. However, these methods will be very time consuming for the reservoir characterization model con­ sidered here. Therefore we propose a simple, approximate method for fully Bayesian estimation of the parameters. The method is illustrated on simulated and real data.

1 Introduction

One problem in evaluation of oil reservoirs is to characterize the geology in order to inves­ tigate the flow properties. An example is location of shales in a background of sand, where shales create barriers for fluid flow, and sandstone allows fluid to flow easily. Sources for information about reservoir characteristics can be well observations, seismic data, and pre­ vious production history. In the following we will consider only well observations. Usually, the number of wells is very limited, due to the high drilling costs. Therefore, data will be scarce. Figure 1 shows an outcrop taken from the Safari (1995) database, which is believed to be an analog to a non-observable oil reservoir. The goal is to create a model for the objects against background. In this example, the objects consist of permeable sandstone, while the background is non-permeable. Haldorsen and Lake (1984) described a simple marked point model for shales in a vertical cross section of a sand matrix. The model did not contain spatial interaction, and it was not possible to condition on observations from more than one well for each shale.

1 Figure 1: Outcrop from the San Juan basin, Rocky Mountains, showing a cross section of meandering channel sandstone embedded in lower to upper delta plain mudstone, siltstone and coal.

Syversveen and Omre (1997) presented an extended marked point model with pairwise spatial interaction between objects. The model, which is described in Section 2, could handle conditioning on observations from more than one well for each object. An important problem, which was not treated completely in Syversveen and Omre (1997), is estimation of parameters in the model. Some parameters may be given by geologists, others may be estimated based only on observations, and some parameters can be estimated from data combined with geological knowledge. In this paper we will consider the last case, and the parameters to be estimated are related to the number of objects (intensity) and the interaction between objects. Prior distributions for the parameters must be specified based on geological experience, and Bayesian estimators combining information from data and geologists are obtained. Because data are scarce, the parameter estimates will rely heavily on the prior distribution. There are several methods for parameter estimation in point processes. Three methods are compared and discussed in Diggle et al. (1994); that is approximate maximum likelihood (Ogata and Tanemura 1981,1984), pseudo-likelihood estimation (Besag 1978), and Takacs- Fiksel method (Takacs 1983, 1986), (Fiksel 1988). Pseudo-likelihood and Takacs-Fiksel method are special cases of a more general method described by Baddeley (1995). However, none of these methods are applicable for the model described in Section 2, because of hidden variables and scarce data. Instead we must rely on Markov chain Monte Carlo methods, like Monte Carlo maximum likelihood and simulated tempering, which will be discussed in Section 3. With simulated tempering, a fully Bayesian estimation of the parameters can be done. However, in our case, simulated tempering is very time consuming. Therefore, we propose a faster approximate method to do fully Bayesian estimation. The method works well because the data are scarce. The approximate method is presented in Section 4 and tested with simulated and real data in Section 5.

2 The Model

We present a two-dimensional model for cross sections of the reservoir. In practice, however, a three-dimensional model is needed, but the model can be generalized to 3D. Each object is modeled as a rectangle with center position C = (C1, C2), length and height S = (L, H),

2 Figure 2: Model for a single object. and a deformation on top and bottom represented by a pair of random functions Z (•) = (ZB{•), ZT{-)). We write U = (C,S,Z(-)), see Figure 2. The height of the rectangle is equal to the expected height of the object, while the length of the object equals the length of the rectangle. The deformation Z (•) on top and bottom, are Gaussian random fields present in order to fulfill conditioning, and is not in focus with respect to the parameter estimation problem discussed here. The prior pdf for N objects in a domain is defined in the following way: We let 6i = (c},cf,li,hi) denote the rectangle associated with object number i, and z,(-) its top and bottom positions. The joint density for the N objects can be written f(6,n,z(-)) = f(9,n)f(z(-)\9,n), (1) where 9 = (0i,... ,9n ) and z(-) = (zi(-),...,zn(-)). The last term f(z(-)\9,n), is Gaussian random fields, see Syversveen and Omre (1997), and will not be further discussed in this article. The first term f(9,n) is a marked point field with pairwise interaction (Stoyan, Kendall, and Mecke 1995) of the form

2 n n fa,r(d >n ) = "7—A exP{]C MCi, W + ^+5353 ^(Ci, Cj) + no} ' ’ i=1 j=l i

Here c{a, r) is an unknown function of the parameters a and r, the term 6i(c-, h)+b2(c-, hi) represents the marginal density for a single object if no interaction was present, and

3 d r(ci, Cj) is a pairwise interaction function between center-points of objects, defined as

ln(M) if \xi-Xj\ < r d T {Xi, Xj (3) 0 if \xi — Xj\ > r. It may have been more natural to relate the interaction to the size of the objects also, not only center-points, and this can be done. However, we believe that interaction between center-points models the interaction sufficiently. The parameter a in (2) is related to the intensity of objects. To be more specific, a is equal to ln(A|Z>|), where A is the intensity per unit area, and V is the area of interest. The main objective of this paper is to estimate the intensity parameter a and the interaction parameter r. Available observations are the dimension of the rectangular reservoir and exact observations of the reservoir along the transects of a number of wells. It is known which observations that belong to the same object. Observations of depth to top and bottom position of objects are contained in the vector o. Neither 6 nor n can be observed directly. The variable 6 is not observable because of the way it is defined in the model. Center-points and expected height of objects are never observed, even if the whole reservoir is observed. The number of objects n is not observable since only well observations are available, not the whole reservoir. The missing data and the fact that 6 is a hidden variable make the parameter estimation problem difficult. Taking observations into account, a posterior pdf for Z, 0, and N can be defined. See Syversveen and Omre (1997) for a definition of the likelihood function. The posterior pdf for 0, N is written as

fatr (e,n\o) =f(6,n)f(o\0,n)/c'o{a, r)

= exp| y^(6i(c 1 , k) + 62 (cf, hi)) 1=1 + 5353 ^(c,-, cj) + na\f(o\0,n)/c''{a, r)

j=1 i<3

= exp { h\o) + b2(cj, /i,-|o)) ^ 1=1 n + 53 53 dr ^ ci) + na\l{o\n)lc 0{a,r)

J=1 i<3 =ha ,r(0,n\o)/co(a, r). (4) The function I{o\n) is an indicator function which is equal to one when conditioning is fulfilled and zero otherwise, and d Q(a,r), d°(a,r), and c0(a,r) are normalizing functions. When a and r are given constants, we use the Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970) to simulate from the distribution defined by the pdf (4). Proposals and acceptance rates are described in detail in Syversveen and Omre (1997).

4 3 Monte Carlo methods for parameter estimation

In this Section we review two parameter estimation methods based on Markov chain Monte Carlo (MCMC) simulations. By the first method we obtain the maximum likelihood esti­ mator and the maximum a posteriori (MAP) estimator. With the second method we are able to sample from the posterior distribution for the parameters, and from this, estimators such as the posterior mean can be obtained. The basic idea behind MCMC simulation is to simulate an ergodic Markov chain, having the desired distribution tt(x) as its limiting distribution. The transition kernels can be defined in several ways, leading to different algorithms. Basic references in the field are Metropolis et al. (1953), Hastings (1970), and Geman and Geman (1984). Green (1995) describes how to do MCMC when the state space is a union of spaces of different dimensionality, and covers MCMC for spatial point processes discussed by Geyer and M0ller (1994). Good review papers on MCMC are Besag et al. (1995) and Tierney (1994).

3.1 Monte Carlo Maximum Likelihood (MCML)

Monte Carlo maximum likelihood (MCML) is described by Geyer and Thompson (1992) and Geyer (1994b). The method is well suited for normalized families of densities with pdf of the form (2), where c(a,r) is the normalizer of the family and ha>r(6,n) is the unnormalized density. MCMC methods can be used to generate realizations from this kind of distributions, as mentioned in Section 2. Suppose first that we can observe (

*ao,ro

The last term is equal to b(Sa0iro [^^^j]), since

(5)

This gives the following Monte Carlo approximation of Z(o,r)

(6) where (#i,n.i), (d 2,7%%),... are samples from /o o ,ro (0, n) generated by MCMC. The Monte Carlo approximation lao>ro(a, r) converges almost surely to l(a,r) for any fixed (a,r) when n goes to infinity. The speed of convergence depends on the variability in the terms in the sum in (6). There­ fore the approximation (6) to the log-likelihood becomes poorer when ||(ao,ro) — (o, r)||

5 gets large, and in practice the (a, r) value maximizing the log-likelihood is determined by iteration. The log-likelihood is calculated in a small area around (oo,ro), and the (a,r) value maximizing the log-likelihood is chosen as (a0,r0) in the next step. However, when we have hidden variables as described in Section 2, the first term in (6) can not be computed because {6,n) is not fully observed. We only observe o, which is a part of (9,n). Instead this term must be estimated by simulations from the posterior distribution (4). The log-likelihood is now

where the first expectation is obtained by integrating over the unobserved part of (<9, n) and the second expectation follows from (5). The Monte Carlo approximation becomes

4o,r 0(o,r) = ln(-^ KAehn*i\°) \ _ i f! Y' haA d i>ni) \ (8 ) 71 i=1 K,To(Oln i\°)' ha0,r0(di,ni)J

The first term contains samples of (0,iV) from the posterior distribution (4), and the second term contains samples from the prior distribution (2), both with parameter values (oo,r 0).

3.2 The MAP estimator

If the parameters a,r are given a prior distribution f(a,r), the MCML method can be extended to give the maximum a posteriori (MAP) estimator. Instead of maximizing the likelihood, we maximize the posterior density ratio f(a, r\o)/f(ao, ro|o) of (a, r) and (<%o,ro). Using the fact that /(a,r|o) oc L(a,r)f(a,r), where L is the likelihood function, together with (8), the expression to be maximized with respect to a,r is now

haAdlntlo) \ _ zl £20,1-0(°jr) — 53 haAQj,ni) \ . , / f{a,r) \ hao,r0(Oi,ni)J V(ao,r 0)/ (9) n i=i ha0,ro(dhni\o)J UE in the case with hidden variables. Note that /(00,7-0) can be omitted in the last term; it has been included as a normalizing term in order to get 4o,r 0(ao,? ’o) = 0 as in the MCML case.

3.3 Simulated tempering for parameter estimation

With the MCML method we are in practice not able to determine the whole likelihood function or posterior pdf, only the maximum is obtained. In the following we describe a method to determine the whole posterior distribution for parameters.

6 The goal is to sample from the joint posterior distribution of (0,n) and the parameters (a,r). Then the posterior distribution for (o,r) is found by integrating out (9,n). The joint posterior probability density function for (6,n ) and (o,r) is

f(6,n,a,r\o) oc /a,r(0,rc|o)/(a,r) = ha>r(6,n\o)f(a,r)/c 0(a,r) using (4). Since c0(a,r) is unknown, it must be estimated. This can be done by using reverse logistic regression, see Geyer (1994a). The idea is to sample from several distribu­ tions with different values of (a,r), and treat these as samples from a so-called mixture distribution. For the moment we simplify notation and set (6,n ) = x. The estimation procedure goes as follows:

1. For K different values of (a,r), generate N samples using MCMC,

xN+1, xN+2

2. Compute the estimator c = (c^b,. . . , C(0,r)K) for c(o,r) by reverse logistic regres ­ sion, solving the equations

where

3. Construct the unnormalized mixture density

k=i CM*

4. Threat x1,... , xKN as realizations from ho{x), and estimate c0(a,r) as

(10)

7 The sampling from f(Q,n,a,r\o) is now done by MCMC with two types of transitions, changing (0,ri) or changing (a,r). When (a,r) is changed, the acceptance rate contains the ratio c((a, r)new )/c((a,r)old ), which is calculated from equation (10). In practice, c(a, r) must be precomputed on a grid, since the sampling procedure will otherwise be too time- consuming. The concept of sampling from distributions with different parameter values is known as simulated tempering, see Geyer and Thompson (1995). Higdon et al. (1995) and Weir (1997) describe fully Bayesian analysis for images constructed from emission computed tomography data following the guidelines above.

3.4 Discussion

We will now discuss how the methods presented earlier in this Section apply to the model in Section 2. The MCML method may locate several local maxima, in which case it is difficult to de­ termine the one being the global maximum. On the other hand, the MAP estimator will give only one maximum, if the prior distribution is strong enough. This means that a geologist must provide sufficient information so that a reasonable prior distribution for the parameters can be decided. The best alternative however, is to use fully Bayes estimation, as described in Subsection 3.3, to determine the whole posterior distribution of the parameters. Unfortunately, this will be computationally expensive in the model considered here, because of hidden vari­ ables, few observations, and the fact that the parameter space is two-dimensional. Instead, we will use the MAP estimator and the MCML method to develop a fast method for de­ riving an approximation to the posterior distribution for the parameters. This is described in the next Section.

4 Approximate fully Bayesian estimation

As mentioned above, a very limited amount of data are available in the reservoir character­ ization problem. In order to estimate parameters, prior knowledge about the parameters is needed. In Subsection 3.2, we described how the posterior density ratio /(a,r|o)//(a 0,r0|o) can be determined in a small area around the fixed parameter values (oo,ro). If (oq , r0) is the mode in the distribution, that is, the MAP estimate, the posterior density estimated locally can be used to find an approximate analytical expression for the global posterior density. If we assume that the posterior distribution belongs to a given family of distributions, the parameters in the model can be estimated from local characteristics around the mode. For example, if we have only one parameter and the posterior distribution is assumed to belong to the one-dimensional Gaussian family, the expectation is given as the mode. The variance is determined from the second derivative of the logarithm of the posterior density,

8 using the fact that ^ln/($) = -l/a2. It is natural to assume that the likelihood belongs to a conjugate family of the prior distribution. Hence the posterior distribution belongs to the same family as the prior distribution, but with different parameters. The assumption about conjugate families holds as long as the data are weak; the effect of the data is to adjust the parameters in the prior distribution. Compared to the MAP estimate, we gain information about the uncertainty in the estimate. We are also able to obtain other estimators, such as the posterior mean. The computational costs are about the same as for the MAP estimate.

5 Examples

We will now discuss how to do approximate fully Bayesian estimation in the reservoir characterization model described in Section 2. The parameters to be estimated are the intensity parameter a and the interaction parameter r, which are given a joint Gaussian prior distribution. A reasonable procedure for chosing the prior mean is to first decide the r-value according to prior beliefs about interaction, and then find an appropriate a- value such that prior assumptions about the number of objects is correct. The a-value corresponding to the chosen r-value is obtained by simulations from the prior model. The covariance matrix should reflect the uncertainty in our prior guess. The simulated logarithm of the posterior ratio (9) can be written as 7 („ r\ _ r 1 exP CCE^r(Cfc)Cj) +72|q) 1 ao,ro ’ lm ^ exp (£ £ d T0(c* k, c)) + n-a0)J

Inf1 exp(EEdr(c&,Cf) + n

V P°1

d 2 log f (a, r\o) -1 (13) dr 2 -f)'

9 and d2log/(a,r|o) p dadr <7i<72(1 — P2)

Numerical values for the second derivatives are calculated from the simulations using stan­ dard finite difference approximations :

— a°^2 ' ^ =^2 PQo,r 0(a - hi,r) — 2iao,ro{a,r) + L0,r0(a + hi,r)] + O(hj) (15)

--- a°g°2 ’ ^ =^2 [Lo.ro (a> r ~ W ~ 2Zao ,ro (o, r) + Lo,ro( a>r + h2)] + 0(/l|) (16)

— ^adr' ^ =4hi/i2[Zao ’ro(a + ^1,r + + L0,r0(°~hi,r — h2) —

Lo.ro (a + hi,r - h2) — Lo.ro(« — hi,r + ft2)] + O(hf) + 0(h2) (17) where the error terms are functions of fourth order derivatives. Under the assumption that the posterior is Gaussian, the error terms are all zero, since fourth order derivatives are zero. Setting the analytical expressions (12) - (14) for the second derivatives of the posterior pdf equal to the empirical estimates based on equations (15) - (17) respectively, an estimate for S is obtained. If this procedure is repeated several times, an estimate of the uncertainty in the estimate for E is also obtained. This uncertainty is caused by Monte Carlo error. The rate of convergence for MCML depends on the variability in the terms in the two first sums in (11); the more variability, the slower convergence. We illustrate the variability in the second sum in (11) by simulations from the prior model with different values of oq and ro- We first use ao = 4.0 and ro = 30.0 and plot a histogram over the terms in the second sum in (11) from 10000 samples with a = 4.1 and r = 29.0, see left of Figure 3. To the right we have used a0 = 4.5, ro = 10.0, a = 4.6 and r = 9.0. All other model parameters are the same for the two cases. This difference in variability can be explained by the different variation in the number of objects, which causes different variability both in the interaction term and the intensity term. In the simulations in the left part of Figure 3 the number of objects varies between 2 and 12, while in the simulations in the right part of Figure 3, it varies between 15 and 38. Expression (11) is only calculated for values of a and r on a grid spaced by hi and h2 to avoid storing a lot of data. Since the posterior ratio is reliable only in a small area around (oq , ro), we iterate until oq and r0 maximize (11). Then we repeat the procedure with these values for a0 and ro in order to estimate the covariance matrix with uncertainty. The goodness of the approximation to the posterior distribution can be checked by compar ­ ing posterior density ratios /(a, r|o) //(a0, r0|o) for the (approximate) analytical expression with results from simulations, given by the exponential of expression (11). The comparison should be done for different parameter values a, r, oq , and ro-

10 Figure 3: Histograms for terms in the second sum in equation (11).

5.1 Example with simulated data

In our first example, we analyze a simulated dataset where the parameters a and r are drawn from their prior and then the configuration of the objects are simulated from the model to get our petroleum reservoir. The parameters in the prior mean is [4.0 30.0]T and the covariance matrix is (1.0 5.0 ) l 5.0 100.0 j - It is reasonable to introduce positive correlation between a and r, because the intensity of objects is influenced by the interaction parameter. Figure 4 shows the realization at left, and the observations available at right. We drill four wells at horizontal positions 20, 40, 60, and 80 and observe top and bottom of the objects in the wells. We use the discretization parameters hi = 0.1 and hi = 1.0 and collect 10000 samples, chosen as every 10th iteration after 1000 initial stabilizing iterations. This is repeated 10 times, all resulting in the same mode, and based on this, the posterior covariance matrix with uncertainties is calculated. The posterior distribution for a and r is approximately Gaussian, with mean ji = [4.1 30.0]T and covariance

( 0.62 ± 0.01 4.23 ± 0.09 \ ^ 4.23 ±0.09 66.4 ±2.1 ) with the uncertainty given as one standard deviation. We see that the mean is almost the same as in the prior distribution, but the data have modified the covariance matrix. The variances of a and r have decreased with respect to the prior distribution; that is, the observations have reduced the uncertainty about the parameter values. The goodness of the approximation to the posterior distribution is checked by computing the posterior density ratio for values of a and r close to oq and tq from simulations and from

11 Figure 4: To the left, realization from the prior model, regarded as the “true” reservoir. To the right, observations of top and bottom positions of objects collected from wells. Same symbol indicates same object. Simulations Diff. estimated density - simulations r\a 3.9 4.0 4.1 4.2 4.3 3.9 4.0 4.1 4.2 4.3 28.0 0.978 0.979 0.962 0.925 0.872 0.015 0.009 -0.002 0.006 0.014 29.0 0.985 0.999 0.992 0.966 0.923 -0.013 -0.008 -0.002 0.004 0.010 30.0 0.973 0.996 1.000 0.985 0.951 -0.011 -0.006 0.000 0.006 0.011 31.0 0.942 0.974 0.988 0.982 0.959 -0.010 -0.004 0.002 0.008 0.013 32.0 0.897 0.936 0.957 0.961 0.947 -0.011 -0.005 0.003 0.010 0.016

Table 1: Posterior ratio for example with simulated data with oo = 4.1, ro = 30.0. the estimated density. The simulated values are based on 10000 samples. We choose two different combinations of qq and ro, one at the mean of the posterior distribution, and one some distance apart from the means in both directions, see Table 1 and 2. When oq and ro are chosen equal to the mean values (Table 1), there are very small differences between posterior ratios obtained by simulations and the estimated posterior density. Also when oo = 2.8 and ro = 18.0 (Table 2) the differences are quite small, and the approximation to the posterior distribution seems reasonable also in this area. We will now discuss how the choice of prior mean influences the posterior mean. As mentioned before, the prior mean should be chosen such that the expected number of objects in the prior distribution reflects the prior beliefs. Many combinations of a and r will fulfill this. For each fixed value of r, we can find a unique maximum likelihood estimate for a. Some corresponding values of a and r are plotted in Figure 5. All these combinations of a and r give about the same number of objects, although there are small differences. If we consider the likelihood for both a and r, the likelihood is quite flat along

12 Simulations Diff. estimatec density - simulations r\a 2.6 2.7 2.8 2.9 3.0 2.6 2.7 2.8 2.9 3.0 16.0 0.617 0.704 0.777 0.830 0.857 -0.021 -0.021 -0.009 0.018 0.061 17.0 0.681 0.793 0.894 0.977 1.034 -0.008 -0.013 -0.009 0.011 0.046 18.0 0.733 0.869 1.000 1.116 1.208 0.011 0.002 0.000 0.011 0.037 19.0 0.772 0.931 1.090 1.240 1.369 0.034 0.023 0.016 0.019 0.036 20.0 0.797 0.975 1.160 1.342 1.510 0.059 0.048 0.039 0.037 0.045

Table 2: Posterior ratio for example with simulated data with ao = 2.8,7*0 = 18.0.

10 20 30 40 50 60

Figure 5: Plot of maximum likelihood estimates of a for different fixed r values. a curve fitted to these points, and the likelihood takes its maximum close to the curve. If the prior mean for a and r are chosen close to the area where the likelihood is largest, the posterior mean will be very close to the prior mean. On the other hand, if the prior mean is far away from this area, the data will correct the posterior mean in the direction where the likelihood is larger. The following examples will illustrate this. First we choose prior mean equal to [4.5 25.0]T and prior covariance matrix as before. This is a rather “bad” prior since the number of objects will be too high (the mean number of objects when simulating from the prior is about 10), and the prior mean is not close to the area of highest likelihood. The posterior mean becomes (j, = [4.25 28.5] r, which is closer to the area of highest likelihood. Then we chose the prior mean equal to [4.5 35.0]T, which is a more reasonable prior with respect to the number of objects, and the same prior covariance matrix as before. The prior mean is now in the area where the likelihood is largest, and the posterior mean becomes equal to the prior mean. The data are not in disagreement with the chosen prior

13 distribution, and therefore the posterior mean equals the prior mean.

5.2 Real data example

We now look at the outcrop shown in Figure 1. The outcrop is taken from the San Juan basin, Rocky Mountains, and shows a cross section of meandering channel sandstone embedded in lower to upper delta plain mudstone, siltstone, and coal. Hence, the objects are sand and the background is nonpermeable shale. The size of the formation is rescaled and taken to be 100 x 16 units, where one unit is about 11 meters. Three wells are assumed to be located at 20, 48, and 58 units, and from these wells we collect observations. In this example we assume that the interaction length is quite short compared to the size of the reservoir, so we choose the prior mean for r to be 20.0. The number of objects will be about 30, and this corresponds to an a value of about 9.0, so we choose the prior mean for a to be 9.0. The covariance matrix is set to

We still use the discretization parameters h\ = 0.1 and fig = 1.0. This time we have to generate much more samples than in the first example in order to get convergence. The reason for this is that the number of objects is larger, which gives a higher variability in the terms in two first sums of equation (11) when a and r vary. 150000 samples were enough to get a stable estimate of the mean. Because the sampling was very time-consuming, the uncertainty in the covariance matrix was not computed. The posterior mean is estimated to [8.8 24.0]r, and the covariance matrix is

1.06 2.6 E = ( 2.6 16.65

The posterior mean is not far from the prior mean, so the data do not disagree strongly with the prior distribution. We see that the variance of a is almost unchanged compared to the prior distribution, while the variance of r is reduced. The correlation has increased from 0.5 to 0.62. We have also compared the posterior density ratio obtained from simulations and from the approximative analytical expression. The difference is small, which support our approxi ­ mation to the parametric family of the posterior pdf.

6 Conclusion

A fast, approximate method is suggested for fully Bayesian analysis with respect to pa­ rameters in a situation where data contain little information and we have hidden variables.

14 In addition to point estimates for the parameters, the method gives us an estimate of the uncertainty in the point estimates. Comparison of the prior and posterior distribution for the parameters can tell us how much information the data give about the parameters.

7 Acknowledgements

I thank Havard Rue, Hakon Tjelmeland, and Henning Omre for helpful comments, and the Safari project for giving me permission to use data from the Menefee formation. The research is funded by a PhD grant from The Research Council of Norway.

References

Baddeley, A. (1995). Time-invariance estimating equations. Technical Report 22, De­ partment of Mathematics, University of Western Australia. Besag, J. (1978). Some methods of statistical analysis for spatial data. Bull. Int. Statist. Inst. 47{ 2), 77-92. Besag, J., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic (with discussion). StatisticalScience 10(1), 3-66. Diggle, P., T. Fiksel, P. Grabamik, Y. Ogata, D. Stoyan, and M. Tanemura (1994). On parameter estimation for pairwise interaction point processes. Int. Statist.Rev. 62(1), 99-117. Fiksel, T. (1988). Estimation of interaction potentials of Gibbsian point processes. Statis­ tics 19, 77-86. Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6(6), 721-741. Geyer, C. (1994a). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical report, School of Statistics, University of Min­ nesota. Geyer, C. (1994b). On the convergence of Monte Carlo maximum likelihood calculations. J. R. Stat. Soc. B 56(1), 261-274. Geyer, C. and J. Mgller (1994). Simulation procedures and likelihood inference for spatial point processes. Scand. J. of Stat. 21(4), 359-373. Geyer, C. and E. Thompson (1992). Constrained Monte Carlo maximum likelihood for dependent data. J. R. Stat. Soc. B 54(3), 657-699. Geyer, C. and E. Thompson (1995). Annealing Markov chain Monte Carlo with applica­ tions to ancestral inference. Journal of the American Statistical Association 90 (431), 909-920.

15 Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model determi­ nation. Biometrika 82(4), 711-732. Haldorsen, H. and L. Lake (1984). A new approach to shale management in field-scale models. Society of Petroleum Engineers Journal, 447-457. Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97-109. Higdon, D., V. Johnson, T. Turkington, J. Bowsher, D. Gilland, and R. Jaszczak (1995). Fully Bayesian estimation of Gibbs hyperparameters for emission computed tomog ­ raphy data. Technical report, ISDS, Duke University. To appear in IEEE-TMI. Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller (1953). Equation of state calculation by fast computing machines. Journal of Chemical Physics 21, 1087-1092. Norsk Hydro, Statoil, Saga Petroleum, The Norwegian Petroleum Directorate (1995). Safari; Sedimentary Architecture of Field Analogues for Reservoir Information. Norsk Hydro, Statoil, Saga Petroleum, The Norwegian Petroleum Directorate. Ogata, Y. and M. Tanemura (1981). Estimation of interaction potentials of spatial point processes through the maximum likelihood procedure. Ann. Inst. Statist. Math. 33B, 315-338. Ogata, Y. and M. Tanemura (1984). Likelihood analysis of spatial point patterns. J. R. Stat. Soc. B 46, 496-518. Stoyan, D., W. Kendall, and J. Mecke (1995). Stochastic Geometry and Its Applications. Wiley. 2nd ed. Syversveen, A. and H. Omre (1997). Conditioning of marked point processes within a Bayesian framework. Scand. J. of Stat. 24 (3), 341-352. Takacs, R. (1983). Estimator for the pair-potential of a Gibbsian point process. Technical report, Institut fur Mathematik, Johannes Kepler Universitet Linz, Austria. Takacs, R. (1986). Estimator for the pair-potential of a Gibbsian point process. Statis­ tics 17, 429-433. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). The Annals of Statistics 22(4), 1701-1762. Weir, I. S. (1997). Fully Bayesian reconstruction from single photon emission computed tomography data. Journal of the American Statistical Association 92(437), 49-60.

16 VI Bayesian Object Recognition with Baddeley ’s Delta Loss Bayesian Object Recognition with Baddeley ’s Delta Loss

Havard Rue* and Anne Randi Syversveen Department of Mathematical Sciences NTNU, Norway-

First version: October 1995 Revised: June 1996 Revised: October 1996 Minor Revision: January 1997

‘Address FOR CORRESPONDENCE: Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7034 Trondheim, Norway. E-MAIL: [email protected] and [email protected]

1 Abstract

A common problem in Bayesian object recognition using marked point process models is to produce a point estimate of the true underlying object configuration: the number of objects and the size, location and shape of each object. We use decision theory and the concept of loss functions to design a more reasonable estimator for this purpose, rather than using the common zero-one loss corresponding to the max­ imum a-posteriori estimator. We propose to use the squared A-metric of Baddeley (1992) as our loss function and demonstrate that the corresponding optimal Bayesian estimator can be well approximated by combining Markov chain Monte Carlo meth­ ods with Simulated Annealing into a two-step algorithm. The proposed loss function is tested using a marked point process model developed for locating cells in confocal microscopy images.

Bayesian inference ; Unsymmetric loss functions ; Object Recognition ; Template mod ­ els; Markov chain Monte Carlo methods ; Marked point processes ; D istance between images ; Confocal Microscopy Images

AMS 1991 Subject Classification : Primary 62M30, Secondary 60G35

2 1 Introduction

Bayesian object recognition is the problem of how to estimate the number of objects and their location in a non-ideal environment. The degree of difficulty varies with the degree of variation in size and shape of the objects, the number of types of objects present, and if they are allowed to overlap or not. One relatively easy example is to locate nearly circular cells on a slowly varying background, as in the confocal microscopy images we will study later. A more complex example is the estimation of location, orientation and shape of mitochondria and membranes in electron micrographs of cardiac muscle cells, studied by Grenander and Miller (1994). The objects have large variability in shape, and the images contain several types of objects. Industrial and practical applications can be found, within document reading and robot vision.

Object recognition belongs to the field of high-level imaging, in which the image modelling is on a more global scale and directly connected to the object types in mind compared to low-level imaging which deals with (smoothing) prior models on a pixel level. Recently, there has been a growing interest in this field; especially along the guidelines of Grenan- der’s general pattern theory using deformable templates. The original study is available in Grenander et al. (1991). The recent discussion paper by Grenander and Miller (1994) represents the state of the art of this approach, and points out future directions for the practical use of general pattern theory, now collected in a monograph (Grenander, 1993).

A template can be thought of as a continuous model for a single object; typically a polyg­ onal silhouette defining the contour of the object. The prior model focuses on how the object can be deformed from an ideal or typical outline, by effecting the length and ori­ entation of the boundary segments in case of a polygon. The observed data is most often an image with discrete pixel values available on a lattice. Even for perfect data, the (con ­ tinuous) object of interest is only available in a discretized version. Combining the prior model and the likelihood, which contains the information from the data, we obtain the posterior distribution for the object. The posterior distribution is used in any further inferential issues concerning the object within the Bayesian paradigm. Often, this step relies heavily on iterative simulation methods like Markov chain Monte Carlo (MCMC) methods or simulation (Grenander and Miller, 1994). (Refer to Tierney (1994) or Besag et al. (1995) for a general discussion of MCMC methods and to Geyer and Mpller (1994) for application to point processes.) Examples include a study of the silhouette of a hand (Grenander et al., 1991), a more challenging situation with a grey-level image of a hand (Amit et al., 1991), detection of defects in potatoes (Grenander and Manbeck, 1993), finding spiral structures in images of galaxies (Ripley and Sutherland, 1990), and modelling human faces (Phillips and Smith, 1994).

Lately, template models have been extended to handle the case when the number of ob­ jects is not known a priori', that is, the parameter space is a union of subspaces with varying dimension. Two approaches exist. Grenander and Miller (1994) suggest using

3 multiple-graph deformable templates and jump-diffusion simulation. Baddeley and Van Lieshout (1991; 1993) and Baddeley (1994) argue for using marked point processes (mpp), and MCMC or birth-death methods for simulation. The shape of each object is described by a mark, and the location of the object by a point. Markov point processes model interaction between objects, and a deformable template models each object. The two ap­ proaches seem to differ in complexity with an order of magnitude. In this paper we will therefore concentrate on the mpp-approach which seems well suited for our main objective: To obtain a good point estimate for object recognition models, by using decision theory and the concept of loss functions. Why is the concept of loss functions important within Bayesian object recognition? Assume for a moment that we will perform a simulation study for our object recognition model with a known true scene. After discretizing the continuous objects and adding noise to the image, we want to obtain a point estimate for the shape, size, location and the number of objects. Suppose that some estimates are available; for example, the maximum a-posterior (MAP) estimate, approximated by the mode of the posterior distribution determined by a stochastic search procedure like Simulated Annealing, and estimates found by applying other variants of steepest ascent and coordinatewise steepest ascent algorithms. We want to quantify the quality of the estimates by comparing them to the known true scene. For this purpose, we use a distance measure or metric d. When the true scene is not known, we can compare the posterior expectation of d for the various estimates. We then use d implicitly as our loss function, since this is how we have chosen to quantify the distance between two scenes. By minimising the posterior expectation of the loss d (the risk), we obtain the optimal Bayes estimator (OBE) with respect to d. This gives a lower value for the risk than using for example the MAP-estimate which is OBE with respect to the 0-1 loss function. It should be noted that an important assumption is made: We implicitly assume that we are able to compute the OBE for d. This may not always be possible. An example of the scenario is the simulation examples in Baddeley and Van Lieshout (1991) and Van Lieshout (1994), where various estimates are computed and the quality measured by using Baddeley ’s A-metric (Baddeley, 1992). The main contribution in our work is actually to demonstrate how we can compute (approximately) the OBE with the (squared) A-metric as our loss function. In this way, we will then be able to do a more coherent inference in object recognition models within the Bayesian framework.

The experience from low-level imaging is that more sensible loss functions than those lead­ ing to MAP, the posterior marginal mean and mode may give more reasonable estimates. See for example the results in Rue (1995) and Frigessi and Rue (1997) for image classifi­ cation (which seems most promising) and Rue (1997) for grey-scale images. See also Rue (1997) for a discussion of understanding loss functions within low-level imaging. The pro ­ posed loss functions have a local structure, for which the OBE can be well approximated by the two-step MCMC and Simulated Annealing algorithm proposed by Rue (1995,1996).

Baddeley ’s A-metric is of particular interest for object recognition, and has recently been

4 applied as a loss function in image classification (Frigessi and Rue, 1997). The A-metric measures the distance between images of (discrete or continuous) objects, and has several appealing theoretical properties. It is a good distance measure between images of objects and perhaps surprisingly, the computation of the approximate OBE is relatively easy and relatively fast. Even if the inferential part is the main objective in this paper, we will postpone the discussion of the A-metric and inference to Section 3 because we find that it is easier to explain the algorithm when we have one particular mpp-model in mind. To start with, we will therefore in Section 2 present a simple mpp-model for location of certain cells in confocal microscopy images before we return to the inference in Section 3. In Section 4 we discuss the MCMC algorithm for i) producing samples from the posterior with a focus on how to do efficient jump between various parameter subspaces by allowing objects to split and fuse, and ii) computing the (approximate) OBE. In Section 5 we present experiments with the proposed mpp-model, and compare the MAP-estimator with the A-estimator.

2 The Model

This section describes the marked point process (mpp) model we propose to use for locating cells in confocal microscopy images. The model is motivated from the cells or objects similar to those in the 365 x 387 image displayed in Figure 1. The model has some similarity with the mitochondria-model of Grenander and Miller (1994), but is simpler both in form and in the interpretation of the parameters. This simplification is possible because the cells are roughly circular. We will first describe the model for one object and then show how to integrate this into a mpp-model to handle an unknown number of objects with different size, shape, and location.

To describe our prior model for one object, we choose the origin as the centre of the object. As a first approximation, the object is assumed to be circular with stochastic radius r > 0, where the density of r is proportional to a Gaussian density with mean pT and variance of. To allow for local deviation from the circle we divide the circle into n z segments as displayed in Figure 2 and introduce z = (zi,... ,z„z). Each Zi is the deviation from the assumed circle along a line with angle 2v(i — 1 )/n z starting from the centre of the object. We take z to be a Gaussian vector conditionally on r, that is

z\r ~ N(0,of(r)S).

It is natural to choose S to be a Toeplitz circulant matrix with ones on the diagonal, to reflect invariance and stationary properties of a cell. The deviation of an object from a circle is slowly varying with an exponentially decaying correlation function pz(r) for 0 < t < 1/2, and pz{ 1 - r) for 1/2 < r < 1, i.e.

(S)v = pz (min(j - 1 ,n z-j + 1 )/n z) , j = 1,... ,n z.

5 The function pz{r) will be given later when needed. Let u(z,r) = v{z\r)v{r) be the density of the object. In order to separate the interpretation of r and z, we choose

(2.1) <(r) = f:r= such that the radius r is a hyperparameter that controls the overall size and shape, and z models the local variability. This choice ensures that a scaling of an object is invariant in z, since ,/2 v{r) "(z,r) = j.n k > 0, u(kz, nr ) v(nr) ’ only depends on r and k . It is often reasonable to not allow the objects to cross themselves, or even more restrictive, require that r + z, > 0, Vi. The choice in (2.1) makes such constraints easy to include, as the probability for a (random) configuration to be illegal does not depend on r. The conditional measure v*{z,r) = v{z,r\legal configuration) can then easily be found by estimating the probability for an illegal configuration from samples from the model. However, for reasonable values of the parameters the probability for an illegal configuration is small, so we do not pursue this question further in this article.

The next step is to include the model for one object into our mpp-model; refer to Baddeley and M0ller (1989), Baddeley and Van Lieshout (1993) or Van Lieshout (1995) for further details. Each object x has a location s = (sx, sy) 6 S, size and shape m = (r, z) G M. An object configuration is a finite unordered set x = {$i,..., xn} of objects X{ = (%, m;) G U, and let f2 denote the set of all configurations. For our case, U = S x M, where S' is a finite subset of the 1Z2 and M = 1Z+ x TZUz. A marked point process, or object process, is a point process on the class of objects U. The basic reference model is the Poisson object process in U. Let p = A v be a finite non-atomic measure on U, where A is the on S with A (S') < oo and v is the probability measure on M with u(M) = 1. Under the Poisson model, the number of objects has a with mean p(U). Conditionally on n objects, each object is distributed uniformly in U. The joint probability density of (xi,..., xn ) and n is defined by the density / relative to the Poisson object process, for instance by the pairwise interaction model

f{x, n) = a(3n ]H[ g{xi, xj). i

We will try to keep the model simple to make the interpretation of the examples in Section 5 with the new loss function as clear as possible. We do not model any interaction between objects apart from the non-overlapping property of the cells in Figure 1. We choose

, x ( 0, if object Xi and Xj overlap, 1, otherwise,

6 so our prior model becomes a hard core object process. We will comment on the importance of the interaction between objects in Section 6. To summarise, the prior model for x (and n) is p(x) = aPn exp( —A (S)) JJ v(mi) 1% dfa, Xj) ■ i i>j Our data is an imperfect image of discretized objects, {yt, t G T}, where T is a lattice and yt is the pixel-value at pixel t, typically binary or byte valued. We assume that the yt’s are conditionally independent given x, so the likelihood reads

(2.2) Kv\x) = II lt(yt\x) - ter It is often not clear what the likelihood should be for real data. This is mainly because our model is defined on properties of the images, such as the number of objects and a continuous object shape, and not on a pixel basis where the data is recorded. One approach that is often (quite) satisfactory, is the following: Define the silhouette {R(t; x), t G T} to be a binary image, and R(t; x) is zero if t is outside all objects and one if t is inside one object. The silhouette can be used to define the likelihood. Assume yt is a Gaussian variable with mean Co and variance a\ in the background area, and mean Ci and variance of in the foreground area. Then the likelihood is

1 Vt - CR(t;X) (2.3) i(y|®) = n ter aR(t;X) In the two synthetic simulation experiments in Section 5, this likelihood model is correct. For the confocal images like Figure 1 it is a little crude since each cell contains texture information. Although this texture information can in principle be included into the like­ lihood (see for example Geman et al. (1990)), it seems sufficient for our application to allow the mean level to vary between the cells, but to be constant within each cell. We can include this feature in our mpp-model by expanding the mark model to carry an addi­ tional variable C on each object, m = {z,r, C), where the (prior) density p(C) has support in Z)g say. Since we cannot distinguish cells with level near Co from the background, we choose Dg and a (not too small) positive constant 5, so £' — 8 > Co, VC' G Dg. We redefine the silhouette in the foreground area so Jf2(t; x) equals one in the interior of xi, two in the interior of x%, and so on, to make the likelihood (2.3) still valid. We have used this approach on the confocal images in Section 5 without any significant increase of the total computational cost.

3 Inference

This section discusses the inference for our model. Recall that our goal is to obtain a point estimate of x that includes the number of objects n as well. (The algorithm for computing the OBE is given in Section 4.2.).

7 Recall that S is a finite portion of the plane, and that T is the finite lattice for which our data is observed. Assume for simplicity that S covers T. Let p(wi, u%), ui, «2 € S, be a metric on S, for instance the Euclidean metric. Define d(ui,x) to be the shortest distance from a point u\ to the (contour defined by) the objects x measured by p. If u\ is inside, or on the boundary of an object, the distance is zero. Baddeley (1992) suggests to use the A-metric to measure the distance between two configurations x and x',

(3.1) Apw(x, x') = 53 |w(d(t, x)) - w(d(t, ®'))|Pj 1 < p < oo, where T' is a lattice covering S with a resolution which is sufficient to approximate the continuous integral. For simplicity and to keep the computational costs low, we use T' equal the lattice of our data T. In (3.1), w(-) is a concave and strictly increasing function, satisfying w(v) = 0 when v = 0. Furthermore, w(-) is eventually constant, i.e. there is a value vc such that w(v) = c, for all v > vc. Similar to Baddeley and Van Lieshout (1991) and Van Lieshout (1994) and for reasons discussed in Frigessi and Rue (1997), we choose p = 2 and (3.2) w(v) = min{c,v}, c > 0. We write d c(ui,x) for min{c, d(ui,x)} and A2(x,x) when we use (3.2). In (3.2), c is a truncation parameter that determines the degree of localness in A2(x,x)2. The contribu ­ tion to one of the terms in the sum (3.1) is only influenced by the object configuration within a circle with radius c centred at t.

Refer to Baddeley (1992) for a discussion of general topological properties of the A-distance that are important in the context of image analysis described in Serra (1982).

The A-metric is particularly suitable for our case, as it is natural to consider our objects as sets for which the A-metric was derived. The argumentation in Section 1 is our motivation for adopting the squared of (3.1) as loss function for configurations in ft, f A2c{x,x)2, if® eft, (3.3) [ +oo, otherwise. The loss function Lc(x,x) describes the loss, or cost, if we estimate x as x. Note that the loss for estimating n objects when there are n objects, is implicitly present in (3.3). The optimal Bayesian estimator (OBE) minimises the posterior expectation of the loss,

(3.4) ®a = arg min Earn/ Lc(x, x). $efi and we call it the A-estimator. By assigning an infinite loss if x £ ft, we implicitly require that ®a 6 ft. In Section 5 we will study how the estimates using this loss function compare with those obtained with the commonly used 0-1 loss, if x = x, (3.5) otherwise.

8 The OBE for (3.5) is the posterior mode, xmap - The A-estimator has a nice property in common with the MAP-estimator, it is invariant for reparametrizations of the model parameters. This is obvious, since the loss only depends on the shape of the objects and then only implicitly on the underlying parameters. It seems for us reasonable to define the loss on the shape of the objects, since we only see the objects and not the parameters.

To get a rough feeling of xa , consider a one-dimensional object with the only edge at position 6 with posterior distribution ir(9). The inside of the object is defined to be to the left of 9 and the outside to the right of 6. This is a simple model for one (isolated) node-value in Figure 2, r + % say, where the centre is fixed. For a continuous lattice T, we find that for small c, the OBE for 6 is

(3.6) 6a = median(fl) - c + 0(c2).

In comparison, the MAP-estimator is the mode of ir(6). We can expect the mode of 7r(6) to have more variability than the median. Thus we expect a smoother boundary for our objects using the new loss function, compared to the posterior mode, even if the real situation is considerably more complex than in this example. Another observation is that (3.6) shifts the OBE in the direction of the interior of the object. This in agreement with the more complex situation in Figure 4 displaying three objects A, A+ and A~. The A-distance between A and A+ is much larger than the distance between A and A-, as the extra part in A+ influence the distance from far more sites t compared to the reversed situation in A~. Assume that our prior knowledge prefers smooth objects similar to A. If the true object is A+, then the estimated object will most likely contain the additional part in A+ but slightly diminished. If the true object is A", then the estimated object can be more similar to the prior A. Refer also to Baddeley (1992) for more examples.

4 Algorithms

This section contains a description of the algorithms involved in this work. In Section 4.1 we discuss the MCMC algorithm for producing samples from the (posterior) model, and in Section 4.2 how we can compute (an approximation) to the OBE by using the ideas of the two-step MCMC and Simulated Annealing algorithm of Rue (1995).

4.1 Simulation from the Model

Markov chain Monte Carlo (MCMC) methods have been around for a while as the main tool for (iteratively) generating (dependent) samples from and exploring high dimensional (posterior) distributions. We will concentrate on the main ideas and the practically ne­ cessity of moving between different parameter subspaces by splitting and fusing objects

9 in an efficient way within the (recent) powerful jump-MCMC framework of Green (1995), leaving the details for the Appendix.

Let 7r(dx) be the target distribution of interest. In the MCMC methodology, we can randomly propose different ways to change the current state of the Markov chain x, and accept the move with a certain probability to maintain the correct equilibrium distribution. We construct a transition kernel P(x, dx') that is irreducible and aperiodic, so that ir(dx) is the equilibrium distribution for the chain. It is common to use transition kernels that are reversible with respect to tt, i.e., P(x, dx') satisfies the detailed balance (for all appropriate A and B) (4.1) f [ 7r(dx) P(x, dx') = [ f ir(dx') P(x', dx ). Ja Jb JbJa The detailed balance restricts P more than actually needed, but makes the design of P more tractable. The algorithm goes as follows. Let x be the current state of the chain. We then propose a move of type j that moves x to dx' with probability qj(x, dx'). This move is accepted with probability aj(x —t x') (Hastings, 1970),

f 71idx')qj(x',dx)\ (4.2) Ctj(x x') \ ’ 7 c(dx) Qj(x, dx') J ’ otherwise we stay in the old state x. This chain can then be simulated to obtain a dependent sample from (approximately) the target distribution.

For our mpp-model we use the following types of moves: move and scale an object, change n and parts of z,, and finally, birth and death of objects. These types of moves are sufficient for moving between two arbitrary configurations in fi. In practice, however, we need more available moves when we have good quality images like the one shown in Figure 1. Assume one object is on its way to cover two cells lying close to each other. The strong likelihood will in practice prevent the necessary series of single steps needed to get out of this deadlock: remove the object, and then give birth to two new objects (one in each cell). The solution is to do the necessary series of single steps in one new step. An example of this scenario is shown in Figure 7b, an example of an accepted split of an object covering two cells, using the guidelines below. (See also Grenander and Miller (1994) for examples of fusing and splitting of objects with a different model.)

When we move between subspaces giving simultaneous death to one object and birth to two new objects, we must be careful to maintain the correct equilibrium distribution. Green (1995) discusses reversible jumps between parameter subspaces in general, how to meet the reversible requirement and to calculate the acceptance probability aj(x -> x'). Jumps between subspaces make proposals of varying dimension, and cannot, be easily described by a density (as in (4.2)) in order to satisfy (4.1). The proposal kernel satisfies the following condition, 7r(dx) qj(x, dx') has a density ff(x, $') with respect to some symmetric measure C on the Cartesian product of the state space with itself. The acceptance probability

10 becomes (4.3) which reduces to (4.2) for x and x' in the same subspace. The ratio between the densities in (4.3) is called the Green ratio (Geyer, 1996). Practical computation of (4.3) involves computing an appropriate Jacobian for dimension-matching compared to (4.2), refer to the Appendix for details.

We will give a brief outline on how we propose to do efficient fusing and splitting of objects for our model. (The details are in the Appendix.) Figure 3 shows schematically how we propose to split an object into two objects. Let (s, (r, z)) be the parameters of the original object, and (si, {h,Zi)) and (s2, (r2, z2)) be the parameters for the two new objects. The required steps are as follows:

1. We propose an angle ui uniformly in [0,2?r), and let L be the line with this angle through s. Then we select two new centres si and uniformly on the line L in the interval s ± r, with the constraint |si — sg| = r. Figure 3 displays the case when u\ = 0.

2. The radii for the two new objects (ri and r^) are drawn from a truncated Gaussian distribution, say, where the parameters depend on the distances |si — s| and |s2 — s| to reflect the sizes of object 1 and 2 (see Figure 3). It is important that the radii are not too large to prevent overlap of the two new objects.

3. We propose to handle the deviations Z\ and z% in the following way. First draw two lines orthogonal to L through si and S2 (see Figure 3). This divides the object into three regions; the left, interior and right part. As the outer boundary is often well placed, it is of vital importance to keep the left and right part approximately unchanged after the split. We let Z\ take values to fit the polygon in the left part in Figure 3, and similarly for z2 iu the right part. This introduces small deviations which we interpret as random noise to ensure reversibility. The parts of Z\ and Z2 in the interior of the old object are updated from their Gaussian prior conditionally on the left and right part, respectively. (See also the comments in Section 5.3.)

4. For the confocal microscopy images the mean level of each object £ is a stochastic variable, with p(£) > 0 in D$. The new levels £i and £2 then become £ ± v where v is a with density truncated so that £ ± v e D$.

This split-move is quite useful in the following case as well. Assume an object is bom that covers only parts of a cell, and has a large part in the background area. This object is now likely to first be splitted into two objects, and then the object in the background area will be removed. Figure 7c shows an example.

11 To fuse two objects, we do essentially the same as described above with obvious adjustments like r = |si - s2|, and for the confocal microscopy images £ = (£1 + £%)/2.

The effort to code the proposed splitting and fusing of objects is significant, and required (for us) much more work than all the other basic steps in the MCMC algorithm altogether. However, the coding effort is rewarded by the improved convergence.

4.2 Computing the OBE

We will now explain how to obtain the corresponding OBE approximately by using the ideas of the two-step MCMC and Simulated Annealing algorithm of Rue (1995). Refer also to Rue (1997) for a general discussion of how to obtain the OBE and to Frigessi and Rue (1997) for a more detailed discussion using (3.3) in image classification. First, suppose that (A 4) h(x) = ! a exp (~Exiy L'(x’ ®)) ’ ® e Q \ 0, otherwise

(where a > 0,) is known and (practical) computable. We will in a moment explain how to meet this requirement. For technical reasons we consider only finite objects, i.e. the Lebesgue measure of M+ where M+ is the subset of M with a positive measure, is finite. Denote by 77 the uniform probability measure on M+. Define a Poisson object process with measure p! — where A' is the Lebesgue measure and A'(S) = 1. Interpret h(x) as the density with respect to p', and then a as a normalisation constant. (Note that h(x) > 0 if and only if $ £ fi.) Then xa is the mode of the corresponding probability distribution Pa (x ) and xa 6 fi. As (4.4) is computable, the mode, or at least good approximation of it, can be found using the standard technique of Simulated Annealing. Refer to Van Lieshout (1994) for a proof of convergence and technical details for objects models. The annealing is to generate samples using MCMC methods from the tempered distribution Pa (®)1/'6, as we let the temperature b tend to zero sufficiently slowly. At the end, we will observe samples uniformly on the set of solutions of (3.4). In practice, we let b tend to zero faster than theoretically required, with the consequence that we are not ensured to reach the global mode of pa (®) or equivalently the minimum of (3.4). Our experience is that good estimates of xa can be obtained even with fast cooling or some steepest ascent algorithm (Baddeley and Van Lieshout, 1993), meaning that this step need not to be the most computationally demanding step in our algorithm. The next step is to ensure that (4.4) is computable. We expand the square in Lc(x, x) to obtain (for x G Cl)

h(x) = aexp (-7^7Ea.|yJ2(d c{t,x) - d c(t,x))2 \ K I ter

12 = a'exp 53 4(<, *) (dc {t, x)-2 Ct)j , where (4.5) Ct — y d c{t,x), and the constants a and a' are of no interest regarding the OBE. We make use of the MCMC algorithm in Section 4.1 to estimate {Q,t E T}, by averaging over states a;(i)) $(2))... j a;W of the Markov chain, with the posterior as equilibrium distribution,

(4.6) 6 = iE4=(t,:cW). A A

The still holds, even if the samples are weakly dependent, thus Q converges to the correct value. So (4.4), with a finite value of c, is computable. Note that the computational cost will increase for increasing values of c. In practice, values of c in the range 3-7 seems reasonable.

Let us comment on the practical issue of how samples from Pa(x) can be generated. We suggest to use the same kind of proposals qj(x,dx') as for generating samples from the posterior distribution, see Section 4.1. This is because h(x) contains a lot of information about the posterior distribution through the £t’s. The computation of ®a is now easy, since we only need to replace p{x\y) by Pa(x), when calculating the acceptance rate aj(x —> x'), after, for instance a proposed fusing of two objects. An efficient algorithm for computing d c(t, x) can make use of the fast algorithm of Borgefors (1986) to compute distances from a site in a discrete lattice to the nearest foreground site, say. Some modifications are however needed, because the contour of our objects are continuous.

To summarise, the algorithm proceeds as follows. First, run a Markov chain with the pos ­ terior as the equilibrium distribution and estimate {Ct,t €T}. Secondly, run a Simulated Annealing algorithm on Pa (®)> to obtain a good approximation to ®a - Note the absence of the mpp-model itself, which makes the algorithm trivial to apply to other mpp-models.

5 Examples

This section contains some simulation experiments with the proposed mpp-model and a comparison between the A-estimator and the MAP-estimator. We start with an example with a single object and then move on to an example with an unknown number of objects. Finally, we apply our model to a real confocal microscopy image and show examples of the fusing and splitting of objects discussed in Section 3.

Our MCMC algorithm is implemented following the guidelines in Section 3. We have tuned the parameters to speed up the computation, for instance how often we shall propose to

13 give birth to a new objects etc., to obtain a reasonable balance between the different kinds of moves as it appears on the screen. The {(*, t 6 T} were estimated using (4.6) and a large number of samples from the Markov chain. We got essentially the same results when we increased the number of samples. The Simulated Annealing algorithm used for computing the MAP and the A-estimator (with an Euclidean metric p) used an exponential cooling schedule. In all the examples, we used as S the continuous lattice T normalised to X(S) = 1. The parameters in the model are selected by trial and error, but the results are not too dependent on the parameter values if they are reasonable for the objects in the image.

5.1 Simulation Experiment 1

The purpose of this first experiment is to show a typical example of how the A-estimator estimates the shape of an object in comparison to the MAP-estimator. We assume it is known in advance that the image contains only one object.

Our image has size 32 x 32, with a near circular object of pixel value 1 on a background with pixel value 0 (Figure 5a). We add iid Gaussian noise with mean zero and variance 1 (Figure 5b). We choose the parameters in the model as: pr — 5, of = 0.52, n z = 30, pz(r) = exp(-r/r c) where rc = 1/4 and erf = 0.052. Figure 5c and Figure 5d display the estimated MAP and A-estimates using c = 5.

The results from this example demonstrate a typical behaviour. The contour of the object for the MAP-estimate is more influenced by the noise resulting in a more wiggly contour of the object. The A-estimate is more stable as it is a compromise candidate, and gives a smoother estimate. Our experience with similar kind of examples, is that the A-estimator gives better, or equally good estimates of the contour of the objects, most prominent for moderate to bad data. The MAP-estimator gives (too) often an impression of capturing non-existing details in the object (due to the noise), that are not present in the A-estimate.

5.2 Simulation Experiment 2

In this experiment, the number of objects is unknown a priori, and we want to study how the number of objects are estimated using the MAP and the A-estimator.

Our image is of size 64 x 64 with three nearly circular objects of pixel value 1 on a background with pixel value 0 (Figure 6a). Figure 6b displays the result after adding iid Gaussian noise with zero mean and variance 1. The parameters in the model were: 0 = 20, p,r -- 4.0, of = 0.52, n z — 30, pz{r) — exp(—r/rc) where rc = 1/4 and of = 0.052. Figure 6c and Figure 6d displays the estimated MAP and A-estimate using c = 5.

The MAP-estimate yields four objects, while the A-estimate recovers the correct three

14 objects, even if the prior model through the parameter ft suggests more objects. For small objects measured as bad data, it is likely that there are parts in the background area that might seem like a small object. Our experience with similar kind of examples, is that the MAP-estimator frequently, around 50% of the cases with new realizations of the noise in this particular example, tends to estimate more objects than there actually are. The A-estimate is more stable with respect to the number of objects and seldom overshoots. If the MAP-estimate fails to locate an object, the A-estimate is likely to fail too. Even if it is easier to estimate too many objects when the size of the objects is small, the same will happen for larger objects with a proper scaling of the noise variance. The intuitive explanation of why the MAP-estimate contains spurious objects, is that the MAP-estimate depends only of the mode in the posterior distribution and not on how much probability the mode contains. It is where (and how) the probability mass in the posterior distribution is located that is and should be important for the inference, not necessarily where the mode in the posterior distribution is. Realizations from the posterior distribution strongly indicate this fact, and support our claim of a higher reliability of the A-estimate compared to the MAP-estimate.

5.3 Application to a Confocal Microscopy Image

Finally, we apply our mpp-model to the real confocal microscopy image in Figure 1. It appears to be an easy task, since all the cells are clearly visible. However, our mpp- approach required the splitting and fusing moves in Section 4 to obtain reliable results. We have displayed some of the moves that appeared in our trials to check for reliability of the approach. Figure 7a displays two misplaced objects that fuse together and after a short while and further developing, they split (Figure 7b). The situation in Figure 7c is the one occurring most frequently. Here a misplaced object is divided in two, and soon the object in the background area dies. Two objects may be bom into the same cell and after a short while they fuse into one object (Figure 7d).

An estimate of the location and shape of the cells in Figure 1 is displayed in Figure 8. We estimated the parameters in prior distribution from a set of similar images by hand­ tracking the cells. The correlation function pz was found to be well approximated by

(5.1) pz(r) = exp( —t/tc) cos (4ttt) .

We chose tc = 1/3 as a conservative estimate to avoid introducing too strong dependency structure in z for those of the cells with a slightly different correlation structure than (5.1). When we split an object, we obtain better splits using the correlation function pz(r) — exp(—t/tc) instead of (5.1) to propose new values for Z\ and in the interior part of the object. This is because (5.1) fits well for one cell but is not always appropriate for the two new objects (see Figure 3) that may not (yet) have a fully developed shape as a cell.

The proposed model seems to work reasonably well. For the good data in Figure 1, samples

15 of the Maikov chain after an initial period and often some crucial moves like those displayed in Figure 7 are very good. Thus, the MAP-estimate, the A-estimate and simply one late sample of the Markov chain are all nearly equal to Figure 8.

6 Discussion

We end this paper by some comments on what the important parts in our approach are. For real applications with good images like Figure 1, it is simply a reasonable model for an object combined with a good simulation algorithm that does the correct types of moves within the MCMC framework. Figure 8 is an example of good data where various estimators will not differ much from a late (perhaps cooled) sample from the posterior distribution. When the quality of our data decreases, the object model is again important, but also what kind of loss function we use. Based on our experience, the A-estimator is to be preferred to the MAP-estimator in such situations, with a slightly increase of the computational and coding cost.

There is one part missing in our mpp-model, that is, an object interaction term in addition to the hard core part. We think that for situations where the objects are small (and more like points), where we have bad data, or we have strong prior knowledge of how the objects interact, an interaction term can give significantly improved results. Otherwise, we can concentrate on the object model and a MCMC algorithm that handle jumps between parameter subspaces in a good way, as the basis for doing inference and producing point estimates with the A-estimator.

Acknowledgements

The authors thank Henning Omre and Amoldo Erigessi for helpful discussions, and the referee for helpful comments and suggestions. The second author was funded by a Ph.D grant from the Research Council of Norway.

16 A Details of the MCMC algorithm

In this section we give the necessary details for the MCMC algorithm in Section 4, for simulation from the posterior distribution. First we discuss moves that involve a change in the parameters of one object, then the birth and death move, and finally the split and fuse move. We use (4.2) in the first case and (4.3) in the remaining move types as they involve a change in dimension. For simplicity we assume that A(S) = 1, and denote by x and x' the old and (possible) new state of the Markov chain, respectively.

A.l Move Type: A Change in an Object

The first move type proposes a change of the parameters in a random object X{ of random type k, keeping the other objects unchanged. We sample new parameters (s'-, m() from some proposal density % that typically depends on the current parameters of the object, Qk(s'i, m^ls,-, mi). The types of changes are i) change parts of the deviations z,-, ii) move the object by moving the centre-point s, and in) change the radius r,-. For all these changes, the acceptance rate follows directly from (4.2) 1 "Ml l(y\x') qk{Si,Tni\s'i,m^ ak(x -> x') = min ’z/(m,-) l(y\x) qk(s'i,m'i\si,mi)'n. j:iiig(xj,Xi))

For our hard-core process, the term Y[j^ig{xj,x'i) is either zero or one corresponding to an illegal or legal configuration x', and Hj^.ig(xj,xi) is equal to one as x is a legal configuration.

A.2 Move Type: Birth and Death of Objects

We will follow Geyer (1996) when we explain the birth or death of an object. The ac­ ceptance ratio for these move types follows from (4.3) as they involve a change in the dimension. Let Pr(birth) and Pr(death) be the probability of proposing a birth or a death move, respectively. Assume for simplicity that we sample the new object xn+ i from the joint prior density of the position and the mark u, to obtain the new state x' — {x, z„+i}- In the (reverse) death move, we propose to remove a random object xn+ \ say, from the configuration x'. To compute the Green ratio (4.3), we need to identify the symmetric mea­ sure ( and the densities fG(x,x') and fG(x',x). The measure ( consists of two non-zero parts, the birth-part which is Lebesgue measure on (A.l) D+ = {{x,x') G Un x Un+1 ■.x'i = xi,i = l,...,n}, and the death-part which is Lebesgue measure on (A.2) D~ = {($', a:) G Un+1 x C7" : x'i = z,-, i = 1,..., n}.

17 The density fG(x,x') for the birth-part is

Pn II p(mi) Kv\x) K^n+i) Pr(birth) i>j i=l for (x, x1) G D+ and zero in D~. The death-part has density fG(x',x)

71+1 0n+1 II 3(xi,Xj) JJ v(rrii) l(y\x') Pr(death)/(n + 1) i>j i=l for {x\ x) G Dn and zero in D+. Equation(4.3) gives now the Green ratio for a birth of a new object, /3 l(y\x') Pr(death) n -^birth — n + 1 l(y\x) Pr (birth) i=l A new object is accepted with probability min(l, J?birth), and a random object is removed with probability min(l, R^h-th)-

A.3 Move Type: Split and Fuse

The details for the split and fuse move are more complicated than the move types described above. It is therefore useful to review a general and useful result from Green (1995): Sup­ pose there are two subspaces, 1 and 2, with parameters 0^ and 0^ of different dimensions, and proper densities tt(1, #M) and tt(2, 0®). Let Pr(l —>■ 2) and Pr(2 1) be the probabil ­ ity of proposing a move from subspace 1 to 2, and 2 to 1, respectively. The move from 1 to 2, consists of simulating a continuous random vector u® from density q(u^), and setting fll1) to be some deterministic function of 0^ and u^2\ The move from 2 to 1 is similar, setting 0W to be some deterministic function of 6^ and vP-\ which has density q(u^). For this move to be reversible, there must be a bijection between (#M, u^) and (0(2\ u^), with dim(0to) + dim(wM) = dim(^(^) + dim(u®). The Green ratio is now (Green, 1995),

„ tt(2,0(2)) Pr(2 1) q(uW) (A.3) ^ 7r(l,6to)Pr(l->2) g(uW) a^w.uW) and the acceptance rates are a(l —y 2) = min(l, R1-+2), and a(2 —> 1) = min(l, L^)-

For our split and fuse move, the underlying symmetric measure is constructed similarly to (A.l) and (A.2), except that parts of the splitting object will also be present in the two new objects (Figure 3). Denote by Om = (sm, ttim) the fused mother object, and by Ol = (sx,, rn£) and Or — (Sr , ttir ) the two split objects indexed by Left and Light. To split Om into Ol and Or , we first need to determine the centre-points (sr and sr ) and the radii (r& andrfi). We do this as follows: Sample u\ from uniform (0,2ir), U2 from uniform(0,rM), and set sl = sm+u^ [cos u\ , sin Ui], and sr = sm+(^2—I'm) [cos Ui, sin Ui]. The radii rL and

18 Tr axe sampled from the densities f(ri) and /(r^), where the parameters in the densities should depend on u2. (We use / as a generic density of its argument(s).) It is important that the radii are not too large to prevent the two new objects from overlapping. In the reverse fuse move, we only need to determine the centre-point sm as I’m = |s£ — s#|: Sample v\ from uniform(0, tm) and set sm = si 4- ui[cosu 2,sinu2] where v2 is the angle between sl and Sr .

The next step is to determine the deviations zl,zr and Zm- The goal is to leave the contour in the left and right part of Figure 3 nearly unchanged after the split, for reasons discussed in Section 4. Figure 9a illustrates the details of the deviations zl in a split of Om and Figure 9b illustrates the details of the deviations zm in a fuse of Ol and Or . When we propose fusing Ol and Or , we divide zM into three parts, zM = (4,4,zm)' Here, z\f are the deviations in the left half of 0&, marked as three black dots in Figure 9b. We set z%j to interpolate the contour of the left half of Or , and zf- to interpolate the right half of Om (not shown). Then we update the deviations inbetween zf conditionally on Zm and zf with respect to their prior measure. In a split step the deviations are handled essentially in the same way as in the fuse step, but we need to ensure that there is a bijection between and (0®, u®). The deviations zr and Zr must therefore be conditioned on the following: fusing Or and Or could yield Om- Figure 9a displays those deviations in Ol (zi) that are involved in this condition, marked with a thick line in between the nodes. The condition is that the thick line must go through the node of the Om, resulting in one degree of freedom less for zl in the left part of Ol for each node of Om that is in this part. We may therefore divide Zl into (zf, zl, Zr), where zf are those deviations in the left part of Ol that we must update stochastically, zf is the part of zl that is deterministically derived from zf (due to the bijection requirement), and zf is the rest. We take the mean of zf at the interpolated contour of Om, plus a Normally distributed noise term ef,

(A.4) (zf)i = intpol(s M + rM + (4)i. sM + rM + (-zff+i) + (el)i ~ rL-

Here, “intpol ” is the linear interpolation between the nearest two nodes of Om on each side of the i'th node of Ol, see Figure 9a. (We used the same correlation structure in ef as for the deviations z.) We then update zf as a deterministic function of zf, and finally sample zf from the conditional prior density given zf and zf. For later use, we write the update of zf as (A.5) zf = AlJl + Bl[zsl, zlf + bL for a vector of iid standard Normal variates and some matrices Al, Bl and a vector bL. We must go through a similar procedure for zr , the deviations for Or . The reason for the seemingly odd expression (A.5), is the problem of evaluating the Jaco ­ bian in the acceptance rate (A.3). We need to evaluate the Jacobian of the parameters in Ol and Or and the random variables needed to propose Om, with respect to the parameters in Om and the random variables needed to propose Ol and Or . Therefore (A.4) and (A.5)

19 are needed as they give the necessary relations between zl and parameters in Om and the random variables needed for proposing Ol from Om- The relations between the radii and centre-points are easy to establish, using the description earlier in this Appendix. We end this description with a minor detail: The dimension of the Jacobian will vary over split and fuse steps, but need to be constant for the reverse step where the bijection requirement is valid. It should at this point not be of any surprise for the reader that we determine the determinant of the Jacobian numerically on-line for each split and fuse move. To summarise, the Green ratio for a proposed split-move becomes, _ Pr(fuse) Pr(this particular fuse) pllt Pr(split) Pr(this particular split) x /Mmi>(mjt) Uj^gjx'nx'j) l(y\x') v{m) Il&jgfaxj) l(y\x) ______1 Am______/(zL|z&,z&) l/(2?r) l/rM f{rL)f{rR) f(ef) f(e%) f('YL)f('YR) x [Jacobian].

The terms on the first line are the probabilities for selecting the split and fuse move type, and the probabilities for selecting Ol and Or to fuse (for example 1/(2)) and selecting Om to split (for example l/(n —1)). The second line contains the terms from the posterior distributions. The third line contains densities for the random variables used to propose Ol and Or from Om and visa versa. Finally, the last line contains the Jacobian that is computed numerically for each proposal.

20 Figure 1: A typical confocal microscopy image of cells which is our motivation for the proposed mpp-model.

21 Figure 2: The figure displays schematically our model for one object. The dotted circle with radius r is the overall shape. Local variability is modelled through the Gaussian vector z = (z\,... ,znx ), where z,- denotes the deviation from the circle along the line with angle 2n(i — l)/n z.

22 Figure 3: The figure shows how we split one object m (solid line) into two objects mi and m2 (dotted lines), explained in Section 3. The two vertical lines divide the object into a left, interior and right part.

A+ A A- O O O

Figure 4: The figure shows three objects, A+, A and A~. The distance between A and A+ measure by the A-metric is (much) larger than the corresponding distance between A and A~. This is because the extra part in A+ influence d c(t,A) for far more sites t compared ' to the reversed situation in A~.

I

I I ! (C) (d) Figure 5: The results from the first simulation experiment; (a) the true discretized object with leaves zero and one; (b) the result adding Gaussian noise with zero mean and variance one; (c) and (d) the MAP and A-estimate (with c = 5) respectively.

24 t8B

(a) (b)

(=) (d) Figure 6: The results from the second simulation experiment; (a) the true discretized object with levels zero and one; (b) the result adding Gaussian noise with zero mean and variance one; (c) and (d) the MAP and A-estimate (with c = 5) respectively.

25 (d)

Figure 7: A detail of Figure 1 and some of the moves happened when running our MCMC algorithm on the larger image; (a) two misplaced objects first fuse together and after a short while they split again as shown in (b); (c) splitting of one object into two objects, and the small object in the background area will soon meet the death; (d) two objects fuses early in the MCMC run.

26 Figure 8: A typical result of applying our mpp-model to a real confocal microscopy image.

27 Figure 9: The details of the split and fuse step; (a) the details of how the deviations are updated in a split step; (b) the details of how the deviations are updated in a fuse step.

28 References

Amit, Y., G eenander , U. and Piccioni , M. (1991). Structural image restoration through deformable templates, J. Amer. Statist. Assoc. 86: 376-387.

Baddeley , A. J. (1992). Errors in binary images and a IP version of the Hausdorff metric, Nie. Arch. c. Wish. 10: 157-183.

Baddeley , A. J. (1994). Contribution to the discussion of paper by Grenander and Miller (1994), J. Roy. Statist. Soc. Ser. B 56(4): 584-585.

Baddeley , A. J. and M0ller, J. (1989). Nearest-neighbour Markov point processes and random sets, Int. Statist. Rev. 57(2): 89-121.

BADDELEY, A. J. AND Van LlESHOUT, M. N. M. (1991). Recognition of overlapping objects using Markov spatial processes, Technical Report Research Report BS-R9109, Centrum voor Wiskunde en Information.

Baddeley , A. J. AND Van LlESHOUT, M. N. M. (1993). Stochastic geometry models in high-level vision, in K. V. Mardia and G. K. Kanji (eds), Statistics and Images, Vol. 20, Abingdon: Carfax Publishing, chapter 11, pp. 235-256.

Besag , J., G reen , P., H igdon , D. and Mengersen , K. (1995). Bayesian computation and stochastic systems (with discussion), Statist. Sci. 10(1): 3-66.

BORGEFORS, G. (1986). Distance transformations in digital images, Comp. Vis. Graph. Im. Proc. 34: 344-371.

FRIGESSI, A. AND Rue , H. (1997). Bayesian image classification with Baddeley ’s delta loss, J. Comp. Graph. Statist. 6(1): 55-73.

G eman , S., G eman , D., G raffigne , C. AND D ong , P. (1990). Boundary detection by constrained optimization, IEEE Trans. Pattern Anal. Machine Intell. 12: 609-628.

GEYER, C. (1996). Likelihood inference for spatial point processes, in W. S. Kendall (ed.), Proc. Seminaire Europeen de Statistique Tolouse 1996, Stochastic Geometry: Theory and Applications, Springer Lecture Notes.

GEYER, C. J. AND M0LLER, J. (1994). Simulation procedures and likelihood inference for spatial point processes, Scand. J. Statist. 21: 359-373.

GREEN, P. J. (1995). Reversible jump MCMC computation and Bayesian model determination, Biometrika 82(4): 711-732.

G renander , U. (1993). General Pattern Theory, Oxford University Press.

G renander , U., C how, Y. and Keenan , D. M. (1991). Hands: a Pattern Theoretic Study of Biological Shapes, Research Notes on Neural Computing, Springer, Berlin.

29 G renander , U. and Manbeck , K. M. (1993). A stochastic model for defect detection in potatoes, J. Comp. Graph. Statist. 2: 131-151.

GRENANDER, U. and Miller, M. I. (1994). Representations of knowledge in complex systems (with discussion), J. Roy. Statist. Soc. Ser. B 56(4): 549-603.

H astings , W. K. (1970). Monte Carlo simulation methods using Markov chains and their applications, Biom. 57: 97-109.

PHILLIPS, D. B. AND Smith , A. F. M. (1994). Bayesian faces via hierarchical template modeling, J. Amer. Statist. Assoc. 89(428): 1151-1163.

Ripley , B. D. and Sutherland , A. I. (1990). Finding spiral structures in images of galaxies, Phil. Trans. Roy. Soc. L. A 332: 477-485.

Rue , H. (1995). New loss functions in Bayesian imaging, J. Amer. Statist. Assoc. 90: 900-908.

Rue , H. (1997). A loss function model for the restoration of grey level images, Scand. J. Statist. 24(1): 103-114.

Serra , J. (1982). Image Analysis and Mathematical Morphology, Academic press.

Tierney , L. (1994). Markov chains for exploring posterior distributions (with discussion), Ann. Statist. 22(4): 1701-1762.

Van LlESHOUT, M. N. M. (1994). Stochastic annealing for nearest-neighbour point processes with application to object recognition, Adv. in Appl. Probab. 26: 281-300.

Van LlESHOUT, M. N. M. (1995). Markov point processes and their applications in high-level imaging (with discussion), Bull. Int. Statist. Inst. LVT, Book 2: 559-576.

30 V Log Gaussian Cox Processes Log Gaussian Cox processes

JESPER M0LLER Aalborg University ANNE RANDI SYVERSVEEN The Norwegian University of Science and Technology RASMUS PLENGE WAAGEPETERSEN University of Aarhus

ABSTRACT. Planar Cox processes directed by a log Gaussian intensity process are investigated in the univariate and multivariate cases. The appealing properties of such models are demonstrated theoretically as well as through data examples and simulations. In particular, the first, second and third-order properties are studied and utilized in the statistical analysis of clustered point patterns. Also empirical Bayesian inference for the underlying intensity surface is considered.

Key words-, empirical Bayesian inference; ; Markov chain Monte Carlo; Metropolis-adjusted Langevin algorithm; multivariate Cox processes; Neyman-Scott processes; pair correlation function; parameter estimation; spatial point processes; third- order properties. AMS 1991 subject classification: Primary 60G55, 62M30. Secondary 60D05.

1 Introduction Cox processes provide useful and frequently applied models for aggregated spatial point patterns where the aggregation is due to a stochastic environmental heterogeneity, see e.g. Diggle (1983), Cressie (1993), Stoyan et al. (1995), and the references therein. A Cox process is ’doubly stochastic ’ as it arises as an inhomogeneous Poisson process with a random intensity measure. The random intensity measure is often specified by a random intensity function or as we prefer to call it an intensity process or surface. There may indeed be other sources of aggregation in a spatial point pattern than spatial heterogeneity. Cluster processes is a well-known class of models where clusters are generated by an unseen point process, cf. the references mentioned above. The class of nearest-neighbour Markov point processes (Baddeley and M0ller, 1989) include many specific models of cluster processes (Baddeley et al., 1996) as well as other types of processes with clustering modelled by ‘interaction functions ’ (M0ller, 1994) such as the penetrable sphere model (Widom and Rowlinson, 1970; Baddeley and Van Lieshout, 1995) and the continuum random cluster model (M0ller, 1994; Haggstrom et al., 1996). In this paper we consider log Gaussian Cox processes, i.e. Cox processes where the logari thm of the intensity surface is a Gaussian process. We show that tire class of

1 stationary log Gaussian Cox processes possesses various appealing properties: (i) The distribution is completely characterized by the intensity and the pair correlation function of the Cox process. This makes parametric models easy to inteiprete and simple methods are available for parameter estimation and model checking, (ii) Theoretical properties are easily derived. Higher-order properties are for instance simply expressed by the intensity and the pair correlation function of the log Gaussian Cox process. Thereby summary statistics based on e.g. the third-order properties can be constructed and estimated, (iii) The underlying Gaussian process and intensity surface can be predicted from a realization of a log Gaussian Cox process observed within a bounded window using Bayesian methods, (vi) There is no problem with edge effects as the distribution of a log Gaussian Cox process restricted to a bounded subset is known. The properties (i)-(vi) are rather characteristic for Log Gaussian Cox processes. We shall further note that log Gaussian Cox processes are flexible models for clustering, easy to simulate, and that the definition of univariate log Gaussian Cox processes can be extended in a natural way to multivariate log Gaussian Cox processes. Other transformations than the exponential of the Gaussian process may be con­ sidered as well, and in particular x2 Cox processes (as defined in Section 3) may be of interest During the final preparation of this paper we realized that a definition of Log Gaussian Cox processes has previously been given in Rathbun and Cressie (1994), but they restrict attention to the case where the intensity is constant within square quadrats and modelled by a conditional autoregression (Besag, 1974). The advantage of these discretized models is mainly that they can easily be explored using Gibbs sampling, but as noticed by Rathbun and Cressie (1994) such models does not converge to anything reasonable as the sides of the quadrats tend to zero. Consequently, it is difficult to investigate the correlation structure of these models through those summary statistics which are usually estimated for a point pattern such as the pair correlation function. The log Gaussian Cox processes studied in the present paper are in contrast to this specified by such characteristics, and discretized versions of our log Gaussian Cox processes can be simulated exactly without any problem with edge effects. Also the Metropolis-adjusted Langevin algori thm (Besag, 1994; Roberts and Tweedie, 1997) for simulating from the posterior of the intensity surface as studied in Section 8 is both easy to specify and implement In contrast and even if we ignore the problem with edge effects, Gibbs sampling from the posterior becomes straightforward only if one uses conditional autoregression priors.

The paper is organized as follows. In Section 2 we give a formal definition of univariate log Gaussian Cox processes and inspect some of their properties by simulations. Theoretical results are established in Section 3. In Section 4 we compare log Gaussian Cox processes with the class of Neyman-Scott processes with a Poisson distributed number of offspring. Extensions of log Gaussian Cox processes to the multivariate case are studied in Section 5. In Section 6 we describe different simulation procedures. Section 7 is concerned with parameter estimation and model checking of parametric models for log Gaussian Cox processes. We illustrate this by considering

2 some datasets of univariate and bivariate point patterns. Finally, in Section 8 we discuss how empirical Bayesian methods may be used for the purpose of predicting the unobserved Gaussian process and intensity surface.

2 Univariate log Gaussian Cox processes

For specificity and since all the examples presented later on are planar we specify the model in R2, but our model can be completely similar defined in Rd, d = 1,2,__ Briefly, by a planar spatial point process we shall understand a locally finite random subset X of the plane R2. This is said to be a Cox process directed by a random intensity process A = {A(s) : s ER2} if the conditional distribution of X given A is a Poisson process with intensity function A(-), i.e. when for bounded Borel sets B C R2 we have conditional on A that card(X ft B) is Poisson distributed with a mean fB A(s)ds which is assumed to be nonnegative and finite. We restrict attention to the case where A and hence X is stationary and sometimes also isotropic, i.e. when the distribution of A is invariant under translations and possibly also rotations in R2. The intensity p = EA(s) is henceforth assumed to be strictly positive and finite. Throughout this paper we model the intensity process by a log Gaussian process: A(s) = exp{y(s)} (1) where Y = {Y"(s): s € R2} is a real-valued Gaussian process (i.e., the joint distribution of any finite vector (Y"(si),..., Y(s„)) is Gaussian). It is necessary to impose conditions on Y so that the random mean measure v given by v(B) = f A(s)ds for bounded Borel B sets B C R, becomes well-defined. First it is of course required that the realizations of A are integrable almost surely. But further conditions are required in order that v is uniquely determined by the distribution of Y. Here we impose the natural condition that v is given in terms of a continuous modification of Y. Then v is uniquely determined, since all the continuous modifications are indistinguishable (i.e. their realizations are identical with probability one), and it also follows that v(B) < oo for bounded B. A simple condition for the existence of a continuous modification of a Gaussian process is given in Lemma 1 in Appendix A. By stationarity, the distribution of Y and hence X is specified by the mean fi = EY(s), the variance a1 = Var(Y"(s)), and the correlation function r(si — s2) = Cov (Y(si),Y(s2))/ct2 of y. The model is only well-defined for positive semi-definite correlation functions, i.e. when 53t,j o !-a_;-r(s,' — sj) > 0 for all ai,..., o n € R, si,...,sn E R2, n = 1,2, — Whether a given function is positive semi-definite may best be answered through a spectral analysis, see e.g. Christakos (1984), Wackemagel (1995), and the references therein. The parameters e11 > 0 and a > 0 have a clear interpretation as a scale and shape parameter, respectively, since we can write A^a = e^A^, where lnAMi

3 stationary Gaussian process with mean fi, variance a2, and correlation function r(-). The homogeneous Poisson process may be considered as the limit of a log Gaussian Cox process as in R2: ((X),Y (

1. Gaussian: exp(-(a/0)2) 5. Hyperbolic: (1 + a/13)-1 2. Exponential: exp( —a//3) 6. Bessel: exp(-{a/0)2)Jo{a/ 0) 3. Cardinal sine: sin (a/0)/{a/0) 7. Spherical: l(a//3 < 1) - [(o//f/2+ 4. Stable: exp{-^/a/P) 1 - (3e/(2j9))]

Table 1. Correlation functions. ICi is the modified Bessel function of the second kind of order one and Jo is the Bessel function of the first kind of order zero.

The first four models in Table 1 represent well the correlation structures which can be achieved by using the correlation models in this table, so we have restricted attention to these four models in the following Fig. ’s 1-3. In Theorem 1, Section 3, it is shown that the corresponding pair correlation functions are given by the exponential to the covariance function a2r(-). These pair correlation functions are plotted in Fig. 1 a)-d) for various values of 0 when a = 1. If one ’standardizes’ the pair correlation function g(-) to g(0) = e or equivalently takes <7 = 1, plots of corresponding pair correlation and covariance functions look very similar, cf. Fig. 1, a)-b) and g)-h). Simulated realizations of Gaussian processes on the unit square with correlation structure given by the four different types of correlation functions are shown in Fig. 2. Fig. 3 shows simulations of the corresponding log Gaussian Cox processes. The parameters in the first row in Fig. 3 are the same as in Fig. 2. In order to facilitate comparison of the four different Cox processes the ^-values in Fig. 3 are chosen so that the mean and variance of the number of points are equal for all Cox processes in the same row. By Theorem 1, Section 3, the mean is p = exp (fi +

4 a) Gaussian b) Exponential c) Cardinal sine d) Stable

e) Modified Thomas f) Matem cluster g) Gaussian h) Exponential

Kg. I. Upper row, a)-d): Various pair correlation functions with varying values of 0 (solid line = smallest value of 0) when a = 1. Lower row: e), f) pair correlation functions for the Thomas and Matdm cluster processes, g), h) Gaussian and exponential correlation functions with 0 as in the upper row.

Kg. 2. Simulated realizations of Gaussian random fields with

Var (card (x A [0, l]2) ) = p + p 2 ( j J . V,l]2[0,l]2

The variance thus increases when P and hence the correlation increases. It is difficult to compare unconditional simulations when the variance of the number of points is large and the simulations in Fig/s 2 and 3 are therefore performed conditional on that the number n of points equals the mean number of points (n == p = 148). In the upper row in Fig. 3 a moderate value of a but large values of /? give rise to large but not dense clusters of points. In the lower row moderate values of p but a higher value of a lead to many but small clusters. In the middle row a high value of a and intermediate values of P are used, and compared to the lowest row fewer but larger clusters appear. The realizations of the ‘Gaussian* and ‘cardinal sine’ log Gaussian Cox processes are visually quite similar. The ‘stable* log Gaussian Cox process is in general

5 1 •. • •>«: •.? .v . ' . • v. ' . •• x •• • • ♦ • • • . " : * .W’-ix • : •" V. * "A.."

Fig. 3. Simulations of log Gaussian Cox processes conditional on that the number of points is 148. First column: Gaussian correlation function. Second column: Exponential Third column: Cardinal sine. Fourth column: Stable. First row: Same values of parameters as in Figure 2, i.e. a3 = 1 and 0 = 0.172, 0.143, 0.094, 0.071 (left to right). Second row: a3 = 2.4, 0 = 0.110, 0.100, 0.053, 0.049. Third row: a3 = 2.4, 0 = 0.057, 0.050, 0.027, 0.020. Mean and variance of the number of points are equal in each row. less clustered than the other processes. This is not surprising because the Gaussian random field with the stable correlation function is not very peaked except at the small scale, c.f. Fig. 2. Finally, it should be noted that Cox processes may be extended to models on bounded regions W C IR2, where the conditional distribution of X\y = X fl W given Ait" = {A(s) : s 6 W} is of a Gibbsian type. For instance, consider a conditional distribution of Xw given Aw = Aw with density

/(®w|Aw) = c(Aw)< JjA($i)>< JI ^(Iki-Zjll) > U=1 J |l{-) > 0 specifies pairwise interactions terms at the small scale. Although the margin al distribution of X restricted to W becomes analytically intractable, such models may at least be simulated and statistical inference may be performed by Markov chain Monte Carlo methods.

6 3 Theoretical results Theoretical properties of Cox processes have been extensively studied, see e.g. Grandell (1976), Daley and Vere-Jones (1988), and Karr (1991). In this section we establish further results for log Gaussian Cox processes. In particular, we discuss the first, second and third-order properties of a univariate log Gaussian Cox process. As in the previous section we consider the planar case, but many of the presented results hold as well in Rj, d — 1,2... (with obvious modifications in a few places). The most useful characteristics for our purpose are the nth order product densities p(n ), n = 1,2,..., for the reduced moment measures of the Cox process X. These are given by the moments of the intensity process as

n

1 for pairwise different si,...,s„ GK2. Intuitively speaking p^{si,...,sn )dsi • • • ds n is the probability that X has a point in each of n infinitesimally small disjoint regions of volumes dsi, ..., ds n .

Theorem 1. A log Gaussian Cox process X is stationary if and only if the corresponding Gaussian field Y is stationary. For a stationary log Gaussian Cox process we have

., S„) = exp < np + a2 |+ E r(si-3j) > ( l

=pn n - sj) 1 <>

Proof. Let c(t) = exp(£ + nt2/2) be the Laplace transform of the normal dis­ tribution 1V(£, k ) with mean £ and variance k . Let £ = ]T)i /i(s;) and k = Ei

7 If X is stationary then p^(s) = p, and we can write g(si,S2) =

Theorem 1 reflects the fact that the distribution of a log Gaussian Cox process is completely determined by (p,a,r(•)) or equivalently by (p, 1, whilst for a homogeneous Poisson process r(-) = 0 and

gx*(s) = 1 + 2r(s)2.

Hence there is not a one-to-one correspondence between (cr2,r(-)) and (px2,gx2(-)) unless the sign of the correlation function is known. In the statistical analysis of point processes mostly first and second-order properties are investigated (see Section 7), but we shall also explore the following correspondence between the second and third-order properties: For any stationary simple point process X with finite intensity p > 0 and well-defined pair correlation function g(x 1,0:2) = g(x 1 — 2:2) > 0 and third-order density p@)(x\,X2,x$) = p®>(x% — 21,23 — 21) define

llcil

This has an interpretation as a third-order summary statistic, since

# 7T 2t4p2z(t) = e'o ]T 1 /{g(flg(y)g(£ - v)} 5,i?eA’:||ci|

where ^ means that the summation is over pairwise distinct points, and where the expectation is with respect to the reduced Palm distribution at the origin (heuristically this means that we have conditioned on that there is a point at the origin and X denotes the collection of the remaining points, cf. e.g. Stoyan et al., 1995). By Theorem 1,

z(t) = 1, t > 0, for a log Gaussian Cox process. (6)

This can be used to check our model assumptions as demonstrated in Section 7.

8 In the case of rotation invariance we propose an unbiased estimator which uses all triplets of observed points and which takes care of edge effects as follows. For a given ’window ’ W C IR2 and x\ G W, a > 0, b > 0, 0 < ip < 2tt, let

UXuafi,$ = {

wxi,a,b,i!> = 2tt/{length of Uxi,a,b,$} taking 2tt/0 = oo. Then for given xi, a, b and ip, l/io Xl,a,6,v is die proportion of triangles which can be observed within W with vertices x\,x%,xz G W such that \\x2 — xi\\ = a, 11£3 — xi11 = b, and ip is the angle (anticlockwise) between the vectors £2 — x\ and £3 — £%.

Theorem 2. Let ip(xi,xz,xz) denote the angle (anticlockwise) between £2 — £1 and £3 — £1. For any stationary simple point process X as considered above, assuming that the distribution of X is invariant under rotations about the origin,

* wxi,||xi-x2||,||xi-x3||,^(xi,x2,x3) 2 £ £ (7) ff(||£l - £2||)5(||£l - £3||M||%2 - 2*11) Xiexnw {* 2,T3}cxmv\{*i}: ||xj-i 2||

is an unbiased estimator of A(W)ir2t4p^z(t) for all t < t*, where A(W) is the area ofW and

Z* = inf < f > 0| J J J J l[wXl,a,bd — 00]dipdadbdxi > 0

xieit'ae(o,t] 6e(o,t] ^e[o, 2%)

Proof. Note that factor 2 in (7) appears because the second summation is over unordered pairs of distinct points. Then p^\x2 — x\,x% — £1) = p§\a,b,ip) for a function p® because of the rotation invariance. Moreover, ||x2 — £3}] = f(a, b,ip) is a function of (a,b,ip) — (||£i — £2||,\\x\ — xz\\,ip(x\,X2,xz)) only. Hence, using that the correction factor w is the same for (£i,£2,xs) as for (zi,£3,£2) together with the fact that

E h(x\,X2,xz)= / / p®(xi,x2,xz)h(xi,x2,xz)dxidx2dxz

xi,x2,x3g X J J J

9 for nonnegative measurable functions h, we find that the mean of (7) equals

r wxi,||xi-X2|l,||3:i-a:3||,^(a!i,X2,a;3)

g(||a:i,nl\\x.i - z2||))y -///:w w w

-U I J J $ abdtpdi/jdadbdxi )9{b)g(f{a, b,ip)) Xiew a€(0,t] 66(0,t] ^€[0,2-) (3)z = / / / / / xi€Vt" a6(0,t] 6G(0.t] 0G[O,2jr) yG[0,27r) p{3){£,v) = AiW) J J d£dr) g(Qg(v)9{t - v) llfll<*IWI

The estimator (7) is of the same spirit as Ripley ’s (1977) estimator for the second order reduced . In fact wXi,a.b,4> agrees with Ripley ’s edge correction factor when a = b and ip = 0. Our edge correction factor is of course also applicable for other third order summary statistics than z. Applications of z and its estimator are discussed at the end of Section 4 and in Section 7, Example 1. In most applications W will be convex in which case t* > I(W), the radius of the maximal inner ball contained in W. We have also considered a naive estimator based on ’minus sampling ’ and which do not presume rotation invariance, viz. the unbiased estimator of A(WQt)ft2t4P3z(t) given by

{g(xi - x2)g(xi - xz)g(x2 - zs)}-1

xi,X2,X3GX:xiGW©t, a s + u eW}. Compared to (7) the variation of this estimator can be very large since not all triplets of points in X n W are used. Another problem may be caused by a clustering of points so that no points are observed within Wet for even moderate values of t. Finally, we establish some simple results about ergodicity. Ergodicity may for instance become useful for establishing consistency of nonparametric estimators of p and

10 Theorem 3. (a) Let Z = {Z(s) : s G R2} be a stationary real-valued , let h : U —y [0, oo) be measurable, and suppose that with probability one f k(Z(s))ds < oo for bounded Borel sets B C R2. Then a Cox process with random B intensity Junction {h(Z(s)) : s G R2} is ergodic if Z is ergodic. Conversely, assuming that the realizations of Z are continuous with probability one and that h is strictly monotone, ergodicity of the Cox process implies that Z is ergodic. (Jo) If Z is a stationary Gaussian process where the correlations decay to zero, i.e. when r(s) —»• 0 as ||s|| —► oo (8) then the corresponding log Gaussian Cox process is ergodic. Especially, a stationary log Gaussian Cox process is ergodic if

g(s) -y 1 as ||s|| -> oo. (9)

Proof. We first need some measure theoretical details. Let F = Rr2 denote the space of functions / : R2 -* R equipped with the cr-field ap generated by the projections p8 : F —*■ R, sGR 2, where ps(f) = /(s). Further, let (M,M) be the measure space of locally finite measures defined on the Borel cr-field % in R2 where the cr-field M. is generated by the projections pa • M —y R, A G %, given by ju(m) = m(A). Furthermore, let H : F —y M be defined by

H(f)(A) = J h(f(s))ds, A G S2-

A

It is not difficult to show that for any fixed A G B2, the function Ha '■ F —y R given by HaU) = PA(H{f)) is measurable. Hence H is measurable and so E = H(Z) is a . Now, consider a stationary Cox process as in (a). This is ergodic if and only if the random measure E is ergodic, cf. e.g. Proposition 10.3.VII in Daley and Vere-Jones (1988). Ergodicity of E means that P(E G I) G {0,1} for all events / G M which are invariant under translations in the plane (I is invariant if m £ I^-mt G / for all i GR2 where mt(A) = m({s : s +1 G A})). Similarly, ergodicity of Z means that P(Z G J) G {0,1} for all events J Gap which are invariant under translations in the plane (i.e. J is invariant if / G J => ft G J for all t G R2 where ft(s) = f(s + f)). Using these definitions it is straightforward to show the first implication in (a). Assuming that h is strictly monotone, then H restricted to Fc = {/ G F : f continuous} becomes injective. Assume further that J £ op is invariant and S is ergodic. Then it follows that H( J) is invariant so that P(Z e H~l(H(J))) G {0,1}. Under the additional assumption that realizations of Z are continuous a.s., it is no restriction to assume that J C Fc. Then, since H is injective on Fc, J)) \ J C F\FC whereby P(Z G J) 6 {0,1}, and the second implication in (a) is proved. According to (a) a stationary log Gaussian Cox process is ergodic if the underlying Gaussian process is ergodic. But ergodicity of the Gaussian process is in fact implied

li by (8), cf. Theorem 6.5.4 in Adler (1981). Using (4) we get the equivalence between (8) and (9). This completes the proof. D

Conditions for continuity of random fields may be found in Adler (1981) or Ledoux & Talagrand (1991). Notice that (8) and (9) are equivalent and that (9) implies ergodicity also for a x2 Cox process.

4 Comparison with Neyman-Scott processes We shall now compare our log Gaussian Cox processes with a popular and frequently used class of models which are simultaneously Poisson cluster and Cox processes, namely those Neyman-Scott processes where the number of points per cluster is Poisson distributed (see e.g. Bartlett, 1964; Diggle, 1983; Stoyan and Stoyan, 1994; Stoyan et al., 1995). Imagine a point process {p,}c R2 of (unobserved) parents which form a homoge­ neous Poisson point process of intensity w > 0, and which generate clusters of offspring u j=i{Pi + xij}- The counts n- % are assumed to be iid Poisson distributed with mean v > 0 and the relative positions Xij of offspring are iid with density /. Further, the {p,}, {n;}, and {a:,-, } are mutually independent The Poisson cluster process of offspring U,- U”i_1 {p,- + x^} is then stochastic equivalent to a Cox process with intensity process

A(s) = #)- (10)

The product densities of such Neyman-Scott processes are known: We have that p = vio,

g{s) = l + ^J f{p)f{p + s)dp

/)(3)(si, s2, s3) = g(si - s2) + g(si - s3) + g(s2 - s3) - 2

+ ~2 J f(p + si)f(p + s2)f(p + s3)dp and with similar but longer expressions for p(n \ n > 4. The higher-order product densities of a log Gaussian Cox process as given by Theorem 1 are in general of a different and much simpler form than for Neyman-Scott processes. In the following we consider some particular but widely used models of Neyman- Scott processes, viz. a Matdm (1960) cluster process and a (modified) Thomas (1949) process (Bartlett, 1964). For the Thomas process, / is the density of a radially symmetric with variance k > 0, and the pair correlation function becomes

"(l)-1+d^“p ("£;)• “-0-

12 For the Matdm cluster process, / is the density for a uniform distribution on a disc with radius R > 0 centered at 0, and the pair correlation function becomes gu(a) = |1 + z^lgrjarccos ^ ^1 - ^ j , 0 < a < 2R 11 , a > 212. In Fig. 1 we have included plots of the pair correlation functions for Thomas and Matdm cluster processes. For comparison we have taken gu(0) = flx(0) = e. Then for the Thomas process w = l/(47r/c(e — 1)) is determined by the value of k, whilst for the Matdm cluster process w is determined by the value of R. At least for certain values of j3 and k the Gaussian pair correlation function and gx(') appear to be very similar, whereas looks different from the other pair correlation functions in Fig. 1. For instance, by taking k = .001 and minimizing f^2(g(a) — gx(a))2da with respect to /?, where logg(a) = exp( —(a//3)2), we get (5 = 1/13.45. The left plot in Fig. 4 shows that the logarithm of these pair correlation functions for the Thomas and the log Gaussian Cox process with Gaussian correlation function are nearly identical. This may suggest that ct (-) = loggx(') could be considered as a covariance function. One way to check this is through the Hankel transform of cx(-) given by CO Ct(<0 = 2~ J Jo(at)acT(a)d a 0 where CO t2k Mt) = 53 (-1)fc k=0 (kl)222k is the Bessel function of first kind and order zero. Then cx(-) is positive semi-definite if and only if Cx(t) > 0 for all t > 0, cf. e.g. Christakos (1984). The Hankel transforms Cx and Cu for the Thomas and Matdm cluster processes given in Fig. 4 show that neither of the two Neyman-Scott processes can be considered as log Gaussian Cox processes. But the close agreement with respect to the pair correlation functions and the remarks below and at the end of this section suggest that certain Thomas processes may in practice be difficult to distinguish from log Gaussian Cox processes with a Gaussian correlation function.

Fig. 4. Left: Plot of Gaussian correlation function (solid) and In gr(-) (dotted line) for k = 0.001. Middle: Hankel transform of In <7t(-) for k = 0.001. Right Hankel transform of Insm (•) for R = 0.1.

13 Fig. 5 shows simulated distribution functions F and G for the distance to the nearest point of a point process X with respect to 0 and a typical point of X, respectively (see e.g. Diggle, 1983). Here we consider two kinds of point processes: a log Gaussian Cox process (the solid lines in Fig. 5) with a Gaussian correlation function and fi = 3.888, <7 = 1, i.e. p = e°, and correspondingly a Thomas process with cv = l/(4ir(e — l)re) and v = p/uj (dotted lines); as before re = .001 and fi — 1/13.45. For each model we simulated 100 realizations and calculated the and the upper and lower envelopes for nonparametric estimates of F and G. (The upper and lower envelopes for F, G or any another summary statistic depending only on the distance are here and elsewhere in the following given by the maximum and minimum values obtained from the simulations at each distance; see e.g. Diggle, 1983.) The are then estimates of the theoretical F and G functions. Further simulations confirmed that the envelopes of F for the Thomas process lie beneath those for the log Gaussian Cox process, while the opposite statement holds for the envelopes of G. We recognized further that the G function distinguishes better between the two processes than the F function, but also that none of these summary statistics are really useful for discriminating between the two models. Another experiment confirmed that it may also be difficult to distinguish between the two models by means of the third-order characteristic z in (5).

Kg. 5. Left Dotted lines: Average and envelopes for the nonparametric estimator of F based on 100 simulations of the Thomas process. Solid lines: The same but for the log Gaussian Cox process with Gaussian correlation function. Right The same as the left plot but with F substituted by G.

In Section 7 plots of F, G, and z raise doubt of the appropiateness of the Matdm cluster process as a model for the data in Example 1, but give no reason to question the use of a log Gaussian Cox process with an exponential correlation function.

5 Multivariate log Gaussian Cox processes Our model can immediately be extended to the case of multivariate Cox processes as follows. Let us for simplicity just consider the bivariate case of a Cox process X = (Xi, X2) directed by random intensity processes Aj = {Aj(s) = exp(k}(s)) : s €

14 1

IR2}, j = 1,2, where Y = {(Yi(s), Y(a)) : s g R2} is a bivariate stationary and possibly isotropic Gaussian process with mean (pi, p2) and covariance functions Cij(a) = Cov(Yi(s{), Yj(s2)) for o = ||si — sail, i,j = 1,2 (in the isotropic case we have that ci2(-) = C2i(-)). Then conditional on Y, X\ and X2 are independent Poisson processes with intensity functions Ai and A2, respectively. The covariance function matrix of the multivariate Gaussian process must be positive semi-definite. Restricting attention to absolutely integrable and isotropic covariance functions, this is equivalent to that

Cn (t) > 0, C22(t) > 0, and |Ci2(t)|2 < Cn(t)C n (t), t > 0 (11)

where CO Cij(t) = J Jo{at)cij(a)ada 0 is the spectral density or Hankel transform of c-,j (Yaglom, 1986; Christakos, 1992; Wackemagel, 1995). Moreover, many of the results presented in Section 3 may easily be extended to the multivariate case. For example, by Theorem 1 the intensity and pair correlation function of Xj become

Pi = exp {fij + Cjj(0)/2} , gjj(a) = exp {<%(&)} (12)

and the mixed pair correlation function is given by

912(a) = E[Ai(si)A2(s2)]/(pi/>2) = exp {ci2(a)} , a = ||si - s2||. (13)

Especially, if we consider affine transformations Yj(s) = H\otijZi(s) + pj of k independent one-dimensional Gaussian processes Zi = {Zi(s) : s G R2}, i = each with mean 0, variance 1, and a positive semi-definite correlation function 77, then of course Y is well-defined and

k k Cjj(s) = ^2 afjTi(s), j = 1,2, Ci2(s) = ^2 ociia,2ri(s), s G IR2 (14) 1=1 1=1

(in this case ci2(-) = c2i(•) no matter if isotropy is required or not). For example, if Yj = o-jZ + pj, j = 1,2, where Z is a stationary Gaussian process with mean 0, variance 1, and correlation function r(-), then the sign of

15

1 Fig. 6. Left: Gaussian random field with exponential correlation function,

/ti = /12 = 2.5, ci = —C2 = 2. 6 Simulation algorithms Some properties of Cox processes are hard to evaluate analytically. Fortunately, log Gaussian Cox processes are easy to simulate so that Monte Carlo methods can be applied. An advantage of log Gaussian Cox processes is that there are no boundary effects since all marginal distributions of a Gaussian field are known. In practice we represent the finite domain of simulation by a grid and approximate the Gaussian process by the values of the corresponding finite dimensional Gaussian distribution on the grid. If we for example wish to simulate a log Gaussian Cox process on the unit square, we approximate the Gaussian process {Y(s)}sej0 ^ on each cell Dij = [i—1/(2M), i+l/(2M)[x [y —1/(2M),j+l/(2M)[by its value Yy = Y{{i,j)) at the center (i,j) of £>y where (%,/) G / = {1/(2M),1/M+1/(2M),...,(M — 1)/M + 1/(2M)}2, and M is a suitable value for the discretization. Thus, simulations of the field Y = {Yij)(i,j)ei ate required. For ease of presentation we shall here mainly focus on univariate log Gaussian Cox processes where the discretization is given by a square lattice I; at the end of Subsection 6.1 we consider briefly the case of a multivariate log Gaussian Cox process and a rectangular lattice. If the Cox process is moderately clustered and the intensity moderate, the very fine scale properties of the Gaussian field are probably not so important and a rather coarse discretization can be used. The choice of discretization also depends on the smoothness of the realizations of the Gaussian field, see Fig. 2. The error due to discretization is e.g. likely to be small when the Gaussian correlation function is used. For the simulations presented in this paper we found it sufficient to use either 65 X 65 or 129 x 129 grids. Simulation of a log Gaussian Cox process involves two steps. First the Gaussian field is simulated and secondly, given the Gaussian field Y = (jfij )(tj)ei> inhomoge ­ neous Poisson process can be simulated: either within each cell Dij where the Poisson process is homogeneous with intensity Ay = exp(jry), or by thinning a homogeneous Poisson process with intensity Amax = maxyAy so that a Poisson point situated in the yth cell is retained with probability Ay/A max. There are several methods available for simulation of a Gaussian random field, see e.g. Lantudjoul (1994). The simulation method based on Cholesky decomposition of

16 the covariance matrix is too slow even for moderate grid sizes. We used another method based on decomposition of the covariance matrix (see Subsection 6.1) or alternatively the turning bands method (Matheron, 1973). In Subsection 6.2 we describe how simulations conditional on the number of points can be obtained. Finally, in Subsection 6.3 we briefly discuss how the Thomas and Matdm cluster processes studied in Section 4 are simulated.

6.1 Simulation using diagonalization by the two-dimensional discrete Fourier transform: A detailed description of this method in the univariate case and any lattice dimension d = 1,2,... assuming only stationarity can be found in Wood and Chan (1994). Below we summarize this for the two-dimensional case (the notation and the results are also used in Sections 7 and 8). For simplicity we assume isotropy. Suppose that an isotropic covariance function c : R2 —» R is given and we wish to simulate a Gaussian field Y = (Yj)(ij)el with covariance matrix E = (<7ij,ki)(ij),(k,i)el = (c(||(*, j)-(M)ll))(,\i),(fc,t)eJ (here we use a lexicographic ordering of the indices ij). Note that E is block Toeplitz and block symmetric. Extend the lattice I to Iext = {l/(2M),l/M-f-l/(2M),...,(2(M — 1) — 1)/M+1/(2M)}2 wrapped on a torus. Let = min(|i—k\,2(M—1)/M—|i—fc|), (i,k) G Iext, andletd((i,y"), (k, l)) = ■\Jdjj. + djj denote the shortest distance on the torus between (i,j) and (k, l). The symmetric matrix K = defined by = c{d{(i,j),(k,l))) is block circulant with 2(M — 1) circulant blocks of dimension 2(M — 1) x 2(M — 1). Hence, by Theorem 5.8.1 in Davis (1979),

K={F2(M-1) ® F2(M—i)) E (F2(M—l) ® Fz(M-i)) (15) where ® ^2(M-i) is unitary and E = diag(e,-_,-, (i,j) G I ext) is a diagonal matrix of the eigenvalues for K. Here F 2(m-i) = (\/2(M — 1) exp(—i2irkl/(2(M — l))))(fc,/)e/en is the (normalized) 2(M—1) x2(M—1) discrete Fourier transform matrix, denotes complex conjugate, and ® is the Kronecker product Now, suppose that K is positive semi-definite (i.e. K has nonnegative eigenvalues). Then we can extend Y to a Gaussian field Yext = (Yij)(ij)eiex, with covariance matrix K. Using the above decomposition of K we find that

Yext = rS1/2A(P2(M_i) ® Fg(M-i)) where F ~ Nj(0,1) follows a d-dimensional standard normal distribution with d equal to the rank of K, D is a diagonal matrix given by the non-zero eigenvalues of K, and A is a certain d x (2(M — l))2 complex matrix of rank d. If M — 1 is a power of two (or three or five), the calculation of Yext is only a 0((2(M — l))2log 2((2(JVf — l))2)) operation as the two-dimensional fast Fourier transform (see e.g. Press et al., 1988) can be applied. Thereby a fast simulation algorithm is obtained. Notice that the extension of the lattice {1/(2M), 1/M+1/(2M),..., (M—1)/M+ 1/(2M)}2 to {1/(2M), 1/M + 1/(2M),...,(2(M - 1) - 1 )/M + 1/(2M)}2 is the minimal extension which gives a block circulant matrix K. Tf K turns out not to be

17 positive semi-definite, it may help to use a larger extension (see Wood and Chan, 1994). Also, if M — 1 is not a power of two (or three or five), a larger extension can be applied in order to use the two-dimensional fast Fourier transform. The algorithm can straightforwardly be generalized to the case of a multivariate Gaussian field Y = ((Yi/i,...,Yjn))(i.j)ei, where I is a M x N rectangular lattice and n > 1. In this case K becomes a 4(M — 1)(JV — l)n x 4(M — 1)(1V — 1 )n block circulant matrix given by 2(M — 1) blocks, which in turn are block circulants and of dimension 2(N — l)n x 2(N — 1 )n. By combining (5.6.3), Theorem 5.6.4, and (3.2.2) in Davis (1979) one obtains that

K = (A(M-l) ® F%(N-1) ® Fn)G(F2(M-l) ® ^2(iV-l) ® ^n) where G is a block diagonal matrix with 4(M — 1)(JV — 1) blocks of dimension 7i x 7i. In the bivariate case, simulation of Y" thus amounts to a linear transformation of 4(M — 1)(JV — 1) independent two dimensional Gaussian vectors. The method is fast and practically applicable. Problems with nonpositive semi­ definiteness of K occurred very seldom, and were then due to slowly decaying corre­ lation functions like the stable correlation function (see Figure 1).

6.2 Conditional simulation: It may sometimes be desired to simulate the conditional distribution of X n [0,1[2 given that N(X) = card(X n [0,1[2) = ti for n G N. Then we need first to simulate a realization y from Y\N(X) = n and secondly simulate from X\N(X) = n,Y = y. The last step is performed by distributing n independent points in the M2 grid cells, where a cell Dij is chosen with a probability proportional to Ay = exp(yy), (i,j) 6 I, and the point subsequently placed at a uniformly sampled location in the chosen cell. Rejection sampling (see e.g. Ripley, 1987) is used for the simulation of Y\N(X) = n as follows. For the conditional density of Y given N(X) = 7i we have that

f(y I n) oc f(n I y)f(y) < —e~n f(y).

The rejection sampling can thus be performed by generating realizations of Y until a realization y is accepted with probability (X/n) en ~x, where A = J2 Ay/M 2. i=0 j—0 Considered A as a random variable, the mean of A approximates the intensity p of X. Thus the acceptance rates are reasonably high if n is close to p and the variance of A moderate.

6.3 Simulation of the Thomas and Matern cluster processes: Procedures for sim­ ulation of the Thomas and Matdm cluster processes on a bounded region A follow straightforwardly from the definitions of these processes as Poisson cluster processes, see Section 4. In order to avoid boundary effects the parent process is simulated on an extended area B containing A. The area B is chosen so that offspring from a parent

18 outside B falls into A with a negligible probability. An approximate procedure for sim­ ulation conditional on the number of points can be obtained by using that the Thomas and Matdm processes are Cox processes with intensity surface given by (10) and then proceed as described in Subsection 6.2 above.

7 Parameter estimation and model checking For simplicity we first restrict attention to the univariate case, but our methods for estimation and model checking can also be used in the multivariate case, see Example 2 at the end of this section. Suppose we have observed a point pattern x = {xi,..., xn } within a bounded planar window W of area A(W). Under a homogeneous log Gaussian Cox model with a correlation function rp(-) the density of Xw — X fl W with respect to a planar unit Poisson process is

1/(1- Lift, <7, p) — Eji,a,0 exp exp(y(s)))ds JJexp (Y(xi)) W 1 Except for very special models this likelihood is analytically intractable. Considering this as a ‘missing data problem* the likelihood can be approximated by discretizing W as described in Section 6 and making importance sampling as follows: The density of the Gaussian field Y is proportional to

hg(y) = exp - P)R(P) x(y - £)*) where 6 = (fi, a, /?), R(P) is the correlation matrix (here assumed to be positive definite), and * denotes transposition. For a given fixed parameter do = (/*o, ffo,Po) suppose that y(l\y(M) is a sample from the distribution of Y and y^1\x),...,y^M\x) is a sample from the conditional distribution of Y given Xw = % (Section 8 describes how the latter sample can be generated). Since the conditional distribution of Xw given Y does not depend on 6, it is easily seen from the results in Gelfand and Carlin (1991) and Geyer (1994) that the Monte Carlo approximation of the log likelihood is

1 M he 1(6) « const + log — ^ h$ 0 (y{m)(x)) 771=1 My (m)) '

Actually we may replace Y with the extended Gaussian field Yext (see Section 6.1) for which it is easier to invert the correlation matrix. We have no experience about how this would work in practice, but we expect that multimodality of the likelihood may cause problems for finding the (approximate) maximum likelihood estimate. Since only the Gaussian density (up to scale) appears in the approximation of the log likelihood, there may be some analog here to Ripley ’s (1988) discussion on the difficulties associated with likelihood analysis for spatial Gaussian processes.

19 Pseudo-likelihood (Besag 1977; Jensen and M0ller, 1991) is not useful since a closed expression of the density is not known even not up to multiplication with a positive constant (so a closed expression of the socalled Papangelou conditional intensity is not known). For the same reason we also doubt the usefulness of the more general method of Takacs-Fiksel estimation (see e.g. Ripley, 1988, and the references therein). Since the distribution of a log Gaussian Cox process is completely determined by its first and second order properties we suggest instead to base the inference on corresponding summary statistics as described in the following. As a natural estimate of the intensity we shall use

P = n/A(W). (16)

This estimator is unbiased. If rp(a) -*■ 0 as a -* oo, then the ergodicity implies that p —» p almost surely as W extends to R2, cf. Theorem 3. The parameters 0 and /? > 0 are estimated by a minimum contrast method: Assume henceforth that the correlation function is isotropic. Let c(-) denote a nonpara- metric estimate of the covariance function. Then

oo J {c(a)“- [<72r|g(a)] a| da (17)

6 where 0 < e < ao and a > 0 are user specified parameters; in Examples 1 and 2 we take e = min ||x,- — xj||, while ao and a are determined by the form of c(-) and rp(-). These parameters must of course be chosen so that the terms in (17) are well-defined. For fixed (3 the minimum of (17) is obtained at Oo Oo s0 = [B(P)/A(P)]1*° ‘ with B(P) = J {c(a)rp{a)}ada , A(/3) = J rp(a)2ada

€ € provided B(/3) > 0; otherwise there exists no minimum. Inserting this into (17) and using that p = exp (p. +

p = argmaxS(^) 2/A(/3), <72 = , /t = log(p)-d 2/2. (18)

Diggle (1983) describes a similar estimation procedure using the -function t K(t) = 2ir J ag(a)da, t > 0

0 instead of the covariance function, but for the data considered later on we found that there may be many local minima, and it may be difficult to find a global minimum. The procedure in (18) is computationally much simpler; we need only to maximize with respect to (3, whereas Biggie ’s procedure involves a2 as well. In our examples the function B(f3)2/A((3) turned out to be unimodal.

20 As the nonparametric estimate of the covariance function we have used c(-) = logp(-) with

g( a) = 2vaj?A(W) X' ^ fcft(a ~ ~ a:ill)6,'J' » a

kh(a) — ^(l — x2/h2)l[—h 0,6#is the proportion of the circumference of the circle with center Xj and radius 11 x, — x j 11 lying within W, and a* is the circumradius of W. The estimator (19) and other estimators of the pair correlation function are discussed in Stoyan and Stoyan (1994); in particular they discuss how to choose the bandwidth of the kernel. To study how well our estimation procedure works we performed 20 simulations from the model with an exponential covariance function where a2 = 2.0 and /? = .05. A scatter plot of the estimated values of /? and a2 together with the true values is shown in Fig. 7. There is a large variation in the estimate of /3, but the mean values of the 20 estimates are j3 = .0513 and a2 = 2.08, not far from the true values. The other plot in Fig. 7 shows the mean covariance function c (solid line) and upper and lower envelopes for the empirically estimated covariance functions obtained from the 20 simulations. The values of c are close to the exponential covariance function, especially at small distances. Estimating the parameters from c gives a2 = 2.145 and /3 = .0461, which indicates that a good estimate of the covariance function gives good parameter estimates.

Fig. 7. Left Estimated parameters /? and a 1 from 20 simulations under the log Gaussian Cox process with exponential covariance function c(t) = 2.0exp(-20i). The true parameter value is marked with a square. Right: The true covariance function (dotted line), the mean and upper and lower envelopes for the estimated covariance functions (solid lines).

Having estimated the parameters we may check our model assumptions by compar ­ ing nonparametric estimators of various summary statistics with those obtained under

21 the estimated log Gaussian Cox model. We have considered the distribution functions F and G of the distance to the nearest point in X from a fixed point in the plane and a ’typical point ’ in X, respectively. Under the log Gaussian Cox model F = F^a2^ and G = G^^ are given by

= 1 - exp < ll*ll<« and

Gfi.

. INI<“ where the mean values may be approximated by Monte Carlo. As in Diggle (1983), Stoyan and Stoyan (1994), and Stoyan et al. (19 95) we have in Examples 1 and 2 compared nonparametric estimates of F, G, L = ~s/kJtt based on the data with those obtained by simulations under the estimated log Gaussian Cox model. For short we call such nonparametric estimates for empirical F, G, and 1,- functions. Moreover, we have obtained a nonparametric estimate of the third-order characteristic z in (5) by combining (7) with (16) and (19), and considered whether this summary statistic varies around 1 in accordance with the result (6) for log Gaussian Cox processes.

Example 1: The first data set consists of the locations of 126 Scots pine saplings in a square plot of 10 x 10 m2. The pine forest has grown naturally in the Eastern Finland and the data have previously been analyzed by Penttinen et al. (1992) and Stoyan and Stoyan (1994), who both fitted a Matdm cluster process using the L-function both for parameter estimation and model checking. The estimation in Penttinen et al. (1992) was carried out by trial-and-error, while Stoyan and Stoyan (1994) used a minimum contrast method. The fit in both cases seems quite good, see Fig. 11 in Penttinen et al. (1992) and Fig. 131 in Stoyan and Stoyan (1994), but one may object to that the same summary statistics have been used for both estimation and model checking. Fig. 8 shows several characteristics for the pine data: The data normalized to a unit square are shown in a). The logari thm of the estimated pair correlation function is plotted in b) (solid line), and the shape of the curve suggests to use the exponential covariance function. We estimated the parameters by minimizing (17) with uq = 0.1 and a = 0.5, which are chosen to give more weight to values of a close to zero. The estimates are (3 = 1/33 and a1 = 1.91. The dotted line in b) shows the covariance function for the estimated model. The plot in c) shows the empirical L-function for the data (solid line) and upper and lower envelopes of the ^-function for the fitted model based on 19 simulations. This is the same as Stoyan and Stoyan (1994), Fig. 131, and our model shows a better fit with respect to the 1,-function than the Mattim cluster model. The empirical ^-function falls within the envelopes from the simulations

22 FW GW

Fig. 8. Example 1. Several characteristics for the pine data (see the text for explanations).

Fig. 9. Example 1. Estimate of z based on the data (solid line) and ‘conditional ’ envelopes (------) and ‘unconditional ’ envelopes (------) based on 20 simulations. Left: Log Gaussian Cox process. Right Matdm cluster process.

23 except for very small values of t. The plots in d) and e) show the nonparametric estimates F and G based on the data against the mean of these estimates obtained from 99 simulations under the estimated model. The plots show a reasonable good fit to the chosen model and F and G fall within the upper and lower envelopes based on the 99 simulations. For the Matdm cluster model fitted by Stoyan and Stoyan (1994) we have also created plots similar to d) and e) which indicate that our model fits the data better. Finally, f) shows a realization under the estimated log Gaussian Cox process. We also used the third-order characteristic z to check our model assumptions. The left plot in Fig. 9 shows the estimated z for the data and two sets of envelopes based on 20 unconditional simulations of the estimated log Gaussian Cox process and 20 simulations where we condition on the observed number of points. The plot gives no reason to doubt the model no matter whether the ‘unconditional ’ or ‘conditional ’ envelopes are considered. The two sets of envelopes are not very different in this situation where /? is rather small and the correlation therefore not very strong. To check the discriminatory power of z we similarly calculated envelopes for the Matdm cluster process estimated by Stoyan and Stoyan (1994), see the right plot in Fig. 9. The estimated z-function based on the data crosses the envelopes in an interval of (-values and even though the large variability of the estimator for small t makes it difficult to make definitive conclusions, the plot raises serious doubt concerning the appropiateness of the Matem cluster process as a model for the data. □

Example 2: In this example we study a bivariate data set consisting of two types of trees, 219 spruces and 114 birches in a square plot of 50 x50 m2. The data has been collected by Kari Leinonen and Markku Nygren as a part of a larger data set where also a very few pines were present and marks consisting of tree length and diameter were included. These data has earlier been studied by Kuuluvainen et al. (1996). They found that small trees are clustered, while larger trees are regularly distributed. Fig. 10 a) shows the data normalized to unit area. The plot indicates clustering and a positive dependence between the two types of trees. In Fig. 10 b) the empirical covariance functions C22, £n and £\2 are plotted (solid line, from top to bottom) using the equations (12) and (13) to obtain cy = ln#y. Here gjj(a) is the estimate (19) based on the point pattern ay = {ayi, ..., Xjnj } of type j trees. Further,

n\ n 2 A(W) n2 912(a) = X2j\\)bij+ 2iranin2 +712 5353 w& - n®i« - «=i i=r n 1 Ml xij\\)bij} 711+712 53 53 w* -11*2,• - 1=1 j=1 with the correction factor 6y similarly defined as in (19), and where we have combined kernel estimation with the way Lotwick and Silverman (1982) and Diggle (1983) recommend to estimate Kn(t) = 2tt Jq agi2(a)da, t > 0. Based on the plot of the empirical covariance functions we specify a model for a bivariate log Gaussian Cox process with exponential covariance functions cy(a) =

24

* •»

Fig. 10. Example 2. a) Plot of data, spruces marked with *• ’ and birches marked with *x\ b) Empirical covariance functions (solid line), from top 622, cn, C12, and covariance functions for the fitted model (dotted line), c) Empirical 1*22 (solid line) together with lower and upper envelopes (dotted line) plotted against the mean of 99 simulations from the fitted model

8 Prediction and Bayesian inference We conclude this paper by considering prediction of the unobserved Gaussian process and intensity process under a given model for a univariate log Gaussian Cox process when this is observed within a bounded window. We use an empirical Bayesian approach, where the a posteriori distribution of the intensity process is obtained by considering the Gaussian distribution as a prior which smoothes the intensity surface, and where the prior may be estimated as described in Section 7. The posterior is not analytically tractable so we use a Markov chain Monte Carlo algori thm to

25 simulate the posterior distribution whereby various posterior characteristics can be estimated. The results are applied on the dataset in Example 1 and we compare various Bayesian estimators of the intensity process with a nonparametric kernel estimator studied in Diggle (1985), Berman and Diggle (1989), and Cressie (1993). Ogata and Katsura (1988) developed another objective Bayesian method for estimating the intensity function of a marked inhomogeneous Poisson point process using spline functions. Other related research but for Poisson (and more general) cluster processes include Lawson (1993), Baddeley and Van Lieshout (1993), and Granville and Smith (1995), who consider Bayesian estimation of cluster centres and cluster membership. Simultaneously with the development of the material of this section, Heikkinen and Arjas (1996) have been working with nonparametric Bayesian estimation of the intensity function of inhomogeneous planar Poisson processes generalizing the method of Arjas and Gasbarra (1994). Suppose that a realization x of a log Gaussian Cox process is observed within a bounded window and we wish to predict the Gaussian process and the intensity surface on the bounded set W D W^\ As in Section 6 we shall without loss of generality assume that W is the unit square and consider a finite subdivision of and = W \ into cells A'j of area A,j > 0, (i,j) 6 I, where I = {1/(2M), 1/M + 1/(2M),..., (M — 1)/M + 1/(2M)}2. Define the sublattices, /(“) = nl, a = 1,2. Further, we approximate the Gaussian field Y restricted to W by a Gaussian field Y = (%)(ij)g/ with mean vector jl = (M(w)er and a covariance matrix E given by the covariance function of Y. As noticed in Subsection 6.1 we can extend Y to Yext = (ky) . . = TQ + /xexU where I^t = {1/(2M), 1/M +

1 /(2M),..., (2(M—1) — 1)/M+1/(2M)} , K given by (15) is assumed to be positive semi-definite and of rank d, T ~ Nd( 0,1), Q is a certain d x (2(M — l))2 real matrix of rank d, and jlext = (M)(ij)g/„<- We shall later on explain why it (apart from ease of exposition) may be prefered to use F instead of Yext. Now, if /(7|a:) denotes the density of the conditional distribution of F given that X n = z, log f{l\x) = const(z) - ^IItII2 + 53 - eViiAij) (20) (ij)eicx, where n,-j = card(a; n D,-j) is the number of points of x contained in the i/th cell if (i,j) G I^\ and we set nij = Ay = 0 if (i,j) 0 /W. Though this conditional distribution is not defined in accordance to the covariance structure of the Gaussian process outside W, we shall refer to this as the a posteriori distribution of F given x; the important point is that the marginal distribution of Y under this posterior agrees with the conditional distribution of Y given X A = x. In the following the gradient of the posterior V(7) :=dlog/(7|a:)/d7 = -7+ (nij - eVijA^) plays a keyrole. It is easily seen that dV(7)/9 7* is strictly negative definite. Thus the posterior is strictly log-concave.

26 For simulation of the posterior we use a Metropolis-adjusted Langevin algorithm (MALA) as suggested by Besag (1994) in the discussion of Grenander and Miller (1994) and further studied in Roberts and Tweedie (1997). This is a Metropolis-Hastings type Markov chain Monte Carlo (MCMC) method inspired by the definition of a Langevin diffusion through a stochastic which in the present context is dT(t) = (h/2)V(T(t))dt + VhdB{t) where B(-) is standard Brownian motion and h > 0 is a user specified parameter (see Example 1 below); the posterior r|z is a stationary distribution of this Markov process r(-). The MALA is given by two steps: First, if is the current state of the chain, a 'proposal' y/ m+1) is generated from a multivariate normal distribution with mean C(7(m)) = + (A/2)V('y(™)) and independent coordinates with co mmon variance h. In general, the use of gradient information in the proposal kernel may lead to much faster convergence than for e.g. a Metropolis chain (Roberts and Rosenthal, 1995). Secondly, with probability exp 1|7^ — £(u^m+1^ 11 /(2A) j

/(-y( m)|a:) exp ||u(m+1) — £(7(m))||2/(2/i)j the next state becomes 7^m+11 otherwise 7^m+1^ = 7^. This gives an irreducible and aperiodic Markov chain with the posterior as the stationary distribution, but it is not geometrically ergodic as the posterior has lighter tails than the Gaussian distribution (this can formally be verified using Theorem 4.2 in Roberts and Tweedie, 1996b). Briefly, the problem with the light tails is that the Markov chain may leave the center of the posterior for a very long time, since ||V(7)|| may become extremely large if 7 is far away from the mode of the posterior. As suggested in Roberts and Tweedie (1996b) more robust geometric ergodicity properties may be obtained by truncating the gradient in the mean of the proposal kernel: In Appendix B we show that if V(7) is replaced by V(7f™ = -7 + W - (# A (21) for some constant H > 0, then the ‘truncated MALA’ becomes geometrically ergodic when 0 < h < 2. However, if a sensible value of h is chosen, the undesirable properties of the (untruncated) MALA may not be a problem. In our examples the chain behaved very nicely and a truncation of the gradient (for a suitably large H) would not have made a difference. Note that E and K do not need to be strictly positive definite. This is one reason for using F instead of Y when the posterior is considered. In the case where K is strictly positive definite we have compared MALA’s for simulating the conditional distribution of F respective Y given x, where the gradient in the latter case is given by V(iiext) = -(ijext - hxt)K~l + (Uij - eV,iAij)(ij)ehxt •

27 For the data in Example 1 considered below we found that in the former case the algo ­ rithm mixes much faster (Fig. 11), so this is another reason to prefer the ‘parametriza- tion ’ given by T. By simulating the posterior we can obtain MCMC estimates of the posterior mean, credibility intervals, etc. for the Gaussian process and intensity surface. Conditional simulations of the unobserved part X n W® of the point process given X n = x can also be obtained. To do this one generates first a realization from the posterior distribution of the intensity surface and given this realization, X fl is simulated along the same lines as described in the beginning of Section 6. Maximum a posteriori (MAP) estimation is also possible. Since /('y|ar) is strictly log-concave and its tails tend to zero at infinity, the MAP-estimate jMAP is the unique solution to V(7) = 0. Because of the linear relationship between Yext and P, the MAP-estimate of Yext is simply given by ylxfp = 7MAPQ+jlext. Note that the MAP- estimate yMAP of y agrees with y^fp restricted to I. It can be shown that y^ifp restricted to 1^ is the same as the predictor of J obtained from V y /(ij)erw ‘data’ yW = fyfiAF') using kriging (see e.g. Cressie, 1993). X J J (ij)eiw The conditional density of the intensity surface Aext = (exp(yj))(ij)ei«i (with respect to d-dimensional Hausdorff measure with carrier space of dimension (2(M — l))2) is not log-concave and (exp{yffAP)){i.j)ei is clearly not the MAP-estimate of the intensity surface. If K is strictly positive definite, then using an obvious notation, fycx,($ext|aO is strictly log-concave, and so

h(jjext) = /AEIt((exP(5u))(ij)6AZila') = /y„,(5ea:i|x)/ JJ jjij is strictly log-concave. Consequently, in this case the MAP-estimate \MAP of the intensity surface on W is the same as ^xfp = (exp (fjy))( t-j)€jez, restricted to I, where y is the unique solution to V(yext) — (1,..., 1). Note here that since the log Gaussian distribution is heavy tailed and skewed, \MAP is not necessarily a sensible estimator of the intensity surface (see also the discussion in Example 1 below). We have used a discrete gradient ascent algorithm for finding jMAP, since this algorithm involves only the calculation of the gradient: Given an initial value 7® the iteration is given by 7^m+1^ = 7W+5V j , m = 0,1,..., where 8 > 0 is a user specified parameter. The algorithm for finding y is similar, except that in each iteration we replace V by V — (1,..., 1). A too high value of 8 may cause the algorithm to diverge - we used in Example 1 the modest value 8 = .1.

Example 1 (continued): We now consider estimation of the Gaussian process and the intensity surface on a grid / = {0,..., 64}2 under the log Gaussian Cox process which was estimated in Example 1, Section 7. In this example W = = [0, l)2. After some preliminary runs of the MALA the parameter h was adjusted to be .06 in order to obtain an acceptance rate close to the optimal rate .574 given in Roberts and

28 Rosenthal (1995) (they formally prove their results for target distributions with i.i.d. components, but notice that various generalizations are possible and the optimal rate appear to be quite robust over changes in the model). Then a sample of length 300.000 was generated by the MALA and we used a subsample of this (with spacing equal to 10) for obtaining Monte Carlo estimates of the various characteristics of the posterior. These estimates are reported below. To study the convergence properties and to compare the different implementations of the MALA we have considered various plots of timeseries and estimated autocorrelations for selected cells on the initial as well as the extended lattice. It appears from these plots that the potential problem related to nongeometrical ergodicity of the MALA is rather hypothetical. As an illustration Fig. 11 shows timeseries and estimated autocorrelations for a subsample of Yq&$s when the invariant distribution of the MALA is either r|z or Yext\x. In the former case the autocorrelations die out much faster. Monte Carlo posterior means of the Gaussian process and the intensity surface are shown in Fig. 12. For comparison we have also included Biggie’s (1985) nonparametric kernel estimate of the intensity surface. For the uniform kernel given by the uniform density on a disc, the band width of the kernel can be chosen by minimization of an estimate of the mean square error (see Diggle, 1985, and Berman and Diggle, 1989). Instead of the uniform kernel we actually used a planar Epanecnikov kernel since the estimate obtained with this kernel has a more suitable smooth appearance. The band width 0.089 for the planar Epanecnikov kernel was obtained by calibration of the chosen band width for the uniform kernel as suggested in Diggle (1985). The posterior mean of the intensity surface is quite peaked since the minimum and maximum values are 56.86 and 2724.26. This is not surprising recalling the heavytailedness of the log Gaussian distribution. The kernel estimate is less peaked with a range 0-716.37. Integration of the Monte Carlo posterior mean of the intensity surface and the kernel estimate over the unit square yields 125.66 and 126.01, respectively, so the expected number of points for the inhomogeneous Poisson processes with intensity surfaces given by the posterior mean respective the kernel estimate are practically equal and very near to the observed number of points (126). We have also in Fig. 12 included a plot of the logari thm to the Monte Carlo posterior mean of the intensity surface as this gives a better impression of the variability for intermediate values of the posterior mean. The application of MCMC also facilitates assessment of posterior uncertainty. The estimated posterior variance of the Gaussian process, Var(Yij\x), (i,j) e I, is shown in the left plot in Fig. 13. The largest variance is 1.76 whilst the smallest is .69. By comparing this plot with the Monte Carlo posterior mean of the intensity surface in Fig. 12, we see that the posterior variance is smallest where the posterior mean is largest and vice versa. For the posterior distribution of the intensity surface we have further for selected cells Dij estimated the 10% and 90% quantiles. These credibility intervals are shown in Table 2 when (i,j) are given by (13a, 136), a, b = 1,..., 4, and (14,36). The credibility interval for Du,zo is largest as this cell is situated in a peak of E(Aij\x). As an illustration of the simulation method on the extended lattice and the effect of wrapping the extended Gaussian field on a torus, the right plot in Fig. 13 shows

29 Fig. 11. Example 1. Upper plots: Umeseries (left) and estimated autocorrelations (right) for fgs.sslx obtained by transforming a subsample of Fgg.gslx (spacing = 10) generated by MALA. Lower row: Same as upper row but no transformation is used, i.e. fgs.gsk is generated directly by MALA. the Monte Carlo posterior mean of Yext- Notice that outside the original field and away from the boundaries the estimated posterior mean is constant and equal to the unconditional mean. Finally, we have considered MAP-estimation of the Gaussian process and the intensity process. The extended matrix K was strictly positive definite, and y^fp and X^fp were obtained by iterating the discrete gradient ascent algorithm until the gradient was practically zero (i.e. until its coordinates were numerically less than 10-5). The minimum and maximum values of yMAP are 3.53 and 7.96, while the corresponding values of the estimated posterior means E(Yij\x), (i,j) G I, are 3.24 and 7.59. Actually, yMAP is very similar to these posterior means, so we have omitted the plot of yMAP. Since max XfjAP « 10~15 the MAP-estimate is clearly a totally unreasonable estimate of the intensity surface. This may as noted before be due to the skewness and heavytailedness of the log Gaussian distribution combined with the fact that tire intensity surface is a random field of correlated log Gaussian random variates. □ In Example 1 the posterior mean and the nonparametric kernel estimate gave very different estimates of the intensity surface. To study these estimators under known conditions we simulated a point pattern on the unit square from the log Gaussian Cox process with exponential correlation function and parameters /r = 4, a1 — 2, (3 = .1.

30 Fig. 12. Example 1. Upper left plot; Monte Carlo posterior mean of the Gaussian field. Upper right plot: Monte Carlo posterior mean of the intensity surface. Lower left plot: Logarithm to the upper right plot. Lower right plot: Diggie ’s nonparametric kernel estimate of the intensity surface.

Fig. 13. Example 1. Left: Monte Carlo posterior variance of the Gaussian field on the ori ginal lattice. Right: Monte Carlo posterior mean of the Gaussian field on the extended lattice. Using the same procedure as in Example 1, Section 7, the estimates of p, a2, fi are 3.78, 2.46, .077, and the procedure for choosing the bandwidth yields .061. Plots of the true intensity surface, the Monte Carlo posterior mean of the intensity surface under the estimated model, and the kernel estimate are shown in Fig. 14. In this case the two

31 i = 13 i = 26 i = 39 i = 52 7.0-181.3 26.6-546.2 7.4-190.4 28.6-541.3 3 = 52 78.1 235.0 83.0 234.5 7.9-207.5 8.8-218.5 8.0-215.3 8.0-207.3 j = 39 89.9 95.7 93.0 89.0 6.5-173.3 11.5-282.0 11.6-282.9 6.7-170.4 j = 26 74.3 119.2 119.8 73.2 17.2-380.3 12.9-311.4 10.5-248.4 5.0-138.2 j = 13 163.2 130.6 105.2 60.7 (i, j) = 363.3-3733.3 (36,14) 1734.1

Table 2. Example 1. Estimated 80%-credibility intervals and posterior means of the intensity surface at selected cells C,j organized in accordance with Figure 11, where (i,j) = (0,0), (0,64), (64,0), (64,64) correspond to the lower left, upper left, lower right, and upper right cells.

Fig. 14. Simulation study. Upper left plot: True Gaussian surface. Upper right plot: True intensity surface. Lower left plot Monte Carlo posterior mean of the intensity surface. Lower right plot: Biggie ’s nonparametric kernel estimate.

32 estimates look much more similar than in Example 1. The large difference between the intensity surface estimates in Example 1 may be explained by the considerably larger bandwidth which was used in the kernel estimate in Example 1, and which yielded an oversmoothed estimate of the intensity surface. In Fig. 14 the range of the true intensity surface, the Monte Carlo posterior mean, and the kernel estimate are .44-7802.62,21.27- 7653.15, and 0-4898.43, respectively. Integration of the estimates give 150.06 for the posterior mean and 147.69 for the kernel estimate, while the integral of the true surface is 158.32, and the true and estimated intensity are p = 148.41 and p = 150. In conclusion, at least for the particular cases of Example 1 and the simulation study, the posterior mean seems to be the better estimate.

Acknowledgments This research will be a part of the second and third authors Ph.D. Dissertations. It has been funded by the Danish Informatics Network in the Agricultural Sciences, the Danish Natural Science Research Council, NORFA and the Research Council of Norway. Antd Penttinen kindly provided the data studied in Examples 1 and 2. We thank Anders Brix, Poul Svante Eriksen, Peter Green, Jprgen Hoffmann-J0rgensen, Steffen L. Lauritzen, Antti Penttinen, Gareth Roberts, Mats Rudemo, Dietrich Stoyan, the editor and two anonymous referees for helpful comments.

Appendix A Sample path continuity of Gaussian processes The following lemma gives a simple condition for the existence of an almost surely continuous modification of a Gaussian process Y = (3^)s6Rd with covariance function c : Rd x R1* —> R.

Lemma 1. Suppose that there exist constants k, K > 0 and a > 0 such that

c(s, s) + — 2e(s,t) < JtT||s —1||“ for all s,t 6 with ||s — f|| < k. Then Y has a modification for which almost all sample paths are continuous.

Proof. Let B be any compact subset of Ud , define the pseudo-metric Sy on Rd by

<5y(s, t) = Vc(s, s) + c(t, t) — 2c(s, t), s,t G Rd, and let N(B, Sy, e) denote the minimal number of Sy balls of radius e required to cover B. We show first that (K,)ses has an almost surely continuous modification. This result follows from Theorem 11.17 in Ledoux & Talagrand (1991) by showing that

n (Al) 0

33 for some arbitrary k > 0. Choose k < (Kk0)1/2. The set B is contained in [—6, b]d for some b > 0, and we have that ||s —1|| < (e2/itT)1/,a => Sy(s,t) < e for all e < k. The number of Euclidian balls of radius r required to cover [—6, b]d is bounded by (by/d/r) d so that N(B,8y,e) < Ke~2d /a, 0 < e < k, where K = (bVdK~^ a)d . Since /jQ ^ \/— log ede = a /tt/2 it is easy to see that (Al) holds.

Let now Yn, n = 1,2,... be almost surely continuous modifications of (ys)s6[_n;n]d, n = 1,2,..., and let fi be the underlying sample space on which Y is defined. Then there is a null set Nq C such that Yf(aj) = Ysm(v) for any n > m > 1, s G [—to, m]d and w E f2 \ Nq. An almost surely continuous modification Y of Y is thereby obtained by defining Ys(lj) = Y™^s\cj ) for all w E Q.\ Nq and s E where n(s) > 1 is determined by s E [—n(s), n(s)]d \ [—n(s) +1, n(s) — l]d.D

Appendix B Geometrical ergodicity of the truncated MALA Below we prove that the truncated MALA discussed in Section 8 is geometrically ergodic when 0 < h < 2. For simplicity and without loss of generality we shall assume that fi = 0. Letting 7 = 7Q, then by (20) the logarithm of the posterior density is

log /(7|z) = const(rc) - ^[|7||2 + nf - ^ e^Aij (ij)eh;,

where n = in ij)(ij)eicx,’ and by (21) the truncated gradient is

V(7f"- = -7 + (n-a(7))Q'

where

In the following g will denote a measurable function mapping Ud into R. The results concerning geometrical ergodicity are the following:

Theorem 4. When 0 < h < 2 the truncated MALA is Vs-uniformly ergodic for 14(7) = exp(s||7||) and any s > 0, i.e, there exist 0 < Rs < 00 and 0 < ps < 1 such that for any 7 E Ud ,

< Vs(j)Rsp™ , TO > 1. V,

Here denotes the m-step transition kernel of the truncated MALA and for a signed measure (i, UmIIv ; = suP|9 |

Corollary. Suppose that |g(7)| < exp(s||7||/2), 7 ERd, and that (F;)/>o is generated from the truncated MALA where 0 < h < 2 and the initial distribution o /Fq is arbitrary.

34 m Define the Monte Carlo approximation = Y^9(Tl)/m of the mean £ = Bg(ro) calculated for the stationary chain. Assuming first that the density of To is /(-]%), then m o-g ■= nJimom,Var(£m) = Var(g(To)) + 2^2Cov(g(To),g(Ti)) < oo. m-foo ;=1

Moreover, if ag > 0, we have a independently of the chosen initial distribution ofTq:

(€m — f)/\Af/m ^(0,1) asm-* oo.

Proof. Let 0(7) =7 + (h/2)'7(‘y)trunc and let q{7, •) be the density of JVd(c(7), hlf). Then (36) in Roberts and Tweedie (1996b) holds since n — R(j)Q* is bounded. Since the proof of Theorem 4.1 in Roberts and Tweedie (1996b) is also applicable in our situation, the geometrical ergodicity then follows if we can show that the truncated MALA ‘converges inwards’. More precisely let as in Roberts and Tweedie (1996b), J(7) = {7 ; ||7|| < ||7||} and ^(7) = {7 : /(7l*k(7;7) < Then we need to show that

/ 9(7) 7)^7 -»• 0 as ||7|| -> 00. 7a (7)A7(7)

Here A denotes symmetric difference, i.e. AAB = [A \ B] U [B \ A]. Note that for any e > 0 we can choose S€ such that J q(7,7)d7 < e where RdWt) Be{l) = (7 : Il7 ~ c(7)|| < Se}. Thus

/ 9(7,7)^7 < / 9 (7,7)^ + e- V A(7)A/(7) 7 (A(7)A/(7))nB,(7) Since

7 e Be(7) => ||7|| < H7 - c(7)|| + ||c(7)|| < |1 - A/2|||7|| + Se + constant we have for ||7|| sufficiently large that Be(7) C 1(7) so that D Be(7) = Be(7) \ A(7) . It is therefore enough to show that when ||7|| is sufficiently large then 7 6 Be(7) implies that 7 G A(7). It is straightforwardly seen that the inequality which defines the set A(7) is equivalent to Ji + J2 + J3 + J4 + J5 > 0 where Jl = m(ll7l|2 - II#), h = (h/8) (||(n — B(7))<5*|| 2 — ||(»-.r(*))q *

Jz= 53 A'j (e7u ~ e7,V) > = (7 - 7) (^(7) + -^(7)) /2, (ij)eiczt h = (V4) [7 (w - B (7) ) -j(n- B(7))*j

35 and 7 = 7Q. If 7 6 ^(7) and ||7|| is sufficiently large, then Ji + J2 > 0 since J2 is bounded and 0 < h < 2. We therefore just need to show that J3 + J4 + J5 > 0 for 7 G Be(7) and ||7|| sufficiently large. If ||7|| 00 then also ||7|| —* 00 because Q is of full rank. Furthermore, if 7 G Be(7), then 7^ = (1 — h/2)jij + {v(-y)Q)ij where ^(7) is a uniformly bounded vector. Since 0 < h < 2, we have therefore that %• —> 00 implies that 7^ —> 00, while 7,j —*■ —00 implies that 7^ —* —00, where in both cases |%j —> 00 at a rate faster than |7tJj, since %■ — 7^ is of the order (h/2)j,j asymptotically. Let now B = {{i.j) G Iext : 11%-1| yA 00}. Then J3 + J4 + J5 can be written as

E (e%' - e^) + (%• - itj) [HAije^/iH V e™) +

HAijeia/(H V e^«)]/2 + (A/4)(n,.,- - HAije^/(h V A ) ) -

%'N - tfA;ie^/(tfve^'))]} + H(B,7,i) where 7,7) is a finite sum of bounded terms. Since for each (i,j) G Iext \ B the corresponding term {...} in the sum converges to 00 when H7II -4 00, the proof of the theorem is completed. The corollary is a direct consequence of Theorem 4.1 in Roberts and Tweedie (1996a). □

References

Adler, R. (1981). The Geometry of Random Fields. Wiley, New York.

Atjas, E. and Gasbarra, D. (1994). Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler. Statistica Sinica 4, 505-524.

Baddeley, A.J. and Van Lieshout, M.N.M. (1993). Stochastic geometry models in high- level vision. In K. Mardia and G.K. Kanji (eds.) Statistics and images, Advances in Applied Statistics, a supplement to J. Appl. Statist. 20, 231—256.

Baddeley, A.J. and Van Lieshout, M.N.M. (1995). Area-interaction point processes. Ann. Inst. Statist. Math. 47, 601-619.

Baddeley, A.J., Van Lieshout, M.N.M. and Mpller, J. (1996). Markov properties of cluster processes. Adv. Appl. Prob. (SGSA) 28, 346-355.

Baddeley, A. J. and Mpller, J. (1989). Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89-121.

Bartlett, M.S. (1964). Spectral analysis of two-dimensional point processes. Biometrika 44, 299-311.

36 Berman, M. and Diggle, PJ. (1989). Estimating weighted integrals of the second-order intensity of a spatial point process. J. R. Statist. Soc. B 51, 81-92.

Besag, J.E. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. B 36, 192-225.

Besag, J.E. (1977). Some methods of statistical analysis for spatial data. Bull. Intemat. Statist. Inst. 47, 77-92.

Besag, J.E. (1994). Discussion of the paper by Grenander and Miller. J. R. Statist. Soc. B 56, 591-592.

Christakos, G. (1984). On the problem of permissible covariance and covariogram models. Water Resources Research 20, 251-265.

Christakos, G. (1992). Random Field Models in Earth Sciences. Academic Press, San Diego.

Cressie, N. (1991). Statistics for Spatial Data. Wiley, New York.

Daley, D.J. and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. Springer-Verlag, New York.

Davis, P.J. (1979). Circulant Matrices. Wiley, New York.

Diggle, PJ. (1983). Statistical Analysis of Spatial Point Patterns. Academic Press, London.

Diggle, PJ. (1985). A kernel method for smoothing point process data. Appl. Statist. 34, 138-147.

Diggle, PJ. and Milne, R.K. (1983). Bivariate Cox processes: Some models for bivariate spatial point processes. J. R. Statist. Soc. B 45, 11-21.

Gelfand, A.E. and Carlin, B.P. (1991). Maximum likelihood estimation for constrained or missing data models. Research Report 91-002, Division of Biostatistics, University of Minnesota.

Georgii, H.-O. (1988). Gibbs Measures and Phase Transitions. Walter de Gruyter, Berlin.

Geyer, C J. (1994). On the convergence of Monte Carlo maximum likelihood calcula­ tions. J. R. Statist. Soc. B 56, 261-274.

Grandell, J. (1976). Doubly Stochastic Poisson Processes. Lecture Notes in Mathemat­ ics, 529. Springer-Verlag, Berlin.

37 Granville, V. and Smith, R.L. (1995). Clustering and Neyman-Scott process parameter simulation via Gibbs sampling. (Manuscript) Statistical Laboratory, University of Cambridge.

Grenander, U. and Miller, M.I. (1994). Representations of knowledge in complex systems (with discussion). J. R. Statist. Soc. B 56, 549-603.

Haggstrom, O., Van Lieshout M.N.M. and Mpller, J. (1996). Characterisation results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Research Report R-96-2040, Department of Mathematics, Aalborg University. (Submitted for publication.)

Heikkinen, J. and Arjas, E. (1996). Nonparametric Bayesian estimation of a spatial Poisson intensity. Preprint 20, Department of Statistics, University of Jyvaskyla. (Submitted for publication.)

Jensen, J.L. and M0ller, J. (1991). Pseudolikelihood for exponential family models of spatial point processes. Annals of 1, 445-461.

Karr, A.F. (1991). Point Processes and Their Statistical Inference. (2nd ed.) Marcel Dekker, New York.

Kuuluvainen, T., Penttinen, A., Leinonen, K, and Nygren, M. (1996). Statistical opportunities for comparing stand structural heterogeneity in managed and primeval forests: an example from boreal spruce forest in southern Finland. Silvia Fennica 30, 315-328

Lantuejoul, C. (1994). Nonconditional simulation of stationary isotropic multigaussian random functions. In Geostatistical Simulations, (eds. M. Armstrong andP. Dowd) Kluwer Academic Publishers, Dordrecht

Lawson, A.B. (1993). Discussion contribution to the The Gibbs sampler and other Markov chain Monte Carlo methods. J. R. Statist. Soc. B 55, 61-62.

Ledoux, M. and Talagrand, M. (1991). Probability in Banach spaces. Springer-Verlag, Berlin.

Lotwick, H.W. and Silverman, B.W. (1982). Methods for analysing spatial processes of several types of points. J. R. Statist. Soc. B 44, 406-413.

Mantouglou, A. and Wilson, J. L. (1982). The turning bands method for simulation of random fields using line generation by a spectral method. Water Resources Research 18 (5), 1379-1394.

Matdm, B. (1960). Spatial Variation. Meddelanden fran Statens Skogsforskningsinstitut, Vol. 49 (5). Statens Skogsforskningsinstitut, Stockholm.

38 Matheron, G. (1973). The intrinsic random functions and their applications. Adv. Appl. Prob. 5, 439-468.

M0ller, J. (1994). Markov chain Monte Carlo and spatial point processes. Research Report 293, Department of Theoretical Statistics, University of Aarhus. To appear in O.E. Bamdorff-Nielsen et al. (eds.): Proc. Seminaire Europeen Statistique Toulouse 1996 “Current trends in stochastic geometry with applications ”, Chapman and Hall.

Ogata, Y. and Katsura, K. (1988). Likelihood analysis of spatial inhomogeneity for marked point patterns. J. Am. Statist. Ass. 40, 29-39.

Pentdnen, A., Stoyan, D. and Henttonen H.M (1992). Marked point processes in forest statistics. Forest Science 38 (4), 806-824.

Preston, C. (1976). Random Fields. Lecture Notes in Mathematics, 534. Springer- Verlag, Berlin.

Rathbun, S.L. and Cressie, N. (1994). A space-time survival point process for a longleaf pine forest in Southern Georgia. J. Am. Statist. Ass. 89, 1164-1174.

Ripley, B.D. (1977). Modelling spatial patterns (with discussion). J. R. Statist. Soc. B 39, 172-212.

Ripley, B.D. (1987). Stochastic Simulation. Wiley, New York.

Ripley, B.D. (1988). Statistical Inference for Spatial Processes. Cambridge University Press, Cambridge. Roberts, G.O. and Rosenthal, J.S. (1995). Optimal scaling of discrete approximations to Langevin . Research Report no 95-11, Statistical Laboratory, Cambridge University.

Roberts, G.O. and Tweedie, R.L. (1996a). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 95-110.

Roberts, G.O. and Tweedie, R.L. (1996b). Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli 2, 341-363.

Stoyan, D., Kendall, W.S. and Mecke, J. (1995). Stochastic Geometry and Its Applica­ tions. 2nd ed. Wiley, Chichester.

Stoyan, D. and Stoyan, H. (1994). Fractals, Random Shapes and Point Fields. Wiley, Chichester.

Thomas, M. (1949). A generalization of Poisson ’s binomial limit for use in . Biometrika 36, 18-25.

39 Wackemagel, H. (1995). Multivariate Geostatistics. Springer-Verlag, Berlin.

Widom, B. and Rowlinson, J.S. (1970). New models for the study of liquid-vapor phase transitions. J. Chem. Physics 52, 1670-1684.

Wood, A.T.A. and Chan, G. (1994). Simulation of stationary Gaussian processes in [0, l]d. J. Computational and Graphical Statistics 3, 409-432.

Yaglom, A.M. (1986). Correlation Theory of Stationary and Related Random Functions I. Springer, Berlin.

Jesper M0ller, Department of Mathematics, Aalborg University, Fredrik Bajers Vej 7E, DK- 9220 Aalborg 0, . Email: [email protected] Anne Randi Syversveen, Department of Mathematical Sciences, The Norwegian Uni­ versity of Science and Technology, N-7034 Trondheim, Norway. Email: [email protected] Rasmus Plenge Waagepetersen, Department of Theoretical Statistics and Operations Re­ search, Departments of Mathematical Sciences, University of Aarhus, DK-8000 Arhus C, Denmark. Email: [email protected]

40 Appendix: A Method for Approximate Fully Bayesian Analysis of Parameters A METHOD FOR APPROXIMATE FULLY BAYESIAN ANALYSIS OF PARAMETERS

Anne Randi Syversveen and Havard Rue

Department of Mathematical Sciences Norwegian University of Science and Technology N-7034 Trondheim, Norway

Key Words: marked point process; parameter estimation; Markov chain Monte Carlo; reservoir characterization;

ABSTRACT

Stochastic modeling of the geology in petroleum reservoirs has become an impor­ tant tool in order to investigate flow properties in the reservoir. The stochastic models used contain parameters which must be estimated based on observations and geological knowledge. The amount of data available is however quite limited due to high drilling costs etc., and the lack of data prevents the use of many of the standard data driven approaches to the parameter estimation problem. Mod­ em simulation based methods using Markov chain Monte Carlo simulation, can however be used to do fully Bayesian analysis with respect to parameters in the reservoir model, with the drawback of relatively high computational costs. In this paper, we propose a simple, relatively fast approximate method for fully Bayesian analysis of the parameters. We illustrate the method on both simulated and real data using a two-dimensional marked point model for reservoir characterization.

1 INTRODUCTION

One problem in evaluation of petroleum reservoirs is to characterize the geology in order to investigate the flow properties. An example is location of shales in a background of sand, where shales create barriers for fluid flow, and sandstone

E—maiV.ajmeiandfiimf.■unit.no, havard.rueOimf .unit.no

1 -—~~“a ^~***\

FIG. 1: Outcrop from the San Juan basin, Rocky Mountains, showing a cross section of meandering channel sandstone embedded in lower to upper delta plain mudstone, siltstone and coal.

allows fluid to flow easily. Sources for information about reservoir characteristics can be well observations, seismic data, and previous production history. In the following we will consider only well observations. Usually, the number of wells is small, due to the high drilling costs. Therefore, data will be scarce. To illustrate what the geology may look like, we show in Figure 1 an outcrop taken from the Safari (1995) database collected by Norsk Hydro, Statoil, Saga Petroleum and The Norwegian Petroleum Directorate. The outcrop has objects of permeable sandstone in a background of non-permeable shale, and is believed to be an analog to a non-observable oil reservoir.

Haldorsen and Lake (1984) described a simple marked point model for shales in a vertical cross section of a sand matrix. The model had several severe weak­ nesses; it did not contain spatial interaction between the shales, and it was not possible to condition on observations from more than one well for each shale. Re­ cently Syversveen and Omre (1997) presented an extended model with pairwise spatial interaction between objects. The model, which is described in Section 2, can handle conditioning on observations from more than one well for each object. However, the important problem of estimation of parameters in the model, was not completely treated. Some of the parameters are naturally estimated from data

combined with geological knowledge, and in this paper we will consider estimation of these parameters. Prior distributions for the parameters are specified based on geological experience and combined with the observed data within the Bayesian framework. Because data are so scarce, we need the complete posterior distri­ bution for the parameters and not only a point estimate, to judge how the data influence the prior distribution. As a consequence, more classical methods for pa­ rameter estimation in point processes (see for example the review in Higgle et al. (1994)), are not appropriate for our problem.

2 VJl W2

Zt(w2) Zr(ly l)Z'--y/

H (Cbc2) L

Zb{w2) Zb (tv i)

FIG. 2: Model for a single object.

We can however do a fully Bayesian analysis of the parameters by use of modern Markov chain Monte Carlo (MCMC) methods, which are iterative simula­ tion based approaches. In Section 3 we will give an introduction to these methods. MCMC methods are often computationally intensive, and in Section 4 we propose a relatively fast and approximate method to do a full Bayesian analysis. An im­ portant assumption is a quite low information content in the data, which is fulfilled in our problem where the data are scarce. The proposed method is presented in Section 4 and tested in Section 5 with simulated data and the outcrop in Figure 1.

2 THE MODEL

We will now present a two-dimensional model for cross sections of the reservoir, like the outcrop shown in Figure 1. Practical use of such a model involves an extension to the three-dimensional space, but we will not discuss this here. The framework is a marked point process where the object is the “mark” and its location the “point”.

Each object is modeled as a rectangle with center position C = (C1, C2), length and height S = (L, H), and a deformation on top and bottom represented by a pair of random functions Z(-) = (Zb {•), •Z'r(-))- We write U = (C, S,Z(-)), see Figure 2. The height of the rectangle is equal to the expected height of the object, while the length of the object equals the length of the rectangle. The

3 deformation Z (•) on top and bottom, are Gaussian random fields present in order to fulfill the conditioning of the data. As the deformation Z (•) is not important in the parameter estimation problem, we will not discuss it further.

The prior pdf for N objects in a domain is defined in the following way: We let 0i = (c},c-,li,hi) denote the rectangle associated with object number i, and

Zi(-) its top and bottom positions. The joint density for the N objects can be written

f(0,n,z(-)) = f(O,n)f(z{-)\0,n), (1) where 0 = (0j,... , 0n) and z(-) = (zi (•),... ,zn(-)). (We use lower case letters to represent realizations of random variables.) The last term f(z(-)\0, n), is Gaussian random fields (see Syversveen and Omre (1997)). The first term f{0, n) is a marked point process with pairwise interaction (Stoyan, Kendall, and Mecke 1995) of the form ^ n n faA 0, n) = -----r exp{^ h(c},li) + b 2(cf, hi) + ^2Yl 4(C{, Cj) + na} {,) «=l j=i i

-hatr (0,n)/c(a,r). (2)

Here c(a,r ) is an unknown function of the parameters a and r, and the term

&i(cj, Zj) + b 2(cf, hi) represents the marginal density for a single object if no in­ teraction was present. The function dr(cj, cj) is a pairwise interaction function between center-points of objects, defined as

ln(|x‘ r^1) if |*i - Xj\ < r d r(xi, Xj) (3) 0 if |Xi — Xj\ > r where \x{—xj\ is the Euclidian distance between x, and Xj. It may have been more natural to relate the interaction between the objects to the size of the objects and not only center-points. However, we believe that interaction between center-points models the interaction sufficiently, at least to some first order. The parameter a in equation (2) is related to the intensity of objects, increasing the value of a yields an increasing (expected) number of objects.

The parameters to be estimated are the intensity parameter o and the inter­ action parameter r. The data are the dimension of the rectangular reservoir and exact observations of the reservoir along the transects of a number of wells. It is

4 known which observations that belong to the same object. Observations of depth to top and bottom position of objects are contained in the vector o. Note that neither 0 nor n can be observed directly. The variable 0 is not observable because of the way it is defined in the model; center-points and expected height of objects are never observed. The number of objects n is unobservable since only well obser­ vations are available, not the whole reservoir. This means that the model contains hidden variables, which makes the parameter estimation problem non-standard.

Taking observations into account, a posterior pdf for Z, 0, and N can be defined. See Syversveen and Omre (1997) for a definition of the likelihood function. The posterior pdf for ®,N is written as

fa,r{0,n\o) =f(0,n)f(o\0,n)/c' o(a,r)

= exp | y>i(c tUjo) + 62(c?, Ai|o)) i=1 71 1 + 5353 dr(%, ci) + na}I(o\n)/c 0(a, r) j=1 i

The function I(o\n) is an indicator function which is equal one when conditioning is fulfilled and zero otherwise, and d 0(a, r) and c0(a, r) are unknown normalizing functions.

When a and r are given constants, we use the Metropolis-Eastings algorithm (Metropolis et al. 1953, Hastings 1970) to simulate from the distribution defined by the pdf (4). Proposals and acceptance rates are described in detail in Syversveen and Omre (1997). When a and r are unknown, they can be treated as stochastic variables and we would wish to simulate from the joint density of 0, N, A and R. This is however difficult, since the normalizing function cD(o, r) is unknown. We return to this in Section 3.2.

3 PARAMETER ESTIMATION BY MONTE CARLO

In this section we review two parameter estimation methods based on Markov chain Monte Carlo (MCMC) simulations. By the first method we obtain the maximum

a posteriori (MAP) estimator by doing a minor modification of the maximum likelihood estimator. The second method gives realizations from the posterior

5 distribution for the parameters, and estimators such as the posterior mean can easily be obtained.

The basic idea behind MCMC simulation is to simulate an ergodic Markov chain, having the desired distribution w(x) as its limiting distribution. The transi­ tion kernels can be defined in several ways, leading to different algorithms. Good review papers on MCMC are Besag et al. (1995) and Tierney (1994). Green (1995) describes how to do MCMC when the state space is a union of spaces of different dimensionality, and covers MCMC for spatial point processes discussed by Geyer and M0ller (1994).

3.1 The MAP estimator

The MAP estimator can be obtained by a minor modification of the maximum likelihood estimator, so we start with the latter estimator. Monte Carlo maximum likelihood (MCML) is described by Geyer and Thomp­ son (1992) and Geyer (1994b). The method is well suited for normalized families of densities with pdf of the form (2), where c(o,r) is the normalizer of the fam­ ily and hatT (0,n) is the unnormalized density. Note that c(a,r ) is an unknown function. MCMC methods can be used to generate realizations from this kind of distributions, as mentioned in Section 2.

Suppose first that we can observe (6, n). Let the log-likelihood corresponding to an observation of (0,n) be the log-likelihood ratio against an arbitrary fixed parameter point (ao,ro),

The last term is equal to ln(l£a0|roU^'^^n')]), since

f hqtir(0,n) \ _ r hq,r(0,n)ha>r {6 ,n) c(o,r) Ea. (0, n)dfi( 0, n) = (5) ''hlao,ro00>ro ((S,n)/0; “■! ' J "-ao,ro/tOOjro t,(0,n) c(oo,ro) This gives the following Monte Carlo approximation of Z(a, r) ^oo.ro 0 -Kit (6) where (0i,ni), (02,^2),... are samples from /aOiro (0,n) generated by MCMC. The Monte Carlo approximation ia0iro («,r) converges almost surely to l(a, r) for any fixed (a, r) when n goes to infinity.

6 However, when we have hidden variables as described in Section 2, the first term in (6) can not be computed because (0,n) is not fully observed. We only

observe o, which is a part of (6 ,n). Instead the first term must be estimated by simulations from the posterior distribution (4). The log-likelihood is now

- ’-K" [&(r«To=»)]) - -k - [£S%>])«

where the first expectation is obtained by integrating over the unobserved part of (0, n) and the second expectation follows from equation 5. The Monte Carlo approximation becomes

1 n VrW,«?|Q) \ ,V' hq,r(0ti^i) \ ( ) Wo(°»r)=ln(-53 hao,rM,nt\o)J \n£i^oo,ro(0»j ni) ' 8 t'=l

The first term contains samples of (©, AT) from the posterior distribution (4), and the second term contains samples from the prior distribution (2), all sampled with parameter values (ao, tq).

When the parameters a, r are given a prior distribution f(a, r), the MCML method can be extended to give the maximum a posteriori (MAP) estimator. Instead of maximizing the likelihood, we maximize the posterior density ratio f(a,r\o)/f(ao,ro\o) of (o,r) and (ao, ro). The expression to be maximized with respect to a, r is now

/I ” L,ro(a,r)=ln(-^] ha,r(0hn i\°) \ hao,ro(®i J ni\°)' X=1 (9) ha, r(0i,ni) \ , f(a,r) \ — In ha 0,r0 (Oi,ni)s V/(a0,r0)/

Note that /(ao, ro) can be omitted in the last term; it has been included as a normalizing term in order to get fao,ro( ao>ro) = 0 as in the MCML case.

3.2 Simulated tempering

Simulated tempering is a way to use MCMC methods to sample from the whole posterior distribution for parameters, converse to the MAP-estimator which only locates the mode of the posterior pdf. The goal is to sample from the joint posterior

7 distribution of (6 ,n) and the parameters (o,r). Then samples from the posterior distribution for (a, r) are obtained by collecting the simulated values of (a, r). The joint posterior probability density function for (9,n) and (a,r ) is

f{9,n,a,r\o) oc /a,r(S,n|o)/(a,r) = hatT (9,n\o)f(a,r)/c0(a,r) using equation (4). Since c0(a,r) is unknown, it must be estimated. This can be done by using reverse logistic regression, see Geyer (1994a). The idea is to sample from several distributions with different values of (a, r), and treat these as samples from a so-called mixture distribution.

The sampling from f(9, n, a, r|o) is done by MCMC with two types of transi­ tions, changing (9,n) or changing (a,r). In practice, c(a, r) must be precomputed on a grid, since the sampling procedure will otherwise be too time-consuming.

Higdon et al. (1995) and Weir (1995) describe fully Bayesian analysis for images constructed from emission computed tomography data following the guide­ lines above.

4 APPROXIMATE FULLY BAYESIAN ESTIMATION

We have in Section 3 discussed how to use modem Markov chain Monte Carlo methods to do Bayesian inference for the two parameters of interest in the model described in Section 2, the intensity and interaction parameter. The fully Bayesian approach is preferred in our case, as it gives a precise description of how the (limited) information contained in the observed data will influence the posterior pdf of the parameters. From a computational point of view however, the MAP estimator is preferable, as it is simpler and faster to compute compared to the fully Bayesian analysis through the more complicated simulated tempering approach.

To describe our approximate method for a fully Bayesian analysis, assume for a moment that the parametric family posterior pdf for the parameters of interest is known. Further assume the parameters in the pdf can be determined from the local characteristics at the mode. As an example, let the posterior distribution belong to the one-dimensional Gaussian family. The two parameters, the expectation and the variance, are then uniquely determined respectively from the mode and the second derivative of the logarithm of the density, using the fact that In f(x) = —1/cr2.

Our method for an approximate fully Bayesian analysis of the parameters

8 are based on the assumption about a known parametric family for the posterior pdf of the parameters of interest. With low information content in the observed data, it is a quite accurate assumption to assume that the posterior pdf belongs to same family as the prior pdf. We compute the MAP estimator and then estimate the parameters in the posterior pdf from estimated local characteristics to obtain the full posterior pdf for the parameters. The posterior pdf can of course be used later to obtain a point estimate by minimizing a suitable loss function, for example to obtain the posterior mean (which may or may not be the mode).

The big advantage with our approach is the reduction of computational complexity, both with respect to coding and computational costs. However, the assumption of a known parametric family of the posterior pdf must hold, at least approximately. In the next Section we will illustrate our approach on the pre­ sented model analyzing a simulated data-set and the outcrop in Figure 1. We will also demonstrate how we can check the validity of the assumption of a known parametric family for the posterior pdf.

5 EXAMPLES

In this section we will illustrate how to apply approximate fully Bayesian estima­ tion in the reservoir characterization model described in Section 2, using the idea outlined in Section 4. The parameters to be estimated are the intensity parameter a and the interaction parameter r. We assign a joint Gaussian prior distribution to the parameters for two reasons. First, it has an easy interpretation for the geologists specifying the priors. Second, this choice simplifies the presentation of our main idea.

The parameters in the prior distribution can be set as follows. Based on simulations from the prior model, the geologist can evaluate the realizations and then gain insight into how the parameters affect the distribution of number of objects and how they distribute within the window of interests.

We will now see how we can approximate the posterior pdf for the param­ eters by a bivariate Gaussian pdf where the parameters are estimated by finite differences which are in turn estimated by simulation. The simulated logarithm of

9 the posterior ratio (9) can be written as

- , . _. I" 1 rk exP QZ Z) ^r(cifc) c*j) + n^a) i

m (10) Inr 1 y- expCCXXCcfc.CjQ+Wja) i | f(a,r) 1 Lm ^ exp(EE^o(c; ;,cj) + nia0)J L/(o 0,r0)J’ where the first term comes from simulations from the posterior model for (Q,N) and the second term from simulations from the prior model. In the Gaussian distribution, the mean equals the mode, so the posterior mean for a and r are the values that maximize (10). The posterior covariance matrix is determined as follows. Let the covariance matrix be denoted

V Pala2 O'! /

Let /(a,r|o) be the Gaussian posterior density for a and r. Then we have that

d 2 log f (a, r\o) -1 (11) da 2 "4(l-f:)' d2log/(o,r|o) — 1 (12) dr 2 - 4(i-fY 92 log f(a,r\o) p (13) dadr

Numerical values for the second derivatives are estimated from the simula­ tions using standard finite difference approximations,

---a°g°2 ’ ) =^2$»o,i-o(a - ^l)r) ~~ 2Zao.ro (air) + L0,r0(a + ^lir)] + 0(h\) (14)

■ - =^|[4o,ro( a)r - h) - 2ioo,ro(°) r) + 40,r0(air + h)] + 0(/l|) (15)

— a

— Lo.ro (° + hi,r — hi) — Lo.ro (« ~hi,r + hi)] (16)

+ 0(h2) + 0(^2) where the error terms are functions of fourth order derivatives. Under the assump­ tion that the posterior is Gaussian, the error terms are all zero, since fourth order derivatives are zero.

10 Setting the analytical expressions (11) - (13) for the second derivatives of the posterior pdf equal to the empirical estimates based on equations (14) - (16) respectively, we obtain an estimate for E.

The rate of convergence for MCML (for obtaining the MAP estimate) de­ pends on the variability in the terms in the two first sums in (10); the more variability, the slower convergence. See for example Geyer and Thompson (1992) for details on this point. We can therefore monitor the variability of each of the two terms in (10) to gain insight in how many samples from the Markov chain we need to get a reasonable estimate.

Expression (10) is only calculated for values of o and r on a grid spaced by h\ and to avoid storage problems. Since the posterior ratio is reliable only in a small area around (no,ro), we iterate until oq and ro maximize (10). If we want an estimate of the uncertainty in the estimate for E, we can repeat this procedure with these values for oq and ro and get an estimate of the Monte Carlo error.

The goodness of the approximation to the posterior distribution can be checked by comparing posterior density ratios /(a,r|o)//(ao,ro|o) for the (ap­ proximate) analytical expression with results from simulations, given by the ex ­ ponential of expression (10). The comparison can be done for different parameter values a, r, oq, and ro-

5.1 Analyzing a simulated data set

In our first example, we analyze a simulated data set where the parameters a and r are drawn from their prior and then the configuration of the objects are simulated from the model to get our petroleum reservoir. The parameters in the prior mean is [4.0 30.0]t and the covariance matrix is

X 5.0 100.0 J

It is reasonable to introduce positive correlation between a and r, because the intensity of objects is influenced by the interaction parameter. Figure 3 shows the realization at left, and the observations available at right. We drill four wells at horizontal positions 20, 40, 60, and 80 and observe top and bottom of the objects in the wells.

11 FIG. 3: To the left, realization from the prior model, regarded as the “true” reser­ voir. To the right, observations of top and bottom positions of objects collected from wells. Same symbol indicates same object.

We use the discretization parameters h\ = 0.1 and = 1.0 and collect 10000 samples, chosen as every 10th iteration after 1000 initial stabilizing iterations. This is repeated 10 times, all resulting in the same mode, and based on this, the posterior covariance matrix is estimated. The posterior distribution for a and r is then (approximately) Gaussian, with mean fi = [4.1 30.0]r and covariance

/ 0.62 ± 0.01 4.23 ± 0.09

V 4.23 ±0.09 66.4 ±2.1 with the uncertainty given as one standard deviation. We see that the mean is almost the same as in the prior distribution, but the data have modified the covariance matrix. The variances of a and r have decreased compared to the prior distribution; that is, the observations have reduced the uncertainty about the parameter values.

We have checked the goodness of the approximation to the posterior distri­ bution by computing the the posterior density ratio for various values of a and r both close to and more distant from oq and ro from simulations and from the es­ timated density. The simulated values are based on 10000 samples. Table I shows the estimated differences when

12 TABLE I: Posterior ratio for example with simulated data with uq = 2.8, ro = 18.0.

Diff. estimated density - simulations r\a 2.6 2.7 2.8 2.9 3.0 16.0 -0.021 -0.021 -0.009 0.018 0.061 17.0 -0.008 -0.013 -0.009 0.011 0.046 18.0 0.011 0.002 0.000 0.011 0.037 19.0 0.034 0.023 0.016 0.019 0.036 20.0 0.059 0.048 0.039 0.037 0.045

5.2 Analyzing a real data set

In our last example we apply our method on the outcrop shown in Figure 1. The outcrop is taken from the San Juan basin, Rocky Mountains, and shows a cross section of meandering channel sandstone embedded in lower to upper delta plain mudstone, siltstone, and coal. The objects are sand and the background is nonpermeable shale. The size of the formation is rescaled and taken to be 100 x 16 units, where one unit is about 11 meters. We drill three wells are at position 20, 48, and 58 units, and from these wells we collect observations. We chose the prior pdf for the two parameters as follows. We varied the parameters in the model in order to obtain a structure for realizations from the model comparable with other similar outcrops from the Safari database. This ended in the following choice for (a,r): mean [9.0 20.0]r and covariance

1.0 2.5 E = 2.5 25.0

We used the discretization parameters hi = 0.1 and kg = 1.0. From 150000 sam­ ples from the posterior pdf, we obtained the following estimate of the parameters of the posterior pdf: the mean [8.8 24]r, and covariance

1.06 2.61 E = 2.61 16.6

We have also compared the posterior density ratio obtained analytically and with simulations. The difference is small, which support our approximation to the parametric family of the posterior pdf.

13 ACKNOWLEDGEMENTS

We thank the Safari project for giving us permission to use data from the Menefee formation. The first author is funded by a PhD grant from The Research Council of Norway.

BIBLIOGRAPHY

Besag, J., P. Green, D. Higdon, and K. Mengersen (1995). Bayesian computation and stochastic systems (with discussion). Statistical Science 10(1), 3-66. Diggle, P., T. Fiksel, P. Grabaraik, Y. Ogata, D. Stoyan, and M. Tanemura (1994). On parameter estimation for pairwise interaction point processes. Int. Statist. Rev. 62(1), 99-117. Geyer, C. (1994a). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical report, School of Statistics, Univer­ sity of Minnesota. Geyer, C. (1994b). On the convergence of Monte Carlo maximum likelihood calculations. J. R. Stai. Soc. B 56(1), 261-274. Geyer, C. and J. Mpller (1994). Simulation procedures and likelihood inference for spatial point processes. Scand. J. of Stat. 21 (4), 359-373. Geyer, C. and E. Thompson (1992). Constrained Monte Carlo maximum likeli­ hood for dependent data. J. R. Stat. Soc. B 54 (3), 657-699. Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika 82(4), 711-732. Haldorsen, H. and L. Lake (1984). A new approach to shale management in field-scale models. Society of Petroleum Engineers Journal, 447-457. Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97-109. Higdon, D., V. Johnson, T. Turkington, J. Bowsher, D. Gilland, and R. Jaszczak (1995). Fully Bayesian estimation of Gibbs hyperparameters for emission computed tomography data. Technical report, ISDS, Duke University. To appear in IEEE-TMI. Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller (1953). Equation of state calculation by fast computing machines. Journal of Chem­ ical Physics 21, 1087-1092. Norsk Hydro, Statoil, Saga Petroleum, The Norwegian Petroleum Directorate (1995). Safari; Sedimentary Architecture of Field Analogues for Reservoir In ­ formation. Norsk Hydro, Statoil, Saga Petroleum, The Norwegian Petroleum Directorate.

14 Stoyan, D., W. Kendall, and J. Mecke (1995). Stochastic Geometry and Its Applications. Wiley. 2nd ed. Syversveen, A. and H. Omre (1997). Conditioning of marked point processes within a Bayesian framework. To appear in Scandinavian Journal of Statis­ tics. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). The Annals of Statistics 22(A), 1701-1762. Weir, I. S. (1995). Fully Bayesian reconstruction from single photon emission computed tomography data. To appear in Journal of American Statistical Association.

15