Julian Besag FRS, 1945–2010

Peter Green

School of Mathematics

23 August 2011 / ISI World Statistics Congress

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 1 / 57 Outline

1 Biography

2 Methodology

3 Applications

4 Last seminars

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 2 / 57 Biography

Julian Besag as a young boy

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 3 / 57 Biography A brief biography

1945 Born in Loughborough 1965–68 BSc Mathematical Statistics, Birmingham 1968–69 Research Assistant to Maurice Bartlett, Oxford 1970–75 Lecturer in Statistics, Liverpool 1975–89 Reader (from 1985, Professor), Durham 1989–90 Visiting professor, U Washington 1990–91 Professor, Newcastle-upon-Tyne 1991–2007 Professor, U Washington 2007–09 Visiting professor, Bath 2010 Died in Bristol

Visiting appointments in Oxford, Princeton, Western Australia, ISI New Delhi, PBI Cambridge, Carnegie-Mellon, Stanford, Newcastle-u-Tyne, Washington, CWI Amsterdam, Bristol

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 4 / 57 Biography

Julian Besag in 1976

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 5 / 57 Methodology Methodology

Spatial statistics Modelling, conditional formulations Frequentist and Algebra of interacting systems Digital image analysis Monte Carlo computation and hypothesis testing Markov chain Monte Carlo methods Exploratory data analysis

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 6 / 57 Methodology Spatial statistics: Modelling, conditional formulations Modelling, conditional formulations: key papers

Nearest-neighbour systems and the auto-logistic model for binary data. Journal of the Royal Statistical Society B (1972). Spatial interaction and the statistical analysis of lattice systems (with Discussion). Journal of the Royal Statistical Society B (1974). On spatial-temporal models and Markov fields. Proceedings of the 10th (1974) European Meeting of Statisticians (1977).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 7 / 57 Methodology Spatial statistics: Modelling, conditional formulations

The 1974 paper in JRSS(B) 192 [No.2,

Spatial Interactionand the StatisticalAnalysis of LatticeSystems

ByJULIAN BESAG UniversityofLiverpool [Read beforethe ROYAL STATISTICAL SOCIETY at a meetingorganized by the RESEARCH SECTION on Wednesday,March 13th,1974, Professor J. DURBIN in theChair] SUMMARY The formulationof conditionalprobability models for finitesystems of spatiallyinteracting random variables is examined.A simplealternative proofof theHammersley-Clifford theorem is presentedand thetheorem is then used to constructspecific spatial schemeson and offthe lattice. Particularemphasis is placed upon practicalapplications of themodels in plantecology when the variatesare binaryor Gaussian. Some aspectsof infinitelattice Gaussian processesare discussed. Methodsof statistical analysisfor lattice schemes are proposed,including a veryflexible coding technique.The methodsare illustratedby two numericalexamples. It is maintainedthroughout that the conditionalprobability approach to the specificationand analysisof spatialinteraction is moreattractive than the alternativejoint probability approach. Keywords:MARKOV FIELDS; SPATIAL INTERACTION; AUTO-MODELS; NEAREST-NEIGHBOUR SCHEMES; STATISTICAL ANALYSIS OF LATTICE SCHEMES; CODING TECHNIQUES; SIMULTANEOUSBILATERAL AUTOREGRESSIONS; CONDITIONAL PROBABILITY MODELS

Green (Bristol) Julian1. BesagINTRODUCTION FRS, 1945–2010 Dublin, August 2011 8 / 57 IN this paper, we examinesome stochasticmodels whichmay be used to describe certaintypes of spatial processes. Potential applications of the models occur in plant ecologyand the paper concludeswith two detailednumerical examples in this area. At a formallevel, we shall largelybe concernedwith a ratherarbitrary system, consistingof a finiteset of sites, each site having associated with it a univariate randomvariable. In most ecological applications,the sites will representpoints or regionsin the Euclideanplane and will oftenbe subjectto a rigidlattice structure. For example,Cochran (1936) discussesthe incidence of spottedwilt over a rectangular arrayof tomato plants. The disease is transmittedby insectsand, afteran initial period of time,we should clearlyexpect to observeclusters of infectedplants. The formulationof spatial stochasticmodels will be consideredin Sections2-5 of the paper. Once havingset up a modelto describea particularsituation, we shouldthen hope to be able to estimateany unknownparameters and to testthe goodness-of-fit of the model on the basis of observation.We shall discussthe statisticalanalysis of latticeschemes in Sections6 and 7. We begin by makingsome generalcomments on the types of spatial systems whichwe shall,and shall not,be discussing.Firstly, we shall not be concernedhere withany randomdistribution which may be associatedwith the locationsof the sites themselves.Indeed, when settingup models in practice,we shall require quite specificinformation on the relativepositions of sites,in orderto assess the likely interdependencebetween the associatedrandom variables. Secondly,although, as in Methodology Spatial statistics: Modelling, conditional formulations JRSS(B) 1974: the Hammersley–Clifford theorem

The theorem states that “Markov random fields are the same as Gibbs distributions”– that is, that a multivariate distribution satisfies the Markov random field property if and only if its log-density is additive over cliques. Besag gave a simple proof of this, assuming “positivity” – essentially that any combination of values realisable locally was realisable globally.

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 9 / 57 198 BESAG- StatisticalAnalysis of Lattice Systems [No. 2, Methodology Spatial statistics: Modelling, conditional formulations for any 1< i

Q(x)- Q(x1) = Xi G1(X1)+ E xi G1J(x1,x1) + XXi Xk GlJ,k(Xl, Xi, XJ) + 2

+ X2X3 ... xnG1,2,..n(x, x2,.. Xn),

Now suppose site 1 (# 1) is not a neighbourof site 1. Then Q(x) - Q(xl) must be independentof xl forall x e Q. Puttingxi =0 forio 1 or 1,we immediatelysee that G1,(xl, x) = 0 on Q. Similarly,by othersuitable choices of x, it is easilyseen succes- sivelythat all 3-,4-, ..., n-variableG-functions involving both xl andx1 must be null. The analogousresult holds for any pair of siteswhich are notneighbours of each otherand hence,in general,G can onlybe non-nullif the sitesi,j, ..., s forma clique. On theother hand, any set of G-functionsgives rise to a validprobability distri- butionP(x) whichsatisfies the positivity condition. Also sinceQ(x) - Q(xj) depends onlyupon x1 if there is a non-nullG-function involving both xi andx1, it follows that the same is trueof P(x*j xl, ..., xi-,, xi+, ..., xn). This completesthe proof. We nowconsider some simple extensions of the theorem. Suppose firstly that the variatescan take a denumerablyinfinite set of values. Then the theorem still holds if, Green (Bristol)in thesecond part, we imposeJulianthe Besagadded FRS,restriction 1945–2010that the G-functions be chosenDublin, August 2011 10 / 57 suchthat E expQ(x) is finite,where the summation is over all x E Q. Similarly,ifthe variateseach have absolutelycontinuous distributions and we interpretP(x) and alliedquantities as probabilitydensities, the theorem holds provided we ensurethat exp Q(x) is integrableover all x. Theseadditional requirements must not be taken lightly,as we shallsee by examples in Section4. Finally,we may consider the case of multivariaterather than univariate site variables. In particular,suppose that the randomvector at sitei has vi components.Then we mayreplace that site by v* notionalsites, each of whichis associatedwith a singlecomponent of therandom vector.An appropriatesystem of neighboursmay thenbe constructedand the univariatetheorem be appliedin theusual way. We shallnot considerthe multi- variatesituation any further in thepresent paper. As a straightforwardcorollary to thetheorem, itmay easily be establishedthat for anygiven Markov field P(X*= xi, Xi = Xi,..., X,= x I all othersite values) dependsonly upon xi,x1,...,x and thevalues at sitesneighbouring sites i,j, ...,s. In theHammersley-Clifford terminology, the local and global Markovian properties are equivalent. In practice,we shallusually find that the sites occur in a finiteregion of Euclidean spaceand thatthey often fall naturally into two sets: those which are internal to the Methodology Spatial statistics: Modelling, conditional formulations Markov random fields

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 11 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 12 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields

?

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 13 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields

?

  Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 14 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields = Gibbs distributions

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 15 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields = Gibbs distributions

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 16 / 57 Methodology Spatial statistics: Modelling, conditional formulations Markov random fields = Gibbs distributions

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 17 / 57 Methodology Spatial statistics: Modelling, conditional formulations The Hammersley–Clifford theorem

Many years later, the theorem was superseded by a more complete understanding of Markov properties in undirected graphical models: we can distinguish Global, Local and Pairwise Markov properties, and relate all these to the Factorisation property of Gibbs distributions; in general F = G = L = P ⇒ ⇒ ⇒ and under an additional condition implied by positivity they are all equivalent.

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 18 / 57 Methodology Spatial statistics: Modelling, conditional formulations Pairwise Markov property

 

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 19 / 57 Methodology Spatial statistics: Modelling, conditional formulations Global Markov property

 

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 20 / 57 Methodology Spatial statistics: Modelling, conditional formulations Other key ideas in the 1974 paper

Auto-models auto-normal, auto-logistic, etc establishes idea that the natural multivariate generalisations of standard univariate distributions might be based on conditional not marginal distributions having the assumed form. Inferential methods for MRFs Ecological applications Characterisation of MRFs as equilibria of certain space-time models

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 21 / 57 Methodology Spatial statistics: Frequentist and Bayesian inference Inference in spatial systems: key papers

Statistical analysis of non-lattice data. The Statistician (1975). On the estimation and testing of spatial interaction in Gaussian lattice processes. Biometrika (1975). (with Pat Moran) Errors-in-variables estimation for Gaussian lattice schemes. Journal of the Royal Statistical Society B (1977). Efficiency of pseudo-likelihood estimation for simple Gaussian fields. Biometrika (1977).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 22 / 57 will not necessarilybe unique. For the simplestlattice schemes, the proportionof blackened sites may be as highas 50 percent but, in typical (non-lattice)geographical applications, one mightexpect a value nearer 30 percent. Experiencewith the codingtechnique has thus far been extremely limited.Two verysimple lattice examples are considerednumerically in Besag (1974a),whilst Besag and Moran (1975) discussthe efficiency of thetechnique for basic auto-normal lattice schemes. These investigations Methodology Spatial statistics: Frequentist and Bayesian inference are unfortunatelyof little direct relevance in non-experimentalsituations. In fact,as a generalrule, the coding technique would seem a somewhat Pseudo-likelihoodinefficient estimationprocedure in theway that the white-site terms pi(t) areentirely ignoredonce Sn,o has been generated.We thereforeconsider an alter- nativeproposition.

3.3. A Pseudo-likelihoodTechnique Giventhe previous set-up, perhaps the most naive approach to theesti- mationof theunknown parameters in theterms pi() wouldbe to take thatvector 4t which maximizes the quantity n Ln E In pi(qj (3 .2) i=l1 withrespect to 'P. Of course,Ln is not thetrue log-likelihood function forthe sample (except in thetrivial case of completeindependence) and yetits maximization,especially in viewof thecoding technique, would seemto presentan intuitivelyplausible method of estimation.That this intuitioncan be givena theoreticalfoundation will be seenlater on. Note that althoughthe pseudo-likelihoodtechnique confers the advantages thatno codingis requiredand thatall thep-functions are used in the maximizationprocess, it does have the drawback that no samplingproper- tiesof theestimates are yetknown. Nevertheless, one supposesthat the methodwill, on thewhole, produce better point estimates of thepara- Green (Bristol)metersthan does thecoding Juliantechnique. Besag FRS, 1945–2010 Dublin, August 2011 23 / 57 Althoughmaximum pseudo-likelihood estimation is intendedto have fairlywidespread applicability, it is of specialinterest to see how it fares withthe auto-normal schemes of section2. In particular,we supposethat

pi(.) = (2ru2)-1/2exp [-iu-{x2 p- _- P i,j(xj-_ ] (3.3)

so that,possibly following suitable rescaling, Xi has conditionalvariance a2 foreach i. Observingthe proposals in section2.4, we also assumethat ,u=DO and that i,3,=hi,ij, whereD and H_{hi,j} are knownn xp and nxn matrices,respectively; thus B=I-H# and 0, f and a together 190 Methodology Spatial statistics: Frequentist and Bayesian inference Pseudo-likelihood estimation

Thus in a Markov random field x, indexed by sites i S, with parameter ψ, the maximum ∈ pseudo-likelihood estimator of ψ is that maximising Y p(x x , ψ) i | S\i i∈S

... evidently not the joint probability of anything! It is motivated by Besag’s previous ‘coding technique’ estimator where the product is restricted to sites i that are not neighbours of each other, when it is a straightforward conditional likelihood. It has proved a powerful and durable idea in various contexts with dependent data, notching up now 733 citations.

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 24 / 57 Methodology Spatial statistics: Frequentist and Bayesian inference Pseudo-likelihood estimation

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 25 / 57 Methodology Spatial statistics: Algebra of interacting systems Algebra of interacting systems: key papers

On a system of two-dimensional recurrence relations. Journal of the Royal Statistical Society B (1981). On conditional and intrinsic autoregressions. Biometrika (1995) (with Charles Kooperberg). Markov random fields with higher-order interactions. Scandinavian Journal of Statistics (1998) (with Håkon Tjelmeland). A recursive algorithm for Markov random fields. Biometrika (2002) (with Francesco Bartolucci). First-order intrinsic autoregressions and the de Wijs process. Biometrika, 92, 909-920 (2005) (with Debashis Mondal).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 26 / 57 Biometrika(2005), 92, 4, pp.909-920 (C2005 Biometrika Trust Printedin Great Britain

First-orderintrinsic autoregressions and thede Wijs process

BY JULIAN BESAG AND DEBASHIS MONDAL Department ofStatistics, Box 354322, Universityof Washington, Seattle, Washington 98195, U.S.A. julian~stat.washington.edudebashis~stat.washington.edu

SUMMARY We discuss intrinsic autoregressions for a first-orderneighbourhood on a two dimensional rectangular lattice and give an exact formula for thevariogram that extends known results to the asymmetriccase. We obtain a corresponding asymptotic expansion Methodology Spatial statistics: Algebra of interacting systems that ismore accurate and more general than previous ones and use this to derive the deWijs variogram under appropriate averaging, a result thatcan be interpretedas a two Intrinsic autoregressiondimensional spatial analogue and of the Brownian de motion Wijs obtained process as the limit of a random Biometrikawalk (2005), in one92, dimension. 4, pp.909-920 This provides a bridge between geostatistics,where the de Wijs (C2005 processBiometrika wasTrust once themost popular formulation,and Markov random fields,and also Printedexplainsin Great why Britain statistical analysis using intrinsicautoregressions isusually robust to changes of scale.We brieflydescribe correspondingcalculations in the frequencydomain, including limitingresults for higher-order autoregressions. The paper closes with some practical considerations, includingapplications to irregularly-spaceddata. First-orderintrinsic autoregressions and thede Wijs process Some keywords: Agricultural field trial;Asymptotic expansion; De Wijs process; Earth science;Environmetrics; Geographical ;Geostatistics; Intrinsicautoregression; Markov random field;Variogram. BY JULIAN BESAG AND DEBASHIS MONDAL Department ofStatistics, Box 354322, Universityof Washington, Seattle, Washington 98195, 1. INTRODUCTIONU.S.A. Let {X.,,: u,julian~stat.washington.edu v E if = 0, + 1, . .. } be a homogeneousdebashis~stat.washington.edu first-orderintrinsic autoregression on the two-dimensionalrectangular lattice i2 (Ktinsch, 1987; Besag & Kooperberg, 1995; Rue & Held, 2005, Ch. 3), with generalisedSUMMARY spectral density function We discuss intrinsicf(W,a)=K(1-2/3cosw-2ycosq)-' autoregressions for a first-order (wiqE(-m,])neighbourhood on a two(1) dimensionaland conditional rectangular expectation lattice structure and give an exact formula for thevariogram that extends known results to the asymmetriccase. We obtain a corresponding asymptotic expansion that ismore E(XuvI accurate ... )=and /(x1,,v+xx+1l,))+y(xu," more general than previous 1+xuv+1), ones andvar(XIvl use ...)=this toK, derive (2) the deWijs where variogram /3,y > 0 and under 3+ appropriatey= 2.Note that averaging,we use the a result term that'order'can to be identify interpretedthe neighbour as a two dimensionalhood structure spatialof analogue a scheme. of This Brownian is consistent motionwith obtained Besag (1974) as the for limit stationary of a random auto walkregressions in one dimension.but differs Thisfrom providesKiinsch (1987),a bridge which between follows geostatistics,Matheron (1973)where inthe using de Wijs the processterm wasto describe once thethemost level ofpopular degeneracy. formulation, It is convenientand Markov and incurs randomno loss herefields, toand assume also explainsthat whyK= 1 statistical and thatanalysis {X,v } is using Gaussian. intrinsic Theautoregressions above formulation isusually provides robust distributions to changes of scale.for allWe constrasts brieflydescribe among correspondingtheXu,, rathercalculations than for the inXuv the themselves. frequencydomain, In particular, including all limitingdifferencesresults XU for - Xu+1 higher-order +t have autoregressions.well-defined distributions, The paperwith closes zero withmeans some and practicallag (s, t) considerations,variogram includingapplications to irregularly-spaceddata. Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 27 / 57 Vst=1var(Xu U-X+ v~ (s,t E if). (3) Some keywords: Agricultural field trial;Asymptotic expansion; De Wijs process; Earth science;Environmetrics; Geographical epidemiology;Geostatistics; Intrinsicautoregression; Markov random field;Variogram.

1. INTRODUCTION Let {X.,,: u, v E if = 0, + 1, . .. } be a homogeneous first-orderintrinsic autoregression on the two-dimensionalrectangular lattice i2 (Ktinsch, 1987; Besag & Kooperberg, 1995; Rue & Held, 2005, Ch. 3), with generalised spectral density function f(W,a)=K(1-2/3cosw-2ycosq)-' (wiqE(-m,]) (1) and conditional expectation structure

E(XuvI ... )= /(x1,,v+xx+1l,))+y(xu," 1+xuv+1), var(XIvl ...)= K, (2) where /3,y > 0 and 3+ y= 2.Note thatwe use the term 'order' to identifythe neighbour hood structureof a scheme. This is consistentwith Besag (1974) for stationary auto regressionsbut differsfrom Kiinsch (1987), which followsMatheron (1973) in using the termto describe the level of degeneracy. It is convenient and incursno loss here to assume that K= 1 and that {X,v } is Gaussian. The above formulationprovides distributions for all constrasts among theXu,, rather than for theXuv themselves. In particular, all differencesXU - Xu+1 +t have well-defined distributions,with zero means and lag (s, t) variogram Vst=1var(Xu U-X+ v~ (s,t E if). (3) 910 JULIAN BESAG AND DEBASHIS MONDAL

Any first-orderintrinsic autoregression Methodology can beSpatial interpreted statistics:as a process Algebra of ofindependent interacting systems incrementsbetween adjacent sites of the lattice,conditioned by the logical zero-sum con straints that are induced around each cell. Separate variances are permitted for rows and forcolumns; see Ktinsch (1999) regarding the proof of this result,which also extends to Intrinsic autoregressionhigher-orderschemes and and to theintrinsicautoregressions de Wijs on finite processgraphs, provided all con trastshave well-defined distributions.Alternatively, the scheme (2) can be interpretedas a limitingform of the corresponding stationary autoregression, originally due to Levy (1948), as the interactionparameters /3and y approach the boundary 3+y=4 of the parameter space. This also generalises to higher-order lattice schemes. Besag (1981) discusses the autocorrelation properties of the first-orderstationary autoregression. We close this sectionwith some brief remarksabout thede Wijs process (deWijs, 1951, 1953;Matheron, 1971), a generalised continuum formulation that takes nonzero values with respect to contrasts between averages over areas rather than values at points. The restrictionto averages is relevant in practice, where 'point'measurements are generally idealisations.The deWijs process isGaussian andMarkovian and itsvariogram intensity increases as the logarithmof distance,which underlies its invariance to conformal trans formations and its physical appeal as a basic model. To be more specific, the de Wijs process {Y,,} is a generalised Gaussian Markov process indexed by functions (o on the plane that integrate to zero and that give rise to thewell-defined variance formula

var(Ye ) - log x-y11 11(x)q(py) dx dy.

In particular, let ~9(x)= 1A(x)IIAI- 1B(X)IJBI,where A and B are two regions of the plane with respective areas JAI> 0 and IBI> 0. Then we interpretY1, as the differencein the average values YA and YB of the de Wijs process over A and B. These averages correspond to an intrinsicprocess with a single degeneracy and generalised variogram defined by VAB= 2 var(YA- YB).Note that the spectral density functionof the de Wijs process is inverselyproportional to W2+ q2.We return to this in ? 4 2 togetherwith some generalisations. In mining, the de Wijs model was once the automatic choice of engineers but it seems to have gone largelyout of fashionwith thedevelopment of geostatistics, for reasons that are not entirely clear, though there are computational problems in dealing with large datasets. It is still popular in South Africa in its original context of gold mining but elsewhere is usually relegated to being a limitingcase of theMatern (1986) familyand is rarelymentioned by name. However, an unpublished University of Chicago technical report by P. McCullagh and D. Clifford, containing an extensive empirical study of agricultural uniformitytrials, argues persuasively that,when augmented by a white noise Green (Bristol) component, the de Wijs processJulian generally Besag FRS,provides 1945–2010 an adequate fitwithin theMatern Dublin, August 2011 28 / 57 class. Indeed, thepaper proposes a 'loi du terroir'on thisbasis. A by-product of our paper is that it enables a close approximation to thede Wijs process plus white noise to be fitted to large datasets and to be used forpredictive inference.

2. EXACT VARIOGRAM CALCULATIONS FOR FIRST-ORDER INTRINSIC AUTOREGRESSIONS It follows fromequation (2) that the v for the first-orderintrinsic autoregression satisfy the systemof recurrenceequations,

? vSt =- ss /3(V8.1,t? VS+1,) + y(vss,-1 ? vst+1) (S, t E A) (4) Methodology Spatial statistics: Algebra of interacting systems Original lattice with array and cells A and B Intrinsic autoregression andL1 the de WijsD1 process

B

A

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 29 / 57

24 Methodology Spatial statistics: Algebra of interacting systems Sublattice with subarray and cells A and B Intrinsic autoregressionL and2 the de WijsD2 process

B

A

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 30 / 57 Consider first–order intrinsic autoregression on averaged to L2 D1

29 Methodology Spatial statistics: Algebra of interacting systems Sublattice with subarray and cells A and B Intrinsic autoregressionL and4 the de WijsD4 process

B

A

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 31 / 57 Consider first–order intrinsic autoregression on averaged to L4 D1

31 Methodology Spatial statistics: Algebra of interacting systems Sublattice with subarray and cells A and B Intrinsic autoregressionL and8 the de WijsD8 process

B

A

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 32 / 57 Consider first–order intrinsic autoregression on averaged to L8 D1

32 Methodology Spatial statistics: Algebra of interacting systems Intrinsic autoregression and the de Wijs process

Integrated de Wijs process Intrinsic autoregression

256 256 arrays ... provides a rigorous link between geostatistical× models and (intrinsic) lattice Markov random fields, explaining empirical robustness of discrete-space formulations to changes of scale. 28

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 33 / 57 Methodology Spatial statistics: Digital image analysis Digital image analysis: key papers

On the statistical analysis of dirty pictures (with Discussion). Journal of the Royal Statistical Society B (1986). Towards Bayesian image analysis. Journal of Applied Statistics (1989). Bayesian image restoration, with two applications in spatial statistics (with Discussion). Annals of the Institute of Statistical Mathematics (1991) (with Jeremy York and Annie Mollié)

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 34 / 57 Methodology Spatial statistics: Digital image analysis

The 1986 paperJ. R. Statist.Soc. in B (1986)JRSS(B) 48, No. 3, pp. 259-302 On theStatistical Analysis of DirtyPictures

By JULIAN BESAG Universityof Durham, U.K. [Read beforethe Royal StatisticalSociety, at a meetingorganized by the Research Section on Wednesday, May 7th,1986, ProfessorA. F. M. Smithin theChair] SUMMARY A continuoustwo-dimensional region is partitionedinto a finerectangular array of sites or "pixels",each pixel having a particular"colour" belonging to a prescribedfinite set. The truecolouring of the region is unknownbut, associated with each pixel, there is a possibly multivariaterecord which conveys imperfect information about its colour according to a knownstatistical model. The aim is to reconstructthe truescene, with the additional knowledgethat pixels close together tend to havethe same or similar colours. In thispaper, it is assumedthat the local characteristicsofthe true scenie can be representedby a non- degenerateMarkov random field. Such information can be combinedwith the records by Bayes'theorem and the true scene can be estimated according to standard criteria. However, thecomputational burden is enormous and the reconstruction may reflect undesirable large- scaleproperties of the random field. Thus, a simple,iterative method of reconstruction is proposed,which does not dependon theselarge-scale characteristics. The methodis illustratedby computer simulations in which the original scene is notdirectly related to the assumedrandom field. Some complications, including parameter estimation, are discussed. Potentialapplications are mentionedbriefly. Keywords:IMAGE PROCESSING; PATTERN RECOGNITION; SEGMENTATION; CLASSIFICATION; IMAGE RESTORATION; REMOTE SENSING; MARKOV RANDOM FIELDS; PAIRWISE Green (Bristol) INTERACTIONS; MAXIMUMJulian A POSTERIORI Besag FRS, 1945–2010ESTIMATION; SIMULATED ANNEALING;Dublin, August 2011 35 / 57 GIBBS SAMPLER; ITERATED CONDITIONAL MODES; GREY-LEVEL SCENES; AUTO- NORMAL MODELS; PSEUDO-LIKELIHOOD ESTIMATION.

1. INTRODUCTION There has been much recentinterest in the followingtype of problem.A continuoustwo- dimensionalregion S is partitionedinto a finerectangular array of sitesor pictureelements ("pixels"), each having a particularcolour, lying in a prescribedset. The colours may be unordered,in whichcase theyare usuallytokens for other attributes associated with S, suchas croptypes in satelliteimages, or maybe ordered,in whichcase theyare usuallygrey levels and representthe value perpixel of some underlyingvariable, such as intensity.In eithercase, there is supposedto be a truebut unknowncolouring of the pixels in S and theaim is to reconstruct thisscene from two imperfectsources of information. The firstof theseis that,associated with each pixel,there is a possiblymultivariate record whichprovides data on thecolour there. It is assumedthat, for any particular scene, the records followa knownstatistical distribution. The secondsource is notdirectly quantitative but states thatpixels close togethertend to have the same colour or verysimilar colours, depending on whethercolours are unorderedor ordered.In thispaper, we seekto quantifythis second source probabilistically,by means of a non-degenerateMarkov randomfield which crudely repre- sentsthe local characteristicof the underlying scene. In principle,such an approachenables the two sourcesto be combinedby Bayes' theoremand the truescene to be estimatedaccording to standardcriteria. For discretecolours, with which this paper is almostwholly concerned, the obvious choices are (i) the colouringthat has overall maximumprobability, given the

Presentaddress: Departmentof MathematicalSciences, University of Durham,Durham DH1 3LE, England. ? 1986 Royal StatisticalSociety 0035-9246/86/48259 $2.00 Methodology Spatial statistics: Digital image analysis Dirty pictures 19861 Image Restoration 275

x -x X ~** 00 ox-xx**x

* * 0 x ~ ~ ~ ~ ~~ x _ ?~ xx_x.kx t .t____._ __ __ 0 0 % *- **t****~~2****** ? ~X-~XXXX ______o08 ;*X X 0 $0 * ~ ~ ~ ~ ~~~*~~~X ~ 0 *** 4: 0 ~Xx- ~~~~~ 0 0O

* xx xxx*- 0 u*X Treixco ** - n x 6 i . Fig.3_a x xx xxx

X -

** *.****co*4114*14*IXYR *-== x 'l x El-,'"tElEIi~~~~~~~~~~~~~~X Fig. b. Maimumlikelhoodclassfier:320.errorrate Fig 3a Tre dsixclourscne 6 x64

**********)( XXXXXX ______M M s ~~~~~* **** xx:xx:* :*** __ :_:_:_:__ r r ~~~~** ** * ******** Xx _ ?__ ___

____ 2

- - *i**$* * O ., . N j ]- 0X- *************

> **********66 I *l*6*6*66E6*_ '*66**i*6**+*$*6 Wi

Fig. 3c. ICM reconstructionwith / t 1.5: 1.2% Fig. 3d. ICM reconstructionwith /3 estimated: errorrate. , = 1.80: 1.10%error rate.

multi-componentrecords. The resultsare applied to the classificationof land usage and of Green (Bristol) clouds,from satellite data. Julian Besag FRS, 1945–2010 Dublin, August 2011 36 / 57 5.4. Block Reconstruction In Section2.6, it was proposedthat, instead of choosing ii to maximizeP(xi Iy, Xs\j)at any stageof reconstruction, one mightchoose k'Bto maximizeP(G'B I y, xS\B), whereB representsa smallblock of pixelsin the vicinityof pixel i. In particular,suppose that the Bs form2 x 2 blocksof four, these being addressed in a rasterscan withoverlap allowed betweensuccessive blocks.At each stage,the block in questionmust be assignedone of c4 colourings,based on four records and, in the case of second-orderneighbourhood, 26 direct and diagonal adjacencies.With AssumptionsI and 2 of Section 2.1 and {p(x)} as in (6), this reducesto 1986] Image Restoration 265 local characteristics(Sections 3 andMethodology 4), thesesame Spatial fieldsgenerally statistics:have Digital undesirable image analysislarge-scale properties.In particular,given the level of local dependencewe have in mind,realizations of gp(x)} on S will usuallyconsist almost entirelyof a singlecolour. Of course,this will not Dirty pictures generallybe trueof P(x Iy) because of the influenceof the records;nevertheless, one might prefera methodof reconstructionwhich is not onlycomputationally undemanding but also ignoresthe large scale deficienciesof {p(x)}. We now describea schemewhich satisfies these conditions. Suppose thatx denotesa provisionalestimate of the true scene x* and thatour aim is merely to update the currentcolour xi at pixel i in the lightof all available information.Then a plausiblechoice is thecolour which has maximumconditional probability, given the records y and the currentreconstruction xS\i elsewhere; that is, the new xi maximizesP(x, Iy, X\) with respectto xi. It followsfrom Bayes' theoremand equations (1) and (2) that

P(xi I y, xS\i) oc f(yi I xi)pi(xi Izai), (5) so thatimplementation is trivialfor any locally dependent M.r.f. {p(x)}. Whenapplied to each pixelin turn,this procedure defines a singlecycle of an iterativealgorithm for estimating x*. As an initialx, we shall normallyadopt the conventionalmaximum likelihood classifier, whichignores geometrical considerations and merelychooses xi to maximizef(yi Ixi) at each i separately.We thenapply thealgorithm for a fixednumber of cyclesor untilconvergence, to producethe finalestimate of x*: note that

P(x I y) = P(Xi I y, Xs\i)P(Xs\iI A so thatP(x Iy) neverdecreases at any stage and eventualconvergence is assured.In practice, convergence,to what musttherefore be a local maximumof P(x Iy), seemsextremely rapid, withfew if any changes occurring after about thesixth cycle. Indeed, it was as an approximation to maximumprobability estimation that the algorithmwas firstproposed (Besag, 1983), althoughwe no longerview it merelyin thatlight. The algorithmwas suggestedindependently by Kittlerand Foglein(1984b), who applied it to Landsat data, as did Kiiveriand Campbell (1986). Note thatits dependenceonly on the local characteristicsof {p(x)} is ensuredby the rapidconvergence. We label the methodICM, representing"iterated conditional modes". The actual mechanicsof updatingmay depend on computingenvironment. Thus, witha languagesuch as Fortran,updating is most convenientlyimplemented as a rasterscan. It is helpfulto varythe rasterfrom cycle to cycle,in orderto reducethe small directionaleffects whichmay otherwisebe produced.However, with a matrixlanguage such as APL, it will be Green (Bristol)muchfaster and moreconvenient Julianto modify Besagthe FRS,algorithm 1945–2010and use synchronousupdating: that Dublin, August 2011 37 / 57 is,to updatecycle by cycle.This eliminatesspurious directional effects but convergence can no longerbe guaranteedand small oscillationsmay occur.The partiallysynchronous scheme, in whichcoding sets of pixelsare simultaneouslyupdated, provides a usefulcompromise. Given the ideal of a genuinelyparallel system,with a singleprocessor dedicated to each pixel,the processorscould be allowed to run at theirown individualrates and restorationwould be immediate. Finally,as remarkedby PeterGreen, there is a somewhatincidental link with Section 2.3. Apartfrom the totallysynchronous option, which is invalidfor the Gibbs sampler,ICM is exactlyequivalent to instantaneousfreezing in simulatedannealing!

2.6. SomeModifications of ICM There are severalmodifications which can be made to the basic versionof ICM. A simple variantis to workup to thechosen {p(x)}, usinga sequenceof weaker fields on previouscycles. This can have theadvantage of not fixing pixel colours too earlyin therestoration, on thebasis of theunreliable initial estimate, though it naturallyrequires a littlemore iteration. Examples are providedin Sections3 and 5. Methodology Spatial statistics: Digital image analysis Iterated conditional modes

True scene (unobservable) x; observed digital image y.

Recover x by iteratively maximising the posterior local characteristics

p(x y, x ) i | S\i

... a ‘Gauss-Seidel’ approach. Empirically exhibits superior performance relative to (expensive) MAP estimator argmax p(x y) x |

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 38 / 57 Methodology Markov chain Monte Carlo methods MCMC: key papers

Spatial statistics and Bayesian computation (with Discussion). Journal of the Royal Statistical Society B (1993) (with Peter Green). Bayesian computation and stochastic systems (with Discussion). Statistical Science (1995) (with Peter Green, David Higdon and Kerrie Mengersen).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 39 / 57 Methodology Markov chain Monte Carlo methods Contributions to MCMC

partial conditioning (extends multigrid Swendsen-Wang) randomised proposals (explains adaptive rejection Metropolis sampling) Langevin-Hastings (MALA) sequential MCMC prediction simultaneous credible regions promotion/adaptation of statistical physics ideas

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 40 / 57 Methodology Monte Carlo computation and hypothesis testing Monte Carlo testing: key papers

Simple Monte Carlo tests for spatial pattern. Applied Statistics (1977) (with ). Generalized Monte Carlo significance tests. Biometrika (1989) (with Peter Clifford). Sequential Monte Carlo p-values. Biometrika (1991) (with Peter Clifford).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 41 / 57 Applications Applications

medical imaging remote sensing microarrays agricultural field trials geographical epidemiology biostatistics social networks ecology, etc.

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 42 / 57 Applications Microarrays Microarrays: key paper

Probabilistic segmentation and intensity estimation for microarray images. Biostatistics (2006) (with Raphael Gottardo, Matthew Stephens and Alejandro Murua).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 43 / 57 86 R.GOTTARDO ET AL.

In a typical application of cDNA arrays, gene expression patterns between two samples (e.g. a treat- ment and a control) are compared. The RNA is extracted from both samples and each is labeled with a different fluorescent dye. Generally, one dye is red, the other green. Next, the RNA samples are mixed and cohybridized to the probes on the cDNA array, which is then scanned to provide a 16-bit gray-scale image for each dye. The relative intensity of the dyes in each spot measures the relative abundance of that particular RNA type in the sample. Several factors, such as the hydrophobicity of the pretreated glass surface, the humidity as the probe dries, and the speed of drying, induce unequal distribution of probe material in the spot (Hedge et al., 2000) and can result in spots having irregular shape and size. Figure 1 shows an image from one of the data sets discussed later in the paper; note the evidence of doughnut shapes in some spots. Image analysis is required to produce estimates of the foreground and background intensities for both the red and green dyes for each gene. These estimates are the starting point of any statistical analysis such as testing for differential expression (Tusher et al., 2001; Efron et al., 2001; Newton et al., 2001; Dudoit Downloaded from et al., 2002; Gottardo et al., 2003), discriminant analysis (Golub et al., 1999;Applications Tibshirani et al.,Microarrays 2002), and clustering (Eisen et al., 1998; Tamayo et al., 1999; Yeung et al., 2001). To estimate the intensities, one first needs to locate the spots on the images and then to classify each pixel either as part of a spot or as biostatistics.oxfordjournals.org background.Microarrays Chen et al. (1997) provide an early statistical treatment of this task and Yang et al. (2002) discuss the effects of different approaches. at University of Bristol Information Services on August 18, 2011

Raw image from cDNA array from an HIV experiment: the paper devises and investigates a probabilistic approach to simultaneous segmentation and intensity estimation, implemented using EM/ICM algorithms

Fig. 1. Three blocks of oneGreen of the HIV (Bristol) raw images. The whole image contains 12 blocks.Julian BesagEach block FRS, is formed1945–2010 by Dublin, August 2011 44 / 57 16 × 40 spots. At the bottom, we have enlarged two portions of the image containing several artifacts not caused by hybridization of the probes to the slide. Some spots are doughnut shaped with larger intensity on the perimeter of the spot. Applications Agricultural field trials Agricultural field trials: key papers

Statistical analysis of field experiments using neighbouring plots. Biometrics (1986) (with Rob Kempton). Bayesian analysis of agricultural field experiments (with Discussion). Journal of the Royal Statistical Society B (1999) (with David Higdon).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 45 / 57 Applications Agricultural field trials Field trials: key ideas

linear models for yield, grounded and informed by well-understood practical context adjustment for variation in fertility using neighbouring plots decomposition Y = Fψ + T τ + error Bayesian formulation, MCMC computation simplifies interpretation (e.g. ranking, selection) allows complex formulations hierarchical t-formulation (outliers, jumps in fertility)

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 46 / 57 Applications Agricultural field trials Field trials

AgriculturalField Experiments 703 Raw yields

100I

Centredvariety effects

20 A) 10 20

0112

0) 3:(a)) Fertility >(b )_ ,

(a)J00

12 Residuals,,

(a)!t\flfrS

(b)@ f

Fig. 1. El Batanvariety trial: yields and additivedecompositions into variety, fertility and residualeffects under (a) Gaussian and (b) hierarchical-tformulations, with the same scale throughout:shaded regionsprovide pointwise90% credibleintervals; locations of varieties 1, 10, 11 and 20 ineach replicateare identified

Table 2. El Batanvariety trial: bivariate and marginalposterior distributions of vy and v/,in the hierarchical-tformulation

Green (Bristol) VY JulianPiPobabilities Besagfor the FRS,following 1945–2010values of vp,: Dublin, August 2011 47 / 57 1 2 4 8 16 32 64

2 0.03 0.04 0.01 0.08 4 0.13 0.14 0.03 0.01 0.01 0.32 8 0.10 0.10 0.01 0.01 0.23 16 0.07 0.05 0.01 0.14 32 0.06 0.04 0.01 * 0.12 64 0.06 0.04 0.01 * 0.11 0.45 0.41 0.07 0.03 0.02 0.01 0.01

suspiciouslylarge yield in thefinal replicate and henceis penalizedby thet-analysis. This analysisalso suggestsa fallin fertility in the first replicate between varieties 10 and 11,which thereforedemotes the former and promotesthe latter. Variety 20 has oneof the lowest yields in thefinal replicate but otherwise performs very well. Its mean is enhancedby the t-analysis Applications Geographical epidemiology Geographical epidemiology: key papers

The detection of clusters in rare diseases. Journal of the Royal Statistical Society A (1991) (with James Newell). Bayesian image restoration, with two applications in spatial statistics (with Discussion). Annals of the Institute of Statistical Mathematics (1991) (with Jeremy York and Annie Mollié). Modelling risk from a disease in time and space. Statistics in Medicine (1998) (with Leo Knorr-Held).

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 48 / 57 Applications Geographical epidemiology Space-time analysis of lung cancer incidence

MODELLING RISK FROM A DISEASE IN TIME AND SPACE 2053

MODELLING RISK FROM A DISEASE IN TIME AND SPACE 2047

Disease counts treated as independent Binomial, with logit probabilities modelled as year+age*time+ gender*race*time+ covariate+county, with county a Gaussian intrinsic autoregression plus noise.

Figure 1. Crude annual death rate × 1000 for each county in Ohio Figure 5. Medians and 50, 80 and 95 per cent credible intervals for the overall time trend and for the aggregate of this with each of the ÿve age group e ects

Figure 3 shows the observed mortality rate from lung cancer over time, broken down by gender ? 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2045–2060 (1998) and race in the top panel and, ignoringGreen deaths (Bristol) below age 45, by age group in the bottom one.Julian Besag FRS, 1945–2010 Dublin, August 2011 49 / 57 However, these plots can be misleading. For example, we must not conclude that there is a higher risk for white than for non-white women, since the di erence in the mortality rates might be an artifact of disparities in the age distributions of the two subgroups. Indeed, later we shall see that this appears to be the case. Of course, a major risk factor for lung cancer is cigarette consumption and any analysis that ignores this is highly questionable. Our basic formulation allows for unobserved covariates in each county but this device is unlikely to be entirely satisfactory here. Although there is no reliable direct information on smoking behaviour across Ohio, it should help to include a measure of urbanization in the model, primarily as a surrogate for cigarette consumption but also for other risk factors associated with urban areas, for example, industrial air pollution. For example, Kafadar and Tukey19 suggest the logarithm of the population size of the largest city in each county and compare this favourably with other, more obvious, measures, such as population density, in a nationwide analysis of deaths from lung cancer. Figure 4 shows a map of the Kafadar–Tukey (K–T) index for Ohio, based on population data from 1970. The four counties containing the major cities (Cleveland, on Lake Erie, in the north; Cincinnati, in the southwest; Columbus, approximately

? 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 2045–2060 (1998) Last seminars Last seminars

Markov chains for DNA sequences Markov random graphs for social networks

Constrained Monte Carlo and Some Applications

JULIAN BESAG

Department of Statistics, University of Washington, Seattle, USA (emeritus) Department of Mathematics, University of Bristol, UK (honorary)

Markov random fields in statistical ecology Markov point processes

1

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 50 / 57 Last seminars Last seminars

Agenda

A statistician plays Sudoku ... and defends p–values in exploratory data analysis.

Simple Monte Carlo p–values. Examples

Markov chain Monte Carlo (MCMC) p–values. Applications

MCMC for p–values in multi–dimensional contingency tables Applications

Mobility and irreducibility in constrained sample spaces. Applications

Social networks : Markov random graphs : MCMC p–values. Application

2

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 51 / 57 Last seminars Last seminars

Continuum limits of Gaussian Markov random fields : resolving the conflict with geostatistics

JULIAN BESAG

Department of Mathematical Sciences, , England emeritus Department of Statistics, University of Washington, Seattle, USA

Joint work with DEBASHIS MONDAL Department of Statistics, University of Chicago, USA formerly Department of Statistics, University of Washington

Oxford, 23 October, 2008

1

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 52 / 57 Last seminars Last seminars

Agenda

Hidden Markov random fields (MRFs) and some applications. • Geostatistical versus MRF approach to spatial data. • Describe simplest Gaussian intrinsic autoregression on 2–d rectangular array • and its exact and asymptotic variograms.

Describe de Wijs process and its exact and approximate variograms. • Reconcile geostatistics and Gaussian MRFs via regional averages. • Generalizations and wrap–up. •

2

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 53 / 57 Last seminars Last seminars

Wrap up

Gaussian Markov random fields are alive and well !! • Precision matrix of Gaussian MRFs sparse efficient computation. • ⇒ rapid Regional averages of Gaussian MRFs continuum de Wijs process. • −→ Reconciliation between Gaussian MRF and original geostatistical formulation. • Empirical evidence for de Wijs process in agriculture : • P. McCullagh & D. Clifford (2006), “Evidence of conformal invariance for crop yields”, Proc. R. Soc. A, 462, 2119–2143. Consistently selects de Wijs within Mat´ern class of variograms (25 crops !).

de Wijs process also alive and well and can be fitted via Gaussian MRFs. • de Wijs process has separate life as Gaussian free field in statistical physics. •

54

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 54 / 57 Webpage: sustain.bris.ac.uk/JulianBesag/tributes Email: [email protected]

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 55 / 57 Honours

1983 RSS in Silver 1984 Member of the International Statistical Institute 1991 Fellow of the Institute of Mathematical Statistics 2001 Chancellor’s medal, University of California 2004 Fellow of the Royal Society

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 56 / 57 At the helm of his yacht Annie in 2006

Green (Bristol) Julian Besag FRS, 1945–2010 Dublin, August 2011 57 / 57