Learning Heterogeneous Hidden Markov Random Fields

Total Page:16

File Type:pdf, Size:1020Kb

Learning Heterogeneous Hidden Markov Random Fields Learning Heterogeneous Hidden Markov Random Fields Jie Liu Chunming Zhang Elizabeth Burnside David Page CS, UW-Madison Statistics, UW-Madison Radiology, UW-Madison BMI & CS, UW-Madison Abstract neighborhood system on an image is invariant across different regions. However, it is necessary to bring Hidden Markov random fields (HMRFs) are heterogeneity to HMMs and HMRFs in some biolog- conventionally assumed to be homogeneous ical applications where the correlation structure can in the sense that the potential functions are change over different sites. For example, a heteroge- invariant across different sites. However in neous HMM is used for segmenting array CGH data some biological applications, it is desirable [Marioni et al., 2006], and the transition matrix de- to make HMRFs heterogeneous, especially pends on some background knowledge, i.e. some dis- when there exists some background knowl- tance measurement which changes over the sites. A edge about how the potential functions vary. heterogeneous HMRF is used to filter SNPs in genome- We formally define heterogeneous HMRFs wide association studies [Liu et al., 2012a], and the and propose an EM algorithm whose M-step pairwise potential functions depend on some back- combines a contrastive divergence learner ground knowledge, i.e. some correlation measure be- with a kernel smoothing step to incorpo- tween the SNPs which can be different between differ- rate the background knowledge. Simulations ent pairs. In both of these applications, the transition show that our algorithm is effective for learn- matrix and the pairwise potential functions are hetero- ing heterogeneous HMRFs and outperforms geneous and are parameterized as monotone paramet- alternative binning methods. We learn a het- ric functions of the background knowledge. Although erogeneous HMRF in a real-world study. the algorithms tune the parameters in the monotone functions, there is no justification that the parameter- ization of the monotone functions is correct. Can we 1 Introduction adopt the background knowledge about these heteroge- neous parameters adaptively during HMRF learning, Hidden Markov models (HMMs) and hidden Markov and recover the relation between the parameters and random fields (HMRFs) are useful approaches for mod- the background knowledge nonparametrically? elling structured data such as speech, text, vision and Our paper is the first to learn HMRFs with hetero- biological data. HMMs and HMRFs have been ex- geneous parameters by adaptively incorporating the tended in many ways, such as the infinite models [Beal background knowledge. It is an EM algorithm whose et al., 2002, Gael et al., 2008, Chatzis and Tsechpe- M-step combines a contrastive divergence style learner nakis, 2009], the factorial models [Ghahramani and with a kernel smoothing step to incorporate the back- Jordan, 1997, Kim and Zabih, 2002], the high-order ground knowledge. Details about our EM-kernel-PCD models [Lan et al., 2006] and the nonparametric mod- algorithm are given in Section 3 after we formally de- els [Hsu et al., 2009, Song et al., 2010]. HMMs are fine heterogeneous HMRFs in Section 2. Simulations homogeneous in the sense that the transition matrix in Section 4 show that our EM-kernel-PCD algorithm stays the same across different sites. HMRFs, inten- is effective for learning heterogeneous HMRFs and out- sively used in image segmentation tasks [Zhang et al., performs alternative methods. In Section 5, we learn 2001, Celeux et al., 2003, Chatzis and Varvarigou, a heterogeneous HMRF in a real-world genome-wide 2008], are also homogeneous. The homogeneity as- association study. We conclude in Section 6. sumption for HMRFs in image segmentation tasks is legitimate, because people usually assume that the 2 Models Appearing in Proceedings of the 17th International Con- ference on Artificial Intelligence and Statistics (AISTATS) 2.1 HMRFs And Homogeneity Assumption 2014, Reykjavik, Iceland. JMLR: W&CP volume 33. Copy- Suppose that X = f0; 1; :::; m − 1g is a discrete space, right 2014 by the authors. and we have a Markov random field (MRF) defined on Learning Heterogeneous Hidden Markov Random Fields a random vector X 2 Xd. The conditional indepen- dence is described by an undirected graph G(V; E). The node set V consists of d nodes. The edge set E consists of r edges. The probability of x from the MRF 1 with parameters θ is Q(x; θ) 1 Y P (x; θ) = = φ (x; θ ); (1) Z(θ) Z(θ) c c c2C(G) 1 where Z(θ) is the normalizing constant. Q(x; θ) is 1 some unnormalized measure with C(G) being some subset of the cliques in G. The potential function φc is defined on the clique c and is parameterized by Figure 1: The pairwise HMRF model with three latent θc. For simplicity in this paper, we consider pairwise MRFs, whose potential functions are defined on the nodes (X1, X2, X3) and observable nodes (Y1, Y2, Y3) with parameters θ = fθ ; θ ; θ g and ' = f' ;' g. edges, namely jC(G)j = r. We further assume that 1 2 3 0 1 each pairwise potential function is parameterized by a single parameter, i.e. θc = fθcg. X A hidden Markov random field [Zhang et al., 2001, L(θ; ') = log P (y; θ; ') = log P (x; y; θ; '): (3) d Celeux et al., 2003, Chatzis and Varvarigou, 2008] con- x2X sists of a hidden random field X 2 d and an observ- X Since we only have one instantiation (x; y), we usually able random field Y 2 d where is another space Y Y have to assume that θ 's are the same for i = 1; :::; r (either continuous or discrete). The random field X i for effective parameter learning. This homogeneity as- is a Markov random field with density P (x; θ), as de- sumption is widely used in computer vision problems fined in Formula (1), and its instantiation x cannot be because people usually assume that the neighborhood measured directly. Instead, we can observe the emit- system on an image is invariant across its different re- ted random field Y with its individual dimension Y i gions. Therefore, conventional HMRFs refer to homo- depending on X for i = 1; :::; d, namely P (yjx; ') = i geneous HMRFs, similar to conventional HMMs whose Qd P (y jx ; ') where ' = f' ; :::; ' g and ' i=1 i i 0 m−1 xi transition matrix is invariant across different sites. parameterizes the emitting distribution of Yi under the state x . Therefore, the joint probablity of x and y is i 2.2 Heterogeneous HMRFs P (x; y;θ; ') = P (x; θ)P (yjx; ') In a heterogeneous HMRF, the potential functions on d 1 Y Y (2) different cliques can be different. Taking the model in = φ (x; θ ) P (y jx ; '): Z(θ) c c i i Figure 1 as an example, θ1, θ2 and θ3 can be different c2C(G) i=1 if the HMRF is heterogeneous. As with conventional Example 1: One pairwise HMRF model with three HMRFs, we want to be able to address applications that have one instantiation (x; y) where y is observ- latent variables (X1, X2, X3) and three observable able and x is hidden. Therefore, learning an HMRF variables (Y1, Y2, Y3) is given in Figure 1. Let from one instantiation y is infeasible if we free all θ's. X = f0; 1g. X1, X2 and X3 are connected by three To partially free the parameters, we assume that there edges. The pairwise potential function φi on edge i is some background knowledge k = fk1; :::; krg about (connecting Xu and Xv) parameterized by θi (0<θi<1) I(X =X ) the parameters θ = fθ ; :::; θ g in the form of some un- is φ (X; θ ) = θ u v (1−θ )I(Xu6=Xv ) for i = 1; 2; 3, 1 r i i i i known smooth mapping function which maps θ to k where I is an indicator variable. Let = R. For i i Y for i = 1; :::; r. The background knowledge describes i = 1; 2; 3, Y jX =0 ∼ N(µ ; σ2) and Y jX =1 ∼ i i 0 0 i i how these potential functions are different across dif- N(µ ; σ2), namely ' = fµ ; σ g and ' = fµ ; σ g. 1 1 0 0 0 1 1 1 ferent cliques. Taking pairwise HMRFs for example, In common applications of HMRFs, we observe only the potentials on the edges with similar background one instantiation y which is emitted according to the knowledge should have similar parameters. We can re- hidden state vector x, and the task is to infer the most gard the homogeneity assumption in conventional HM- probable state configuration of X, or to compute the RFs as an extreme type of background knowledge that marginal probabilities of X. In both tasks, we need k1 = k2 = ::: = kr. The problem we solve in this paper to estimate the parameters θ = fθ1; :::; θrg and ' = is to estimate θ and ' which maximize the log likeli- f'0; :::; 'm−1g. Usually, we seek maximum likelihood hood L(θ; ') in Formula (3), subject to the condition estimates of θ and ' which maximize the log likelihood that the estimate of θ is smooth with respect to k. Liu, Zhang, Burnside, Page 3 Parameter Learning Methods Algorithm 1 PCD-n Algorithm [Tieleman, 2008] 1 2 s Learning heterogeneous HMRFs in above manner in- 1: Input: independent samples X = fx ; x ; :::; x g volves three difficulties, (i) the intractable Z(θ), (ii) from P (x; θ), maximum iteration number T ^ the latent x, and (iii) the heterogeneous θ. The way 2: Output: θ from the last iteration we handle the intractable Z(θ) is similar to using con- 3: Procedure: trastive divergence [Hinton, 2002] to learn MRFs.
Recommended publications
  • Multivariate Poisson Hidden Markov Models for Analysis of Spatial Counts
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Saskatchewan's Research Archive MULTIVARIATE POISSON HIDDEN MARKOV MODELS FOR ANALYSIS OF SPATIAL COUNTS A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Mathematics and Statistics University of Saskatchewan, Saskatoon, SK, Canada by Chandima Piyadharshani Karunanayake @Copyright Chandima Piyadharshani Karunanayake, June 2007. All rights Reserved. PERMISSION TO USE The author has agreed that the libraries of this University may provide the thesis freely available for inspection. Moreover, the author has agreed that permission for copying of the thesis in any manner, entirely or in part, for scholarly purposes may be granted by the Professor or Professors who supervised my thesis work or in their absence, by the Head of the Department of Mathematics and Statistics or the Dean of the College in which the thesis work was done. It is understood that any copying or publication or use of the thesis or parts thereof for finanancial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to the author and to the University of Saskatchewan in any scholarly use which may be made of any material in this thesis. Requests for permission to copy or to make other use of any material in the thesis should be addressed to: Head Department of Mathematics and Statistics University of Saskatchewan 106, Wiggins Road Saskatoon, Saskatchewan Canada, S7N 5E6 i ABSTRACT Multivariate count data are found in a variety of fields.
    [Show full text]
  • Regime Heteroskedasticity in Bitcoin: a Comparison of Markov Switching Models
    Munich Personal RePEc Archive Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Chappell, Daniel Birkbeck College, University of London 28 September 2018 Online at https://mpra.ub.uni-muenchen.de/90682/ MPRA Paper No. 90682, posted 24 Dec 2018 06:38 UTC Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models Daniel R. Chappell Department of Economics, Mathematics and Statistics Birkbeck College, University of London [email protected] 28th September 2018 Abstract Markov regime-switching (MRS) models, also known as hidden Markov models (HMM), are used extensively to account for regime heteroskedasticity within the returns of financial assets. However, we believe this paper to be one of the first to apply such methodology to the time series of cryptocurrencies. In light of Moln´arand Thies (2018) demonstrating that the price data of Bitcoin contained seven distinct volatility regimes, we will fit a sample of Bitcoin returns with six m-state MRS estimations, with m ∈ {2,..., 7}. Our aim is to identify the optimal number of states for modelling the regime heteroskedasticity in the price data of Bitcoin. Goodness-of-fit will be judged using three information criteria, namely: Bayesian (BIC); Hannan-Quinn (HQ); and Akaike (AIC). We determined that the restricted 5-state model generated the optimal estima- tion for the sample. In addition, we found evidence of volatility clustering, volatility jumps and asymmetric volatility transitions whilst also inferring the persistence of shocks in the price data of Bitcoin. Keywords Bitcoin; Markov regime-switching; regime heteroskedasticity; volatility transitions. 1 2 List of Tables Table 1. Summary statistics for Bitcoin (23rd April 2014 to 31st May 2018) .
    [Show full text]
  • 12 : Conditional Random Fields 1 Hidden Markov Model
    10-708: Probabilistic Graphical Models 10-708, Spring 2014 12 : Conditional Random Fields Lecturer: Eric P. Xing Scribes: Qin Gao, Siheng Chen 1 Hidden Markov Model 1.1 General parametric form In hidden Markov model (HMM), we have three sets of parameters, j i transition probability matrix A : p(yt = 1jyt−1 = 1) = ai;j; initialprobabilities : p(y1) ∼ Multinomial(π1; π2; :::; πM ); i emission probabilities : p(xtjyt) ∼ Multinomial(bi;1; bi;2; :::; bi;K ): 1.2 Inference k k The inference can be done with forward algorithm which computes αt ≡ µt−1!t(k) = P (x1; :::; xt−1; xt; yt = 1) recursively by k k X i αt = p(xtjyt = 1) αt−1ai;k; (1) i k k and the backward algorithm which computes βt ≡ µt t+1(k) = P (xt+1; :::; xT jyt = 1) recursively by k X i i βt = ak;ip(xt+1jyt+1 = 1)βt+1: (2) i Another key quantity is the conditional probability of any hidden state given the entire sequence, which can be computed by the dot product of forward message and backward message by, i i i i X i;j γt = p(yt = 1jx1:T ) / αtβt = ξt ; (3) j where we define, i;j i j ξt = p(yt = 1; yt−1 = 1; x1:T ); i j / µt−1!t(yt = 1)µt t+1(yt+1 = 1)p(xt+1jyt+1)p(yt+1jyt); i j i = αtβt+1ai;jp(xt+1jyt+1 = 1): The implementation in Matlab can be vectorized by using, i Bt(i) = p(xtjyt = 1); j i A(i; j) = p(yt+1 = 1jyt = 1): 1 2 12 : Conditional Random Fields The relation of those quantities can be simply written in pseudocode as, T αt = (A αt−1): ∗ Bt; βt = A(βt+1: ∗ Bt+1); T ξt = (αt(βt+1: ∗ Bt+1) ): ∗ A; γt = αt: ∗ βt: 1.3 Learning 1.3.1 Supervised Learning The supervised learning is trivial if only we know the true state path.
    [Show full text]
  • Ergodicity, Decisions, and Partial Information
    Ergodicity, Decisions, and Partial Information Ramon van Handel Abstract In the simplest sequential decision problem for an ergodic stochastic pro- cess X, at each time n a decision un is made as a function of past observations X0,...,Xn 1, and a loss l(un,Xn) is incurred. In this setting, it is known that one may choose− (under a mild integrability assumption) a decision strategy whose path- wise time-average loss is asymptotically smaller than that of any other strategy. The corresponding problem in the case of partial information proves to be much more delicate, however: if the process X is not observable, but decisions must be based on the observation of a different process Y, the existence of pathwise optimal strategies is not guaranteed. The aim of this paper is to exhibit connections between pathwise optimal strategies and notions from ergodic theory. The sequential decision problem is developed in the general setting of an ergodic dynamical system (Ω,B,P,T) with partial information Y B. The existence of pathwise optimal strategies grounded in ⊆ two basic properties: the conditional ergodic theory of the dynamical system, and the complexity of the loss function. When the loss function is not too complex, a gen- eral sufficient condition for the existence of pathwise optimal strategies is that the dynamical system is a conditional K-automorphism relative to the past observations n n 0 T Y. If the conditional ergodicity assumption is strengthened, the complexity assumption≥ can be weakened. Several examples demonstrate the interplay between complexity and ergodicity, which does not arise in the case of full information.
    [Show full text]
  • Particle Gibbs for Infinite Hidden Markov Models
    Particle Gibbs for Infinite Hidden Markov Models Nilesh Tripuraneni* Shixiang Gu* Hong Ge University of Cambridge University of Cambridge University of Cambridge [email protected] MPI for Intelligent Systems [email protected] [email protected] Zoubin Ghahramani University of Cambridge [email protected] Abstract Infinite Hidden Markov Models (iHMM’s) are an attractive, nonparametric gener- alization of the classical Hidden Markov Model which can automatically infer the number of hidden states in the system. However, due to the infinite-dimensional nature of the transition dynamics, performing inference in the iHMM is difficult. In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to re- sample state trajectories for the iHMM. The proposed algorithm uses an efficient proposal optimized for iHMMs and leverages ancestor sampling to improve the mixing of the standard PG algorithm. Our algorithm demonstrates significant con- vergence improvements on synthetic and real world data sets. 1 Introduction Hidden Markov Models (HMM’s) are among the most widely adopted latent-variable models used to model time-series datasets in the statistics and machine learning communities. They have also been successfully applied in a variety of domains including genomics, language, and finance where sequential data naturally arises [Rabiner, 1989; Bishop, 2006]. One possible disadvantage of the finite-state space HMM framework is that one must a-priori spec- ify the number of latent states K. Standard model selection techniques can be applied to the fi- nite state-space HMM but bear a high computational overhead since they require the repetitive trainingexploration of many HMM’s of different sizes.
    [Show full text]
  • Hidden Markov Models (Particle Filtering)
    CSE 473: Artificial Intelligence Spring 2014 Hidden Markov Models & Particle Filtering Hanna Hajishirzi Many slides adapted from Dan Weld, Pieter Abbeel, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline § Probabilistic sequence models (and inference) § Probability and Uncertainty – Preview § Markov Chains § Hidden Markov Models § Exact Inference § Particle Filters § Applications Example § A robot move in a discrete grid § May fail to move in the desired direction with some probability § Observation from noisy sensor at each time § Is a function of robot position § Goal: Find the robot position (probability that a robot is at a specific position) § Cannot always compute this probability exactly è Approximation methods Here: Approximate a distribution by sampling 3 Hidden Markov Model § State Space Model § Hidden states: Modeled as a Markov Process P(x0), P(xk | xk-1) § Observations: ek P(ek | xk) Position of the robot P(x1|x0) x x x 0 1 … n Observed position from P(e |x ) 0 0 the sensor y0 y1 yn 4 Exact Solution: Forward Algorithm § Filtering is the inference process of finding a distribution over XT given e1 through eT : P( XT | e1:t ) § We first compute P( X1 | e1 ): § For each t from 2 to T, we have P( Xt-1 | e1:t-1 ) § Elapse time: compute P( Xt | e1:t-1 ) § Observe: compute P(Xt | e1:t-1 , et) = P( Xt | e1:t ) Approximate Inference: § Sometimes |X| is too big for exact inference § |X| may be too big to even store B(X) § E.g. when X is continuous § |X|2 may be too big to do updates § Solution: approximate inference by sampling § How robot localization works in practice 6 What is Sampling? § Goal: Approximate the original distribution: § Approximate with Gaussian distribution § Draw samples from a distribution close enough to the original distribution § Here: A general framework for a sampling method 7 Approximate Solution: Perfect Sampling Robot path till time n 1 Time 1 Time n Assume we can sample Particle 1 x0:n from the original distribution .
    [Show full text]
  • Hierarchical Dirichlet Process Hidden Markov Model for Unsupervised Bioacoustic Analysis
    Hierarchical Dirichlet Process Hidden Markov Model for Unsupervised Bioacoustic Analysis Marius Bartcus, Faicel Chamroukhi, Herve Glotin Abstract-Hidden Markov Models (HMMs) are one of the the standard HDP-HMM Gibbs sampling has the limitation most popular and successful models in statistics and machine of an inadequate modeling of the temporal persistence of learning for modeling sequential data. However, one main issue states [9]. This problem has been addressed in [9] by relying in HMMs is the one of choosing the number of hidden states. on a sticky extension which allows a more robust learning. The Hierarchical Dirichlet Process (HDP)-HMM is a Bayesian Other solutions for the inference of the hidden Markov non-parametric alternative for standard HMMs that offers a model in this infinite state space models are using the Beam principled way to tackle this challenging problem by relying sampling [lO] rather than Gibbs sampling. on a Hierarchical Dirichlet Process (HDP) prior. We investigate the HDP-HMM in a challenging problem of unsupervised We investigate the BNP formulation for the HMM, that is learning from bioacoustic data by using Markov-Chain Monte the HDP-HMM into a challenging problem of unsupervised Carlo (MCMC) sampling techniques, namely the Gibbs sampler. learning from bioacoustic data. The problem consists of We consider a real problem of fully unsupervised humpback extracting and classifying, in a fully unsupervised way, whale song decomposition. It consists in simultaneously finding the structure of hidden whale song units, and automatically unknown number of whale song units. We use the Gibbs inferring the unknown number of the hidden units from the Mel sampler to infer the HDP-HMM from the bioacoustic data.
    [Show full text]
  • Modelling Multi-Object Activity by Gaussian Processes 1
    LOY et al.: MODELLING MULTI-OBJECT ACTIVITY BY GAUSSIAN PROCESSES 1 Modelling Multi-object Activity by Gaussian Processes Chen Change Loy School of EECS [email protected] Queen Mary University of London Tao Xiang E1 4NS London, UK [email protected] Shaogang Gong [email protected] Abstract We present a new approach for activity modelling and anomaly detection based on non-parametric Gaussian Process (GP) models. Specifically, GP regression models are formulated to learn non-linear relationships between multi-object activity patterns ob- served from semantically decomposed regions in complex scenes. Predictive distribu- tions are inferred from the regression models to compare with the actual observations for real-time anomaly detection. The use of a flexible, non-parametric model alleviates the difficult problem of selecting appropriate model complexity encountered in parametric models such as Dynamic Bayesian Networks (DBNs). Crucially, our GP models need fewer parameters; they are thus less likely to overfit given sparse data. In addition, our approach is robust to the inevitable noise in activity representation as noise is modelled explicitly in the GP models. Experimental results on a public traffic scene show that our models outperform DBNs in terms of anomaly sensitivity, noise robustness, and flexibil- ity in modelling complex activity. 1 Introduction Activity modelling and automatic anomaly detection in video have received increasing at- tention due to the recent large-scale deployments of surveillance cameras. These tasks are non-trivial because complex activity patterns in a busy public space involve multiple objects interacting with each other over space and time, whilst anomalies are often rare, ambigu- ous and can be easily confused with noise caused by low image quality, unstable lighting condition and occlusion.
    [Show full text]
  • Pdf File of Second Edition, January 2018
    Probability on Graphs Random Processes on Graphs and Lattices Second Edition, 2018 GEOFFREY GRIMMETT Statistical Laboratory University of Cambridge copyright Geoffrey Grimmett Geoffrey Grimmett Statistical Laboratory Centre for Mathematical Sciences University of Cambridge Wilberforce Road Cambridge CB3 0WB United Kingdom 2000 MSC: (Primary) 60K35, 82B20, (Secondary) 05C80, 82B43, 82C22 With 56 Figures copyright Geoffrey Grimmett Contents Preface ix 1 Random Walks on Graphs 1 1.1 Random Walks and Reversible Markov Chains 1 1.2 Electrical Networks 3 1.3 FlowsandEnergy 8 1.4 RecurrenceandResistance 11 1.5 Polya's Theorem 14 1.6 GraphTheory 16 1.7 Exercises 18 2 Uniform Spanning Tree 21 2.1 De®nition 21 2.2 Wilson's Algorithm 23 2.3 Weak Limits on Lattices 28 2.4 Uniform Forest 31 2.5 Schramm±LownerEvolutionsÈ 32 2.6 Exercises 36 3 Percolation and Self-Avoiding Walks 39 3.1 PercolationandPhaseTransition 39 3.2 Self-Avoiding Walks 42 3.3 ConnectiveConstantoftheHexagonalLattice 45 3.4 CoupledPercolation 53 3.5 Oriented Percolation 53 3.6 Exercises 56 4 Association and In¯uence 59 4.1 Holley Inequality 59 4.2 FKG Inequality 62 4.3 BK Inequalitycopyright Geoffrey Grimmett63 vi Contents 4.4 HoeffdingInequality 65 4.5 In¯uenceforProductMeasures 67 4.6 ProofsofIn¯uenceTheorems 72 4.7 Russo'sFormulaandSharpThresholds 80 4.8 Exercises 83 5 Further Percolation 86 5.1 Subcritical Phase 86 5.2 Supercritical Phase 90 5.3 UniquenessoftheIn®niteCluster 96 5.4 Phase Transition 99 5.5 OpenPathsinAnnuli 103 5.6 The Critical Probability in Two Dimensions 107
    [Show full text]
  • Mean Field Methods for Classification with Gaussian Processes
    Mean field methods for classification with Gaussian processes Manfred Opper Neural Computing Research Group Division of Electronic Engineering and Computer Science Aston University Birmingham B4 7ET, UK. opperm~aston.ac.uk Ole Winther Theoretical Physics II, Lund University, S6lvegatan 14 A S-223 62 Lund, Sweden CONNECT, The Niels Bohr Institute, University of Copenhagen Blegdamsvej 17, 2100 Copenhagen 0, Denmark winther~thep.lu.se Abstract We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classifi­ cation models with Gaussian processes. In contrast to previous ap­ proaches, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given. 1 Modeling with Gaussian Processes Bayesian models which are based on Gaussian prior distributions on function spaces are promising non-parametric statistical tools. They have been recently introduced into the Neural Computation community (Neal 1996, Williams & Rasmussen 1996, Mackay 1997). To give their basic definition, we assume that the likelihood of the output or target variable T for a given input s E RN can be written in the form p(Tlh(s)) where h : RN --+ R is a priori assumed to be a Gaussian random field. If we assume fields with zero prior mean, the statistics of h is entirely defined by the second order correlations C(s, S') == E[h(s)h(S')], where E denotes expectations 310 M Opper and 0. Winther with respect to the prior. Interesting examples are C(s, s') (1) C(s, s') (2) The choice (1) can be motivated as a limit of a two-layered neural network with infinitely many hidden units with factorizable input-hidden weight priors (Williams 1997).
    [Show full text]
  • Probability on Graphs Random Processes on Graphs and Lattices
    Probability on Graphs Random Processes on Graphs and Lattices GEOFFREY GRIMMETT Statistical Laboratory University of Cambridge c G. R. Grimmett 1/4/10, 17/11/10, 5/7/12 Geoffrey Grimmett Statistical Laboratory Centre for Mathematical Sciences University of Cambridge Wilberforce Road Cambridge CB3 0WB United Kingdom 2000 MSC: (Primary) 60K35, 82B20, (Secondary) 05C80, 82B43, 82C22 With 44 Figures c G. R. Grimmett 1/4/10, 17/11/10, 5/7/12 Contents Preface ix 1 Random walks on graphs 1 1.1 RandomwalksandreversibleMarkovchains 1 1.2 Electrical networks 3 1.3 Flowsandenergy 8 1.4 Recurrenceandresistance 11 1.5 Polya's theorem 14 1.6 Graphtheory 16 1.7 Exercises 18 2 Uniform spanning tree 21 2.1 De®nition 21 2.2 Wilson's algorithm 23 2.3 Weak limits on lattices 28 2.4 Uniform forest 31 2.5 Schramm±LownerevolutionsÈ 32 2.6 Exercises 37 3 Percolation and self-avoiding walk 39 3.1 Percolationandphasetransition 39 3.2 Self-avoiding walks 42 3.3 Coupledpercolation 45 3.4 Orientedpercolation 45 3.5 Exercises 48 4 Association and in¯uence 50 4.1 Holley inequality 50 4.2 FKGinequality 53 4.3 BK inequality 54 4.4 Hoeffdinginequality 56 c G. R. Grimmett 1/4/10, 17/11/10, 5/7/12 vi Contents 4.5 In¯uenceforproductmeasures 58 4.6 Proofsofin¯uencetheorems 63 4.7 Russo'sformulaandsharpthresholds 75 4.8 Exercises 78 5 Further percolation 81 5.1 Subcritical phase 81 5.2 Supercritical phase 86 5.3 Uniquenessofthein®nitecluster 92 5.4 Phase transition 95 5.5 Openpathsinannuli 99 5.6 The critical probability in two dimensions 103 5.7 Cardy's formula 110 5.8 The
    [Show full text]
  • Arxiv:1504.02898V2 [Cond-Mat.Stat-Mech] 7 Jun 2015 Keywords: Percolation, Explosive Percolation, SLE, Ising Model, Earth Topography
    Recent advances in percolation theory and its applications Abbas Ali Saberi aDepartment of Physics, University of Tehran, P.O. Box 14395-547,Tehran, Iran bSchool of Particles and Accelerators, Institute for Research in Fundamental Sciences (IPM) P.O. Box 19395-5531, Tehran, Iran Abstract Percolation is the simplest fundamental model in statistical mechanics that exhibits phase transitions signaled by the emergence of a giant connected component. Despite its very simple rules, percolation theory has successfully been applied to describe a large variety of natural, technological and social systems. Percolation models serve as important universality classes in critical phenomena characterized by a set of critical exponents which correspond to a rich fractal and scaling structure of their geometric features. We will first outline the basic features of the ordinary model. Over the years a variety of percolation models has been introduced some of which with completely different scaling and universal properties from the original model with either continuous or discontinuous transitions depending on the control parameter, di- mensionality and the type of the underlying rules and networks. We will try to take a glimpse at a number of selective variations including Achlioptas process, half-restricted process and spanning cluster-avoiding process as examples of the so-called explosive per- colation. We will also introduce non-self-averaging percolation and discuss correlated percolation and bootstrap percolation with special emphasis on their recent progress. Directed percolation process will be also discussed as a prototype of systems displaying a nonequilibrium phase transition into an absorbing state. In the past decade, after the invention of stochastic L¨ownerevolution (SLE) by Oded Schramm, two-dimensional (2D) percolation has become a central problem in probability theory leading to the two recent Fields medals.
    [Show full text]