A Survey of Probabilistic Models, Using the Bayesian Programming Methodology As a Unifying Framework

A survey of probabilistic models, using the Bayesian Programming methodology as a unifying framework Julien Diard,∗ Pierre Bessière, and Emmanuel Mazer Laboratoire GRAVIR / IMAG – CNRS INRIA Rhône-Alpes, 655 avenue de l’Europe 38330 Montbonnot Saint Martin FRANCE [email protected] Abstract Several authors have already been interested in the relations between some of these models. Smyth, for This paper presents a survey of the most common instance, expresses the Kalman Filters (KFs) and the probabilistic models for artefact conception. We use Hidden Markov Models (HMMs) in the Bayesian Net- a generic formalism called Bayesian Programming, work (BN) framework [1, 2]. Murphy replaces KFs, which we introduce briefly, for reviewing the main HMMs and a lot of their variants in the Dynamic probabilistic models found in the literature. Indeed, Bayesian Network (DBN) formalism [3, 4] . Finally, we show that Bayesian Networks, Markov Localization, Ghahramani focuses on the relations between BNs, Kalman filters, etc., can all be captured under this sin- DBNs, HMMs, and some variants of KFs [5, 6, 7]. gle formalism. We believe it offers the novice reader a good introduction to these models, while still providing The current paper includes the above, and adds Re- the experienced reader an enriching global view of the cursive Bayesian Estimation, Bayesian Filters (BFs), field. Particle Filters (PFs), Markov Localization models, Monte Carlo Markov Localization (MCML), and (Partially Observable) Markov Decision Processes (POMDPs and MDPs). This paper builds upon 1 Introduction previous works, where more models are presented: Bessière et al. treat Mixture Models, Maximum En- We think that over the next decade, probabilistic tropy Approaches, Sensor Fusion, and Classification reasoning will provide a new paradigm for understand- in [8], while Diard treats Markov Chains and develops ing neural mechanisms and the strategies of animal Bayesian Maps in [9]. behaviour at a theoretical level, and will raise the per- All these models will be presented by their rewriting formance of engineering artefacts to a point where they into the Bayesian Programming formalism, which will are no longer easily outperformed by the biological ex- allow to use a unique notation and structure through- amples they are imitating. out the paper. This gives the novice reader an efficient Rational reasoning with incomplete and uncertain first introduction to a wide variety of models and their information is quite a challenge for artificial systems. relations. This also, hopefully, brings other readers the The purpose of probabilistic inference is precisely to “big picture”. By making the hypotheses made by each tackle this problem with a well-established formal the- model explicit, we also bring to perspective all their ory. During the past years a lot of progress has been variants, and shed light upon the possible variatons made in this field both from the theoretical and ap- that are yet to be explored. plied point of view. The purpose of this paper is We will present the different models following a to give an overview of these works and especially to general-to-specific ordering. The “generality” measure try a synthetic presentation using a generic formalism we have chosen takes into account the number of hy- named Bayesian Programming (BP). potheses made by each model: the less hypotheses made by a model, the more general it is. However, ∗ Julien Diard is currently with the Laboratoire de Physiolo- not all hypotheses are comparable: for example, spe- gie de la Perception et de l’Action, Collège de France, Paris and the Department of Mechanical Engineering of the National Uni- cialising the semantics of variables or specialising the versity of Singapore. He can be reached at [email protected]. dependency structure of the probabilistic models are 8 8 8 More general > Pertinent variables Bayesian > > <> > <> Decomposition Programs <> Spec (π) Desc > Parametric OR Prog > :> Forms Bayesian Bayesian > > Programs Networks Maps > : > Identification based on Data (δ) : Question DBNs Bayesian Figure 2: Structure of a bayesian program. Markov Loc. Filters Particle discrete semi-cont. continuous MCML POMDPs Filters HMMs HMMs HMMs rules needed for probability calculus, it is very general. Kalman MDPs Filters Indeed, this formalism will be used in the rest of this More specific paper, to present all the other models we consider. Figure 1: Probabilistic modelling formalisms treated In the BP formalism, a bayesian program is a struc- in this paper and their general-to-specific partial or- ture (see Figure 2) made of two components. dering. The first is a declarative component, where the user defines a description. The purpose of a description is to specify a method to compute a joint distribu- not easily compared. Therefore, we obtain a partial tion over a set of relevant variables {X1,X2,...,Xn}, ordering over the space of all possible models, which given a set of experimental data δ and preliminary can be seen Figure 1. knowledge π. This joint distribution is denoted Of course, generality allows a bigger power of ex- P (X1 X2 ...Xn | δ π). To specify this distribution, pression (more models will be instances of general the programmer first lists the pertinent variables (and formalisms). Whereas specialization allows for effi- defines their domains), then, applying Bayes’ rule, de- cient solutions to inference and learning issues. For composes the joint distribution as a product of simpler instance, the Baum-Welch algorithm, and the closed terms (possibly stating conditional independence hy- form solution of the state estimation problem in the potheses so as to simplify the model and/or the com- case of KFs, both come from and justify the hypothe- putations), and finally, assigns forms to each term of ses made by HMMs and KFs, respectively. the selected product (these forms can be parametric The rest of the paper is organized as follows. Sec- forms, or recursive questions to other bayesian pro- tion 2 introduces the Bayesian Programming formal- grams). If there are free parameters in the parametric ism. Each subsequent section briefly presents one of forms, they have to be assessed. They can be given by the formalisms of Figure 1, traversing the tree in an al- the programmer (a priori programming) or computed most depth-first manner. For each of these formalisms, on the basis of a learning mechanism defined by the we will rewrite them in the BP framework and nota- programmer and some experimental data δ. tions, describe their most salient features, and provide references for the interested reader. Sections 2 The second component is of a procedural nature, to 5 present general purpose models, where the mod- and consists of using the previously defined description elling choices are made independently of any specific with a question, i.e. computing a probability distri- knowledge about the phenomenon, while Sections 6 bution of the form P (Searched | Known). Answering a to 9 present problem oriented models, where problem “question” consists in deciding a value for the variable dependent knowledge are exploited; these are taken Searched according to P (Searched | Known), where from the domain of robotics. Searched and Known are conjunctions of variables ap- pearing in disjoint subsets of the n variables. It is well known that general Bayesian inference is a very dif- 2 Bayesian Programming ficult problem, which may be practically intractable. But, as this paper is mainly concerned with modelling We now introduce BP, a Bayesian Programming issues, we will assume that the inference problems are solved and implemented in an efficient manner by the methodology. We briefly summarize it here, but still 1 invite the interested reader to refer to Lebeltel et al. programmer or by an inference engine . for all the details about this methodology and its use in robotics [10] . 1The one we use to tackle these problems has been described As this formalism is only based on the inference elsewhere [8]. 3 Bayesian Networks Very efficient inference techniques have been devel- opped to answer questions of the form P (Xi | Known), Bayesian Networks, first introduced by Pearl [11], where Known is a subset of the other variables of the have emerged as a primary method for dealing with BN. However, some difficulties appear for more gen- probabilistic and uncertain information. They are the eral questions (i.e. with more than one variable on the result of the marriage between the theory of probabil- left hand side). ities and the theory of graphs. They are defined by Readings on BNs should start by the books by the Bayesian Program of Figure 3. Pearl [11], Lauritzen [12], Jordan [13] and Frey [14]. 8 8 8 Variables > > > > > > X1,...,XN 4 Dynamic Bayesian Networks > > > > > <> Decomposition > <> S <> D P (X1 ...XN ) = QN P (Xi| P ai), P > i=1 To deal with time and to model stochastic pro- > > with P ai ⊆ {X1,...,Xi−1} > > > cesses, the framework of Bayesian Networks has been > > : Forms: any > > extended to Dynamic Bayesian Networks [15]. Given a > : Identification: any > : Question: P (Xi | Known) graph representing the structural knowledge at time t, supposing this structure to be time-invariant and time to be discrete, the resulting DBN is the repetition of Figure 3: The BN formalism rewritten in BP. the first structure from a start time to a final time. Each part at time t in the final graph is named a time The pertinent variables are not constrained and slice. The DBNs are defined by the Bayesian Program have no specific semantics. of Figure 4. The decomposition, on the contrary, is specific: it is a product of distributions of one variable Xi, con- ditioned by a conjunction of other variables, called its 8 8 8 Variables > > > “parents”, P ai, P ai ⊆ {X1,...,Xi−1}. This assumes > > > X1,...,XN ,...,X1 ,...,XN > > > 0 0 T T > > > Decomposition that variables are ordered, and ensures that apply- > > <> > <> S P (X1 ...XN ) ing Bayes’ rule correctly defines the joint distribution.

Load more