A survey of probabilistic models, using the Bayesian Programming methodology as a unifying framework

Julien Diard,∗ Pierre Bessière, and Emmanuel Mazer Laboratoire GRAVIR / IMAG – CNRS INRIA Rhône-Alpes, 655 avenue de l’Europe 38330 Montbonnot Saint Martin FRANCE [email protected]

Abstract Several authors have already been interested in the relations between some of these models. Smyth, for This paper presents a survey of the most common instance, expresses the Kalman Filters (KFs) and the probabilistic models for artefact conception. We use Hidden Markov Models (HMMs) in the Bayesian Net- a generic formalism called Bayesian Programming, work (BN) framework [1, 2]. Murphy replaces KFs, which we introduce briefly, for reviewing the main HMMs and a lot of their variants in the Dynamic probabilistic models found in the literature. Indeed, (DBN) formalism [3, 4] . Finally, we show that Bayesian Networks, Markov Localization, Ghahramani focuses on the relations between BNs, Kalman filters, etc., can all be captured under this sin- DBNs, HMMs, and some variants of KFs [5, 6, 7]. gle formalism. We believe it offers the novice reader a good introduction to these models, while still providing The current paper includes the above, and adds Re- the experienced reader an enriching global view of the cursive Bayesian Estimation, Bayesian Filters (BFs), field. Particle Filters (PFs), Markov Localization mod- els, Monte Carlo Markov Localization (MCML), and (Partially Observable) Markov Decision Processes (POMDPs and MDPs). This paper builds upon 1 Introduction previous works, where more models are presented: Bessière et al. treat Mixture Models, Maximum En- We think that over the next decade, probabilistic tropy Approaches, Sensor Fusion, and Classification reasoning will provide a new paradigm for understand- in [8], while Diard treats Markov Chains and develops ing neural mechanisms and the strategies of animal Bayesian Maps in [9]. behaviour at a theoretical level, and will raise the per- All these models will be presented by their rewriting formance of engineering artefacts to a point where they into the Bayesian Programming formalism, which will are no longer easily outperformed by the biological ex- allow to use a unique notation and structure through- amples they are imitating. out the paper. This gives the novice reader an efficient Rational reasoning with incomplete and uncertain first introduction to a wide variety of models and their information is quite a challenge for artificial systems. relations. This also, hopefully, brings other readers the The purpose of probabilistic inference is precisely to “big picture”. By making the hypotheses made by each tackle this problem with a well-established formal the- model explicit, we also bring to perspective all their ory. During the past years a lot of progress has been variants, and shed light upon the possible variatons made in this field both from the theoretical and ap- that are yet to be explored. plied point of view. The purpose of this paper is We will present the different models following a to give an overview of these works and especially to general-to-specific ordering. The “generality” measure try a synthetic presentation using a generic formalism we have chosen takes into account the number of hy- named Bayesian Programming (BP). potheses made by each model: the less hypotheses made by a model, the more general it is. However, ∗ Julien Diard is currently with the Laboratoire de Physiolo- not all hypotheses are comparable: for example, spe- gie de la Perception et de l’Action, Collège de France, Paris and the Department of Mechanical Engineering of the National Uni- cialising the semantics of variables or specialising the versity of Singapore. He can be reached at [email protected]. dependency structure of the probabilistic models are 8 8 8 More general > Pertinent variables Bayesian > > <> > <> Decomposition Programs <> Spec (π)  Desc > Parametric OR Prog > :> Forms Bayesian Bayesian > > Programs Networks Maps > : > Identification based on Data (δ) : Question DBNs

Bayesian Figure 2: Structure of a bayesian program. Markov Loc. Filters

Particle discrete semi-cont. continuous MCML POMDPs Filters HMMs HMMs HMMs rules needed for calculus, it is very general. Kalman MDPs Filters Indeed, this formalism will be used in the rest of this More specific paper, to present all the other models we consider. Figure 1: Probabilistic modelling formalisms treated In the BP formalism, a bayesian program is a struc- in this paper and their general-to-specific partial or- ture (see Figure 2) made of two components. dering. The first is a declarative component, where the user defines a description. The purpose of a description is to specify a method to compute a joint distribu- not easily compared. Therefore, we obtain a partial tion over a set of relevant variables {X1,X2,...,Xn}, ordering over the space of all possible models, which given a set of experimental data δ and preliminary can be seen Figure 1. knowledge π. This joint distribution is denoted Of course, generality allows a bigger power of ex- P (X1 X2 ...Xn | δ π). To specify this distribution, pression (more models will be instances of general the programmer first lists the pertinent variables (and formalisms). Whereas specialization allows for effi- defines their domains), then, applying Bayes’ rule, de- cient solutions to inference and learning issues. For composes the joint distribution as a product of simpler instance, the Baum-Welch algorithm, and the closed terms (possibly stating hy- form solution of the state estimation problem in the potheses so as to simplify the model and/or the com- case of KFs, both come from and justify the hypothe- putations), and finally, assigns forms to each term of ses made by HMMs and KFs, respectively. the selected product (these forms can be parametric The rest of the paper is organized as follows. Sec- forms, or recursive questions to other bayesian pro- tion 2 introduces the Bayesian Programming formal- grams). If there are free parameters in the parametric ism. Each subsequent section briefly presents one of forms, they have to be assessed. They can be given by the formalisms of Figure 1, traversing the tree in an al- the programmer (a priori programming) or computed most depth-first manner. For each of these formalisms, on the basis of a learning mechanism defined by the we will rewrite them in the BP framework and nota- programmer and some experimental data δ. tions, describe their most salient features, and pro- vide references for the interested reader. Sections 2 The second component is of a procedural nature, to 5 present general purpose models, where the mod- and consists of using the previously defined description elling choices are made independently of any specific with a question, i.e. computing a probability distri- knowledge about the phenomenon, while Sections 6 bution of the form P (Searched | Known). Answering a to 9 present problem oriented models, where problem “question” consists in deciding a value for the variable dependent knowledge are exploited; these are taken Searched according to P (Searched | Known), where from the domain of . Searched and Known are conjunctions of variables ap- pearing in disjoint subsets of the n variables. It is well known that general is a very dif- 2 Bayesian Programming ficult problem, which may be practically intractable. But, as this paper is mainly concerned with modelling We now introduce BP, a Bayesian Programming issues, we will assume that the inference problems are solved and implemented in an efficient manner by the methodology. We briefly summarize it here, but still 1 invite the interested reader to refer to Lebeltel et al. programmer or by an inference engine . for all the details about this methodology and its use in robotics [10] . 1The one we use to tackle these problems has been described As this formalism is only based on the inference elsewhere [8]. 3 Bayesian Networks Very efficient inference techniques have been devel- opped to answer questions of the form P (Xi | Known), Bayesian Networks, first introduced by Pearl [11], where Known is a subset of the other variables of the have emerged as a primary method for dealing with BN. However, some difficulties appear for more gen- probabilistic and uncertain information. They are the eral questions (i.e. with more than one variable on the result of the marriage between the theory of probabil- left hand side). ities and the theory of graphs. They are defined by Readings on BNs should start by the books by the Bayesian Program of Figure 3. Pearl [11], Lauritzen [12], Jordan [13] and Frey [14].

8 8 8 Variables > > > > > > X1,...,XN 4 Dynamic Bayesian Networks > > > > > <> Decomposition > <> S <> D P (X1 ...XN ) = QN P (Xi| P ai), P > i=1 To deal with time and to model stochastic pro- > > with P ai ⊆ {X1,...,Xi−1} > > > cesses, the framework of Bayesian Networks has been > > : Forms: any > > extended to Dynamic Bayesian Networks [15]. Given a > : Identification: any > : Question: P (Xi | Known) graph representing the structural knowledge at time t, supposing this structure to be time-invariant and time to be discrete, the resulting DBN is the repetition of Figure 3: The BN formalism rewritten in BP. the first structure from a start time to a final time. Each part at time t in the final graph is named a time The pertinent variables are not constrained and slice. The DBNs are defined by the Bayesian Program have no specific semantics. of Figure 4. The decomposition, on the contrary, is specific: it is a product of distributions of one variable Xi, con- ditioned by a conjunction of other variables, called its 8 8 8 Variables > > > “parents”, P ai, P ai ⊆ {X1,...,Xi−1}. This assumes > > > X1,...,XN ,...,X1 ,...,XN > > > 0 0 T T > > > Decomposition that variables are ordered, and ensures that apply- > > <> > <> S P (X1 ...XN ) ing Bayes’ rule correctly defines the joint distribution. <> D 0 T i > T N Also note that X denotes one and only one variable. P > > 1 N Q Q i i > > > = P (X0 ...X0 ) P (Xt | Rt) Therefore, the following model, which can fit into a > > > t=0 i=1 > > :> > > Forms: any BP, does not into a BN: > :> > Identification: any :> i P (ABCD) = P (AB)P (C | A)P (D | B). Question: P (XT | Known)

In a BN, if A and B are to appear together on the Figure 4: The DBN formalism rewritten in BP. left hand side of a term, as in P (AB), they have to be merged into a single variable hA, Bi, and cannot be i In Figure 4, Rt is a conjunction of variables subsequently separated, as in P (C | A) and P (D | B). 1 i−1 S 1 N taken in the set {Xt ,...,Xt } {Xt−1,...,Xt−1}. An obvious bijection exists between joint proba- i This means that Xt depends only on its parents bility distributions defined by such a decomposition 1 i−1 at time t ({Xt ,...,Xt }), as in a regular BN, and directed acyclic graphs: nodes are associated to and on some variables from the previous time slice variables, and oriented edges are associated to con- 1 N ({Xt−1,...,Xt−1}). ditional dependencies. Using graphs in probabilistic QN P (Xi| Ri) defines a graph for time slice t, and models leads to an efficient way to define hypotheses i=1 t t all time slices are identical when the time index t is over a set of variables, an economic representation of changing. a joint and, most importantly, an easy and efficient way to do probabilistic inference. These hypotheses are very often made: the fact that a time slice only depends on the previous one is com- The parametric forms are not constrained theoret- monly called the first order Markov assumption. The ically, but in BN commercial softwares they are very fact that all time slices are identical is the stationar- often restricted to probability tables (as in Netica), or ity hypothesis. In this case, the model defined by a tables and constrained Gaussians (as in Hugin 2). time slice is called the local model, and is said to be 2http://www.norsys.com and http://www.hugin.com. time-invariant, or even homogeneous. As can easily be seen on Figure 4, a DBN as a whole, “prediction” (k > 0), where one tries to extrapolate fu- “unrolled” over time, may be considered as a regular ture state from past observations, or to do “smoothing” BN. Consequently the usual inference techniques ap- (k < 0), where one tries to recover a past state from plicable to BN are still valid for such “unrolled” DBNs. observations made either before or after that instant. The best introduction, survey and starting point However, some more complicated questions may also on DBNs is Murphy’s Ph.D. thesis [4]. The interested be asked (see Section 5.2). reader can also refer to papers by Kanazawa et al. [16], Bayesian Filters (k = 0) have a very interesting re- or to Ghahramani for the learning aspects in DBNs [5]. cursive property which contributes largely to their in- terest. Indeed, P (St | O0 ...Ot) may be simply com- puted from P (St−1 | O0 ...Ot−1) with the following 5 Recursive Bayesian Estimation formula (derivation omitted):

5.1 Bayesian Filtering, Prediction and P (St | O0 ...Ot) = Smoothing P (Ot | St) X [(P (S | S )P (S | O ...O )] (1) Recursive Bayesian Estimation is the generic de- t t−1 t−1 0 t−1 S nomination for a very largely applied class of numer- t−1 ous different probabilistic models of time series. They In the case of prediction and smoothing, the imbri- are defined by the Bayesian Program of Figure 5. cated sums for solving the same question can quickly become a huge computational burden. Readers interested by Bayesian Filtering should re- 8 8 8 Variables > > > fer to [17]. > > > S ,...,S ,O ,...,O > > > 0 T 0 T > > > Decomposition > > <> > <> S P (S ...S O ...O ) = 5.2 Hidden Markov Models > D 0 T 0 T > > P (S )P (O | S ) > > > 0 0 0 <> > > QT Hidden Markov Models are a very popular special- > > i=1 [P (Si | Si−1)P (Oi | Si)] P > :> ization of Bayesian Filters. They are defined by the > > Forms: any > :> > Identification: any Bayesian Program of Figure 6. > > Question: P (St+k | O0 ...Ot) > > k = 0 ≡ Filtering > 8 8 8 Variables > k > 0 ≡ Prediction > > > :> > > > S ,...,S ,O ,...,O k < 0 ≡ Smoothing > > > 0 T 0 T > > > Decomposition > > > > > > P (S ...S O ...O ) = > > > 0 T 0 T Figure 5: Recursive Bayesian estimation in BP. > > <> P (S )P (O | S ) > <> S 0 0 0 <> D QT [P (S | S )P (O | S )] P > i=1 i i−1 i i Variables S0,...,ST are a time series of “state” vari- > > Forms > > > ables considered on a time horizon ranging from 0 to > > > P (S0) ≡ Matrix > > > T . Variables O ,...,O are a time series of “observa- > > > P (Si | Si−1) ≡ Matrix 0 T > > > > > : P (O | S ) ≡ Matrix tion” variables on the same horizon. > > i i > : Identification: Baum-Welch algorithm The decomposition is based on three terms. > : Question: P (S S ...S − 1 | S O ...O ) P (Si | Si−1), called the “system model” or “transition 0 1 t t 0 t model”, formalizes the knowledge about transitions from state at time i − 1 to state at time i. P (Oi | Si), Figure 6: The HMM formalism rewritten in BP. called the “observation model”, expresses what can be observed at time i when the system is in state Si. Fi- Variables are supposed to be discrete. Therefore, nally, a prior P (S0) is defined over states at time 0. the transition model P (Si | Si−1) and the observation The question usually asked to these models is model P (Oi | Si) are both specified using probability P (St+k | O0 ...Ot): what is the probability distribu- matrices (or CPTs for tables). tion for state at time t + k knowing the observations Variants exist about this particular point: when the made from time 0 to t, t ∈ 1,...,T ? The most com- observation variables are continuous, the formalism mon case is Bayesian Filtering where k = 0, which becomes known as “semi-continuous HMMs” [18, 4]. means that one searches for the present state knowing In this case, the observation model is associated either the past observations. However it is also possible to do with a Gaussian form, or a Mixture of Gaussian form. The most popular question asked to HMMs is A nice tutorial by Welch and Bishop may be found P (S0 S1 ...St−1 | St O0 ...Ot): what is the most prob- on the internet [21]. For a more complete mathemati- able series of states that leads to the present state cal presentation one should refer to a report by Barker knowing the past observations? This particular ques- et al. [22], but these are only two entries to a huge lit- tion may be answered with a specific and very efficient erature concerning the subject. algorithm called the “”. Finally, the “Baum-Welch” algorithm is a spe- 5.4 Particle Filters cific learning algorithm that has been developped for HMMs: it computes the most probable observation The fashionable Particle Filters (PFs) may be seen and transition models from a set of experimental data. as a specific implementation of Bayesian Filters. Two nice entry points into the huge HMM literature The distribution P (St−1 | O0 ...Ot−1) is approxi- are the tutorial by Rabiner [18] and the chapter 6 of mated by a set of N particles having weights propor- his book Fundamentals of [19]. tional to their . The recursive Equation 1 is then used to inspire a dynamic process that produces 5.3 Kalman Filters an approximation of P (St | O0 ...Ot). The principle of this dynamical process is that the particles are first The very well known Kalman Filters [20] are an- moved according to the transition model P (St | St−1), other specialization of Bayesian Filters. They are de- and then their weights are updated according to the fined by the Bayesian Program of Figure 7. observation model P (Ot | St). See the tutorial by Arulampalam et al. for a 8 8 8 Variables start [23]. > > > > > > S ,...,S ,O ,...,O > > > 0 T 0 T > > > Decomposition > > > > > > P (S ...S O ...O ) = > > > 0 T 0 T 6 Markov Localization > > <> P (S )P (O | S ) > <> S 0 0 0 <> D QT > i=1 [P (Si | Si−1)P (Oi | Si)] P > > A possible alternative to Bayesian Filters is to add > > > Forms > > > control variables A0,...,At−1 to the model. This ex- > > > P (S0) ≡ G(S0, µ, σ) > > > tension is sometimes called input-output HMM [24, 25, > > > P (Si | Si−1) ≡ G(Si,A · Si−1,Q) > > :> 6, 26], or sometimes Bayesian Filters still, but, in the > > P (Oi | Si) ≡ G(Oi,H · Si,R) > :> > Identification field of robotics, it has received more attention under :> Question: P (St | O0 ...Ot) the name of Markov Localization (ML) [27, 28]. In this field, such an extension is natural, as a robot can Figure 7: The KF formalism rewritten in BP. observe its state by sensors, but can also influence its state via motor commands. ML models are defined by Variables are continuous. The transition model the Bayesian Program of Figure 8. P (Si | Si−1) and the observation model P (Oi | Si) are both specified using Gaussian laws with means that 8 8 8 Variables > > > are linear functions of the conditioning variables. > > > S ,...,S ,A ,...,A ,O ,...,O > > > 0 T 0 T −1 0 T Due to these hypotheses, and using the recursive > > > Decomposition > > > Equation 1, it is possible to analytically solve the in- > > <> P (S ...S A ...A O ...O ) = > <> S 0 T 0 T −1 0 T ference problem to answer the usual P (S | O ...O ) <> D P (S )P (O | S ) t 0 t P > 0 0 0 > > T » – question. This leads to an extremely efficient algo- > > > Q P (Ai−1)P (Si | Ai−1 Si−1) > > > rithm that explains the popularity of Kalman Filters > > > i=1 P (Oi | Si) > > :> and the number of their everyday applications. > > Forms: Matrices or Particles > :> When there is no obvious linear transition and ob- > Identification :> servation models, it is still often possible, using a first Question: P (St | A0 ...At−1 O0 ...Ot) order Taylor’s expansion, to consider that these mod- els are locally linear. This generalization is commonly Figure 8: The ML formalism rewritten in BP. called the Extended (EKF). Sometimes KFs also include action variables, and thus become Starting from a Bayesian Filter structure, the con- specializations of Markov Localization models (see trol variable is used to refine the transition model Section 6). P (Si | Si−1) into P (Si | Ai−1 Si−1), which is then called the action model. The rest of the dependency (less than 1), Rt is the reward obtained at time t, structure is unchanged. and h·i is the mathematical expectation. Given this The forms can be defined in several ways: they measure, the goal of the planning process is to find a are commonly matrices, but when they are imple- optimal mapping from probability distributions over mented using Particles (in a similar manner as the one states to actions (a policy). This planning process, presented Section 5.4), the model takes the name of which leads to intractable computations, is sometimes Monte Carlo Markov Localization (MCML). Among approximated using iterative algorithms called policy the forms, P (Ai−1) is almost always assumed to be iteration or value iteration. These algorithms start uniform (and thus disappears of Equation 2). with random policies, and improve them at each step The resulting model is used to answer the ques- until some numerical convergence criterion is met. Un- tion P (St | A0 ...At−1 O0 ...Ot), which estimates the fortunately, state-of-the-art implementations of these state of the robot, given past actions and observations: algorithms still cannot cope with state spaces of more when this state represents the position of the robot in than a hundred states [31]. its environment, this amounts to localization. This An introduction to POMDPs is proposed by Kael- question is similar to the Bayesian Filtering question, bling et al. [32]. and can also be solved in a recursive manner: 7.2 Markov Decision Processes P (St | A0 ...At−1 O0 ...Ot) = P (At−1)P (Ot | St) Another class of approach for tackling the in-   X P (St | Ai−1 St−1) tractability of the planning problem in POMDPs is (2) P (St−1 | A0 ...At−2 O0 ...Ot−1) to suppose that the robot knows what state it is in. S t−1 The state becomes observable, therefore the obser- A reference paper to ML and its use in robotics is vation variable and model are not needed anymore: the survey by Thrun [29]. the resulting formalism is called a (Fully Observable) Markov Decision Process (MDP), and is summed up by the Bayesian Program of Figure 9. 7 Decision Theoretic Planning

8 8 8 Variables Partially Observable Markov Decision Processes > > > > > > S ,...,S ,A ,...,A (POMDPs) and Markov Decision Processes (MDPs) > > > 0 T 0 T −1 > > > Decomposition are used in robotics to model a robot that has to plan > > <> > <> P (S ...S A ...A ) = <> S 0 T 0 T −1 and to execute a sequence of actions. A complete re- D T P > Q view of POMDPs and MDPs by Boutilier et al. [30] is > > P (S0) [P (Ai−1)P (Si | Ai−1 Si−1)] > > > > > > i=1 an interesting starting point. > > : Forms: Matrices > > > : Identification > 7.1 Partially Observable Markov Decision : Question: P (A0 ...At−1 | St S0) Processes

Formally, POMDPs use the same probabilistic Figure 9: The MDP formalism rewritten in BP. model than Markov Localization except that it is en- riched by the definition of a reward (and/or cost) func- MDPs can cope with planning in state-spaces bigger tion. than POMDPs, but are still limited to some hundreds This reward function R models which states are of states. Therefore, many recent research efforts are good for the robot, and which actions are costly. In aimed toward hierarchical decomposition of the plan- the most general notation, it therefore is a function ning and modelling problems in MDPs, especially in that associates, for each couple state - action, a real the robotic field, where their full observability hypoth- esis makes their practical use difficult [33, 34, 31]. valued number: R : Si,Ai → IR. The reward function helps driving the planning pro- cess. Indeed, the aim of this process is to find an op- timal plan in the sense that it maximizes a certain 8 Bayesian Robot Programming measure based on the reward function. This measure is most frequently the expected discounted cumula- Lebeltel et al. applied the BP methodology to mo- P∞ t tive reward, h t=0 γ Rti, where γ is a discount factor bile robotics [35, 10]. The adequacy of the method as a robotic programming tool has been demonstrated the action variable(s), knowing some other variables? through a succession of increasingly complex experi- This constraint ensures that the model will be, in fine, ments: learning of simple behaviours, behaviour com- used for controlling the robot. That the model is bination, sensor fusion, hierarchical behaviour compo- also used for explicitly computing a distribution over sition, situation recognition and temporal sequencing. state variables (i.e., localization) is an optional goal, This series of experiments were also integrated for solv- that can be convenient for human-robot communica- ing a complex robotic task, showing the possibility of tion purposes, for example. incremental development. The interested reader can refer to Diard et al. for more details concerning the Bayesian Map formalism, and in particular for the definition of the Abstraction 9 Bayesian Maps and Superposition Operators, that allow for building hierarchies of Bayesian Maps [9, 38]. One of the most crucial problems in robotics is the modelling by the robot of its environment. Most com- mon approaches rely either on ML models, or variants 10 Conclusion of KFs (for example, that include action variables). However, as can be easily seen Figure 8, the only ques- In this paper, we have presented a large class of tion asked to such models is a localization question, of probabilistic modelling formalisms. They all have the form P (St | A0 ...At−1 O0 ...Ot). Therefore, the been rewritten in the Bayesian Programming frame- action variables are merely used as input variables, as work, which proves its generality. Moreover, the ex- the model is not concerned with computing probability ploring of the relations between the various formalisms distributions over actions. was greatly eased, because BP forces to express ex- As noted by Diard [9] and Thrun [36], such a sepa- plicitly all the hypotheses made, from the choice of ration between the localization and control models is variables to the use of the model (questions). not satisfying. This remark, among others, led to the We think that this studying of the different hy- definition of the Bayesian Map model, which is a gen- potheses can be very fruitful, especially for designing eralization of ML models. The local model (see Sec- new formalisms, as we have briefly illustrated by some tion 4) of a Bayesian Map is defined by the Bayesian of the reasoning that led to the Bayesian Map model. Program of Figure 10. Acknowledgements 8 8 8 Variables > > > This work has been supported by the BIBA european > > < S ,S 0 ,A ,O > <> t t t t project (IST-2001-32115). The authors would also like > S <> D > Decomposition: any > :> to thank Pr Marcelo. H. Ang Jr. P > Forms: any > :> > Identification: any > > Question: behaviour definition > i i ` i´ : P (A | X),A ⊆ A, X ⊆ {P,Lt,Lt0 ,A}\ A References

Figure 10: The Bayesian Map formalism in BP. [1] P. Smyth, D. Heckerman, and M. I. Jordan. Probabilis- tic independence networks for hidden markov probability models. Neural Computation, 9(2):227–269, 1997. A Bayesian Map is a model that includes four vari- [2] P. Smyth. Belief networks, hidden markov models, and ables: an observation, an action, and a state vari- markov random fields: a unifying view. Pattern Recogni- able at time t, and a state variable at a later time tion Letters, 1998. 0 t . The decomposition is not constrained (if the four [3] K. Murphy. An introduction to graphical models. Technical variables are atomic, there are already 1015 decompo- report, University of California, Berkeley, May 2001. sitions to choose from – Attias recently exploited one [4] K. Murphy. Dynamic Bayesian Networks : Representa- of these [37]), nor are the forms or the identification tion, Inference and Learning. Ph. D. thesis, University of phase. California, Berkeley, Berkeley, CA, July 2002. On the contrary, the use of this model is strongly [5] Z. Ghahramani. Learning dynamic Bayesian networks. In constrained, as we require that the model generates be- C. Lee Giles and Marco Gori, editors, Adaptive Processing i of Sequences and Data Structures, number 1387 in Lec- haviours, which are questions of the form P (A | X): ture Notes in Artificial Intelligence, LNAI, pages 168–197. what is the probability distribution over (a subset of) Springer-Verlag, 1998. [6] Z. Ghahramani. An introduction to hidden markov models [24] Y. Bengio and P. Frasconi. An input/output HMM archi- and bayesian networks. Journal of Pattern Recognition and tecture. In G. Tesauro, D.S. Touretzky, and T.K. Leen, Artificial Intelligence, 15(1):9–42, 2001. editors, Advances in Neural Information Processing Sys- [7] S. Roweis and Z. Ghahramani. A unifying review of lin- tems 7, pages 427–434. MIT Press, Cambridge, MA, 1995. ear gaussian models. Neural Computation, 11(2):305–345, [25] T. W. Cacciatore and S. J. Nowlan. Mixtures of con- February 1999. trollers for jump linear and non-linear plants. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, [8] P. Bessière, J.-M. Ahuactzin, O. Aycard, D. Bellot, F. Co- Advances in Neural Information Processing Systems, vol- las, C. Coué, J. Diard, R. Garcia, C. Koike, O. Lebel- ume 6, pages 719–726. Morgan Kaufmann Publishers, Inc., tel, R. LeHy, O. Malrait, E. Mazer, K. Mekhnacha, 1994. C. Pradalier, and A. Spalanzani. Survey: Probabilistic methodology and techniques for artefact conception and [26] M. Meila and M. I. Jordan. Learning fine motion by development. Technical report rr-4730, INRIA Rhône- markov mixtures of experts. In D. Touretzky, M. C. Mozer, Alpes, Montbonnot, France, 2003. and M. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages –. Morgan Kaufmann [9] J. Diard. La carte bayésienne – Un modèle proba- Publishers, Inc., 1996. biliste hiérarchique pour la navigation en robotique mobile. Ph.D. thesis, Institut National Polytechnique de Grenoble, [27] W. Burgard, D. Fox, D. Hennig, and T. Schmidt. Estimat- Grenoble, France, Janvier 2003. ing the absolute position of a mobile robot using position probability grids. In Proceedings of the Thirteenth Na- [10] O. Lebeltel, P. Bessière, J. Diard, and E. Mazer. Bayesian tional Conference on Artificial Intelligence and the Eighth robot programming. Autonomous Robots (accepted for Innovative Applications of Artificial Intelligence Confer- publication), 2003. ence, pages 896–901, Menlo Park, August, 4–8 1996. AAAI [11] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Press / MIT Press. Networks of Plausible Inference. Morgan Kaufmann, San [28] S. Thrun, W. Burgard, and D. Fox. A probabilistic ap- Mateo, Ca, 1988. proach to concurrent mapping and localization for mobile [12] S. L. Lauritzen. Graphical Models. Clarendon Press, Ox- robots. Machine Learning and Autonomous Robots (joint ford, 1996. issue), 31/5:1–25, 1998. [13] M. Jordan. Learning in Graphical Models. MIT Press, [29] S. Thrun. Probabilistic algorithms in robotics. AI Maga- 1998. zine, 21(4):93–109, 2000. [14] B. J. Frey. Graphical Models for Machine Learning and [30] C. Boutilier, T. Dean, and S. Hanks. Decision theoretic Digital Communication. MIT Press, 1998. planning: Structural assumptions and computational lever- age. Journal of Artificial Intelligence Research, 10:1–94, [15] T. Dean and K. Kanazawa. A model for reasoning about 1999. persistence and causation. Computational Intelligence, 5(3):142–150, 1989. [31] J. Pineau and S. Thrun. An integrated approach to hierar- chy and abstraction for POMDPs. Technical Report CMU- [16] K. Kanazawa, D. Koller, and S. Russell. Stochastic sim- RI-TR-02-21, Carnegie Mellon University, August 2002. ulation algorithms for dynamic probabilistic networks. In [32] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Plan- Philippe Besnard and Steve Hanks, editors, Proceedings of ning and acting in partially observable stochastic domains. the 11th Conference on Uncertainty in Artificial Intelli- Artificial Intelligence, 101(1-2):99–134, 1998. gence (UAI’95), pages 346–351, San Francisco, CA, USA, August 1995. Morgan Kaufmann Publishers. [33] M. Hauskrecht, N. Meuleau, L. P. Kaelbling, T. Dean, and C. Boutilier. Hierarchical solution of Markov decision pro- [17] A. H. Jazwinsky. Stochastic Processes and Filtering The- cesses using macro-actions. In Gregory F. Cooper and Ser- ory. New York : Academic Press, 1970. afín Moral, editors, Proceedings of the 14th Conference on [18] L. R. Rabiner. A tutorial on hidden Markov models and Uncertainty in Artificial Intelligence (UAI-98), pages 220– selected applications in speech recognition. Proc. of the 229, San Francisco, July, 24–26 1998. Morgan Kaufmann. IEEE Trans. on ASSP, 77(2):257–285, February 1989. [34] T. Lane and L. P. Kaelbling. Toward hierarchical decompo- [19] L. R. Rabiner and B.-H. Juang. Fundamentals of Speech sition for planning in uncertain environments. In Proceed- Recognition, chapter Theory and implementation of Hid- ings of the 2001 IJCAI Workshop on Planning under Un- den Markov Models, pages 321–389. Prentice Hall, Engle- certainty and Incomplete Information, Seattle, WA, Au- wood Cliffs, New Jersey, 1993. gust 2001. AAAI Press. [20] R. E. Kalman. A new approach to linear filtering and pre- [35] O. Lebeltel. Programmation Bayésienne des Robots. dictive problems. Transactions ASME, Journal of basic Ph.D. thesis, Institut National Polytechnique de Grenoble, engineering, 82:34–45, 1960. Grenoble, France, Septembre 1999. [21] G. Welch and G. Bishop. An introduction to the kalman [36] S. Thrun. Robotic mapping : A survey. Technical Re- filter. Technical Report TR95-041, Computer Science De- port CMU-CS-02-111, Carnegie Mellon University, Febru- partment, Univerity of North Carolina, Chapel Hill, 1995. ary 2002. [22] A. L. Barker, D. E. Brown, and W. N. Martin. Bayesian es- [37] H. Attias. Planning by probabilistic inference. In Ninth In- timation and the Kalman Filter. Technical Report IPC-94- ternational Workshop on Artificial Intelligence and Statis- 02, Institute for Parallel Computation, University of Vir- tics Proceedings, 2003. ginia, August, 5 1994. [38] J. Diard, P. Bessière, and E. Mazer. Hierarchies of prob- [23] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. abilistic models of space for mobile robots: the bayesian A tutorial on particle filters for on-line non-linear/non- map and the abstraction operator. In Reasoning with Un- gaussian bayesian tracking. IEEE Transactions of Signal certainty in Robotics (IJCAI’03 Workshop) (to appear), Processing, 50(2):174–188, February 2002. 2003.