Bayesian Cognition Probabilistic models of action, perception, inference, decision and learning

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #1 Bayesian Cognition

Cours 2 : Bayesian Programming

Julien Diard http://diard.wordpress.com [email protected] CNRS - Laboratoire de Psychologie et NeuroCognition, Grenoble Pierre Bessière CNRS - Institut des Systèmes Intelligents et de Robotique, Paris

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #2 Contents / Schedule

• c1 : fondements théoriques, • c1 : Mercredi 11 Octobre définition du formalisme de la • c2 : Mercredi 18 Octobre programmation bayésienne • c3 : Mercredi 25 Octobre • *pas de cours la semaine du 30 • c2 : programmation Octobre* bayésienne des robots • c4 : Mercredi 8 Novembre • c5 : Mercredi 15 Novembre • c3 : modélisation bayésienne • *pas de cours la semaine du 20 cognitive Novembre* • c6 : Mercredi 29 Novembre • c4 : comparaison bayésienne de modèles • Examen ?/?/? (pour les M2)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #3 Plan

• Summary & questions! • Basic concepts: minimal example and spam detection example • Bayesian Programming methodology – Variables – Decomposition & hypotheses – Parametric forms (demo) – Learning – Inference • Taxonomy of Bayesian models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #4 Theory As Extended Logic • Probabilités • Probabilités « fréquentistes » « subjectives » E.T. Jaynes (1922-1998) – Une probabilité est une – Référence à un état de propriété physique d'un connaissance d'un sujet • P(« il pleut » | Jean), objet P(« il pleut » | Pierre) – Axiomatique de • Pas de référence à la limite Kolmogorov, théorie d’occurrence d’un des ensembles événement (fréquence) • Probabilités conditionnelles N – P (A)=fA = limN – P(A | π) et jamais P(A) ⇥ N – Statistiques classiques – Statistiques bayésiennes • Population parente, etc.

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #5 Principle Incompleteness Preliminary Knowledge + Bayesian Learning Experimental Data = Probabilistic Representation Uncertainty

P(a) P( a) 1 + ¬ = P(a∧b) = P(a)P(b | a) = P(b)P(a | b)

Decision€

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #6 Règles de calcul • Règle du produit P (AB C)=P (A C)P (B AC) | | | = P (B C)P (A BC) | | Reverend Thomas Bayes è Théorème de Bayes (~1702-1761) P (B C)P (A BC) P (B AC)= | | , si P (A C) =0 | P (A C) | • Règle de la somme |

P (A C)+P (A¯ C)=1 P ([A = a] C)=1 | | | a A è Règle de marginalisation P (AB C)=P (B C) | | A

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #7 Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #8 Mon premier Programme Bayésien

• Variables A = {il pleut, il ne pleut pas} B = {Jean a son p., Jean n'a pas son p.}

Sp. • Décomposition Desc. P(B A) = P(A) P(B | A) PB • Formes paramétriques Tables de probabilités conditionnelles

P(B | A) A=il pleut A=il ne pleut P(A) A=il pleut A=il ne pleut pas pas B=Jean n'a pas 0,05 0,9 Identification : 0,4 0,6 son parapluie B=Jean a son 0,95 0,1 parapluie Question : P(A | B) = P(A) P(B | A) / P(B)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #9 Plan

• Summary & questions! • Basic concepts: minimal example and spam detection example • Bayesian Programming methodology – Variables – Decomposition & conditional independence hypotheses – Parametric forms (demo) – Learning – Inference • Taxonomy of Bayesian models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #10 Bayesian Spam Detection

• Classify texts in 2 categories “spam” or “nonspam” – Only available information: a set of words • Adapt to the user and learn from experience

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #11 Variables

• Spam – Boolean variable: {True, False}

• W0, W1, ..., WN-1

– Wi is the presence or absence of word i in a text – Boolean variables: {True, False}

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #12 Decomposition

P(Spam ∧W0 ∧ ... ∧WN−1)

= P(Spam)× P(W0 | Spam)× P(W1 |W0 ∧Spam)

× ... × P(WN−1 |WN−2 ∧ ... ∧W0 ∧Spam)

€ P(Spam W0 ... WN 1) ^ ^ ^ N 1 P(Spam) P(W Spam) ⇡ ⇥ n | Yn=0

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #13 Importance des hypothèses d’indépendance conditionnelle • Nombre de probabilités – Attention : nb proba ≠ nb param libres – Avant hypothèses d’indépendance conditionnelle N+1 P(Spam W0 ... W N 1) 2 probabilités ^ ^ ^ – Après hypothèses N 1 P(Spam) P(W Spam) ⇥ n | 2 + 4N probabilités n=0 Diard – LPNC/CNRSY Cognition Bayésienne – 2017-2018 #14 Graphical representation

P(Spam ∧W ∧ ... ∧W ) P(Spam W0 ... WN 1) 0 N−1 ^ ^ ^ N 1 = P(Spam)× P(W0 | Spam)× P(W1 |W0 ∧Spam) P(Spam) P(Wn Spam) × ... × P(WN−1 |WN−2 ∧ ... ∧W0 ∧Spam) ⇡ ⇥ | Yn=0

Spam Spam €

W0 W1 W2 … WN-1 W0 W1 W2 … WN-1

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #15 P(Spam ∧W0 ∧ ... ∧WN−1) N−1 = P(Spam)× ∏ P(Wn | Spam) Parametric forms n=0 and identification

€ Could also be P([Spam = false]) = 0.25 computed from a learning database

P Spam = true = 0.75 ([ ]) P([Spam = false]) =θ € P([Spam = true]) =1−θ

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #16 P(Spam ∧W0 ∧ ... ∧WN−1) N−1 = P(Spam)× ∏ P(Wn | Spam) Parametric forms n=0 and identification

€ n Nbref P([Wn = true]| [Spam = false]) = Nbref

P([Wn = false]| [Spam = false]) =1− P([Wn = true]| [Spam = false]) n Nbret P([Wn = true]| [Spam = true]) = Nbret

P([Wn = false]| [Spam = true]) =1− P([Wn = true]| [Spam = true])

Attention, si un mot Wn n'est jamais vu dans un spam, alors si on le voit dans un mail m, m ne peut pas être un spam

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #17 P(Spam ∧W0 ∧ ... ∧WN−1) N−1 = P(Spam)× ∏ P(Wn | Spam) Parametric forms n=0 and identification • Loi de succession de€ Laplace

n 1+ Nbref P([Wn = true]| [Spam = false]) = 2 + Nbref

P([Wn = false]| [Spam = false]) =1− P([Wn = true]| [Spam = false]) n 1+ Nbret P([Wn = true]| [Spam = true]) = 2 + Nbret

P([Wn = false]| [Spam = true]) =1− P([Wn = true]| [Spam = true])

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #18 Identification

n 1+ Nbref P([Wn = true]| [Spam = false]) = 2 + Nbref n 1+ Nbret P([Wn = true]| [Spam = true]) = 2 + Nbret

Notion de paramètre libre, de base de données d’apprentissage, d’algorithme d’identification de paramètres

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #19 P(Spam ∧W0 ∧ ... ∧WN−1) Joint distribution and N−1 = P(Spam)× ∏ P(Wn | Spam) questions (1) n=0

P(Spam) = ∑P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) W0∧ ... ∧WN-1 N−1 P(Spam) = ∑ P(Spam)× ∏ P(Wn | Spam) € W0∧ ... ∧WN-1 n=0 = P(Spam)

€ Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #20 P(Spam ∧W0 ∧ ... ∧WN−1) Joint distribution and N−1 = P(Spam)× ∏ P(Wn | Spam) questions (2) n=0

€ P(Wn ) = ∑P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) Spam∧Wi≠n

N−1 P(Wn ) = ∑ P(Spam)× ∏ P(Wn | Spam) € Spam∧Wi≠n n=0

= ∑P(Spam)× P(Wn | Spam) Spam # n & # n & 1+ Nbref 1+ Nbret P([Wn = true]) = % 0.25 × ( +% 0.75 × ( % ( 2 + Nbre $ 2 + Nbref ' $ t ' € Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #21

€ P(Spam ∧W0 ∧ ... ∧WN−1) Joint distribution and N−1 = P(Spam)× ∏ P(Wn | Spam) questions (3) n=0

∑P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) Wi≠n P(Wn | [Spam = true]) = € ∑P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) W0∧ ... ∧WN-1 n 1+ Nbret P(Wn | [Spam = true]) = 2 + Nbret €

€ P(Spam)× P([Wn = true] | Spam) P(Spam | [Wn = true]) = ∑ P(Spam)× P([Wn = true] | Spam) Spam

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #22 P(Spam ∧W0 ∧ ... ∧WN−1) Joint distribution and N−1 = P(Spam)× ∏ P(Wn | Spam) questions (4) n=0

€ P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) P(Spam |W0 ∧ ... ∧ WN-1) = ∑P(Spam ∧W0 ∧ ... ∧ Wn ∧ ... ∧WN−1) Spam N−1 P(Spam)× ∏ P(Wn | Spam) P Spam |W ∧ ... W = n=0 € ( 0 N−1) N−1 ∑ P(Spam)× ∏ P(Wn | Spam) Spam n=0

Diard – LPNC/CNRS € Cognition Bayésienne – 2017-2018 #23 Bayesian Program

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #24 Résultat : exemple numérique

• 5 mots considérés : fortune, next, programming, money, you • 1 000 mails dans la base de données d’apprentissage (250 non spams, 750 spams) n P(Wn | [Spam=false]) P(Wn | [Spam=true])

W =false W =true W =false W =true n n n n n n n Word n af at 0 fortune 0 375 0 0.996032 0.00396825 0.5 0.5 1 next 125 0 1 0.5 0.5 0.99867 0.00132979 2 programming 250 0 2 0.00396825 0.996032 0.99867 0.00132979 3 money 0 750 3 0.996032 0.00396825 0.00132979 0.99867 4 you 125 375 4 0.5 0.5 0.5 0.5

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #25 Résultat : exemple numérique

• 25=32 mails possibles

Subset Words present P(Spam | w0 … w4) number [Spam = false] [Spam = true]

3 {money} 5.24907e-06 0.999995

11 {next, money} 0.00392659 0.996073

12 {next, money, you} 0.00392659 0.996073

15 {next, programming, money} 0.998656 0.00134393

27 {fortune, next, money} 1.57052e-05 0.999984

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #26 Results

SpamSieve http://c-command.com/spamsieve/

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #27 Plan

• Summary & questions! • Basic concepts: minimal example and spam detection example • Bayesian Programming methodology – Variables – Decomposition & conditional independence hypotheses – Parametric forms (demo) – Learning – Inference • Taxonomy of Bayesian models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #28 Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #29 Logical Proposition

Logical Propositions are denoted by lowercase names: a

Usual logical operators: a∧b a∨b € ¬a

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #30 Probability of Logical Proposition To assign a probability to a given proposition a, it is necessary to have at least some preliminary knowledge, summed up by a proposition π. P(a | π) ∈ [0,1]

Probabilities of the conjunctions, disjunctions and negations of propositions: € P(a ∧ b | π) P(a ∨ b | π) P(¬a | π) Probability of proposition a conditioned by both the preliminary knowledge π and some other proposition b: P(a | b ∧ π)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #31 Normalization and Product rule

P(a | π) + P(¬a | π) =1

€ P(a∧b | π) = P(a | π) × P(b | a∧π) = P(b | π) × P(a | b∧π)

€ P(a∨b | π) = P(a | π) + P(b | π) − P(a∧b | π)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #32 Discrete Variable

• Variables are denoted by names starting with one uppercase letter: X • Definition: a discrete variable X is a set of propositions xi – Mutually exclusive: i ≠ j ⇒ x ∧ x = false € [ i j ] – Exhaustive: at least one€ is true • The cardinal€ of X is denoted: "X# € • Continuous variable: limit case when "! X$#→ ∞

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #33 € Variable combination

• Variable conjunction

X ∧Y = {xi ∧ yj } = Z

• Variable disjunction

X ∨Y = {xi ∨ y j }

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #34 € Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #35 Description

The purpose of a description is to specify an effective method to compute a joint distribution on a set of variables: {X1, X 2 ,..., X n} given some preliminary knowledge π and a set of experimental data δ. € This joint distribution is denoted as:

P(X1 ∧ X 2 ∧...∧ X n |δ ∧π)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #36 Decomposition Partition in K subsets: Li ≠ ∅ Li = Xi1 ∧ Xi2 ∧... Conjunction rule: 1 2 n € P(X ∧ X ∧...∧ X |δ ∧π) = P(L1 |δ ∧π) × P(L2 | L1 ∧δ ∧π) ×...× P(Lk | Lk−1 ∧...∧L1 ∧δ ∧π) Conditional independence hypotheses: Ri ⊂ Li-1∧ ... ∧ L1 P Li | Li−1 ∧...∧L1 ∧δ ∧π € ( ) = P(Li | Ri ∧δ ∧π) Decomposition: P(X1 ∧ X 2 ∧...∧ X n |δ ∧π) € = P(L1 |δ ∧π) × P(L2 | R2 ∧δ ∧π) ×...× P(Lk | Rk ∧δ ∧π)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #37 € Indépendance et indépendance conditionnelle • Indépendance – P(X Y) = P(X) P(Y) – P(X | Y) = P(X) • Indépendance conditionnelle – P(X Y | Z) = P(X | Z) P(Y | Z) – P(X | Y Z) = P(X | Z) – P(Y | X Z) = P(Y | Z)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #38 Independence vs Conditional Independence

Indépendance mais pas indépendance conditionnelle :

I0 indépendant de F0 : P(F0 | I0) = P(F0) I0 pas indépendant de F0, conditionnellement à S0 : P(F0 | I0 S0) ≠ P(F0 | S0)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #39 Independence vs Conditional Independence

Indépendance conditionnelle mais pas indépendance :

S2 indépendant de C0, conditionnellement à O0 : P(S2 | C0 O0) = P(S2 | O0) S2 pas indépendant de C0 : P(S2 | C0) ≠ P(S2)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #40 Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #41 Parametrical Forms or Recursive Questions

Parametrical form:

P Li | Ri ∧δ ∧π = f Li ( ) µ(Ri,δ)( )

Recursive Question: € P(Li | Ri ∧δ ∧π) = P(Li | Ri ∧δ % ∧π %) - modular and hierarchical programs - (or coherence variables) €

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #42 « Vocabulaire probabiliste »

• Démo Mathematica

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #43 Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #44 Learning

• « Algorithmically » – P(X) is Gaussian, data {X0, …, XT}: estimate µ(X0, …, XT), σ(X0, …, XT) • « Learning as Bayesian inference » P(X0 ... XT XT+1 µ ) = P(XT+1 µ )P(µ X0 ... XT )P(X0 ... XT ) | |

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #45 Learning as inference

P(X0 ... XT XT+1 µ ) = P(XT+1 µ )P(µ X0 ... XT )P(X0 ... XT ) | | • Inference « without » learning (values) P(XT+1 X0 ... XT ) P(X0 ... XT XT+1 µ ) | / µ, X • More natural model P(X0 ... XT XT+1 µ ) = P(Xt µ )P(µ ) | t • Prior Yon parameter space – « bet », convergence speed, etc. • and , etc.

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #46 Programme Bayésien

• Variables

Spécification • Décomposition Description de des PB P(X1 … Xn | π δ) connaissances préalables π • Formes paramétriques

Identification à partir des données δ

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #47 Translating knowledge into the model? • No unique manner – Limiting the domain to [a,b] = setting a parametric form with probability 0 outside of [a,b] – Conditional independence hypothesis • Explicit in the decomposition • Implicit in the parametric form

0,1 0,2 0,1 0,4 0,2 0,1 0,2 0,1 0,4 0,2 0,1 0,2 0,1 0,4 0,2

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #48 Programme Bayésien

• Variables

Spécification • Décomposition Description PB • Formes paramétriques

Identification

Question

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #49 Question

Given a description, a question is obtained by partitioning the set of variables into 3 subsets: the searched variables (not empty), the known variables and the free variables.

We define the Search, Known and Free as the conjunctions of the variables belonging to these three sets.

We define the corresponding question as the distribution:

P(Search | Known∧δ ∧π)

€ Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #50 How to choose the question to solve?

• Non systematic choice… • Compute P(Search | Known δ π)

– Draw at random s ß drawS P(Search | Known δ π)

– Maximize probability s ß maxS P(Search | Known δ π) – Multiply with cost function and minimize expected loss • = Multiply with reward function and maximize expected gain • (Bayesian decision theory) • Compute

– P(X1 X2 | X4 δ π)

– or first P(X1 | X4 δ π) then P(X2 | X4 δ π)?

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #51 Inférence probabiliste

• Théorème – Si on connaît la distribution conjointe

P(X1 X2 … Xn) – Alors on peut calculer n'importe quelle « question » • Une question : un terme quelconque P(S | K),

avec S et K des sous-ensemble de X1 X2 … Xn • Exemple :

– P(X1 | [Xn = xn]), P(X2 X4 | [X3 = x3])

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #52 Preuve

Searched variables: S = Xi ...Xj

Known variables: K = Xk ...Xl Free variables: F = X ...X (S K) 1 n \ P (S K)= P (SF K) | | ⇥F P (SFK) = F P(K) ≠ 0 P (K) P (SFK) = F P (SFK) S,F 1 P (S K)= P (SFK) | Z ⇥F

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #53 Inférence probabiliste : en pratique • Inférence dans le cas général – NP-difficile (Cooper, 90) • Deux problèmes d’optimisation – Draw (P(Search | Known∧δ ∧π)) – P (Search | Known∧δ ∧π) 1 € = × ∑P(Search∧Known∧Unknown |δ ∧π) Z Free

€ Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #54 Inférence probabiliste

• Méthodes générales d’optimisation – Méthode de Monte Carlo • MCMC Monte Carlo • Métropolis, Métropolis-Hastings • Gibbs sampling • Solution efficace problème dépendante – Filtre à particule • Solution analytique problème dépendante – Filtre de Kalman

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #55 Inférence probabiliste : moteur d’inférence

http://probabilistic-programming.org/wiki/Home

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #56 Inférence probabiliste : importance du modèle • Hypothèses d’indépendance conditionnelle – Réduisent l’espace du modèle – Réduit aussi le temps d’inférence • Sommation imbriquées • Petits espaces de sommation • Modèle modulaire, modèle hiérarchique – Programmation Bayésienne structurée

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #57 What can we do with a p(X | Y)?

• Only 2 main operators

1. Draw – Draw a sample x according to P(X | Y) 2. Compute – Compute the probability value P([X=x] | Y)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #58 What can we do with a probability distribution p(X | Y)?

• Many “summary” operators

• Maximize – Find point x where P([X=x] | Y) is maximal • Compute moments – Mean µ, variance σ2, etc., precision (inverse of variance 1/σ2) • Compute expectation

– E(P(X)) = 〈P(X)〉 = ΣX x P(X=x) • Compute its entropy

– H(P(X | Y)) = -ΣX P(X | Y) log P(X | Y) – Surprise of a value x : S(X=x) = log (1/P(X=x)) • Entropy is the expectation of surprise: a uniform maximizes surprise and entropy, a Dirac minimizes surprise and entropy

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #59 Plan

• Summary & questions! • Basic concepts: minimal example and spam detection example • Bayesian Programming methodology – Variables – Decomposition & conditional independence hypotheses – Parametric forms (demo) – Learning – Inference • Taxonomy of Bayesian models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #60 Synonymie : une chose, plusieurs noms

• Kalman Filters • Bayesian networks – Linear-gaussian state-space model – Belief networks – Linear dynamical system – Generative models • Dynamic Bayesian Network – Recursive graphical models – Dynamic belief network – DPIN directed probabilistic inference networks – Temporal belief network – Causal (belief) network • Markov Localisation – Probabilistic (causal) network – Filtre bayésien • Markov Random Fields – Input-output HMM – UPIN undirected probabilistic • State-space model inference networks – Continuous HMM – Undirected • Auto-regressive HMM – Markov networks – Correlation HMM – Log-linear models – Conditionnally gaussian HMM • Gibbs distribution switching regression model • Maxent models – Switching markov model – Switching regression model

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #61 Polysémie : un nom, plusieurs choses

• Filtres Bayésiens

– Avec variable de contrôle Ut

– Sans variables de contrôle Ut • Filtres de Kalman

– Avec variable de contrôle Ut

– Sans variables de contrôle Ut

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #62 First-order Propositional model

Probabilistic Object-oriented Knowledge based relational models Bayes nets model construction

Outcome Space relational proofs, tuples of ground terms structures nested data structures

Specificity IBAL SLPs full distribution constraints Parameterization Halpern’s logic, PLPs CPDs weights

RMNs, Decomposition Markov logic

independent probabilistic choices dependencies

Set of Objects PHA, ICL, PRISM, LPADs known unknown

BUGS, RBNs, BLPs, PRMs, BLOG, MEBN DAPER models

Fig. 1. A taxonomy of first-order probabilistic languages.

Milch, B. and Russell, S. (2007). First-order probabilistic languages: Into the unknown. In predicateInductiveAccepted Logic isProgramming a family of binary-valued, pages 10–24. random Springer. variables Ai, indexed by natural numbers i that represent papers. Similarly, the function PrimaryAuthor can be represented as an indexed family of random variables Pi, whose values are naturalDiard numbers – LPNC/CNRS representing researchers. Thus, instantiations of a set of random variablesCognition can representBayésienne relational – 2017-2018 structures. Indexed families of random #63 variables are a basic modeling element in the BUGS system [37], where they are represented graphically using “plates” that contain co-indexed nodes. There are two well-known FOPLs whose possible outcomes are not relational structures in the sense we have defined. One is stochastic logic programs (SLPs) [17]. An SLP defines a distribution over proofs from a given logic program. If a particular goal predicate R is specified, then an SLP also defines a distribution over tuples of logical terms: the probability of a tuple (t1,...,tk) is the sum of the of proofs of R(t1,...,tk). SLPs are useful for defining distributions over objects that can be encoded as terms, such as strings or trees; they can also emulate more standard FOPLs [31]. The other prominent FOPL with a unique space is IBAL [26], a programming language that allows stochastic choices. An IBAL program defines a distribution over environments that map symbols to values. These values may be individual symbols, like the values of variables in a BN; but they may also be other environments, or even functions. This analysis defines the top level of the taxonomy shown in Figure 1. In the rest of the paper, we will focus on languages that define probability distributions over relational structures. First-order Propositional model Probabilistic Logic

K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. Ph. D. thesis, University of California, Berkeley, Berkeley, CA, July 2002.

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #64 S. Roweis and Z. Ghahramani. A unifying review of linear gaussian models. Neural Computation, 11(2):305–345, February 1999.

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #65 First-order Propositional model Probabilistic Logic

Factor Graphs • Réseaux bayésiens Prob. FG • Réseaux bayésiens dynamiques • Filtres bayésiens • Modèles de Markov Cachés • Filtres de Kalman • Processus de décision markovien (partiellement observable) • …

J. Diard, P. Bessière, and E. Mazer. A survey of probabilistic models, using the bayesian programming methodology as a unifying framework. In The Second International Conference on Computational Intelligence, and Autonomous Systems (CIRAS 2003), Singapore, December 2003.

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #66 Factor Graphs

• Bipartite graph representing the factorization

of a function m g(X , X ,...,X ) = f (S ), with S X , X ,...,X 1 2 n j j j ✓ { 1 2 n} • Example Yj=1

g(X1, X2, X3)

= f1(X1) f2(X1, X2)

f3(X1, X2) f4(X2, X3)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #67 Probabilistic Factor Graphs

• Each fi is a P(.) • No constraint relative to Bayes’ theorem • P (X X ) = P(X X )P(X X ) 1 2 1 | 2 2 | 1 – is a valid Probabilistic FG – is not a valid probabilistic model à Probabilistic FG is a strict superset of the set of probabilistic models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #68 Bayesian Networks

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #69 Exercice : Existe-t-il des décompositions valides qui ne sont pas Bayesian Networks représentables par un DAG ?

• Directed Acyclic Graph E (DAG) – Correspond à une décomposition valide • Structure RA GL AC – analyser les (in)dépendances locales (d-séparation, etc) IR FL

Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988) Steffen Lauritzen, Graphical Models (1996) Michael Jordan, Learning in Graphical Models (1998) Brendan Frey, Graphical Models for Machine Learning and Digital Communication (1998)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #70 E query Bayesian Networks inférence RA GL AC

IR evidence • Inférence exacte FL P(E | [IR=i] [FL=f]) – Message passing • propagation locale de distributions • clique tree, junction tree, etc. • Inférence approximée – Importance sampling, loopy , variational methods

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #71 Bayesian Networks E apprentissage RA GL AC

• Apprentissage de paramètres IR FL – Tout observé (facile) – Variables manquantes : Expectation-Maximisation (EM) • Espérance sur les variables manquantes sachant les variables observées • Modèle de maximum de probabilité en supposant que les variables manquantes ont leur valeur espérée • Apprentissage de structure – Recovery algorithm (distinguishability of local structure) – Iterative methods: • Start from a given structure (e.g., fully connected) • Generate a variation, evaluate it, accept it if it is better • Iterate until convergence

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #72 Dynamic Bayesian Networks

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #73 Dynamic Bayesian Network

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #74 (recursive) Bayesian Filters

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #75 (recursive) Bayesian Filters

Modèle global

Modèle local (stationnaire) Hypothèse de Markov d'ordre 1

prior ! « appris »

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #76 Recursive (incremental) computation for filtering

P(S t | O 0:t−1 ) = ∑[P(S t | S t−1 )× P(S t−1 | O 0:t−1 )] S t−1

€ P(S t | O 0:t ) = P(O t | S t )× P(S t | O 0:t−1 )

€ Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #77 Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #78 P(S0 | O0)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #79 P(S1 | O0) =∑[P(S1 | S0) × P(S0 | O0)] S0

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #80 P(S1 | O1:0) = P(O1 | S1) × P(S1 | O0)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #81 P(S2 | O0:1) =∑[P(S2 | S1) × P(S1 | O0:1)] S1

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #82 P(S2 | O0:2) = P(O2 | S2) × P(S2 | O0:1)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #83 P(S3 | O0:2) =∑[P(S3 | S2) × P(S2 | O0:2)] S2

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #84 P(S3 | O0:3) = P(O3 | S3) × P(S3 | O0:2)

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #85 Hidden Markov Models

All discrete variables

Forward-backward, EM 0:T 1 T 0:T max P (S S O ) 0:T 1 S |

8 Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #86 Kalman Filters

Inférence è produit matriciel !

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #87 Markov Localization models

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #88 (Partially Observable) Markov Decision Process

• Markov localisation

Bayesian Decision Theory

i i • Reward function R : S A R ⇥ – Cost, loss function

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #89 (Partially Observable) Markov Decision Process

• Maximiser l’espérance du gain t – Horizon infini : max Rt , with [0, 1[ R ⇥ t=0 ⇥ ⇤ • Algorithmes approximés – Policy iteration – Value iteration

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #90 Thanks for your attention!

Questions ?

Diard – LPNC/CNRS Cognition Bayésienne – 2017-2018 #91