STAT 535 Lecture 2 Independence and Conditional Independence C

STAT 535 Lecture 2 Independence and Conditional Independence C

STAT 535 Lecture 2 Independence and conditional independence c Marina Meil˘a [email protected] 1 Conditional probability, total probability, Bayes’ rule Definition of conditional distribution of A given B. PAB(a, b) PA|B(a|b) = PB(b) whenever PB(b) = 0 (and PA|B(a|b) undefined otherwise). Total probability formula. When A, B are events, and B¯ is the complement of B P (A) = P (A, B)+ P (A, B¯) = P (A|B)P (B)+ P (A|B¯)P (B¯) When A, B are random variables, the above gives the marginal distribution of A. PA(a) = X PAB(a, b) = X PA|B(a|b)PB(b) b∈Ω(B) b∈Ω(B) Bayes’ rule PAPB|A PA|B = PB The chain rule Any multivariate distribution over n variables X1,X2,...Xn can be decomposed as: PX1,X2,...Xn = PX1 PX2|X1 PX3|X1X2 PX4|X1X2X3 ...PXn|X1...Xn−1 1 2 Probabilistic independence A ⊥ B ⇐⇒ PAB = PA.PB We read A ⊥ B as “A independent of B”. An equivalent definition of independence is: A ⊥ B ⇐⇒ PA|B = PA The above notation are shorthand for for all a ∈ Ω(A), b ∈ Ω(B), PA|B(a|b) = PA(a) whenever PB =0 PAB(a, b) = PA(a) whenever PB =0 PB(b) PAB(a, b) = PA(a)PB(b) Intuitively, probabilistic independence means that knowing B does not bring any additional information about A (i.e. doesn’t change what we already be- lieve about A). Indeed, the mutual information1 of two independent variables is zero. Exercise 1 [The exercises in this section are optional and will not influence your grade. Do them only if you have never done them before.] Prove that the two definitions of independence are equivalent. Exercise 2 A, B are two real variables. Assume that A, B are jointly Gaussian. Give a necessary and sufficient condition for A ⊥ B in this case. Conditional independence A richer and more useful concept is the conditional independence between sets of random variables. 1An information theoretical quantity that measures how much information random variable A has about random variable B. 2 A ⊥ B | C ⇐⇒ PAB|C = PA|C .PB|C ⇐⇒ PA|BC = PA|C Once C is known, knowing B brings no additional information about A. Here are a few more equivalent ways to express conditional independence. The expressions below must hold for every a, b, c for which the respective conditional probabilities are defined. PABC (a, b|c) = PA|C (a|c)PB|C (b|c) PABC (a, b, c) PAC (a, c)PBC (b, c) = 2 PC (c) PC (c) PA|C (a|c)PB|C(b|c) PABC (a, b, c) = PC (c) P (a, b, c) PA C (a, c) ABC = | PBC (b, c) PC (c) PA|BC (a|b, c) = PA|C (a|c) PABC (a, b, c) = PC (c)PA|C (a|c)PB|C(b|c) Since independence between two variables implies the joint probability dis- tribution factors, it implies far fewer parameters are necessary to represent the joint distribution (rA + rB rather than rArB), and other important sim- plifications such as A ⊥ B ⇒ E[AB] = E[A]E[B][Exercise 3 Prove this relationship.] These definitions extend to sets of variables: AB ⊥ CD ≡ PABCD = PABPCD and AB ⊥ CD|EF ≡ PABCD|EF = PAB|EF PCD|EF . 3 Furthermore, independence of larger sets of variables implies independence of subsets (for fixed conditions): AB ⊥ CD|EF ⇒ A ⊥ CD|EF,A ⊥ C|EF, . [Exercise 4 Prove this.] Notice that A ⊥ B does not imply A ⊥ B|C, nor the other way around. [Exercise 5 Find examples of each case.] Lemma 1 (Factorization Lemma) A ⊥ C | B iff there exist functions φAB,φBC so that PABC = φABφBC . Proof The ⇒ statement is obvious, so we will now prove the converse impli- cation, that is, if PABC = φABφBC for all (a, b, c) ∈ ΩABC , then A ⊥ C | B. PAB(a, b) = X φAB(a, b)φBC (b, c) = φAB(a, b)ψB (b) (1) c ′ PBC (b, c) = X φAB(a, b)φBC (b, c) = φBC (b, c)ψB (b) (2) a ′ PB(b) = ψB(b)ψB (b) (3) ′ PABPBC φAB(a, b)ψB (b)φBC (b, c)ψB(b) = ′ = PABC (4) PB ψB(b)ψB(b) Two different factorizations of the joint probability P (a, b, c) when A ⊥ B|C are a good basis for understanding the two primary types of graphical models we will study: undirected graphs, also known as Markov random fields (MRFs); and directed graphs, also known as Bayes’ nets (BNs) and belief networks. One factorization, PAC (a, c)PBC (b, c) PABC (a, b, c) = = φAC (a, c)φBC (b, c) PC (c) 4 is into a product of potential functions defined over subsets of variables with a term in the denominator that compensates for “double-counting” of variables in the intersection. From this factorization an undirected graphical represen- tation of the factorization can be constructed by adding an edge between any two variables that cooccur in a potential. The other factorization, PABC (a, b, c)= PC (c)PA|C (a|c)PB|C (b|c), is into a product of conditional probability distributions that imposes a par- tial order on the variables (e.g C comes before A, B). From this factorization a directed graphical model can be constructed by adding directed edges that match the “causality” implied by the conditional distributions. In particular, if the factorization involves PX|YZ then directed edges are added from Y and Z to X. 3 The (semi)-graphoid axioms [Exercise: how many possible conditional independence relations are there between n variables?] Some independence relations imply others. [Exercise. A ⊥ B, B ⊥ C ⇒ A ⊥ C? A ⊥ B, B ⊥ C ⇒ A ⊥ C? Prove or give counterexamples.] For any distribution PV over a set of variables V the following properties of independence hold. Let X,Y,Z,W be disjoint subsets of discrete variables from V . [S] X ⊥ Y | Z ⇒ Y ⊥ X | Z (Symmetry) “If Y is irrelevant to X then X is irrelevant to Y ” [D] X ⊥ Y W | Z ⇒ X ⊥ Y | Z (Decomposition) “If some information is irrelevant to X then part of it is also irrelevant to X” 5 [WU] X ⊥ Y W | Z ⇒ X ⊥ Y | WZ (Weak union) “If Y and W together are irrelevant to X then learning one of them does not make the other one relevant” [C] X ⊥ Y | Z and X ⊥ W | YZ ⇒ X ⊥ Y W | Z (Contraction) “If Y is irrelevant to X, and if learning Y does not make W relevant to X, then W was irrelevant to start with” [I] If P is strictly positive for all instantiations of the variables, X ⊥ Y | WZ and X ⊥ W | YZ ⇒ X ⊥ Y W | Z (Intersection) Properties S, D, WU, C are called the semi-graphoid axioms. The semi- graphoid axioms together with property [I] are called the graphoid axioms. Note that C is the converse of (WU and D). 6.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us