Advanced Machine Learning Summer 2019
Total Page:16
File Type:pdf, Size:1020Kb
Course Outline Advanced Machine Learning • Regression Techniques Summer 2019 Linear Regression Regularization (Ridge, Lasso) Kernels (Kernel Ridge Regression) Part 10 – Graphical Models IV 09.05.2019 • Deep Reinforcement Learning • Probabilistic Graphical Models Bayesian Networks Prof. Dr. Bastian Leibe Markov Random Fields Inference (exact & approximate) RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de • Deep Generative Models Generative Adversarial Networks Variational Autoencoders Visual Computing Institute | Prof. Dr . Bastian Leibe 2 Advanced Machine Learning Part 10 – Graphical Models IV Topics of This Lecture Recap: Factor Graphs • Recap: Exact inference Sum-Product algorithm Max-Sum algorithm Junction Tree algorithm • Applications of Markov Random Fields Application examples from computer vision Interpretation of clique potentials Unary potentials Pairwise potentials • Joint probability • Solving MRFs with Graph Cuts Can be expressed as product of factors: Graph cuts for image segmentation Factor graphs make this explicit through separate factor nodes. s-t mincut algorithm Extension to non-binary case • Converting a directed polytree Applications Conversion to undirected tree creates loops due to moralization! Conversion to a factor graph again results in a tree! Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 3 Advanced Machine Learning 4 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: C. Bishop, 2006 Recap: Sum-Product Algorithm Recap: Sum-Product Algorithm • Objectives • Two kinds of messages Efficient, exact inference algorithm for finding marginals. Message from factor node to variable nodes: • Procedure: . Sum of factor contributions ¹fs x(x) Fs(x; Xs) Pick an arbitrary node as root. ! ´ X Compute and propagate messages from the leaf nodes to the root, Xs storing received messages at every node. = fs(xs) ¹xm fs (xm) ! Compute and propagate messages from the root to the leaf nodes, Xs m ne(fs) x X 2 Y n storing received messages at every node. Message from variable node to factor node: Compute the product of received messages at each node for which the . Product of incoming messages marginal is required, and normalize if necessary. ¹xm fs (xm) ¹fl xm (xm) ! ´ ! l ne(xm) fs • Computational effort 2 Y n Simple propagation scheme. Total number of messages = 2 ¢ number of graph edges. Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 5 Advanced Machine Learning 6 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop 1 Recap: Sum-Product from Leaves to Root Recap: Sum-Product from Root to Leaves fa fb Message definitions: Message definitions: fc ¹fs x(x) fs(xs) ¹xm fs (xm) ! ´ ! Xs m ne(fs) x X 2 Y n ¹xm fs (xm) ¹fl xm (xm) ! ´ ! l ne(xm) fs 2 Y n Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 7 Advanced Machine Learning 8 Advanced Machine Learning Part 10 – Graphical Models IV 7 Part 10 – Graphical Models IV Image source: C. Bishop, 2006 Image source: C. Bishop, 2006 Max-Sum Algorithm Max-Sum Algorithm – Key Ideas • Objective: an efficient algorithm for finding • Key idea 1: Distributive Law (again) Value xmax that maximises p(x); max(ab; ac) = a max(b; c) Value of p(xmax). max(a + b; a + c) = a + max(b; c) Application of dynamic programming in graphical models. Exchange products/summations and max operations exploiting the tree structure of the factor graph. • In general, maximum marginals joint maximum. • Key idea 2: Max-Product Max-Sum Example: We are interested in the maximum value of the joint distribution p(xmax) = max p(x) x Maximize the product p(x). For numerical reasons, use the logarithm. Maximize the sum (of log-probabilities). Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 12 Advanced Machine Learning 13 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop Max-Sum Algorithm Max-Sum Algorithm • Maximizing over a chain (max-product) • Initialization (leaf nodes) • Recursion • Exchange max and product operators Messages For each node, keep a record of which values of the variables gave rise to the maximum state: • Generalizes to tree-structured factor graph Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 14 Advanced Machine Learning 15 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop Image source: C. Bishop, 2006 Slide adapted from Chris Bishop 2 Max-Sum Algorithm Visualization of the Back-Tracking Procedure • Termination (root node) • Example: Markov chain Score of maximal configuration stored links Value of root node variable giving rise to that maximum states Back-track to get the remaining variable values max max xn 1 = Á(xn ) ¡ variables Same idea as in Viterbi algorithm for HMMs… Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 16 Advanced Machine Learning 17 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop Slide adapted from Chris Bishop Image source: C. Bishop, 2006 Topics of This Lecture Junction Tree Algorithm • Recap: Exact inference • Motivation Sum-Product algorithm Exact inference on general graphs. Max-Sum algorithm Works by turning the initial graph into a junction tree with one node per Junction Tree algorithm clique and then running a sum-product-like algorithm. • Applications of Markov Random Fields Intractable on graphs with large cliques. Motivation Unary potentials Pairwise potentials • Solving MRFs with Graph Cuts Graph cuts for image segmentation s-t mincut algorithm Extension to non-binary case Applications Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 18 Advanced Machine Learning 19 Advanced Machine Learning Part 10 – Graphical Models IV 18 Part 10 – Graphical Models IV 19 B. Leibe B. Leibe Junction Tree Algorithm Junction Tree Algorithm • Motivation • Starting from an undirected graph… Exact inference on general graphs. Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm. Intractable on graphs with large cliques. • Main steps 1. If starting from directed graph, first convert it to an undirected graph by moralization. 2. Introduce additional links by triangulation in order to reduce the size of cycles. 3. Find cliques of the moralized, triangulated graph. 4. Construct a new graph from the maximal cliques. 5. Remove minimal links to break cycles and get a junction tree. Apply regular message passing to perform inference. Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 20 Advanced Machine Learning 21 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani 3 Junction Tree Algorithm Junction Tree Algorithm 2. Triangulate 1. Convert to an undirected graph through moralization. Such that there is no loop of length > 3 without a chord. Marry the parents of each node. This is necessary so that the final junction tree satisfies the “running intersection” property (explained later). Remove edge directions. Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 22 Advanced Machine Learning 23 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani Junction Tree Algorithm Junction Tree Algorithm 4. Construct a new junction graph from maximal cliques. 3. Find cliques of the moralized, triangulated graph. Create a node from each clique. Each link carries a list of all variables in the intersection. Drawn in a “separator” box. Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 24 Advanced Machine Learning 25 Advanced Machine Learning Part 10 – Graphical Models IV Part 9 – Graphical Models III Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani Image source: Z. Gharahmani Junction Tree Algorithm Junction Tree – Properties • Running intersection property “If a variable appears in more than one clique, it also appears in all 5. Remove links to break cycles junction tree. intermediate cliques in the tree”. For each cycle, remove the link(s) with the minimal number of shared This ensures that neighboring cliques have consistent probability nodes until all cycles are broken. distributions. Result is a maximal spanning tree, the junction tree. Local consistency global consistency Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 26 Advanced Machine Learning 27 Advanced Machine Learning Part 9 – Graphical Models III Part 9 – Graphical Models III Image source: Z. Gharahmani Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani 4 Interpretation of the Junction Tree Junction Tree: Example 1 • Undirected graphical model