Course Outline

Advanced • Regression Techniques Summer 2019  Linear Regression  Regularization (Ridge, Lasso)  Kernels (Kernel Ridge Regression) Part 10 – Graphical Models IV 09.05.2019 • Deep Reinforcement Learning • Probabilistic Graphical Models  Bayesian Networks Prof. Dr. Bastian Leibe  Markov Random Fields  Inference (exact & approximate) RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de • Deep Generative Models  Generative Adversarial Networks  Variational

Visual Computing Institute | Prof. Dr . Bastian Leibe 2 Advanced Machine Learning Part 10 – Graphical Models IV

Topics of This Lecture Recap: Factor Graphs

• Recap: Exact inference  Sum-Product  Max-Sum algorithm  Junction Tree algorithm • Applications of Markov Random Fields  Application examples from computer vision  Interpretation of clique potentials  Unary potentials  Pairwise potentials • Joint probability • Solving MRFs with Graph Cuts  Can be expressed as product of factors:  Graph cuts for image segmentation  Factor graphs make this explicit through separate factor nodes.  s-t mincut algorithm  Extension to non-binary case • Converting a directed polytree  Applications  Conversion to undirected tree creates loops due to moralization!  Conversion to a factor graph again results in a tree! Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 3 Advanced Machine Learning 4 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: C. Bishop, 2006

Recap: Sum-Product Algorithm Recap: Sum-Product Algorithm

• Objectives • Two kinds of messages  Efficient, exact inference algorithm for finding marginals.  Message from factor node to variable nodes: • Procedure: . Sum of factor contributions ¹fs x(x) Fs(x; Xs)  Pick an arbitrary node as root. ! ´ X  Compute and propagate messages from the leaf nodes to the root, Xs storing received messages at every node. = fs(xs) ¹xm fs (xm) !  Compute and propagate messages from the root to the leaf nodes, Xs m ne(fs) x X 2 Y n storing received messages at every node.  Message from variable node to factor node:  Compute the product of received messages at each node for which the . Product of incoming messages marginal is required, and normalize if necessary.

¹xm fs (xm) ¹fl xm (xm) ! ´ ! l ne(xm) fs • Computational effort 2 Y n  Simple propagation scheme.  Total number of messages = 2 ¢ number of graph edges.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 5 Advanced Machine Learning 6 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop

1 Recap: Sum-Product from Leaves to Root Recap: Sum-Product from Root to Leaves

fa fb Message definitions: Message definitions: fc

¹fs x(x) fs(xs) ¹xm fs (xm) ! ´ ! Xs m ne(fs) x X 2 Y n

¹xm fs (xm) ¹fl xm (xm) ! ´ ! l ne(xm) fs 2 Y n

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 7 Advanced Machine Learning 8 Advanced Machine Learning Part 10 – Graphical Models IV 7 Part 10 – Graphical Models IV Image source: C. Bishop, 2006 Image source: C. Bishop, 2006

Max-Sum Algorithm Max-Sum Algorithm – Key Ideas

• Objective: an efficient algorithm for finding • Key idea 1: Distributive Law (again)  Value xmax that maximises p(x); max(ab; ac) = a max(b; c)  Value of p(xmax). max(a + b; a + c) = a + max(b; c)  Application of dynamic programming in graphical models.  Exchange products/summations and max operations exploiting the tree structure of the factor graph.

• In general, maximum marginals  joint maximum. • Key idea 2: Max-Product  Max-Sum  Example:  We are interested in the maximum value of the joint distribution p(xmax) = max p(x) x  Maximize the product p(x).  For numerical reasons, use the logarithm.

 Maximize the sum (of log-probabilities). Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 12 Advanced Machine Learning 13 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop

Max-Sum Algorithm Max-Sum Algorithm

• Maximizing over a chain (max-product) • Initialization (leaf nodes)

• Recursion • Exchange max and product operators  Messages

 For each node, keep a record of which values of the variables gave rise to the maximum state: • Generalizes to tree-structured factor graph

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 14 Advanced Machine Learning 15 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop Image source: C. Bishop, 2006 Slide adapted from Chris Bishop

2 Max-Sum Algorithm Visualization of the Back-Tracking Procedure

• Termination (root node) • Example: Markov chain  Score of maximal configuration

stored links

 Value of root node variable giving rise to that maximum states

 Back-track to get the remaining variable values

max max xn 1 = Á(xn ) ¡ variables  Same idea as in Viterbi algorithm for HMMs… Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 16 Advanced Machine Learning 17 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Chris Bishop Slide adapted from Chris Bishop Image source: C. Bishop, 2006

Topics of This Lecture Junction Tree Algorithm

• Recap: Exact inference • Motivation  Sum-Product algorithm  Exact inference on general graphs.  Max-Sum algorithm  Works by turning the initial graph into a junction tree with one node per  Junction Tree algorithm clique and then running a sum-product-like algorithm. • Applications of Markov Random Fields  Intractable on graphs with large cliques.  Motivation  Unary potentials  Pairwise potentials • Solving MRFs with Graph Cuts  Graph cuts for image segmentation  s-t mincut algorithm  Extension to non-binary case  Applications

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 18 Advanced Machine Learning 19 Advanced Machine Learning Part 10 – Graphical Models IV 18 Part 10 – Graphical Models IV 19 B. Leibe B. Leibe

Junction Tree Algorithm Junction Tree Algorithm

• Motivation • Starting from an undirected graph…  Exact inference on general graphs.  Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm.  Intractable on graphs with large cliques. • Main steps 1. If starting from directed graph, first convert it to an undirected graph by moralization. 2. Introduce additional links by triangulation in order to reduce the size of cycles. 3. Find cliques of the moralized, triangulated graph. 4. Construct a new graph from the maximal cliques. 5. Remove minimal links to break cycles and get a junction tree.  Apply regular message passing to perform inference.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 20 Advanced Machine Learning 21 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani

3 Junction Tree Algorithm Junction Tree Algorithm

2. Triangulate 1. Convert to an undirected graph through moralization.  Such that there is no loop of length > 3 without a chord.  Marry the parents of each node.  This is necessary so that the final junction tree satisfies the “running intersection” property (explained later).  Remove edge directions.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 22 Advanced Machine Learning 23 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani

Junction Tree Algorithm Junction Tree Algorithm

4. Construct a new junction graph from maximal cliques. 3. Find cliques of the moralized, triangulated graph.  Create a node from each clique.  Each link carries a list of all variables in the intersection. . Drawn in a “separator” box.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 24 Advanced Machine Learning 25 Advanced Machine Learning Part 10 – Graphical Models IV Part 9 – Graphical Models III Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani Image source: Z. Gharahmani

Junction Tree Algorithm Junction Tree – Properties

• Running intersection property  “If a variable appears in more than one clique, it also appears in all 5. Remove links to break cycles  junction tree. intermediate cliques in the tree”.  For each cycle, remove the link(s) with the minimal number of shared  This ensures that neighboring cliques have consistent probability nodes until all cycles are broken. distributions.  Result is a maximal spanning tree, the junction tree.  Local consistency  global consistency

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 26 Advanced Machine Learning 27 Advanced Machine Learning Part 9 – Graphical Models III Part 9 – Graphical Models III Image source: Z. Gharahmani Slide adapted from Zoubin Gharahmani Image source: Z. Gharahmani

4 Interpretation of the Junction Tree Junction Tree: Example 1

• Undirected

A B C

• Junction tree

AB B BC • Algorithm 1. Moralization Clique Separator Clique 2. Triangulation (not necessary here) P(U) =  P(Clique) /  P(Separator) P(A,B,C) = P(A,B) P(B,C) / P(B) Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 28 Advanced Machine Learning 29 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Pawan Kumar Image source: J. Pearl, 1988

Junction Tree: Example 1 Junction Tree: Example 1

• Algorithm • Algorithm 1. Moralization 1. Moralization 2. Triangulation (not necessary here) 2. Triangulation (not necessary here) 3. Find cliques 3. Find cliques 4. Construct junction graph 4. Construct junction graph 5. Break links to get junction tree Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 30 Advanced Machine Learning 31 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: J. Pearl, 1988 Image source: J. Pearl, 1988

Junction Tree: Example 2 Junction Tree: Example 2

• Without triangulation step • When applying the triangulation  The final graph will contain cycles that we cannot break  Only small cycles remain that are easy to break. without losing the running intersection property!  Running intersection property is maintained.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 32 Advanced Machine Learning 33 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: J. Pearl, 1988 Image source: J. Pearl, 1988

5 Junction Tree Algorithm Loopy

• Good news • Alternative algorithm for loopy graphs  The junction tree algorithm is efficient in the sense that for a given  Sum-Product on general graphs. graph there does not exist a computationally cheaper approach.  Strategy: simply ignore the problem.  Initial unit messages passed across all links, after which messages are • Bad news passed around until convergence  This may still be too costly. . Convergence is not guaranteed!  Effort determined by number of variables in the largest clique. . Typically break off after fixed number of iterations.  Grows exponentially with this number (for discrete variables).  Approximate but tractable for large graphs.  Algorithm becomes impractical if the graph contains large cliques!  Sometime works well, sometimes not at all.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 34 Advanced Machine Learning 35 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV 35 B. Leibe

Topics of This Lecture Markov Random Fields (MRFs)

• Recap: Exact inference • What we’ve learned so far…  Sum-Product algorithm  We know they are undirected graphical models.  Max-Sum algorithm  Junction Tree algorithm  Their joint probability factorizes into clique potentials, 1 p(x) = Ã (x ) • Applications of Markov Random Fields Z C C C  Motivation which are conveniently expressedY as energy functions.  Unary potentials (,)xy ÃC(xC) = exp E(xC)  Pairwise potentials ii f¡ g  We know how to perform inference for them.  (,)xxij • Solving MRFs with Graph Cuts . Sum/Max-Product BP for exact inference in tree-shaped MRFs.  Graph cuts for image segmentation . Loopy BP for approximate inference in arbitrary MRFs.  s-t mincut algorithm . Junction Tree algorithm for converting arbitrary MRFs into trees.  Extension to non-binary case • But what are they actually good for?  Applications  And how do we apply them in practice? Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 36 Advanced Machine Learning 37 Advanced Machine Learning Part 10 – Graphical Models IV 36 Part 10 – Graphical Models IV B. Leibe Image source: C. Bishop, 2006

Markov Random Fields Applications of MRFs

• Allow rich probabilistic models. • Movie “No Way Out” (1987)  But built in a local, modular way.  Learn local effects, get global effects out. • Very powerful when applied to regular structures.  Such as images…

Observed evidence

Hidden “true states”

Neighborhood relations

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 38 Advanced Machine Learning 39 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from William Freeman

6 Applications of MRFs Applications of MRFs

• Many applications for low-level vision tasks • Many applications for low-level vision tasks  Image denoising  Image denoising

Noisy observations

Observation process “True” image content

“Smoothness constraints” Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 40 Advanced Machine Learning 41 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Results by [Roth & Black, CVPR’05] Results by [Roth & Black, CVPR’05]

Applications of MRFs Applications of MRFs

• Many applications for low-level vision tasks • Many applications for low-level vision tasks  Image denoising  Image denoising  Inpainting  Inpainting  Image restoration

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 42 Advanced Machine Learning 43 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Results by [Roth & Black, CVPR’05] Results by [Roth & Black, CVPR’05]

Applications of MRFs Applications of MRFs

• Many applications for low-level vision tasks • Many applications for low-level vision tasks  Image denoising  Image denoising  Super-resolution  Inpainting  Inpainting  Image restoration  Image restoration  Image segmentation  Image segmentation Convert a low-res image into a high-res image!

super-resolution upscaling

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 44 Advanced Machine Learning 45 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: Pawan M. Kumar Image source: [Freeman et al., CG&A’03]

7 Applications of MRFs Applications of MRFs

• Many applications for low-level vision tasks • Many applications for low-level vision tasks  Image denoising  Super-resolution  Image denoising  Super-resolution  Inpainting  Inpainting  Optical flow  Image restoration  Image restoration  Image segmentation  Image segmentation

Image patches Image pair

Scene patches

Scene

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 46 Advanced Machine Learning 47 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Image source: [Freeman et al., CG&A’03] Image source: William Freeman

Applications of MRFs Applications of MRFs

• Many applications for low-level vision tasks • Many applications for low-level vision tasks  Image denoising  Super-resolution  Image denoising  Super-resolution  Inpainting  Optical flow  Inpainting  Optical flow  Image restoration  Stereo depth estimation  Image restoration  Stereo depth estimation  Image segmentation  Image segmentation

• MRFs have become a standard tool for such tasks.  Let’s look at how they are applied in detail…

Stereo image pair Disparity map

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 49 Advanced Machine Learning 50 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV

MRF Structure for Images MRF Nodes as Pixels

• Basic structure Noisy observations

“True” image content • Two components  Observation model

. How likely is it that node xi has label Li given observation yi? . This relationship is usually learned from training data. Original image Degraded image Reconstruction  Neighborhood relations from MRF modeling pixel neighborhood . Simplest case: 4-neighborhood . Serve as smoothing terms. statistics  Discourage neighboring pixels to have different labels. These neighborhood . This can either be learned or be set to fixed “penalties”. statistics can be learned from training data!

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 51 Advanced Machine Learning 52 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from William Freeman

8 MRF Nodes as Patches Network Joint Probability

• Interpretation of the factorized (,)xyii Image patches joint probability  (,)xxij

P(,)(,)(,) x y  xi y i  x i x j i i, j (,)xy Scene patches ii Image Scene Image-scene Scene-scene compatibility compatibility Image function function (,)xx ij Local Neighboring observations scene nodes More general relationships expressed by potential functions © and ª. Scene Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 54 Advanced Machine Learning 55 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide credit: William Freeman Slide credit: William Freeman

Energy Formulation How to Set the Potentials? Some Examples

• Energy function • Unary potentials  E.g., color model, modeled with a Mixture of Gaussians E(,)(,)(,) x y xi y i x i x j i i, j Á(xi; yi; µÁ) = log µÁ(xi; k)p(k xi) (yi; y¹k; §k) j N Single-node Pairwise k potentials potentials X • Single-node (unary) potentials   Encode local information about the given pixel/patch.  Learn color distributions for each label  How likely is a pixel/patch to belong to a certain class (e.g. foreground/background)?

• Pairwise potentials  Á(xp = 1; yp)  Encode neighborhood information.  How different is a pixel/patch’s label from that of its neighbor? Á(xp = 0; yp) (e.g. based on intensity/color/texture difference, edges) yp y

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 56 Advanced Machine Learning 57 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV

How to Set the Potentials? Some Examples Extension: Conditional Random Fields (CRF)

• Pairwise potentials • Idea: Model conditional instead of joint probability  Potts Model Pairwise potential Ã(xi; xj; µÃ) = µÃ±(xi = xj) Pixels 6 . Simplest discontinuity preserving model. . Discontinuities between any pair of labels are penalized equally. Labels . Useful when labels are unordered or number of labels is small. Unary potential Prior Potts model  Extension: “contrast sensitive Potts model”

Ã(xi; xj; gij(y); µÃ) = µÃgij(y)±(xi = xj) 6 • Energy formulation where 2 2 g() y e yyij  2 avg y  y ij  ij . Discourages label changes except in places where there is also a large Unary likelihood Contrast Term Uniform Prior change in the observations. (Potts Model) Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 58 Advanced Machine Learning 59 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide credit: Phil Torr

9 Example: MRF for Image Segmentation Energy Minimization

• MRF structure • Goal: (,)xyii Pairwise potential Pixels  Infer the optimal labeling of the MRF.  (,)xxij • Many inference are available, e.g.  Simulated annealing What you saw in the movie. Labels  Iterated conditional modes (ICM) Too simple. Unary potential Prior Potts model  Belief propagation Last lecture  Graph cuts Use this one!  Variational methods  Monte Carlo sampling For more complex problems

• Recently, Graph Cuts have become a popular tool  Only suitable for a certain class of energy functions.  But the solution can be obtained very fast for typical vision problems (~1MPixel/sec). Data (D) Unary likelihood Pair-wise Terms MAP Solution Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 60 Advanced Machine Learning 61 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide credit: Phil Torr

Topics of This Lecture Graph Cuts for Binary Problems

• Recap: Exact inference • Idea: convert MRF into source-sink graph  Factor Graphs  Sum-Product/Max-Sum Belief Propagation hard nt-links a cut  Junction Tree algorithm constraint • Applications of Markov Random Fields  Motivation  Unary potentials hard constraint  Pairwise potentials s • Solving MRFs with Graph Cuts  I  w  exp  pq  Graph cuts for image segmentation pq  2  Minimum cost cut can be  2   s-t mincut algorithm computed in polynomial time  Extension to non-binary case   Applications (max-flow/min-cut algorithms) I pq

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 62 Advanced Machine Learning 63 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV [Boykov & Jolly, ICCV’01]

Simple Example of Energy Adding Regional Properties

unary potentials pairwise potentials nt-links a cut E(L)   Dp (Lp )  wpq  (Lp  Lq ) p pqN t-links n-links wpq

t a cut s Dp (t) Regional bias example s t Suppose I and I are given s 2 2 Dp (s)  exp  || I p  I || / 2  “expected” intensities t 2 2 D (s) Dp (t)  exp  || I p  I || / 2  p Lp {s,t} of object and background s (binary object segmentation) NOTE: hard constrains are not required, in general.

Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 64 Advanced Machine Learning 65 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide adapted from Yuri Boykov Slide credit: Yuri Boykov [Boykov & Jolly, ICCV’01]

10 Adding Regional Properties References and Further Reading

• A gentle introduction to Graph Cuts can be found in the nt-links a cut Dp (t) following paper:  Y. Boykov, O. Veksler, Graph Cuts in Vision and Graphics: Theories and Applications. In Handbook of Mathematical Models in Computer Vision, edited by N. Paragios, Y. Chen and O. Faugeras, Springer, 2006. wpq

Dp (s) s

“expected” intensities of • Try the GraphCut implementation at s 2 2 object and background Dp (s)  exp  || I p  I || / 2  https://pub.ist.ac.at/~vnk/software.html s t I and I t 2 2 Dp (t)  exp  || I p  I || / 2  can be re-estimated EM-style optimization Visual Computing Institute | Prof. Dr . Bastian Leibe Visual Computing Institute | Prof. Dr . Bastian Leibe 66 Advanced Machine Learning 103 Advanced Machine Learning Part 10 – Graphical Models IV Part 10 – Graphical Models IV Slide credit: Yuri Boykov [Boykov & Jolly, ICCV’01]

11