Group Meeting - September 18, 2020 Paper Review & Research Progress

Group Meeting - September 18, 2020 Paper review & Research progress

Truong Son Hy ?

?Department of Computer Science The University of Chicago

Ryerson Physical Lab

Son (UChicago) Group Meeting September 18, 2020 1 / 30 Plato

Never discourage anyone who continually makes progress, no matter how slow.

Son (UChicago) Group Meeting September 18, 2020 2 / 30 Machiavelli

Men judge generally more by the eye than by the hand, for everyone can see and few can feel. Every one sees what you appear to be, few really know what you are.

Son (UChicago) Group Meeting September 18, 2020 3 / 30 Papers – Synchronization

1 Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoﬀ Polytope (CVPR 2019) https://arxiv.org/pdf/1904.05814.pdf 2 Synchronizing Probability Measures on Rotations via Optimal Transport (CVPR 2020) https://arxiv.org/pdf/2004.00663.pdf 3 Message-passing algorithms for synchronization problems over compact groups https://arxiv.org/pdf/1610.04583.pdf

Additional reference for Paper 1: 1 Optimization methods on Riemannian manifolds and their application to shape space https://www.cis.upenn.edu/~cis610/ Ring-Wirth-optim-Riemann.pdf

Son (UChicago) Group Meeting September 18, 2020 4 / 30 Paper 1

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoﬀ Polytope (CVPR 2019) Tolga Birdal, Umut S¸im¸sekli https://arxiv.org/pdf/1904.05814.pdf

Son (UChicago) Group Meeting September 18, 2020 5 / 30 Proposals

Proposals New geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images: 1 (Geometric) Birkhoff-Riemannian L-BFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner. 2 (Geometric) Birkhoff-Riemannian Langevin Monte Carlo for generating samples on the Birkhoff Polytope and estimating the confidence of the found solutions. 3 (Probabilistic) Based on Markov Random Field.

Son (UChicago) Group Meeting September 18, 2020 6 / 30 Multi-way graph matching problem

Map/Permutation Synchronization problem The task is to refine the assignments across multiple images/scans (nodes) in a multi-way graph and to estimate the assignment confidences. Given the correspondences between image pairs as relative, total permutation matrices, we seek to find absolute permutations that re-arrange the detected keypoints to a single canonical, global order.

Son (UChicago) Group Meeting September 18, 2020 7 / 30 Background (1)

Deﬁnition 1 – Permutation Matrix

n×n T T Pn = {P ∈ {0, 1} : P1n = 1n, 1n P = 1n }

Each P ∈ Pn is a total permutation matrix and Pij = 1 implies that element i is mapped to element j.

Deﬁnition 2 – Center of Mass The center of mass for all the permutations on n objects is deﬁned as:

1 X 1 C = P = 1 1T n n! i n n n Pi ∈Pn

Deﬁnition 3 – Relative Permutation A permutation matrix is called relative if it is the ratio (or diﬀerence) of T two group elements (i → j): Pij = Pi Pj

Son (UChicago) Group Meeting September 18, 2020 8 / 30 Background (2)

Deﬁnition 4 – Permutation Synchronization Problem

Given a redundant set of measures of ratios Pij :(i, j) ∈ E ⊂ {1, .., N} × {1, .., N}, where E denotes the set of the edges of a directed graph of N nodes, the permutation synchronization can be formulated as the problem of recovering {Pi } such that the group consistency constraint is satisﬁed: T Pij = Pi Pj .

Deﬁnition 5 – Doubly Stochastic (DS) Matrix Permutation matrices are relaxed into their doubly-stochastic counterparts. The set of DS matrices is deﬁned as:

n×n T T DPn = {X ∈ R+ : X 1n = 1n, 1n X = 1n }

Son (UChicago) Group Meeting September 18, 2020 9 / 30 Background (3)

Definition 6 – Birkhoff Polytope The multinomial manifold of DS matrices is incident to the convex object called the Birkhoff Polytope. We use DPn to refer to the Birkhoff Polytope.

Son (UChicago) Group Meeting September 18, 2020 10 / 30 Background (4)

Theorem 1 – Birkhoﬀ-von Neumann Theorem The convex hull of the set of all permutation matrices is the set of doubly- stochastic matrices and there exists a potentially non-unique θ such that any DS matrix can be expressed as a linear combination of k permutation matrices: k X X = θi Pk i=1 T where θi > 0 and θ 1k = 1. Note: Finding the minimum k is NP-hard. But Marcus-Ree theorem has shown that k < (n − 1)2 + 1.

Source code: https://github.com/HyTruongSon/spectral_perm_ synch/blob/master/test_birkhoff.py

Son (UChicago) Group Meeting September 18, 2020 11 / 30 Background (5)

Notations: For every point x on a given manifold M, a tangent space Tx M is de- ﬁned as a ﬁrst order approximation of M at point x. The exponential map expx (v) is the project of a vector v on the tangent space at point x into the corresponding point on the manifold with unit geodesic distance from x.

Riemannian metric g is a collection of inner products Tx M×Tx M → R.A geodesic is a smooth curve y(t) which corresponds to the short- est distance between two points on a manifold. Son (UChicago) Group Meeting September 18, 2020 12 / 30 Background (6)

Note: I could not ﬁnd better ﬁgures from the Internet with the same notation.

Son (UChicago) Group Meeting September 18, 2020 13 / 30 Background (7)

Theorem 2 – Projection onto the tangent space

The project operator ΠX (Y ), Y ∈ DPn onto the tangent space of X ∈ DPn, denoted as TX DPn, is written as:

T T ΠX (Y ) = Y − (α1 + 1β ) X

where α = (I − XX T )†(Y − XY T )1 β = Y T 1 − X T α where † denotes the left pseudo-inverse, and denotes the Hadamard prod- uct.

Son (UChicago) Group Meeting September 18, 2020 14 / 30 Background (8)

Son (UChicago) Group Meeting September 18, 2020 15 / 30 Background (9)

Theorem 3 – First order retraction map

For a vector ξx ∈ TX DPn lying on the tangent space of X ∈ DPn, the ﬁrst order retraction map RX is given as follows:

RX (ξx) = Π(X exp(ξx X ))

where the operator Π denotes the projection onto DPn (computed by Sinkhorn algorithm), and is the Hadamard division.

Son (UChicago) Group Meeting September 18, 2020 16 / 30 Background (10)

Sinkhorn’s theorem If A is an n × n matrix with strictly positive elements, then there exist diagonal matrices D and E with strictly positive diagonal elements such that DAE is doubly stochastic.

Sinkhorn-Knopp algorithm is to ﬁnd a doubly stochastic scaling of a non- negative matrix in an iterative manner:

1 Starting with D0 = E0 = I . −1 2 Let rk = Dk−1AEk−11 and assign Dk = diag(rk ) .

3 T T −1 Let ck = 1 Dk AEk−1 and assign Ek = diag(ck ) .

Reference: A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices, Richard Sinkhorn (1964). We will discuss this further in Part 2 of Optimal Transport series.

Son (UChicago) Group Meeting September 18, 2020 17 / 30 Background (11)

In classical optimization: line-search methods (ﬁrst-order) are based on updating the iterate by choosing a search direction and then adding a multiple of this direction to the old iterate. But it is not possible for general manifolds. The natural extension to manifolds is to follow the search direction along a path.

Reference: Optimization methods on Riemannian manifolds and their application to shape space, Wolfgang Ring and Benedikt Wirth. Son (UChicago) Group Meeting September 18, 2020 18 / 30 Background (12)

Manifold optimization

Note: I could not find a figure with the same notation with the algorithm. But the idea is fine.

Son (UChicago) Group Meeting September 18, 2020 19 / 30 Markov Random Field (1)

Assume that we are provided a set of pairwise permutations Pij ∈ Pn for (i, j) ∈ E. But for our case, the graph is complete, E constains all possible edges. We need to ﬁnd the underlying absolute permutations Xi for i ∈ {1, .., N} with respect to a common origin X1 = I ﬁxed.

Because of the lack of a manifold structure for Pn, we relax the domain of the absolute permutations by assuming that each Xi ∈ DPn (Birkhoﬀ polytope). Let P ≡ {Pij }(i,j)∈E denote the observed random variables. Let N X ≡ {Xi }i=1 denote the latent variables.

Son (UChicago) Group Meeting September 18, 2020 20 / 30 Markov Random Field (2)

The proposed pair-wise Markov Random Field assumes the full joint distribution factorized as follows: 1 Y p(P, X ) = ψ(P , X , X ) Z ij i j (i,j)∈E

where Z denotes the normalization constant: X Z Y Z = ψ(Pij , Xi , Xj )dX N |E| DPn (i,j)∈E P∈Pn and ψ is called the clique potential that is deﬁned as:

T 2 ψ(Pij , Xi , Xj ) = exp(−β||Pij − Xi Xj ||F )

where ||.||F denotes the Frobenius norm, and β > 0.

Son (UChicago) Group Meeting September 18, 2020 21 / 30 Markov Random Field (3)

T If we deﬁne Xij = Xi Xj ∈ DPn, then by Birkhoﬀ-von Neumann theorem, each Xij (doubly-stochastic) has a decomposition into a linear combination of permutation matrices:

Bij X Xij = θij,bMij,b b=1

Bij X θij,b = 1 b=1

where Bij ∈ N+, θij,b > 0, and Mij,b ∈ Pn.

Son (UChicago) Group Meeting September 18, 2020 22 / 30 Markov Random Field (4)

The pairwise MRF implies the following hierarchical decomposition:

1 X Y p(X ) = exp − β ||X ||2 Z C ij ij (i,j)∈E (i,j)∈E

1 T p(Pij |Xi , Xj ) = exp(2βtrace(Pij Xij )) Zij

QBij where C and Zij are normalization constants. For all i, j, Zij ≥ b=1 f (β, θij,b) where f is a positive function that is increasing in both β and θij,b.

Son (UChicago) Group Meeting September 18, 2020 23 / 30 Inference Algorithms (1)

Permutation synchronization is formulated as a probabilistic inference probem:

1 Maximum a-posterior (MAP):

X ∗ = arg max log p(X |P) N X ∈DPn where X T 2 log p(X |P) = −β ||Pij − Xi Xj ||F (i,j)∈E

2 The full posterior distribution:

p(X |P) ∝ p(X , P)

Son (UChicago) Group Meeting September 18, 2020 24 / 30 Inference Algorithms (2)

The MAP estimation problem can be cast as a minimization problem on DPn, given as follows: ∗ X T 2 X = arg min U(X ) = ||Pij − Xi Xj ||F X ∈DPN n (i,j)∈E

where U is called the potential energy function.

Solving a diﬃcult optimization: 1 By using Riemannian limited-memory BFGS (LR-BFGS) → Riemannian second-order method (but why we need second-order here?) 2 Finally, round the resulting approximate solutions into a feasible one via Hungarian algorithm (obtaining binary permutation matrices).

Son (UChicago) Group Meeting September 18, 2020 25 / 30 Inference Algorithm (3)

Euclidean second-order method

BFGS algorithm belongs to the group of Quasi-Newton methods in which the Hessian matrix of second derivatives is not computed. Instead, the Hessian matrix is approximated using updates speciﬁed by gradient evaluations (or approximate gradient evaluations).

Son (UChicago) Group Meeting September 18, 2020 26 / 30 Inference Algorithm (4)

Riemannian ﬁrst-order methods

Son (UChicago) Group Meeting September 18, 2020 27 / 30 Inference Algorithm (5)

If we don’t want to solve the optimization problem by ﬁrst/second-order methods, there is an alternative – Riemannian Langevin Monte Carlo (this can be understood as a ﬁrst-order method):

(k+1) (k) p (k+1) Vi = Π (k) (h∇Xi U(Xi ) + 2h/βZi ) Xi

(k+1) (k+1) Xi = R (k) (Vi ) Xi for ∀i ∈ {1, .., N} and h > 0 denotes the step-size, k denotes the interations, (k) n×n (0) Zi denotes standard Gaussian random variables in R , and Xi is the initial absolute permutations.

Son (UChicago) Group Meeting September 18, 2020 28 / 30 2D multi-image matching

The standard evaluation metric:

1 X T R({Pˆ }|Pgnd) = Pgnd (Pˆ Pˆ ) i n|E| ij i j (i,j)∈E

gnd ˆ where Pij are the ground-truth relative transformations, and Pi is an estimated permutation. R = 0: nothing correct. R = 1: perfect.

Son (UChicago) Group Meeting September 18, 2020 29 / 30 3D correspondences

Initial correspondences are obtained by running siamese PointNet (another paper of Guibas’ team http://stanford.edu/~rqi/pointnet/).

Son (UChicago) Group Meeting September 18, 2020 30 / 30