<<

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope

Tolga Birdal 1, 2 Umut S¸ims¸ekli 3 1 Probability Simplex Computer Science Department, Stanford University, CA 94305Orthogonal Stanford, Hypersphere Δ : US Birkhoff Polytope 2 Permutation Matrices Fakultat¨ fur¨ Informatik, Technische Universitat¨ Munchen,¨ 85748 Munchen,¨ Germany Tangent Space = ∩ 3 LTCI, Tel´ ecom´ ParisTech, Universite´ Paris-Saclay, 75013Common Paris, center of mass France

Abstract (a) Initialization (b) Solution (c) Certainty/Confidence (d) Multiple Hypotheses Solution

We present an entirely new geometric and probabilis-

Top-N tic approach to synchronization of correspondences across Certainty multiple sets of objects or images. In particular, we present two algorithms: (1) Birkhoff-Riemannian L-BFGS for op- timizing the relaxed version of the combinatorially in- Top-2 tractable cycle consistency loss in a principled manner, (2) Certainty Birkhoff-Riemannian Langevin Monte Carlo for generating Figure 1. Our algorithm robustly solves the multiway image samples on the Birkhoff Polytope and estimating the con- problem (a, b) and provides confidence maps (c) that fidence of the found solutions. To this end, we first intro- can be of great help in further improving the estimates (d). The duce the very recently developed Riemannian geometry of bar on the right is used to assign colors to confidences. For the the Birkhoff Polytope. Next, we introduce a new probabilis- rest, incorrect matches are marked in red and correct ones in blue. tic synchronization model in the form of a Markov Random globally coherent whole [30, 62, 73]. Field (MRF). Finally, based on the first order retraction op- In this paper, by using the fact that correspondences erators, we formulate our problem as simulating a stochas- are cycle consistent 1, we propose two novel algorithms tic differential equation and devise new integrators. We for refining the assignments across multiple images/scans show on both synthetic and real datasets that we achieve (nodes) in a multi-way graph and for estimating assignment high quality multi-graph matching results with faster con- confidences, respectively. We model the correspondences vergence and reliable confidence/uncertainty estimates. between image pairs as relative, total permutation matri- ces and seek to find absolute permutations that re-arrange 1. Introduction the detected keypoints to a single canonical, global order. Correspondences fuel a large variety of computer vi- This problem is known as map or permutation synchro- sion applications such as structure-from-motion (SfM) [62], nization [56, 70]. Even though in many practical scenar- SLAM [53], 3D reconstruction [20, 8, 6], camera re- ios matches are only partially available, when shapes are localization [60], image retrieval [44] and 3D scan stitch- complete and the density of matches increases, total permu- ing [38, 22]. In a typical scenario, given two scenes, an tations can suffice [36]. initial set of 2D/3D keypoints is first identified. Then the Similar to many well received works [84, 61], we re- neighborhood of each keypoint is summarized with a lo- lax the sought permutations to the set of doubly-stochastic cal descriptor [47, 23] and keypoints in the given scenes (DS) matrices. We then consider the geometric structure of are matched by associating the mutually closest descrip- DS, the Birkhoff Polytope [9]. We are - to the best of our tors. In a majority of practical applications, multiple images knowledge, for the first time introducing and applying the or 3D shapes are under consideration and ascertaining such recently developed Riemannian geometry of the Birkhoff two-view or pairwise correspondences is simply not suffi- Polytope [26] to tackle challenging problems of computer cient. This necessitates a further refinement ensuring global vision. Note that lack of this geometric understanding consistency. Unfortunately, at this stage even the well de- caused plenty of obstacles for scholars dealing with our veloped pipelines acquiesce either heuristic/greedy refine- problem [61, 77]. By the virtue of a first order retraction, we ment [21] or incorporate costly geometric cues related to 1Composition of correspondences for any circular path arrives back at the linking of individual correspondence estimates into a the start node.

111051 can use the recent Riemannian limited-memory BFGS (LR- The first applications of synchronization, a term coined BFGS) algorithm [82] to perform a maximum-a-posteriori by Singer [67, 66], to correspondences only date back to (MAP) estimation of the parameters of the consistency loss. early 2010s [54]. Pachauri et al. [56] gave a formal def- We coin our variation as Birkhoff-LRBFGS. inition and devised a spectral technique. The same au- At the next stage, we take on the challenge of confi- thors quickly extended their work to Permutation Diffu- dence/uncertainty estimation for the problem at hand by sion Maps [55] finding correspondence between images. drawing samples on the Birkhoff Polytope and estimating Unfortunately, this first method was quadratic in the num- the empirical posterior distribution. To achieve this, we first ber of images and hence was not computationally friendly. formulate a new geodesic stochastic differential equation In a sequel of works called MatchLift, Huang, Chen and (SDE). Our SDE is based upon the Riemannian Langevin Guibas [36, 15] were the firsts to cast the problem of es- Monte Carlo (RLMC) [31, 78, 58] that is efficient and ef- timating cycle-consistent maps as finding the closest pos- fective in sampling from Riemannian manifolds with true itive semidefinite matrix to an input matrix. They also exponential maps. Note that similar stochastic gradient addressed the case of partial permutations. Due to the geodesic MCMC (SG-MCMC) [46, 11] tools have already semidefinite programming (SDP) involved, this perspec- been used in the context of synchronization of spatial rigid tive suffered from high computational cost in real applica- transformations whose parameters admit an analytically de- tions. Similar to Pachauri [56], for N images and M edges, fined geodesic flow [7]. Unfortunately, for our manifold the this method required computing an eigendecomposition of retraction map is only up to first order and hence we can- an NM NM matrix. Zhou et al. [85] then introduced not use off-the-shelf schemes. Alleviating this nuisance, we MatchALS×, a new low-rank formulation with nuclear-norm further contribute a novel numerical integrator to solve our relaxation, globally solving the joint matching of a set of SDE by replacing the intractable exponential map of DS images without the need of SDP. Yu et al. [81] formulated matrices by the approximate retraction map. This leads to a synchronization energy similar to our method and pro- another new algorithm: Birkhoff-RLMC. posed proximal Gauss-Seidel methods for solving a relaxed In a nutshell, our contributions are: problem. However, unlike us, this paper did not use the ge- 1. We function as an ambassador and introduce the Rie- ometry of the constraints or variables and thereby resorted mannian geometry of the Birkhoff Polytope [26] to to complicated optimization procedures involving Frank- solve problems in computer vision. Wolfe subproblems for global constraint satisfaction. Ar- 2. We propose a new probabilistic model for the permu- rigoni et al. [2] and Maset et al. [2] extended Pachauri [56] tation synchronization problem. to operate on partial permutations using spectral decompo- 3. We minimize the cycle consistency loss via a sition. To do so, they considered the symmetric inverse Riemannian-LBFGS algorithm and outperfom the semigroup of the partial matches that are typically hard to state-of-the-art both in recall and in runtime. handle. Their closed form methods did not need initializa- 4. Based upon the Langevin mechanics, we introduce a tion steps to synchronize, but also did not establish an ex- new SDE and a numerical integrator to draw samples plicit cycle consistency. Tang et al. [71] opted to use or- on the high dimensional and complex manifolds with dering heuristics improving upon Pachauri [56]. Cosmo et approximate retractions, such as the Birkhoff Poly- al. [19] brought an interesting solution to the problem of es- tope. This lets us estimate the confidence maps, which timating consistent correspondences between multiple 3D can aid in improving the solutions and spotting consis- shapes, without requiring initial pairwise solutions as in- tency violations or outliers. put. Schiavinato and Torsello [61] tried to overcome the Note that the tools developed herewith can easily ex- lack of group structure of the Birkhoff polytope by trans- tend beyond our application and would hopefully facilitate forming any graph-matching problem into a multi-graph promising research directions regarding the combinatorial matching one. Bernard et al. [5] used an NMF-based ap- optimization problems in computer vision. proach to generate a cycle-consistent synchronization. Park and Yoon [57] used multi-layer random walks framework to 2. Related Work address the global correspondence search problem of multi- attributed graphs. Starting from a multi-layer random-walks Permutation synchronization is an emerging domain of initialization, the authors proposed a robust solver by itera- study due to its wide applicability, especially for the prob- tive reweighting. Hu et al. [35] revisited the MatchLift and lems in computer vision. We now review the developments developed a scalable, distributed solution with the help of in this field, as chronologically as possible. Note that multi- ADMMs, called DMatch. Their idea of splitting the input way graph matching problem formulations involving spatial into sub-collections can still lead to global consistency un- geometry are well studied [18, 50, 43, 29, 79, 19], as well der mild conditions while improving the efficiency. Finally, as transformation synchronization [76, 13, 72, 75, 3, 4, 33]. Wang et al. [77] made use of the domain knowledge and For brevity, we omit these literature and focus on works that added the geometric consistency of image coordinates as a explicitly operate on correspondence matrices.

11106 low-rank term to increase the recall. Probability Simplex The aforementioned works have neither considered the Orthogonal HypersphereΔ푛 : Riemennian structure of the common Birkhoff convex re- 푛−1 laxation nor have they provided a probabilistic framework, Birkhoff Polytope 푶 푆 which can pave the way to uncertainty estimation while Permutation Matrices퓓퓟 풏 simultaneously solving the optimization problem. This is Tangent Space 퓟 = 퓓퓟풏 ∩ 푶 what we propose in this work. Common center퓣 of퓧 퓓퓟mass풏 Figure 2. Simplified (matrices are vectorized) illustration of ge- 3. Preliminaries and Technical Background ometries we consider: (i) ∆n is convex, (ii) DPn is strictly con- Definition 1 (). A permutation matrix tained in ∆n. In low dimensions, such configuration cannot exist is defined as a sparse, square binary matrix, where each as there is no convex shape that touches ∆n only on the corners. column and each row contains only a single true (1) value: Theorem 1 (Birkhoff-von Neumann Theorem). The con-

n×n ⊤ ⊤ vex hull of the set of all permutation matrices is the set n := P 0, 1 : P1n = 1n , 1 P = 1 . (1) P { ∈ { } n n } of doubly-stochastic matrices and there exists a potentially non-unique θ such that any DS matrix can be expressed as where 1n denotes a n-dimensional ones vector. Every P n ∈ a linear combination of k permutation matrices [9, 39]: is a total permutation matrix and Pij = 1 implies that P element i is mapped to element j. Permutation matrices ⊤ X = θ1P1 + + θkPk , θi > 0 θ 1k = 1. (4) are the only strictly non-negative elements of the orthogonal ··· ∧ ⊤ group n n = O : O O = I , a special case of the While finding the minimum k is shown to be NP-hard [27], P ∈ O { } Stiefel manifold of m—frames in Rn when m = n. by Marcus-Ree theorem, we know that there exists one con- structible decomposition where k < (n 1)2 + 1. Definition 2 (Center of Mass). The center of mass for all − the permutations on n objects is defined in Rn×n as [59]: Definition 6 (Birkhoff Polytope). The multinomial mani- fold of DS matrices is incident to the convex object called 1 1 ⊤ 1 ⊤ the Birkhoff Polytope [9], an 2 dimensional con- Cn = Pi = (n 1)!1n1 = 1n1 . (n 1) P n n −n×n n! i∈Pn n! − n vex submanifold of the ambient R with n! vertices: X (2) n n n. We use n to refer to the Birkhoff Polytope. Notice that Cn / as shown in Fig. 2. B ≡ DP DP ∈P It is interesting to see that this is Definition 3 (Relative Permutation). We define a permuta- co-centered with n at Cn, Cn n and over- tion matrix to be relative if it is the ratio (or difference) of P ∈ DP P P P⊤ parameterizes the of the permutation vectors, two group elements (i j): ij = i j . → the permutahedron [32]. n can now be considered as an P ⊤ orthogonal subset of n: n = X n : XX = Definition 4 (Permutation Synchronization Problem). DP P { ∈ DP P I , i.e. the discrete set of permutation matrices is the inter- Given a redundant set of measures of ratios ij : } (i,j) 1,...,N 1,...,N , where { denotes} section of the convex set of DS matrices and the n. ∈ E ⊂ { } × { } E O the set of the edges of a directed graph of N nodes, the 3.1. Riemannian Geometry of the Birkhoff Polytope permutation synchronization [56] can be formulated as the problem of recovering Pi for i = 1,...,N such that the Recently, Douik et al. [26] endowed n with the { } P P P−1 DP group consistency constraint is satisfied: ij = i j . Fisher information metric, resulting in the Riemannian manifold of n. To the best of our knowledge, we are the If the input data is noise-corrupted, this consistency will first to exploitDP this manifold in the domain of computer vi- P not hold and to recover the absolute permutations i , et { } sion, and hence will now recall the main results of Douik some form of a consistency error is minimized. Typically, al. [26] and summarize the main constructs of Riemannian any form of minimization on the discrete space of permu- optimization on n. The proofs can be found in [26]. tations is intractable and these matrices are relaxed by their DP doubly-stochastic counterparts [10, 84, 61] (see Fig. 2). Definition 7 (Tangent Space and Bundle). The tangent bundle is referred to as the union of all tangent spaces Definition 5 (Doubly Stochastic (DS) Matrix). A DS matrix n = X∈DPn X n one of which is defined as: is a non-negative, square matrix whose rows and columns T DP ∪ T DP n×n ⊤ sum to 1. The set of DS matrices is defined as: X n := Z R : Z1n = 0n , Z 1n = 0n . (5) T DP { ∈ } n n Theorem 2. The projection operator ΠX(Y), Y n X Rn×n ∈ DP n = : xij = 1 xij = 1 . onto the tangent space of X , X is written as: DP { ∈ + ∧ } n n Xi=1 Xj=1 ∈ DP T DP ⊤ ⊤ (3) ΠX(Y)= Y (α1 + 1β ) X, with (6) − ⊙

11107 N ⊤ ⊤ ⊤ ⊤ P P and X X , all the observations α =(I XX )+(Y XY )1, β = Y 1 X α, ij (i,j)∈E i i=1 − − − and≡ all { the} latent variables,≡ respectively. { } + depicts the left pseudo-inverse and the Hadamard prod- A typical way to build a probabilistic model is to first X uct. Note that there exists a numerically⊙ more stable way to choose the prior distributions on n for each i and DP X compute the same concise formulation of ΠX(Y) [26]. then choose a conditional distribution on n for each ij given the latent variables. Unfortunately, standardP paramet- Theorem 3. For a vector ξX X n lying on the tan- ric distributions neither exist on n nor on n. The vari- ∈ T DP DP P gent space of X n, the first order retraction map RX ational stick breaking [45] yields an implicitly defined PDF ∈ DP is given as follows: on n and is not able to provide direct control on the re- sultingDP distribution. Defining Kantorovich distance-based RX(ξX) = Π(X exp(ξX X)), (7) distributions over the permutation matrices is possible [17], ⊙ ⊘ yet these models incur high computational costs since they where the operator Π denotes the projection onto n, ef- DP would require solving optimal transport problems during in- ficiently computed using the Sinkhorn algorithm [68] and ference. For these reasons, instead of constructing a hierar- is the Hadamard division. ⊘ chical probabilistic model, we will directly model the full joint distribution of P and X. Plis et al. [59] showed that on the n-dimensional Birkhoff Polytope all permutations are equidistant from the We propose a probabilistic model where we assume the full joint distribution admits the following factorized form: center of mass Cn, and thus the extreme points of n, that DP 2 are the permutation matrices, are located on an (n 1) - 1 2 − P X P X X dimensional hypersphere S(n−1) of radius √n 1, cen- p( , )= ψ( ij, i, j), (8) − Z Y(i,j)∈E tered at Cn. This hypersphere is incident to the Birkhoff Polytope on the vertices. where Z denotes the normalization constant with

Proposition 1. The gap as a ratio between n and both Z := ψ(P , X , X )dX, (9) 2 DP ij i j (n−1) ZDPN and n grows to infinity as n grows. PX|E| n (i,jY)∈E S O ∈Pn The proof is given in the supplementary document. While and ψ is called the ‘clique potential’ that is defined as: there exists polynomial time projections of the n!-element permutation space onto the continuous hypersphere repre- P X X , P X X⊤ 2 ψ( ij, i, j) exp( β ij i j ). (10) sentation and back [59], Prop. 1 prevents us from using hy- − k − kF persphere relaxations, as done in preceding works [59, 83]. Here F denotes the Frobenius norm, β R+ is the dispersionk · k parameter that controls the spread∈ of the distri- 4. Proposed Probabilistic Model bution. Note that the model is a Markov random field [40]. Let us take a closer look at the proposed model. If we We assume that we are provided a set of pairwise, total X X X⊤ define ij := i j n, then by Thm. 1, we have the permutations Pij n for (i,j) and we are inter- ∈ DP ∈ P ∈ E following decomposition for each Xij: ested in finding the underlying absolute permutations Xi for i 1,...,N with respect to a common origin (e.g. Bij Bij X ∈I { } Xij = θij,bMij,b, θij,b = 1, (11) 1 = , the identity matrix). We seek absolute permuta- b=1 b=1 tions that would respect the consistency of the underlying X X M graph structure. For conciseness, we also restrict our set- where Bij is a positive integer, each θij,b 0, and ij,b ≥ ∈ ting to total permutations, and leave the extension to partial n. The next result states that we have an equivalent hier- P permutations, which live on the monoid, as a future study. archical interpretation for the proposed model: Because operating directly on would require us to solve n Proposition 2. The probabilistic model defined in Eq. 8 im- a combinatorial optimizationP problem and because of the plies the following hierarchical decomposition: lack of a manifold structure for n, we follow the popular approach [45, 80, 48] and relax theP domain of the absolute X 1 X 2 permutations by assuming that each X . p( )= exp β ij Zij (12) i n C − k k  We formulate the permutation synchronization∈ DP problem (i,jX)∈E (i,jY)∈E in a probabilistic context where we treat the pairwise rela- P X X 1 P⊤X p( ij i, j)= exp 2β tr( ij ij) (13) tive permutations as observed random variables and the ab- | Zij   solute ones as latent random variables. In particular, our probabilistic construction enables us to cast the synchro- where C and Zij are normalization constants. Besides, for Bij nization problem as inferential in the model. With a slight all i,j, Zij f(β,θij,b), where f is a positive func- ≥ b=1 abuse of notation, in the rest of the paper, we will denote tion that is increasingQ in both β and θij,b.

11108 The proof is given in the supplementary and is based on Riemannian limited-memory BFGS (LR-BFGS) [37], a the simple decomposition p(P, X) = p(X)p(P X). This powerful optimization technique that attains faster conver- hierarchical point of view lets us observe some| interest- gence rates by incorporating local geometric information in ing properties: (1) the likelihood p(Pij Xi, Xj) mainly de- an efficient manner. This additional piece of information is P⊤X | pends on the term tr( ij ij) that measures the data fit- obtained through an approximation of the inverse Hessian, ness. We aptly call this term the ‘soft Hamming distance’ which is computed on the most recent values of the past between Pij and Xij since it would correspond to the ac- iterates with linear time- and space-complexity in the di- tual Hamming distance between two permutations if Xi, Xj mension of the problem. We give more detail on LR-BFGS were permutation matrices [41]. (2) On the other hand, the in our supp. material. The detailed description of the algo- prior distribution contains two competing terms: (i) the term rithm can be found in [37, 82]. Zij favors large θij,b, which would push Xij towards the Finally, we round the resulting approximate solutions X 2 corners of the Birkhoff polytope, (ii) the term ij F acts into a feasible one via Hungarian algorithm [52], obtaining as a regularizer on the latent variables and attractsk themk to- binary permutation matrices. wards the center of the Birkhoff polytope Cn (cf. Dfn. 2), which will be numerically beneficial for the inference algo- 5.2. Posterior Sampling via Riemannian Langevin rithms that will be developed in the following section. Monte Carlo with Retractions In this section we will develop a Markov Chain Monte 5. Inference Algorithms Carlo (MCMC) algorithm for generating samples from the posterior distribution p(X P), by borrowing ideas from [64, We can now formulate the permutation synchronization 46, 7]. Once such samples| are generated, we will be able problem as a probabilistic inference problem, where we will to quantify the uncertainty in our estimation by using the be interested in the following quantities: generated samples. 1. Maximum a-posteriori (MAP): The dimension and complexity of the Birkhoff manifold ⋆ makes it very challenging to generate samples on n or X = arg max log p(X P) (14) DP X N | its product spaces and to the best of our knowledge there is ∈DPn no Riemannian MCMC algorithm that is capable of achiev- + ⊤ 2 where log p(X P) = β Pij XiX , ing this. There are existing Riemannian MCMC algorithms | − (i,j)∈E k − j kF and =+ denotes equality upP to an additive constant. [11, 46], which are able to draw samples on embedded man- ifolds; however, they require the exact exponential map to 2. The full posterior distribution: p(X P) p(P, X). | ∝ be analytically available, which in our case, can only be ap- The MAP estimate is often easier to obtain and useful in proximated by the retraction map at best. practice. On the other hand, characterizing the full poste- To this end, we develop an algorithmically simpler yet rior can provide important additional information, such as effective algorithm. Let the posterior density of interest uncertainty; however, not surprisingly it is a much harder be πH(X) := p(X P) exp( βU(X)) with respect task. In addition to the usual difficulties associated with to the Hausdorff measure.| ∝ We then− define an embedding these tasks, in our context we are facing extra challenges RN(n−1)2 N X˜ X X˜ ξ : n such that ξ( ) = for 2 7→ DP ∈ due to the non-standard manifold of our latent variables. RN(n−1) . By the area formula (cf. Thm. 1 in [25]), we 5.1. Maximum A•Posteriori Estimation have the following expression for the embedded posterior density πλ (with respect to the Lebesgue measure): The MAP estimation problem can be cast as a minimiza- tion problem on n, given as follows: πH(x)= πλ(X˜ )/ G(X˜ ) , (15) DP q| | X⋆ X P X X⊤ 2 G = arg min U( ) := ij i j F where denotes the Riemann metric tensor. X∈DPN (i,j)∈E k − k n n X o We then consider the following stochastic differential equation (SDE), which is a slight modification of the SDE where U is called the potential energy function. We observe that is used to develop the Riemannian Langevin Monte that the choice of the dispersion parameter has no effect on Carlo algorithm [31, 78, 58]: the MAP estimate. Although this optimization problem re- sembles conventional norm minimization, the fact that X −1 −1 dX˜ t =( G X˜ Uλ(X˜ t)+ Γt)dt + 2/βG dBt, lives in the cartesian product of Birkhoff polytopes renders − ∇ p the problem very complicated. where Bt denotes the standard Brownian motion and Γt Thanks to the retraction operator over the Birkhoff poly- is called the correction term that is defined as follows: Γ X˜ N(n−1)2 −1 X˜ X˜ tope (cf. Thm. 3), we are able to use several Riemannian op- [ t( )]i = j=1 ∂[Gt ( )]ij/∂ j. timization algorithms [69], without resorting to projection- By Thm.P 1 of [49], it is easy to show that the solution based updates. In this study, we use the recently proposed process (X˜ t)t≥0 leaves the embedded posterior distribution

11109 6. Experiments and Evaluations 6.1. Real Data

Figure 3. Sample images and manually annotated correspondences 2D Multi-image Matching We run our method to per- from the challenging Willow dataset [16]. Images are plotted in form multiway graph matching on two datasets, CMU [12] pairs (there are multiple) and in gray for better viewing. and Willow Object Class [16]. CMU is composed of House and Hotel objects viewed under constant illumina- tion and smooth motion. Initial pairwise correspondences πλ invariant. Informally, this result means that if we could as well as ground truth (GT) absolute mappings are pro- exactly simulate the continuous-time process (X˜ t)t≥0, the distribution of the sample paths would converge to the em- vided within the dataset. Object images in Willow dataset include pose, lighting, instance and environment variation bedded posterior distribution πλ, and therefore the distribu- as shown in Fig. 3, rendering naive template matching in- tion of ξ(X˜ t) would converge to πH(X). However, unfor- tunately it is not possible to exactly simulate these paths and feasible. For our evaluations, we follow the same design therefore we need to consult approximate algorithms. as Wang et al. [77]. We first extract local features from a set of 227 227 patches centered around the annotated A possible way to numerically simulate the SDE would landmarks, using× the prosperous Alexnet [42] pretrained on be to use standard discretization tools, such as the Euler- ImageNet [24]. Our descriptors correspond to the feature Maruyama integrator [14]. However, this would require map responses of Conv4 and Conv5 layers anchored on the knowing the analytical expression of ξ and constructing hand annotated keypoints. These features are then matched G and Γ at each iteration. On the other hand, recent t t by the Hungarian algorithm [52] to obtain initial pairwise results have shown that we can simulate SDEs directly permutation matrices P . on their original manifolds by using geodesic integrators 0 We initialize our algorithm by the closed form [11, 46, 34], which bypasses these issues altogether. Yet, MatchEIG [51] and evaluate it against the state of the art these approaches require the exact exponential map of the methods of Spectral [56], MatchALS [85], MatchLift [36], manifold to be analytically available, restricting their appli- MatchEIG [51], and Wang et al. [77]. The size of the uni- cability in our context. verse is set to the number of features per image. We assume Inspired by the recent manifold optimization algorithms that this number is fixed and partial matches are not present. [74], we propose to replace the exact, intractable exponen- Handling partialities while using the Birkhoff structure is tial map arising in the geodesic integrator with the tractable left as a future work. Note that [77] uses a similar cost func- retraction operator given in Thm. 3. We develop our recur- tion to ours in order to initialize an alternating procedure sive scheme, we coin as retraction Euler integrator: that in addition exploits the geometry of image coordinates. Authors also use this term as an extra bit of information V(k+1) X(k) Z(k+1) during their initialization. The standard evaluation metric, i = ΠX(k) (h Xi U( i )+ 2h/β i ) (16) i ∇ p recall, is defined over the pairwise permutations as: X(k+1) V(k+1) i = RX(k) ( i ), i 1,...,N (17) i ∀ ∈ { } gnd 1 gnd ⊤ R( Pˆ i P )= P (Pˆ iPˆ ) (18) { }| n ij ⊙ j where h> 0 denotes the step-size, k denotes the iterations, |E| (i,jX)∈E Z(k) Rn×n i denotes standard Gaussian random variables in , gnd (0) where P are the GT relative transformations and Pˆ is an X denotes the initial absolute permutations. The deriva- ij i i estimated permutation. in the case of no correctly tion of this scheme is similar to [46] and we provide more R = 0 found correspondences and for a perfect solution. detailed information in the supplementary material. To the R = 1 Tab. 1 shows the results of different algorithms as well as best of our knowledge, the convergence properties of the ours. Note that our Birkhoff-LRBFGS method that oper- geodesic integrator that is approximated with a retraction ates solely on pairwise permutations outperforms all meth- operator have not yet been analyzed. We leave this analysis ods, even the ones which make use of geometry during ini- as a futurework, which is beyond the scope of this study. tialization. Moreover, when our method is used to initialize 2 We note that the term Xij plays an important role k kF Wang et al. [77] and perform geometric optimization, we in the overall algorithm since it prevents the latent variables attain the top results. These findings validate that walking X i to go the extreme points of the Birkhoff polytope, where on the Birkhoff Polytope, even approximately, and using the retraction operator becomes inaccurate. We also note Riemannian line-search algorithms constitute a promising that, when β , the distribution πH concentrates on direction for optimizing the problem at hand. the global optimum→ ∞X⋆ and the proposed retraction Euler integrator becomes the Riemannian gradient descent with a Uncertainty Estimation in Real Data We now run our retraction operator. confidence estimator on the same Willow Object Class [16].

11110 Table 1. Our results on the WILLOW Object Class graph matching dataset. Wang− refers to running Wang [77] without the geometric con- sistency term. The vanilla version of our method, Ours, already lacks this term. Ours-Geom then refers to initializing Wang’s verification method with our algorithm. For all the methods, we use the original implementation of the authors. Dataset Initial Spectral [56] MatchLift [36] MatchALS [85] MatchEig [51] Wang− [77] Ours Wang [77] Ours-Geom Car 0.48 0.55 0.65 0.69 0.66 0.72 0.71 1.00 1.00 Duck 0.43 0.59 0.56 0.59 0.56 0.63 0.67 0.932 0.96 Face 0.86 0.92 0.94 0.93 0.93 0.95 0.95 1.00 1.00 Motorbike 0.30 0.25 0.27 0.34 0.28 0.40 0.37 1.00 1.00 Winebottle 0.52 0.64 0.72 0.70 0.71 0.73 0.73 1.00 1.00 CMU-House 0.68 0.90 0.94 0.92 0.94 0.98 0.98 1.00 1.00 CMU-Hotel 0.64 0.81 0.87 0.86 0.92 0.94 0.96 1.00 1.00 Average 0.52 0.59 0.65 0.66 0.71 0.76 0.77 0.99 0.99

To do that, we first find the optimal point where synchro- map. The column (e) of the figure depicts the top-2 as- nization is at its best. Then, we set h 0.0001, β signments retained in the confidence map and (e) plots the [0.075, 0.1] and automatically start sampling← the posterior← assignments that overlap with the true solution. Note that, around this mode for 1000 iterations. Note that β is a criti- we might not have access to such an oracle in real applica- cal parameter which can also be dynamically controlled [7]. tions and only show this to illustrate potential use cases of Larger values of β cannot provide enough variation for a the estimated confidence map. good diversity of solutions. Smaller values cause greater random perturbations leading to samples far from the opti- 6.2. Evaluations on Synthetic Data mum. This can cause divergence or samples not explaining We synthetically generate 28 different problems with the local mode. Nevertheless, all our tests worked well with varying sizes: M [10, 100] nodes and n [16, 100] values in the given range. points in each node.∈ For the scenario of image∈ matching, The generated samples are useful in many applications, this would correspond to M cameras and N features in each e.g. fitting distributions or providing additional solution image. We then introduce 15% 35% random swaps to the hints. We address the case of multiple hypotheses genera- GT absolute permutations and− compute the observed rela- tion for the permutation synchronization problem and show tive ones. Details of this dataset are given in suppl. material. that generating an additional per-edge candidate with high Among all 28 sets of synthetic data, we attain an overall re- certainty helps to improve the recall. Tab. 2 shows the top- call of 91% whereas MatchEIG [51] remains about 83%. K scores we achieve by simply incorporating K likely sam- Runtime Analysis Next, we assess the computational ples. Note that, when 2 matches are drawn at random and cost of our algorithm against the state of the art methods, contribute as potentially correct matches, the recall is in- on the dataset explained above. All of our experiments are creased only by 2%, whereas including our samples instead run on a MacBook computer with an Intel i7 2.8GhZ CPU. boosts the multi-way matching by 6%. Our implementation uses a modified Ceres solver [1]. All Table 2. Using top-K errors to rank by uncertainty. Based on the the other algorithms use highly vectorized MATLAB 2017b confidence information we could retain multiple hypotheses. This code making our comparisons reasonably fair. Fig. 5 tab- is not possible by the other approaches such as Wang et al. [51, ulates runtimes for different methods excluding initializa- 77]. Rand-K refers to using K − 1 additional random hypotheses tion. MatchLift easily took more than 20min. for moderate to complement the found solution. Ours-K ranks assignments by problems and hence we choose to exclude it from this eval- our probabilistic certainty and retains top-K candidates per point. uation. It is noticeable that thanks to the ability of using Dataset Wang Ours Rand-2 Ours-2 Rand-3 Ours-3 more advanced solvers such as LBFGS, our method con- Car 0.72 0.71 0.73 0.76 0.76 0.81 verges much faster than Wang et al. and runs on par with Duck 0.63 0.67 0.67 0.69 0.67 0.72 the fastest yet least accurate spectral synchronization [56]. Face 0.95 0.95 0.96 0.97 0.96 0.98 The worst case theoretical computational complexity of our 2 3 Motorbike 0.40 0.37 0.45 0.49 0.52 0.60 algorithm is OB-LRBFGS := O(K KS(n + (2n) ) where |E| Winebottle 0.73 0.74 0.77 0.82 0.79 0.85 K is the number of LBFGS iterations and KS the number Avg. 0.69 0.69 0.71 0.75 0.74 0.79 of Sinkhorn iterations. While KS can be a bottleneck, in practice our matrices are already restricted to the Birkhoff We further present illustrative results for our confidence manifold and Sinkhorn early-terminates, letting KS remain prediction in Fig. 4. There, unsatisfactory solutions arising small. The complexity is: (1) linearly-dependent upon the in certain cases are improved by analyzing the uncertainty number of edges, which in the worst case relates quadrati- 2 0.88 0.93 cally to the number of images = N(N 1), (2) cubically [77] reports a value of , but for their method, we attained |E| − and therefore report this value. dependent on n. This is due to the fact that projection onto

11111 (a) Initialization (b) Solution (c) Confusion (d) Certainty Map (e) Top-2 (f) Top-2 Solution with Uncertainty Confusion

Figure 4. Results from our confidence estimation. Given potentially erroneous solutions (b) to the problems initialized as in (a), our latent samples discover the uncertain assignments as shown in the middle three columns (c-e). When multiple top-2 solutions are accepted as potential positives, our method can suggest high quality hypotheses (f). The edges in the last column (f) is colored by their confidence value. Note that even though, for the sake of space we show pairs of images, the datasets contain multiple sets of images.

300 the tangent space solves a system of 2n 2n equations. Spectral × MatchEIG MatchALS 200 7. Conclusion Wang et al. Ours In this work we have proposed two new frameworks 100 for relaxed permutation synchronization on the manifold of doubly stochastic matrices. Our novel model and formula- Runtime (seconds) Initial tion paved the way to using sophisticated optimizers such as 0 0 5 10 15 20 25 Riemannian limited-memory BFGS. We further integrated a Problem Complexity manifold-MCMC scheme enabling posterior sampling and Figure 5. Running times of different methods with increasing thereby confidence estimation. We have shown that our problem size: N ∈ [10, 100] and n ∈ [16, 100].

confidenceOptimized maps are informative about cycle inconsisten- cies and can lead to new solution hypotheses. We used these (iii) seek better use cases for our confidence estimates such hypotheses in a top-K evaluation and illustrated its benefits. as outlier removal. In the future, we plan to (i) address partial permutations, the Acknowledgements: Supported by the grant ANR-16-CE23-0014 (FBI- inner regionInitial of the Birkhoff Polytope (ii) investigate more MATRIX). Authors thank Haowen Deng for the initial 3D correspon- sophisticated MCMC schemes such as [28, 34, 63, 46, 65] dences and, Benjamin Busam and Jesus Briales for fruitful discussions.

Optimized 11112 References [13] Kunal N Chaudhury, Yuehaw Khoo, and Amit Singer. Global registration of multiple point clouds using [1] Sameer Agarwal, Keir Mierle, and Others. Ceres semidefinite programming. SIAM Journal on Opti- solver. http://ceres-solver.org. mization, 25(1):468–501, 2015. [2] Federica Arrigoni, Eleonora Maset, and Andrea [14] Changyou Chen, Nan Ding, and Lawrence Carin. On Fusiello. Synchronization in the symmetric inverse the convergence of stochastic gradient MCMC algo- semigroup. In International Conference on Im- rithms with high-order integrators. In Advances in age Analysis and Processing, pages 70–81. Springer, Neural Information Processing Systems, pages 2278– 2017. 2286, 2015. [3] Federica Arrigoni, Beatrice Rossi, and Andrea [15] Yuxin Chen, Leonidas Guibas, and Qixing Huang. Fusiello. Spectral synchronization of multiple views Near-optimal joint object matching via convex re- in se (3). SIAM Journal on Imaging Sciences, laxation. In Proceedings of the 31st International 9(4):1963–1990, 2016. Conference on International Conference on Machine [4] Florian Bernard, Johan Thunberg, Peter Gemmar, Learning - Volume 32, ICML’14, pages II–100–II– Frank Hertel, Andreas Husch, and Jorge Goncalves. 108. JMLR.org, 2014. A solution for multi-alignment by transformation syn- chronisation. In Proceedings of the IEEE Conference [16] Minsu Cho, Karteek Alahari, and Jean Ponce. Learn- on Computer Vision and Pattern Recognition, pages ing graphs to match. In Proceedings of the IEEE Inter- 2161–2169, 2015. national Conference on Computer Vision, pages 25– 32, 2013. [5] Florian Bernard, Johan Thunberg, Jorge Goncalves, and Christian Theobalt. Synchronisation of par- [17] Stephan´ Clemenc¸on´ and Jer´ emie´ Jakubowicz. Kan- tial multi-matchings via non-negative factorisations. torovich distances between rankings with applications CoRR, abs/1803.06320, 2018. to rank aggregation. In Joint European Conference on Machine Learning and Knowledge Discovery in [6] Tolga Birdal, Emrah Bala, Tolga Eren, and Slobodan Databases. Springer, 2010. Ilic. Online inspection of 3d parts via a locally over- lapping camera network. In 2016 IEEE Winter Con- [18] Luca Cosmo, Andrea Albarelli, Filippo Bergamasco, ference on Applications of Computer Vision (WACV), Andrea Torsello, Emanuele Rodola,` and Daniel Cre- pages 1–10. IEEE, 2016. mers. A game-theoretical approach for joint matching of multiple feature throughout unordered images. In [7] Tolga Birdal, Umut S¸ims¸ekli, M. Onur Eken, and Pattern Recognition (ICPR), 2016 23rd International Slobodan Ilic. Bayesian Pose Graph Optimization Conference on, pages 3715–3720. IEEE, 2016. via Bingham Distributions and Tempered Geodesic MCMC. In Advances in Neural Information Process- [19] Luca Cosmo, Emanuele Rodola,` Andrea Albarelli, Fa- ing Systems (NeurIPS), 2018. cundo Memoli,´ and Daniel Cremers. Consistent par- [8] Tolga Birdal and Slobodan Ilic. Cad priors for accu- tial matching of shape collections via sparse model- rate and flexible instance reconstruction. In Proceed- ing. In Computer Graphics Forum, volume 36, pages ings of the IEEE International Conference on Com- 209–221. Wiley Online Library, 2017. puter Vision, 2017. [20] Angela Dai, Matthias Nießner, Michael Zollhofer,¨ [9] Garrett Birkhoff. Tres observaciones sobre el algebra Shahram Izadi, and Christian Theobalt. Bundlefusion: lineal. Univ. Nac. Tucuman´ Rev. Ser. A, 5:147–151, Real-time globally consistent 3d reconstruction using 1946. on-the-fly surface reintegration. ACM Transactions on Graphics (TOG), 36(4):76a, 2017. [10] Benjamin Busam, Marco Esposito, Simon Che’Rose, Nassir Navab, and Benjamin Frisch. A stereo vision [21] Joseph DeGol, Timothy Bretl, and Derek Hoiem. Im- approach for cooperative robotic movement therapy. proved structure from motion using fiducial marker In IEEE International Conference on Computer Vision matching. In Proceedings of the European Conference Workshop (ICCVW), December 2015. on Computer Vision (ECCV), pages 273–288, 2018. [11] Simon Byrne and Mark Girolami. Geodesic monte [22] Haowen Deng, Tolga Birdal, and Slobodan Ilic. Ppf- carlo on embedded manifolds. Scandinavian Journal foldnet: Unsupervised learning of rotation invariant of Statistics, 40(4):825–845, 2013. 3d local descriptors. In Proceedings of the European [12] Tiberio´ S Caetano, Julian J McAuley, Li Cheng, Conference on Computer Vision (ECCV), pages 602– Quoc V Le, and Alex J Smola. Learning graph match- 618, 2018. ing. IEEE transactions on pattern analysis and ma- [23] Haowen Deng, Tolga Birdal, and Slobodan Ilic. chine intelligence, 31(6):1048–1058, 2009. Ppfnet: Global context aware local features for robust

11113 3d point matching. In The IEEE Conference on Com- [36] Qi-Xing Huang and Leonidas Guibas. Consis- puter Vision and Pattern Recognition (CVPR), June tent shape maps via semidefinite programming. 2018. In Proceedings of the Eleventh Eurograph- [24] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai ics/ACMSIGGRAPH Symposium on Geometry Li, and Li Fei-Fei. Imagenet: A large-scale hierarchi- Processing. Eurographics Association, 2013. cal image database. In IEEE Conference on Computer [37] Wen Huang, Kyle A Gallivan, and P-A Absil. A Vision and Pattern Recognition. Ieee, 2009. broyden class of quasi-newton methods for rieman- nian optimization. SIAM Journal on Optimization, [25] Persi Diaconis, Susan Holmes, Mehrdad Shahshahani, 25(3):1660–1685, 2015. et al. Sampling from a manifold. In Advances in Mod- ern Statistical Theory and Applications: A Festschrift [38] Daniel F Huber and Martial Hebert. Fully automatic in honor of Morris L. Eaton. Institute of Mathematical registration of multiple 3d data sets. Image and Vision Statistics, 2013. Computing, 21(7):637–650, 2003. [26] A. Douik and B. Hassibi. Manifold Optimization Over [39] GLENN Hurlbert. A short proof of the birkhoff-von the Set of Doubly Stochastic Matrices: A Second- neumann theorem. preprint (unpublished), 2008. Order Geometry. ArXiv e-prints, Feb. 2018. [40] Ross Kindermann. Markov random fields and their applications. American mathematical society, 1980. [27] Fanny Dufosse,´ Kamer Kaya, Ioannis Panagiotas, and Bora Uc¸ar. Further notes on Birkhoff-von Neumann [41] Anna Korba, Alexandre Garcia, and Florence d’Alche´ decomposition of doubly stochastic matrices. PhD Buc. A structured prediction approach for label rank- thesis, Inria-Research Centre Grenoble–Rhone-Alpes,ˆ ing. arXiv preprint arXiv:1807.02374, 2018. 2017. [42] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin- [28] Alain Durmus, Umut Simsekli, Eric Moulines, Roland ton. Imagenet classification with deep convolutional Badeau, and Gael¨ Richard. Stochastic gradient neural networks. In Advances in neural information Richardson-Romberg Markov Chain Monte Carlo. In processing systems, pages 1097–1105, 2012. Advances in Neural Information Processing Systems, [43] Hongdong Li and Richard Hartley. The 3d-3d regis- pages 2047–2055, 2016. tration problem revisited. In Computer Vision, 2007. [29] Rizal Fathony, Sima Behpour, Xinhua Zhang, and ICCV 2007. IEEE 11th International Conference on, Brian Ziebart. Efficient and consistent adversarial bi- pages 1–8. IEEE, 2007. partite matching. In International Conference on Ma- [44] Xinchao Li, Martha Larson, and Alan Hanjalic. Pair- chine Learning, pages 1456–1465, 2018. wise geometric matching for large-scale object re- trieval. In Proceedings of the IEEE Conference [30] P. Gargallo. Using opensfm. Online. Accessed May- on Computer Vision and Pattern Recognition, pages 2018. 5153–5161, 2015. [31] Mark Girolami and Ben Calderhead. Riemann mani- [45] Scott Linderman, Gonzalo Mena, Hal Cooper, Liam fold Langevin and Hamiltonian Monte Carlo methods. Paninski, and John Cunningham. Reparameterizing Journal of the Royal Statistical Society: Series B 118 the birkhoff polytope for variational permutation in- (Statistical Methodology), 73(2):123–214, 2011. ference. In International Conference on Artificial In- [32] Michel X Goemans. Smallest compact formulation telligence and Statistics, pages 1618–1627, 2018. for the permutahedron. Mathematical Programming, [46] Chang Liu, Jun Zhu, and Yang Song. Stochastic gra- 2015. dient geodesic mcmc methods. In Advances in Neural [33] Venu Madhav Govindu. Lie-algebraic averaging for Information Processing Systems, pages 3009–3017, globally consistent motion estimation. In Proceed- 2016. ings of the 2004 IEEE Computer Society Conference [47] David G Lowe. Distinctive image features from scale- on Computer Vision and Pattern Recognition, 2004. invariant keypoints. International journal of computer CVPR 2004., volume 1, pages I–I. IEEE, 2004. vision, 60(2):91–110, 2004. [34] Andrew Holbrook. Note on the geodesic monte carlo. [48] Vince Lyzinski, Donniell E Fishkind, Marcelo Fiori, arXiv preprint arXiv:1805.05289, 2018. Joshua T Vogelstein, Carey E Priebe, and Guillermo [35] Nan Hu, Qixing Huang, Boris Thibert, UG Alpes, Sapiro. Graph matching: Relax at your own risk. and Leonidas Guibas. Distributable consistent multi- IEEE transactions on pattern analysis and machine object matching. In Proceedings of the IEEE Con- intelligence, 38(1):60–73, 2016. ference on Computer Vision and Pattern Recognition, [49] Yi-An Ma, Tianqi Chen, and Emily Fox. A complete 2018. recipe for stochastic gradient MCMC. In Advances in

11114 Neural Information Processing Systems, pages 2917– [62] Johannes L Schonberger and Jan-Michael Frahm. 2925, 2015. Structure-from-motion revisited. In Proceedings of [50] Joao˜ Maciel and Joao˜ Paulo Costeira. A global solu- the IEEE Conference on Computer Vision and Pattern tion to sparse correspondence problems. IEEE Trans- Recognition, pages 4104–4113, 2016. actions on Pattern Analysis and Machine Intelligence, [63] Umut S¸ims¸ekli. Fractional Langevin Monte Carlo: 25(2), 2003. Exploring Levy´ driven stochastic differential equa- [51] E. Maset, F. Arrigoni, and A. Fusiello. Practical and tions for Markov Chain Monte Carlo. In International efficient multi-view matching. In 2017 IEEE Inter- Conference on Machine Learning, pages 3200–3209. national Conference on Computer Vision (ICCV), Oct JMLR. org, 2017. 2017. [64] Umut Simsekli, Roland Badeau, Taylan Cemgil, and [52] James Munkres. Algorithms for the assignment and Gael¨ Richard. Stochastic quasi-Newton Langevin transportation problems. Journal of the society for in- Monte Carlo. In International Conference on Machine dustrial and applied mathematics, 5(1):32–38, 1957. Learning (ICML), 2016. [53] Raul´ Mur-Artal and Juan D. Tardos.´ ORB-SLAM2: [65] Umut S¸ims¸ekli, C¸agatay˘ Yıldız, Thanh Huy Nguyen, an open-source SLAM system for monocular, stereo Gael¨ Richard, and A Taylan Cemgil. Asynchronous and RGB-D cameras. IEEE Transactions on Robotics, stochastic quasi-Newton MCMC for non-convex op- 33(5):1255–1262, 2017. timization. In International Conference on Machine Learning, 2018. [54] Andy Nguyen, Mirela Ben-Chen, Katarzyna Wel- nicka, Yinyu Ye, and Leonidas Guibas. An opti- [66] Amit Singer. Angular synchronization by eigenvectors mization approach to improving collections of shape and semidefinite programming. Applied and computa- maps. In Computer Graphics Forum, volume 30, tional harmonic analysis, 30(1):20, 2011. pages 1481–1491. Wiley Online Library, 2011. [67] Amit Singer and Yoel Shkolnisky. Three-dimensional [55] Deepti Pachauri, Risi Kondor, Gautam Sargur, and structure determination from common lines in cryo- Vikas Singh. Permutation diffusion maps (pdm) with em by eigenvectors and semidefinite programming. application to the image association problem in com- SIAM journal on imaging sciences, 4(2):543–572, puter vision. In Advances in Neural Information Pro- 2011. cessing Systems, pages 541–549, 2014. [68] Richard Sinkhorn and Paul Knopp. Concerning non- [56] Deepti Pachauri, Risi Kondor, and Vikas Singh. Solv- negative matrices and doubly stochastic matrices. Pa- ing the multi-way matching problem by permutation cific Journal of Mathematics, 21(2):343–348, 1967. synchronization. In Advances in neural information [69] Steven T Smith. Optimization techniques on Rie- processing systems, pages 1860–1868, 2013. mannian manifolds. Fields institute communications, [57] Han-Mu Park and Kuk-Jin Yoon. Consistent multiple 3(3):113–135, 1994. graph matching with multi-layer random walks syn- [70] Yifan Sun, Zhenxiao Liang, Xiangru Huang, and Qix- chronization. Pattern Recognition Letters, 2018. ing Huang. Joint map and symmetry synchronization. [58] Sam Patterson and Yee Whye Teh. Stochastic gradi- In Proceedings of the European Conference on Com- ent Riemannian Langevin dynamics on the probability puter Vision (ECCV), pages 251–264, 2018. simplex. In Advances in Neural Information Process- [71] Da Tang and Tony Jebara. Initialization and coordi- ing Systems, pages 3102–3110, 2013. nate optimization for multi-way matching. In Artificial [59] Sergey Plis, Stephen McCracken, Terran Lane, and Intelligence and Statistics, pages 1385–1393, 2017. Vince Calhoun. Directional statistics on permutations. [72] Johan Thunberg, Florian Bernard, and Jorge In Proceedings of the Fourteenth International Con- Goncalves. Distributed methods for synchronization ference on Artificial Intelligence and Statistics, pages of orthogonal matrices over graphs. Automatica, 600–608, 2011. 80:243–252, 2017. [60] Torsten Sattler, Tobias Weyand, Bastian Leibe, and [73] Bill Triggs, Philip F McLauchlan, Richard I Hartley, Leif Kobbelt. Image retrieval for image-based local- and Andrew W Fitzgibbon. Bundle adjustment—a ization revisited, 2012. modern synthesis. In International workshop on vi- [61] Michele Schiavinato and Andrea Torsello. Synchro- sion algorithms, pages 298–372. Springer, 1999. nization over the birkhoff polytope for multi-graph [74] Nilesh Tripuraneni, Nicolas Flammarion, Francis matching. In International Workshop on Graph-Based Bach, and Michael I Jordan. Averaging stochastic gra- Representations in Pattern Recognition, pages 266– dient descent on Riemannian manifolds. In Confer- 275. Springer, 2017. ence on Learning Theory (COLT), 2018.

11115 [75] Roberto Tron and Rene Vidal. Distributed 3-d lo- calization of camera sensor networks from 2-d image measurements. IEEE Transactions on Automatic Con- trol, 59(12):3325–3340, 2014. [76] Lanhui Wang and Amit Singer. Exact and stable re- covery of rotations for robust synchronization. In- formation and Inference: A Journal of the IMA, 2(2):145–193, 2013. [77] Qianqian Wang, Xiaowei Zhou, and Kostas Daniilidis. Multi-image semantic matching by mining consistent features. In CVPR, 2018. [78] Tatiana Xifara, Chris Sherlock, Samuel Livingstone, Simon Byrne, and Mark Girolami. Langevin dif- fusions and the Metropolis-adjusted Langevin algo- rithm. Statistics & Probability Letters, 91:14–19, 2014. [79] Junchi Yan, Minsu Cho, Hongyuan Zha, Xiaokang Yang, and Stephen M Chu. Multi-graph matching via affinity optimization with graduated consistency regu- larization. IEEE transactions on pattern analysis and machine intelligence, 38(6):1228–1242, 2016. [80] Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. A short survey of recent advances in graph matching. In Proceedings of the 2016 ACM on International Conference on Multi- media Retrieval, pages 167–174. ACM, 2016. [81] Jin-Gang Yu, Gui-Song Xia, Ashok Samal, and Jin- wen Tian. Globally consistent correspondence of mul- tiple feature sets using proximal gauss–seidel relax- ation. Pattern Recognition, 51:255–267, 2016. [82] Xinru Yuan, Wen Huang, P-A Absil, and Kyle A Gal- livan. A riemannian limited-memory BFGS algorithm for computing the matrix geometric mean. Procedia Computer Science, 80:2147–2157, 2016. [83] Andrei Zanfir and Cristian Sminchisescu. Deep learn- ing of graph matching. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [84] Mikhail Zaslavskiy, Francis Bach, and Jean-Philippe Vert. A path following algorithm for the graph match- ing problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227–2242, 2009. [85] Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. Multi-image matching via fast alternating minimiza- tion. In Proceedings of the IEEE International Con- ference on Computer Vision, pages 4032–4040, 2015.

11116