Distributed Consistent Data Association

Spyridon Leonardos1, Xiaowei Zhou2 and Kostas Daniilidis3

Abstract— Data association is one of the fundamental prob- lems in multi-sensor systems. Most current techniques rely on pairwise data associations which can be spurious even after the employment of outlier rejection schemes. Considering multiple pairwise associations at once significantly increases accuracy and leads to consistency. In this work, we propose two fully decentralized methods for consistent global data association from pairwise data associations. The first method is a consensus algorithm on the set of doubly stochastic matrices. The second method is a decentralization of the spectral method proposed by Pachauri et al. [28]. We demonstrate the effectiveness of both methods using theoretical analysis and experimental evaluation.

I.INTRODUCTION Multi-sensor data association has been a long standing problem in robotics and computer vision. It refers to the problem of establishing correspondences between feature Fig. 1. Illustration of consistent data association. points, regions, or objects observed by different sensors and serves as the basis for many high-level tasks such as localization, mapping and planning. Most of the efforts in neighbors, and finally producing globally consistent data previous works have been dedicated to improving the data as- association. sociation by designing new feature detectors, descriptors, and Our contributions are the following. 1) We propose a novel outlier removal algorithms in a pairwise setting. However, consensus-like algorithm for distributed data association. the problem setting in practice is often multi-way if the data 2) We propose a decentralized version of the centralized are collected by a sensor network or a multi-robot system, state-of-art spectral method [28]. 3) We show that both and how to establish consistent data association for multiple methods provably converge, do not depend on initialization, sensors and leverage the global reasoning to improve the are parameter-free and guarantee global consistency. 4) We association has drawn increasing attention in recent years. demonstrate the validity of proposed methods through both A necessary condition for good data association among theoretical analysis and experimentation with synthetic and multiple sensors is the cycle consistency meaning that the real data. composition of associations along a cycle of sensors should The rest of this paper is organized as follows. Section II be identity. The cycle consistency is often violated in practice includes a brief review of prior works. Section III introduces due to outliers and how to use such a constraint to remove the reader to notation and preliminaries essential to under- the false associations has been studied in robotics, computer standing the main results of this paper. A rigorous problem vision and graphics. The related work will be discussed formulation is the topic of Section IV. The two proposed arXiv:1609.07015v2 [cs.MA] 25 Oct 2016 in Section II. While promising empirical and theoretical methods are presented in Sections V and VI respectively. results have been reported in previous works, most of them Finally, experimental evaluation is presented in Section VII. addressed the problem in a centralized manner and assumed II. RELATED WORK that the observations and states of all sensors could be accessed at the same time. This assumption is impractical The cycle consistency has been explored in a variety of in a distributed system where the data is processed on computer vision applications. For example, Zach et al. [37] local sensors with limited computational and communication studied the problem of multi-view reconstruction and pro- resources. In this paper, we aim to develop distributed posed to identify unreliable geometric relations (e.g. relative algorithms that can efficiently operate on each sensor, re- poses) between views by measuring the consistencies along a quiring only local information and communication with its number of cycles. Nguyen et al. [26] addressed the problem of finding point-wise maps among a collection of shapes 1,2,3The authors are with the Department of Computer and Informa- and proposed to use the cycle consistency to identify the tion Science, University of Pennsylvania Philadelphia, PA 19104, USA correctness of maps and replace the incorrect maps with the {spyridon,xiaowz,kostas}@seas.upenn.edu Support by the grants ARL MAST-CTA W911NF-08-2-0004 and ARL compositions of correct ones. Yan et al. [35], [36] consid- RCTA W911NF-10-2-0016 is gratefully acknowledged. ered the multiple graph matching problem and imposed the cycle consistency by penalizing the inconsistency during the The ∆(G) is the such that optimization. Zhou et al. [38] developed a discrete searching [∆(G)]ii = |Ni|, where |·| denotes the cardinality of a set. algorithm that maximizes the cycle consistency to improve A or digraph is denoted by the pair G = the dense optical flow in an image collection. Instead of (V, E), where E ⊆ V ×V.A weighted digraph G = (V, E, w) iteratively optimizing the pairwise matches, [15], [21], [28] is a graph along with a function w : E → R+. The adjacency showed that finding globally consistent associations from matrix of a weighted digraph is defined by noisy pairwise associations can be formulated as a quadratic  w(j, i), if (j, i) ∈ E [A(G)] = (2) integer program and relaxed into a generalized Rayleigh ij 0, otherwise problem and solved by the eigenvalue decomposition. More recently, Huang and Guibas [14] and Chen et al. [6] the- Intuitively, if [A(G)]ij > 0 there is information flow from oretically proved that the consistent associations could be vertex j to vertex i. The neighborhood Ni of the vertex i is exactly recovered from noisy measurements under certain the subset of V defined by Ni = {j ∈ V | (j, i) ∈ E}. The conditions by convex relaxation and solved the problem by degree matrix ∆(G) semidefinite programming. Zhou et al. [39] proposed to solve . X [∆(G)] = d (i) = [A(G)] (3) the problem by rank minimization and developed a more ii in ij j∈N efficient algorithm. All of these works addressed problem i in a centralized setting, i.e., all pairwise measurements are A directed path is a sequence i0, i1, . . . , im of distinct ver- available and optimized jointly. tices such that (ik−1, ik) ∈ E for all k = 1, . . . , m. A digraph In the robotics community, numerous approaches for as- is strongly connected if there is a directed path between any sociating elements of two sets have been proposed and two vertices. A digraph is a rooted out-branching tree if it has extensively used in single robot Simultaneous Localization a vertex to which all other vertices are path connected and does not contain any cycles. A digraph is balanced if the in- And Mapping (SLAM) settings. These include but are not . P limited to Nearest Neighbor (NN) and Maximum Likelihood degree din(i) and the out-degree dout(i) = (i,j)∈E [A(G)]ji (ML) approaches [18], [40], Iterative Closest Point (ICP) are equal for all i ∈ V. From this point and on, all graphs [5], RANSAC [8], Joint Compatibility Branch and Bound under consideration are assumed to be directed unless stated (JCBB) [25], Maximum Common Subgraph (MCS) [3], otherwise. The graph Laplacian L(G) is defined as Random Finite Sets (RFS) [33]. More recently, multi-robot L(G) = ∆(G) − A(G) (4) localization and mapping algorithms have been developed. In almost all of them, the data association is addressed either in By construction, L(G)1 = 0. A digraph on n vertices a pairwise fashion by directly generalizing data association contains a rooted-out branching if and only the rank of its techniques from the single robot case [7], [9], [34] or by Laplacian is n − 1. A matrix closely related to the graph considering all associations jointly in a centralized manner Laplacian is defined by . [16]. Closer to our approach is the work by Aragues et al. [1] F (G) = (I + ∆(G))−1(I + A(G)) (5) which is a decentralized method for finding inconsistencies based on cycle detection. However, the proposed inconsis- B. Stochastic Matrices and Permutations tency resolution algorithm comes without any optimality or Stochastic matrices and their properties have been well guarantees. studied in the area of distributed dynamical systems [4], [17], [27] and in probability theory in the context of computing III. PRELIMINARIES, NOTATION AND invariant distributions of Markov chains [2], [4]. A nonneg- DEFINITIONS ative square matrix is stochastic if all its row sums are equal to 1. The spectral radius ρ(A) of a A is A. equal to 1 and it is an eigenvalue of A since A1 = 1. The k In this subsection, we review some elementary facts from existence and the form of the limit limk→∞ A depends on graph theory. For an in depth analysis, we refer the reader the multiplicity of 1 as an eigenvalue. A nonegative square to standard texts [10], [24]. An undirected graph or simply matrix A is irreducible if its associated graph [13] is strongly a graph is denoted by the pair G = (V, E), where V = connected. An irreducible stochastic matrix is primitive if it {1, 2, . . . , n} is the set of vertices and E ⊆ [V]2 is the has only one eigenvalue of maximum modulus. set of edges, where [V]2 denotes the set of unordered pairs A nonnegative square matrix is if both its row sums and column sums are equal to 1. of elements of V. The neighborhood Ni of the vertex i is A doubly stochastic matrix is a if its the subset of V defined by Ni = {j ∈ V | {i, j} ∈ E}.A elements are either 0 or 1. The sets of stochastic matrices path is a sequence i0, i1, . . . , im of distinct vertices such that and doubly stochastic matrices are compact convex sets. This {ik−1, ik} ∈ E for all k = 1, . . . , m. A graph is connected if there is a path between any two vertices. Given a graph is a rather useful property since it implies closure under G = (V, E), its is defined by convex combinations. For more details, regarding the theory of stochastic matrices, we refer the reader to [13]. .  1, if {i, j} ∈ E Let [n] = {1, 2, . . . , n} for some positive integer n.A [A(G)] = (1) ij 0, otherwise mapping π :[n] → [n] is a permutation of [n] if it is bijective. The set of all permutations of [n] forms a S1 S2 S3 group under composition, termed the symmetric group Sn.A 1 permutation π ∈ Sn is represented by a n×n permutation matrix Π defined by 1 1 1  1, if π(j) = i [Π] = (6) ij 0, otherwise 2 2 2 or equivalently Πej = eπ(j), where ej is the jth canonical 3 3 3 basis vector. The simplest choice for a distance on Sn is given by −1 . d(π1, π2) = d(e, π1 π2) = n − hΠ1, Π2i (7) 1 1 1 . T where hA, Bi = tr(A B), e is the identity map and Π1, Π2 are the matrix representations of the permutations π1 and π2 2 2 2 respectively. The distance function defined above is simply the number of labels assigned differently by permutations π1 3 3 3 and π2. With a slight abuse of notation, we write Π ∈ Sn meaning that Π is an n × n permutation matrix. Fig. 2. Example with n = 3 sensors S1,S2,S3 observing m = 3 targets. C. Consensus Algorithms Top: consistent data association. Bottom: inconsistent data association since π12 ◦ π23(2) = 3 but π31(2) = 2. Consensus algorithms have been extensively studied in the control community [4], [17], [27]. In its simplest form, a consensus algorithm is a decentralized protocol in which the localization problems. We denote by πeij ∈ Sm the, possibly agents, modeled as vertices of a graph, try to reach agreement erroneous, estimated pairwise association between sensor i by communicating only with their neighbors. More formally, and j and by Π the corresponding matrix representation. e ij. let xi(t) ∈ R denote the state of agent i at time time t. The, Moreover, let Πi = ρ(πi). the simplest consensus protocol is given by Related to the sensor graph is the data association graph X D = (VD, ED, wD), where VD = V × {1, 2, . . . , m}. There xi(t + 1) = aijxj(t) (8) is an edge from (i, k) to (j, l) if and only if (i, j) ∈ E and j∈Ni∪i [Πe ij]kl > 0. The corresponding edge weight is simply equal P where aij ≥ 0 and j aij = 1. A popular consensus to [Πe ij]kl. protocol [27], which converges to the average of the initial First, we need a precise definition of consistency. values provided that G is balanced, is described by Definition 4.1 (Consistency): A set of pairwise associa- tions {πij}(i,j)∈E is consistent if x(t + 1) = (I − L(G))x(t) (9) e πij ◦ πjk = πik (10) T e e e where x = [x1, . . . , xn] and 0 <  < 1/ maxi[∆(G)]ii. n for all valid indices i, j, k. A set of labels {πi}i=1 is con- sistent with respect to some consistent pairwise associations IV. PROBLEM FORMALIZATION {πeij}(i,j)∈E , if In this section, we formalize the problem of consistent −1 data association. We assume there are n sensors observing πeij = πi ◦ πj , ∀(i, j) ∈ E (11) m targets. For instance, assume we have a set of n cameras Based on the definition of consistency, the problem of observing a scene in the world described by a set of m consistent data association is naturally defined as follows. feature points. Sensors communicate only with a subset of all sensors. Communication constraints between sensors are Definition 4.2 (Consistent data association): encoded by the sensor graph. The sensor graph is the digraph Given pairwise associations {πeij}(i,j)∈E , find labels G = (V, E) where V = {1, 2, . . . , n} and (i, j) ∈ E if there π1, . . . , πn ∈ Sm, such that is information flow from sensor i to j. The pairwise associa- −1 πij = πi ◦ πj , ∀ (i, j) ∈ E (12) tion πij ∈ Sm is defined as follows: we have that πij(l) = k e if the lth target in jth sensor corresponds to the kth target in Remark 1: Under the presence of noise, it might not be n ith sensor. Observe that the pairwise associations πij can be possible to find labels {πi}i=1 satisfying (12) exactly. There- −1 fore, in practice we are looking for labels {π }n satisfying written as πij = πi ◦ πj , where πi ∈ Sm is the mapping i i=1 from the labels of sensor i to some global labels, termed (12) as much as possible according to some criterion. the “universe of features” in some works [6], [39]. This is V. DISTRIBUTED AVERAGING analogous to the absolute position in Euclidean spaces in In the classic consensus algorithms, each agent updates his 1 n estimate of a collective quantity by taking convex combina- A representation of a group G on R is a map ρ : G → GL(n) satisfying ρ(g1g2) = ρ(g1)ρ(g2). tions of the estimates of his neighbors. The problem at hand is different: we do not want all πi’s to converge to the same B. Properties value, rather to converge to a value satisfying consistency. First of all, as the sets of stochastic and doubly stochastic The first obstacle one encounters is the combinatorial nature matrices are convex and thus, closed under convex combi- of the problem at hand. Therefore, we relax the domain of nations, we naturally have the following lemma. problem and work with doubly stochastic matrices instead Lemma 5.1: If all Πe ij and Πi(0) are stochastic then, Πi(t) of permutation matrices. This is a natural choice since the is stochastic for all t ≥ 0. Similarly, if all Πe ij and Πi(0) convex hull of the set of m × m permutation matrices is are doubly stochastic then, Πi(t) is doubly stochastic for all exactly the set of m × m doubly stochastic matrices. t ≥ 0. A. Algorithm Next, we show that in the noiseless case Protocol (13) with a distinguished vertex converges to a consistent solution Based on the previous discussion, we propose the follow- under mild coniditions on the sensor graph. To prove this, ing update rule: we need the following lemma: 1 X  Lemma 5.2: Given a digraph G that contains a rooted-out Πi(t + 1) = Πi(t) + Πe ijΠj(t) (13) |Ni| + 1 branching tree, we have that the matrix F (G) as defined j∈Ni T T T in (5) satisfies: (a) ρ(F (G)) = 1, (b) 1 is an algebraically Let Π = [Π1 ,..., Πn ] . The update rule (13) can be simple√ eigenvalue of F (G) with corresponding eigenvector collectively written as 1/ n and (c) the limit F (G) exists and is given by Π(t + 1) = F (D)Π(t) (14) lim F (G)k = 1cT , (19) k→∞ where D is the data association graph and F (D) is defined T in (5). The convergence properties of the update rule (13) de- where c ≥ 0 and c 1 = 1. k A proof of lemma 5.2 can be found in Appendix IX-A. Next, pend solely on limk→∞ F (D) and Π(0). These properties will be presented in Section V-B. we present the theorem of convergence of Protocol 13 to a After the convergence of Protocol (13), we discretize the consistent labeling from arbitrary initialization. A proof of solution by solving the assignment problem, formally defined theorem 5.3 is presented in Appendix IX-B and heavily relies by on lemma 5.2. Theorem 5.3: If the sensor graph G contains a rooted- maximize hΠe i, Πii Πi (15) out branching and the pairwise associations are noiseless, subject to Πi ∈ Sm then the consensus protocol with the distinguished vertex converges to a set of consistent labels up to a global where Π is the result of Protocol (13). The assignment e i permutation, that is problem can be solved using the Hungarian algorithm [22] 3 in O(m ) time. lim Πi(t) = Πi0Π0 (20) t→∞ Remark 2: The update rule (13) can be seen as a discrete- where {Π }n is a consistent set of labels (with respect time system with state Πi and control input Ui, i.e. i0 i=1 to Πe ij) and Π0 is an arbitrary permutation. Furthermore, if Πi(t + 1) = Πi(t) + Ui(t) (16) T vertex 1 is the distinguished vertex, then Π0 = Π10Π1(0). 1 X  However, in the general case, pairwise associations will Ui(t) = Πe ijΠj(t) − Πi(t) (17) |Ni| + 1 contain noise. Therefore, we should prove convergence with- j∈N i out assuming perfect associations. We generalize lemma 5.2 n n Remark 3: (Intrinsic ambiguity) Let {Πi}i=1 ∈ (Sm) for the case of interest. Consider a digraph D with n vertices be a set of consistent labels. Then, for any Π0 ∈ Sm, and m < n distinguished vertices that have only outgoing n {ΠiΠ0}i=1 are consistent as well. This intrinsic ambiguity is edges. Assume that for every non-distinguished vertex there analogous to the ambiguity of the global reference frame in is a (directed) path from at least one distinguished vertex. rotation localization [30]. To remove this ambiguity, one can Then, the succeeding lemma provides us with the limit of fix the label of one sensor, say the first sensor, by modifying F (D). the sensor graph G so that the first sensor has only outgoing Lemma 5.4: The matrix F (D) as defined in (5) satisfies: edges resulting in Π1(t) = Π1(0) ∈ Sm for all t ≥ 0. This (a) ρ(F (D)) = 1, (b) F (D) has 1 as an eigenvalue with approach is necessary in the presence of noise and we will algebraic and geometric multiplicities equal to the number refer to it as Protocol (13) with a distinguished vertex. of distinguished vertices, (c) it asymptotically converges to Remark 4: (Equivalence to consensus) In the noiseless a limit of the form Π = Π Π−1   case, we can write e ij i0 j0 for some consistent set k Im 0 of labels {Π }n ∈ (S )n . By defining new variables lim F (D) = (21) i0 i=1 m k→∞ F21 0 0 . −1 e Πi = Πi0 Πi, we obtain A proof of lemma 5.4 is presented in Appendix IX-C. An immediate consequence of lemma 5.4 is the following 0 1 0 X 0  Πi(t + 1) = Πi(t) + Πj(t) (18) theorem. |Ni| + 1 j∈Ni Theorem 5.5: If the sensor graph G contains a rooted-out which is a component-wise Vicsek model [17], [32]. branching, then the consensus protocol with a distinguished . 1 Pn T vertex asymptotically converges to a solution that depends Z = n i=1 Zi = R R. Based on this observation, if all only on {Πe ij}(i,j)∈E and on the initial value of the labels of sensors know n and solve a consensus problem for com- the distinguished vertex. puting the average Z = RT R, then the orthogonalization Remark 5: Although Protocol (13) is not guaranteed to step can be performed locally using Cholesky decomposition. converge to the true solution in the presence of noise, it is This is summarized in Algorithm 2 where Zi denotes the Pn T verified experimentally (see Section VII) that for moderate estimate of Z by sensor i. Alternatively, the sum i=1 Yi Yi levels of noise the true solution is still recovered. can be computed using the Push-Sum algorithm [19] as in [20]. VI. DISTRIBUTED ORTHOGONAL ITERATION Remark 6: Kempe and McSherry [20] have shown that if In this section, we present a decentralized implementation the error in the computation of Ri can be made arbitrarily of the spectral method proposed by Pachauri et al. [28]. First, small in finite iterations, then Distributed Orthogonal Iter- we briefly review the main concept of [28]. Based on the ation converges asymptotically an invariant subspace of the discussion in Section IV, the multi-sensor data association input matrix. A rigorous analysis on the number of iterations can be cast as the following optimization problem: of the inner consensus is the subject of future work. X −1 minimize d(πeij, πi ◦ πj ) (22) π1,π2,...,πn∈Sm Algorithm 2 Distributed Orthogonal Iteration (i,j)∈E Require: Input associations {Π } , initial {Π (0)} which is equivalent to e ij (i,j)∈E i i∈V 1: for t = 0,...,N do X T 2: Y (t) = P Π Π (t) maximize tr(Πi Πe ijΠj) (23) i j∈Ni∪{i} e ij j Π1,Π2,...,Πn∈Sm (i,j)∈E 3: Estimate Zi(t) by consensus. In the above formulation, each Π is a permutation matrix 4: Ri(t) = chol(Zi(t)) i −1 which makes the problem computationally intractable. How- 5: Πi(t + 1) = Yi(t)Ri(t) ever, if the individual constraints are relaxed into a single col- 6: end for T lective constraint of the form Π Π = mIm as proposed in [28], the relaxed problem becomes computationally tractable Remark 7: Due to the nonlinear dependence of Zi on since it becomes a Rayleigh quotient problem: Yi, dynamic consensus algorithms [29] cannot be readily incorporated into the current approach. maximize tr(ΠT PΠ) Π (24) The solution obtained by the Orthogonal Iteration is opti- T subject to Π Π = mIm mal up to an arbitrary m×m orthogonal transformation. That is, if Π? is an optimal solution of (24), then any matrix of the where [P]ij = Πe ij. It is a well-known fact that the solution form Π?Q with Q ∈ O(m) is optimal as well. Finally, the of problem (24) is given by the matrix having as columns√ corrective orthogonal transformation is computed by solving the m leading eigenvectors of P scaled by a factor of m. the following Orthogonal Procrustes [11] problem: Numerically, the optimal solution in a centralized setting can 2 be computed using Orthogonal Iteration [11] (see Algorithm minimize kI − Π1QkF (25) Q∈O(m) 1). Intuitively, each iteration consists of two main steps: T a power step and an orthogonalization step based on QR- It is well known [11] that if Π1 = USV is an SVD of Π1, decomposition. For details on the convergence of Orthogonal then Q? = VU T . As a final step, we apply the corrective Iteration we refer the reader to [11] and references therein. transformation Q to all remaining Πi, i = 2, . . . , n and then find a discrete solution by the Hungarian algorithm. Algorithm 1 Orthogonal Iteration VII.EXPERIMENTS Require: Input matrix P, initial Π(0) 1: for t = 0,...,N do A. Synthetic data 2: Y(t) = PΠ(t) First, we experiment with synthetic data to validate the 3: Q(t)R(t) = Y(t) accuracy of proposed methods across different problem 4: Π(t + 1) = Q(t) settings. We generate instances with m = 50 targets and 5: end for n ∈ {20, 50, 100} sensors. As a performance criterion we use the accuracy of the obtained labels, which is simply defined The Orthogonal Iteration method is not readily decentral- as the percentage of correct labels. We vary the percentage izable since the QR-decomposition step requires information of erroneous associations in each pairwise association from from the entire sensor graph. Nevertheless, as observed 10% to 90% to test the robustness of each method against T T T by Kempe and McSherry [20], if Y = [Y1 ,...,Yn ] the presence of outliers. We first experiment with a fully where each sub-matrix Yi is available at sensor i, we obtain connected graph, see Fig. 3 (a)-(c), and then we remove half T T Pn T Pn T R R = Y Y = i=1 Yi Yi. If the sum i=1 Yi Yi can of the edges, see Fig. 3 (d)-(f). In the case of a fully con- be computed in a decentralized manner, then we can recover nected graph, the consensus protocol achieves exact recovery . T R(t) using Cholesky factorization. Let Zi = nYi Yi and of the labels for up to approximately 40% of outliers. The TABLE II spectral method clearly outperforms the consensus protocol GRAFFITI DATASETS RESULTS and is able to achieve exact recovery for up to 80%-90% Dataset Baseline Spectral Consensus of outliers. By reducing either the number of edges in the Graffiti 0.544 0.639 0.647 sensor graph or the number of sensors, the accuracy of both Wall 0.570 0.646 0.658 methods slightly drops. As expected, the benefit of using Trees 0.912 0.902 0.909 joint matching in accuracy increases with the size of the Leuven 0.962 0.972 0.979 Bikes 0.986 0.987 0.987 network and the number of edges. UBC 0.996 0.996 0.996 B. Real data First, we experiment with the CMU hotel sequence 2 which consists of 111 frames and the CMU house 3 sequence VIII.CONCLUSIONSANDFUTUREWORK which consists of 101 frames. We detect corner features In this paper, we proposed two fully decentralized methods [12] in the first image that remain visible across the entire for the problem of data association in sensor networks. sequence. The ground-truth associations are obtained by The first was a computationally inexpensive consensus-like tracking the features across the entire sequence. Specifically, algorithm and the second a decentralized spectral method. we used 89 points in the hotel sequence and 55 in the house We presented experimental results in the context of camera sequence. Pairwise matches are computed between an image sensor networks. We believe that the results of this work and at most 10 previous images and 10 subsequent images will find applications in other settings as well. In the future, in the sequence, thus the sensor graph is relatively sparse. we plan on generalizing these algorithms in order to handle Pairwise matches are obtained by extracting SIFT descriptors different number of targets in each sensor and time-varying [23] and then solving the assignment problem using the pairwise associations. Hungarian algorithm. We used the SIFT implementation of [31]. The baseline method consists of obtaining labels IX.APPENDIX by directly matching with the first image in the sequence. A. Proof of lemma 5.2 The ground-truth associations are extracted by tracking the features over the sequence. The results are presented in First, we need the following lemma: Table I. Both methods significantly outperform the baseline Lemma 9.1: Let Li(G) be the matrix obtained from the which shows that joint matching can significantly improve Laplacian L(G) by removing the ith row and ith column. the overall matching performance. Define Fi(G) analogously. Then,

TABLE I det(Li(G)) 6= 0 iff det(I − Fi(G)) 6= 0 (26) CMU DATASETS RESULTS Dataset Baseline Spectral Consensus Proof: By construction L(G) = (I + ∆(G))(I − F (G)) House 0.716 0.861 0.887 and thus, Li(G) = (I + ∆i(G))(I − Fi(G)) since ∆(G) is Hotel 0.666 0.872 0.878 diagonal. Therefore, since det(I +∆i(G)) > 0, we have that det(Li(G)) 6= 0 if and only if det(I − Fi(G)) 6= 0. Next, we experiment with the Affine Covariant Regions Now, we are ready to proceed to the proof of lemma 5.2 Datasets 4 which consist of sequences of 6 images with By Gersgorin’sˇ discs theorem, we know that the eigenvalues significant overlap but with viewpoint variability. We detect of F (G) lie in the unit circle and since it is stochastic, that is approximately 300 SIFT features [23] in each sequence F 1 = 1, it can be immediately concluded that ρ(F (G)) = 1. that are visible across the entire sequence. The ground- From Proposition 3.8 page 51 of [24], we have that if truth associations are extracted using the available ground- the (directed) graph G contains a rooted-out branching as a truth homographies. The results are presented in Table II. subgraph, then rank L(G) = n − 1. Combining this result Both methods significantly outperform the baseline in the with 9.1, we conclude that the existence of a rooted-out two challenging sequences Graffiti and Wall. However, in branching implies that rank(I−F (G)) = n−1. We conclude the remaining sequences the input pairwise associations are that 1 is a simple eigenvalue of F (G). already very accurate resulting in only improvements Let λ1, λ2, . . . , λn be an enumeration of the eigenvalues of −1 if any. It is worth noting that the spectral method and F (G) such that λ1 = 1. Moreover, let F (G) = PJ(Λ)P the consensus method have similar performance in all real be the Jordan decomposition of F (G). From, F (G)P = sequences despite the fact that the spectral method is com- PJ(Λ), it is easy to see that the the first column of P is in k T putationally more expensive. Therefore, in situations where the span of 1 and thus, limk→∞ F (G) = 1c . Moreover, the computational resources are limited, for instance swarms since F (G)k is stochastic for all positive integers k, it follows of UAVs with on-board cameras, the consensus algorithm that c ≥ 0 and cT 1 = 1. would be preferable. B. Proof of theorem 5.3 2 http://vasc.ri.cmu.edu//idb/html/motion/hotel/ 3http://vasc.ri.cmu.edu//idb/html/motion/house In the noiseless case, observe that 4http://www.robots.ox.ac.uk/˜vgg/data/data-aff. T html F (D) = Π0 (F (G) ⊗ Im) Π0 , (27) 1 CS 1 CS 1 CS SP SP SP

0.8 0.8 0.8

0.6 0.6 0.6 accuracy accuracy accuracy 0.4 0.4 0.4

0.2 0.2 0.2

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fraction of outliers Fraction of outliers Fraction of outliers (a) n = 20 (b) n = 50 (c) n = 100

1 CS 1 CS 1 CS SP SP SP

0.8 0.8 0.8

0.6 0.6 0.6 accuracy accuracy accuracy 0.4 0.4 0.4

0.2 0.2 0.2

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fraction of outliers Fraction of outliers Fraction of outliers (d) n = 20 (e) n = 50 (f) n = 100 Fig. 3. Accuracy of consensus protocol (CS) and spectral method (SP) versus percentage of outliers in pairwise data associations.

Fig. 4. Top: initial matches. Blue color corresponds to inlier matches whereas red color corresponds to outlier matches. Bottom: matches after the consensus algorithm.

where Π0 = diag(Π10,..., Πn0), each Πi0 ∈ Sm satisfies Under the assumption that G is contains a rooted-out branch- T Πi0Πj0 = Πe ij. ing and thus, null(L(G)) = span(1), we obtain Πi(∞) = Πi0Q where Q is an arbitrary matrix Q. Moreover, if one k ∞ T lim F (D) = Π0 (F (G) ⊗ Im) Π0 (28) enforces Π1(t) = Π1(0), Π1(0) ∈ Sm, for all t ≥ 0, it k→∞ T T follows that Q = Π10Π1(0 and Πi(∞) = Πi0Π10Π1(0), k that is equal to any valid labels up to a global permutation. since limk→∞ F (G) exists as we proved earlier. It follows that F (D)k converges to a limit as well. Any asymp-  totic solution of the consensus protocol satisfies Π(∞) = C. Proof of lemma 5.4 F (D)Π(∞). Then, for all i ∈ {1, . . . , n} we have Πi(∞) = (1/|N |) P Π Π (∞). Using Π = Π ΠT , we get Part (a) follows from the fact that F (G) is stochastic. i j∈Ni e ij j e ij i0 j0 Without loss of generality, assume that vertices {1, 2, . . . , m}  T  Π10Π1(∞) are the distinguished ones. Then, F (G) takes the form (L(G) ⊗ I )  .  = 0   m  .  Im 0 T F (G) = (29) Πn0Πn(∞) F21 F22 For all k = 1, 2,... we have by a simple induction: [21] Vladimir G Kim, Wilmot Li, Niloy J Mitra, Stephen DiVerdi, and   Thomas A Funkhouser. Exploring collections of 3d models using k Im 0 fuzzy correspondences. ACM Transactions on Graphics, 31(4):54, F (G) = k (30) 2012. F21,k F22 [22] Harold W Kuhn. The hungarian method for the assignment problem. k Naval research logistics quarterly, 2(1-2):83–97, 1955. where F21,1 = F21, F21,k+1 = F21,k + F F21 22 [23] David G Lowe. Distinctive image features from scale-invariant By the assumption that every non-distinguished vertex is keypoints. International journal of computer vision, 60(2):91–110, path-connected to some distinguished vertex, it follows that 2004. [24] Mehran Mesbahi and Magnus Egerstedt. Graph theoretic methods in for k > 0 large enough, each row of F21,k contains at least k multiagent networks. Princeton University Press, 2010. one positive entry. Thus, all eigenvalues of F22 lie strictly [25] Jose´ Neira and Juan D Tardos.´ Data association in stochastic mapping k inside the unit circle and thus, limk→∞ F22 = 0.  using the joint compatibility test. IEEE Transactions on robotics and automation, 17(6):890–897, 2001. REFERENCES [26] Andy Nguyen, Mirela Ben-Chen, Katarzyna Welnicka, Yinyu Ye, and Leonidas Guibas. An optimization approach to improving collections [1] Rosario Aragues, Eduardo Montijano, and Carlos Sagues. Consistent of shape maps. Computer Graphics Forum, 30(5):1481–1491, 2011. data association in multi-robot systems with limited communications. [27] Reza Olfati-Saber, J Alex Fax, and Richard M Murray. Consensus In Robotics: Science and Systems, pages 97–104, 2011. and cooperation in networked multi-agent systems. Proceedings of [2] Søren Asmussen. Applied probability and queues, volume 51. Springer the IEEE, 95(1):215–233, 2007. Science & Business Media, 2008. [28] Deepti Pachauri, Risi Kondor, and Vikas Singh. Solving the multi- [3] Tim Bailey and Eduardo Nebot. Localisation in large-scale environ- way matching problem by permutation synchronization. In Advances ments. Robotics and autonomous systems, 37(4):261–281, 2001. in neural information processing systems, pages 1860–1868, 2013. [4] Dimitri P Bertsekas and John N Tsitsiklis. Parallel and distributed [29] Demetri P Spanos, Reza Olfati-Saber, and Richard M Murray. Dy- computation: numerical methods, volume 23. Prentice hall Englewood namic consensus on mobile networks. In IFAC world congress, pages Cliffs, NJ, 1989. 1–6. Prague Czech Republic, 2005. [5] Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. [30] Roberto Tron and Rene Vidal. Distributed 3-d localization of camera In Robotics-DL tentative, pages 586–606. International Society for sensor networks from 2-d image measurements. Automatic Control, Optics and Photonics, 1992. IEEE Transactions on, 59(12):3325–3340, 2014. [6] Yuxin Chen, Leonidas Guibas, and Qixing Huang. Near-optimal joint [31] Andrea Vedaldi and Brian Fulkerson. Vlfeat: An open and portable object matching via convex relaxation. In International Conference library of computer vision algorithms. In Proceedings of the 18th ACM on Machine Learning, 2014. international conference on Multimedia, pages 1469–1472. ACM, [7] Alexander Cunningham, Kai M Wurm, Wolfram Burgard, and Frank 2010. Dellaert. Fully distributed scalable smoothing and mapping with robust [32] Tamas´ Vicsek, Andras´ Czirok,´ Eshel Ben-Jacob, Inon Cohen, and Ofer multi-robot data association. In Robotics and Automation (ICRA), 2012 Shochet. Novel type of phase transition in a system of self-driven IEEE International Conference on, pages 1093–1100. IEEE, 2012. particles. Physical review letters, 75(6):1226, 1995. [8] Martin A Fischler and Robert C Bolles. Random sample consensus: [33] B-N Vo, Sumeetpal Singh, and Arnaud Doucet. Sequential monte a paradigm for model fitting with applications to image analysis and carlo methods for multitarget filtering with random finite sets. IEEE automated cartography. Communications of the ACM, 24(6):381–395, Transactions on Aerospace and electronic systems, 41(4):1224–1245, 1981. 2005. [9] Dieter Fox, Jonathan Ko, Kurt Konolige, Benson Limketkai, Dirk [34] Stefan B Williams, Gamini Dissanayake, and Hugh Durrant-Whyte. Schulz, and Benjamin Stewart. Distributed multirobot exploration and Towards multi-vehicle simultaneous localisation and mapping. In mapping. Proceedings of the IEEE, 94(7):1325–1339, 2006. Robotics and Automation, 2002. Proceedings. ICRA’02. IEEE Inter- [10] Chris Godsil and Gordon F Royle. Algebraic graph theory, volume national Conference on, volume 3, pages 2743–2748. IEEE, 2002. 207. Springer Science & Business Media, 2013. [35] Junchi Yan, Minsu Cho, Hongyuan Zha, Xiaokang Yang, and [11] Gene H Golub and Charles F Van Loan. Matrix computations, Stephen M Chu. Multi-graph matching via affinity optimization with volume 3. JHU Press, 2012. graduated consistency regularization. IEEE transactions on pattern [12] Chris Harris and Mike Stephens. A combined corner and edge detector. analysis and machine intelligence, 38(6):1228–1242, 2016. In Alvey vision conference, volume 15, page 50. Citeseer, 1988. [36] Junchi Yan, Jun Wang, Hongyuan Zha, Xiaokang Yang, and Stephen [13] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge Chu. Consistency-driven alternating optimization for multigraph university press, 2012. matching: A unified approach. IEEE Transactions on Image Process- [14] Qi-Xing Huang and Leonidas Guibas. Consistent shape maps via ing, 24(3):994–1009, 2015. semidefinite programming. Computer Graphics Forum, 32(5):177– [37] Christopher Zach, Manfred Klopschitz, and Manfred Pollefeys. Dis- 186, 2013. ambiguating visual relations using loop constraints. In IEEE Confer- [15] Qi-Xing Huang, Guo-Xin Zhang, Lin Gao, Shi-Min Hu, Adrian ence on Computer Vision and Pattern Recognition, 2010. Butscher, and Leonidas Guibas. An optimization approach for ex- [38] Tinghui Zhou, Yong Jae Lee, X Yu Stella, and Alexei A Efros. tracting and encoding consistent maps in a shape collection. ACM Flowweb: Joint image set alignment by weaving consistent, pixel-wise Transactions on Graphics, 31(6):167, 2012. correspondences. In IEEE Conference on Computer Vision and Pattern [16] Vadim Indelman, Erik Nelson, Nathan Michael, and Frank Dellaert. Recognition (CVPR), 2015. Multi-robot pose graph localization and data association from un- [39] Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. Multi-image known initial relative poses via expectation maximization. In 2014 matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), IEEE International Conference on Computer Vision, pages 4032–4040, pages 593–600. IEEE, 2014. 2015. [17] Ali Jadbabaie, Jie Lin, and A Stephen Morse. Coordination of groups [40] Xun S Zhou and Stergios I Roumeliotis. Multi-robot slam with of mobile autonomous agents using nearest neighbor rules. IEEE unknown initial correspondence: The robot rendezvous case. In 2006 Transactions on automatic control, 48(6):988–1001, 2003. IEEE/RSJ International Conference on Intelligent Robots and Systems, [18] Michael Kaess and Frank Dellaert. Covariance recovery from a square pages 1785–1792. IEEE, 2006. root information matrix for data association. Robotics and autonomous systems, 57(12):1198–1210, 2009. [19] David Kempe, Alin Dobra, and Johannes Gehrke. Gossip-based computation of aggregate information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 482–491. IEEE, 2003. [20] David Kempe and Frank McSherry. A decentralized algorithm for spectral analysis. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 561–568. ACM, 2004.