A Remark on Global Positioning from Local Distances Amit Singer∗†

†Yale University, Department of Mathematics, New Haven, CT 06520

Submitted to Proceedings of the National Academy of Sciences of the United States of America

Finding the global positioning of points in from lo- works [1, 2, 3] and protein structuring from NMR spectroscopy cal or partial set of pairwise distances is a problem in geometry that [4, 5], where noisy measurements of neighboring hydrogen emerges naturally in sensor networks and NMR spectroscopy of pro- atoms are inferred from the nuclear Overhauser effect (NOE). teins. We observe that the eigenvectors of a certain sparse matrix exactly match the sought coordinates. This translates to a simple The theory and practice involves many different areas, such as and efficient algorithm which is robust to noisy distance data. machine learning [6], optimization techniques [7] and rigidity theory [8, 9, 10, 11, 2, 12]. We find it practically impossible distance geometry | sensor networks | multidimensional scaling | eigenvectors to give a fair account of all the relevant literature, and recom- mend the interested reader to see the references within those p etermining the configuration of N points ri in R (e.g., listed. Dp = 2, 3) given their noisy distance matrix ∆ij is a long In this paper we describe locally rigid embedding (LRE), a standing problem. For example, the points ri may represent simple and efficient algorithm to find the global configuration coordinates of cities in the United States. Distances between from locally noisy distances, under the simplifying assumption nearby cities, such as New York and New Haven, are known, of local rigidity. We assume that every city, together with its but no information is given for the distance between New York neighboring cities, form a rigid subgraph that can be embed- and Chicago. In general, for every city we are given distance ded uniquely (up to a rigid transformation). That is, there are measurements to its k ≪ N nearest neighbors or to neighbors enough distance constraints among the neighboring cities that which are at most ∆ away. The distance measurements ∆ij make the local structure rigid. The “pebble game” [27] is a of dij = kri − rj k may also incorporate errors. The problem fast algorithm to determine graph rigidity. Thus, given a data is to find the unknown coordinates ri = (xi,yi) of all cities set of distances, the local rigidity assumption can be tested from the noisy local distances. quickly by applying the pebble game algorithm N times, once Even in the absence of noise the problem is solvable only for each city. In the summary section we discuss the restric- if there are enough distance constraints. By solvable we mean tiveness of the local rigidity assumption (see [11, 2, 12] for that there is a unique set of coordinates satisfying the given conditions that ensure rigidity), as well as possible variants of distance constraints up to rigid transformations (rotations, LRE when the assumption does not hold. translations and reflection). Alternatively, we say that the The essence of LRE follows the observation that one can problem is solvable only if the underlying graph consisting of construct an N × N sparse weight matrix W which has the the N cities as vertices and the distance constraints as edges property that the coordinate vectors are an eigenspace. The is rigid in Rp (see, e.g. [8]). Graph rigidity had drawn a lot of matrix is constructed in linear time complexity and has only attention over the years, see [8, 9, 10] for some results in this O(N) non-zero elements, which enables efficient computation area. of that small eigenspace. The matrix W is formed by pre- N processing the local distance information to repeatedly em- When all possible 2 pairwise distances are known, the corresponding complete graph is rigid, and the coordinates bed each city and its neighboring points (e.g., by using MDS can be computed using a classical method known as multidi- or SDP), which is possible by the assumption that every lo- mensional scaling (MDS) [13, 14, 15]. The underlying princi- cal neighborhood is rigid. Once the local coordinates are ob- ple of MDS is to use the law of cosines to convert distances into tained, we calculate weights much like the locally linear em- an inner product matrix, whose eigenvectors are the sought bedding (LLE) recipe [21] and its multiple weights (MLLE) coordinates. The eigenvectors computed by MDS are also the modification [26]. Similar to recent dimensionality reduction solution to a specific minimization problem. However, when methods [21, 22, 23, 24], the motif “think globally, fit locally” most of the distance matrix entries are missing, this mini- [25] is repeated here. This is just another example for the use- mization becomes significantly more challenging due to the fulness of eigenfunctions of operators on data sets, and their low rank-p constraint which is not convex. Various solutions ability to integrate local information to a consistent global to this constrained optimization problem have been proposed, solution. In contrast to propagation algorithms that start including relaxation of the rank constraint by semidefinite pro- with some local embedding and incrementally embed addi- gramming (SDP) [16], regularization [17, 18], smart initializa- tional points while accumulating errors, the global eigenvector tion [19], expectation maximization [20] and other relaxation computation of LRE takes into account all local information schemes [7]. Recently, the eigenvectors of the graph Laplacian at once, which makes it robust to noise. [6] were used to approximate the large scale SDP problem by a much smaller one, noting that linear combinations of only a few Laplacian eigenvectors can well approximate the coordi- Conflict of interest footnote placeholder nates. Insert ’This paper was submitted directly to the PNAS office.’ when applicable. The problem of finding the global coordinates from noisy ∗To whom correspondence should be addressed. E-mail: [email protected] local distances arises naturally in localization of sensor net- c 2007 by The National Academy of Sciences of the USA www.pnas.org — — PNAS Issue Date Volume Issue Number 1–5 Locally Rigid Embedding Next, we examine the spectrum of W . Observe that the T We start by embedding points locally. For every point ri all ones vector 1 = (1, 1,..., 1) satisfies

(i = 1,...,N) we examine its ki neighbors ri1 ,ri2 ,...,ri for ki which we have distance information. We find a p-dimensional W 1 = 1 [3] embedding of those ki +1 points. If the local (ki +1)×(ki +1) distance matrix is fully known then this embedding can be ob- due to (2). That is, 1 is an eigenvector of W with an eigen- tained using MDS. If some local distances are missing, then value λ = 1. We will refer to 1 as the trivial eigenvec- the embedding can be found using SDP or other optimization tor. If the locally rigid embeddings were free of any error methods. The assumption of local rigidity ensures that in (up to rotations and reflections), then the p coordinate vec- T T the noise-free case such an embedding is unique up to a rigid tors x = (x1,x2,...,xN ) , y = (y1,y2,...,yN ) ,... are also transformation. Note that the neighbors are not required to eigenvectors of W with the same eigenvalue λ = 1: be physically close, but can rather be any collection of ki points that the distance constraints among them make them W x = x,W y = y, ... [4] rigid. This locally rigid embedding gives local coordinates for ri,ri1 ,...,ri up to , and perhaps reflec- This follows immediately from (1) by reading it one coordinate ki tion. We define ki weights Wi,ij that satisfy the following p+1 at a time, linear constraints ki N N Wij xj = xi, Wij yj = yi, ... Wi,ij rij = ri [1] Xj=1 Xj=1 Xj=1

ki Thus, the eigenvectors of W corresponding to λ = 1 give the Wi,ij = 1 [2] sought coordinates, e.g. (xi,yi) for p = 2. It is remarkable Xj=1 that the eigenvectors give the global coordinates x and y de- The vector equation (1) means that the point ri is the center spite the fact that different points have different local coordi- of mass of its ki neighbors, and the scalar equation (2) means nate sets that are arbitrarily rotated, translated and perhaps that the weights sums up to one. Together, the weights are reflected with respect to one another. to rigid transformations (translation, rotation and Note that W is not symmetric, so its spectrum may be reflection) of the locally embedded points. It is possible to complex. Although every row of W sums up to one, this does find such weights if ki ≥ p + 1. In fact, for ki >p + 1 there is not prevent the possibility of eigenvalues |λ| > 1, because W an infinite number of ways in which the weights can be chosen. is not stochastic, having negative entries. Therefore, we com- In practice we choose the least square solution, i.e., that with pute the eigenvectors with eigenvalues closest to one in the min ki W 2 . This prevents certain weights from becoming j=1 i,ij complex plane. too large and keeps the weights balanced in magnitude. We P A symmetric weight matrix S with spectral properties sim- rewrite equations (1)-(2) in matrix notation as ilar to W can be constructed with the same effort (due to Riwi = b, D. A. Spielman, private communication). For each point ri, T p+1 T Rki+1 where b = (0,..., 0, 1) ∈ R , wi =(Wi,i1 ,Wi,i2 ,...,Wi,i ) compute an orthonormal basisw ˜i,1,..., w˜i,ki−p ∈ for ki ˜ and Ri is a (p + 1) × ki matrix, whose first p rows are the rel- the nullspace of the (p + 1) × (ki + 1) matrix Ri ative coordinates and the last row is the all ones vector xi xi1 xi2 · · · xi xi1 − xi xi2 − xi · · · xik − xi ki i y y y y yi1 − yi yi2 − yi · · · yi − yi  i i1 i2 · · · ik   ki  i R˜i = [5] Ri = ......  . . .   . . . .       1 1 · · · 1   1 1 1 · · · 1      It can be easily verified that the least squares solution is In other words, we findw ˜i,j that satisfy T T −1 wi = Ri (RiRi ) b.

Note that some of the weights may be negative. In fact, neg- R˜iw˜i,j = 0, j = 1,...,ki − p ative weights are unavoidable for points on the boundary of the convex hull of the set. and

The computed weights Wi,ij (j = 1,...,ki) are assigned 1 j = j w , w 1 2 to the i’th row of W , while all other N −ki row entries, includ- h ˜i,j1 ˜i,j2 i =  0 j1 6= j2 ing the diagonal element Wi,i, are set to zero. This procedure of locally rigid embedding is repeated for every i = 1,...,N, ki−p T The symmetric product Si = w˜i,j w˜ of the nullspace each time filling another row of the weight matrix W . There- j=1 i,j with itself is a (ki + 1) × (ki +P 1) positive semidefinite (PSD) fore, it takes O(N) operations to complete the construction matrix. We sum the N PSD matrices Si (i = 1,...,N) to of the weight matrix W , which ends up being a sparse ma- 1 N form an N × N PSD S by updating for each i only the corre- trix with kN¯ non-zero elements, where k¯ = ki is the N i=1 sponding (ki + 1) × (ki + 1) block of the large matrix average degree of the graph (i.e., the average numberP of neigh- bors). Both the least squares solution and the local embed- N ki−p ding (MDS or SDP) are polynomial in ki, so repeating them T S = wˆi,j wˆi,j , [6] N times is also O(N). Xi=1 Xj=1

2 www.pnas.org — — Footline Author N wherew ˆi,j ∈ R is obtained fromw ˜i,j by padding it with when its rank is greater than p. It was shown in [6] that the zeros. Similar to the non-symmetric case, it can be verified optimization problem (8) with the PSD constraint P ≻ 0 can that the p + 1 vectors 1, x, y,... belong to the nullspace of S be formulated as SDP using the Schur complement lemma. The matrix A is reconstructed from the singular vectors of P S1 = 0, Sx = 0, Sy = 0, .... corresponding to the largest p singular values. In fact, since the global graph of N points is rigid, any vector in the nullspace of S is a linear combination of those p + 1 Numerical Results vectors (the previous construction of W may not enjoy this We consider a data set of N = 1097 cities in the United nice property, i.e., it is possible to create examples for which States, see Figure 1 (top left). For each city we have dis- the dimension of the eigenspace of W is greater than p + 1). tance measurements to its 18 nearest cities (ki = k¯ = 18). In rigidity theory we would call S a stress of the framework For simplicity, we assume that the 18 distances between the [8]. 2 neighboring cities are also collected, so that MDS is used to From the computational point of view, we remark that, if locally embed the neighboring cities. We tested the locally needed, the eigenvector computation can be done in parallel. rigid embedding algorithm for three different levels of multi- In order to multiply a given vector by the weight matrix W plicative Gaussian noise: no noise, 1% noise, and 10% noise. (or S), the storage of W (S) is distributed between the N 2 That is, ∆ij = dij 1+ N (0, δ ) , with δ = 0, 0.01, 0.1. We points, such that the i’th point stores only ki values of W (S) used SeDuMi [28] for solving the SDP (8). together with ki values of the given vector that correspond to The behavior of the numerical spectrum for the different the neighbors. noise levels is illustrated in Figure 2, where the magnitude of 1−λi for the ten eigenvalues that are closest to one is plotted. Multiplicity and Noise It is apparent in the noiseless case that λ = 1 is degenerated The eigenvalue λ = 1 of W (or 0 for S) is degenerate with with multiplicity 3 corresponding to 1, x, y. Note that the multiplicity p + 1. Hence, the corresponding p + 1 computed spectral gap is quite small. The LRE finds the exact configu- 0 1 p eigenvectors φ ,φ ,...,φ may be any linear combination of ration without introducing any errors as can be seen in Figure 0 j 1, x, y,.... We may assume that φ = 1 and that hφ , 1i = 0 1 (top right). for j = 1,...,p. We now look for a p × p matrix A that rep- Adding 1% noise breaks the degeneracy, splits the eigen- resents the linear transformation from the original coordinate values, and due to the small spectral gap, the coordinates spill 1 2 set ri =(xi,yi,...) to the eigenmap φi =(φi ,φi ,...) over the fourth eigenvector. Therefore, for that noise level we solved the minimization problem (8) with P being a 3×3 ma- ri = Aφi, for i = 1,...,N. trix (m = 3). The largest singular values of P were 101.7, 42.5, The squared distance between ri and rj is and 0.13 indicating a successful two dimensional embedding. 2 2 T T When the noise is increased more, the spectrum changes d = kri − rj k =(φi − φj ) A A(φi − φj ). [7] ij even further. We used m = 7 for the noise level of 10%. The Every pair of points for which the distance dij is known gives a largest singular values of the optimal P were 76.6, 8.3, 2.0, and linear equation for the coefficients of the matrix AT A. Solving 0.002, which implies that the added noise effectively increased that equation set results in AT A, whose Cholesky decompo- the dimension of points from two to three. The LRE distorts sition yields A (up to an orthogonal transformation). boundary points, but does a good job in the interior. The locally rigid embedding incorporates errors when the distances are noisy. The noise breaks the degeneracy and Algorithm splits the eigenvalues. If the noise level is small, then we still In this section we summarize the various steps of LRE that expect the first p non-trivial eigenvectors to match the orig- should be taken in order to find the global coordinates from a inal coordinates, because the small noise does not overcome data set of vertices and distances, using the symmetric matrix the spectral gap S. min |1 − λi|, λi=16 1. Input: A weighted graph G =(V,E, ∆), ∆ij is the mea- where λi are the eigenvalues of the noiseless W . For higher sured distance between vertices i and j, and |V | = N is levels of noise, the noise may be large enough so that some the number of vertices. G is assumed to be locally rigid. of the first p eigenvalues cross the spectral gap, causing the coordinate vectors to spill over the remaining eigenvectors. 2. Allocate memory for a sparse N × N matrix S. To overcome this problem, we allow A to be p × m instead of p × p, with m>p, but still m ≪ N. In other words, every 3. For i =1 to N coordinate is approximated as a linear combination of the m non-trivial eigenvectors φ1,...,φm, whose eigenvalues are the (a) Set ki to be the degree of vertex i. closest to one. We replace equation (7) by a minimization T problem for the m × m PSD matrix P = A A (b) i1,...,iki are the neighbors of i: (i, ij ) ∈ E for 2 j = 1,...,ki. T 2 min (φi − φj ) P (φi − φj ) − ∆ij , [8] Xi∼j h i (c) Use SDP or MDS to find an embedding ri,ri1 ,...,rik of i, i1,...,iki . where the sum is over all pairs i ∼ j for which there exists a i distance measurement ∆ij . The least squares solution for P (d) Form the (p + 1) × (ki + 1) matrix R˜i following is often not valid, when it ends up not being a PSD matrix or eq.(5).

Footline Author PNAS Issue Date Volume Issue Number 3 (e) Compute an orthogonal basisw ˜i,1,..., w˜i,ki−p for sets show that even if a few vertices have corresponding sub- the nullspace of R˜i. graphs that are not rigid (non-localizable), so their weights ki−p T are not included in the construction of the stress matrix S, (f) Update S ← S + j=1 w˜i,j w˜i,j . P the matrix S still has a nullspace of dimension three. This 4. Compute m+1 eigenvectors of S with eigenvalues closest can happen if the non-localizable vertices have enough local- to 0: φ0,φ1,...,φm, where φ0 is the all-ones vector. izable neighbors. Still, the algorithm fails when there are too 5. Use SDP as in eq.(8) to find coordinate vectors many non-localizable vertices. For such non-localizable ver- tices, one possible approach is to consider larger subgraphs, x1,...,xp as linear combinations of φ1,...,φm. such as the one containing all 2-hop neighbors (the neigh- Summary and Discussion bors of the neighbors), hoping that the larger subgraph would be rigid. A different approach may be to consider smaller We presented an observation that leads to a simple algorithm subgraphs, hoping that the removal of some neighbors would for finding the global configuration of points from their local make the subgraph rigid. This is obviously possible if the noisy distances under the assumption of local rigidity. The vertex is included in a triangle (or some other small simple LRE algorithm that uses the matrix W is similar to the LLE rigid subgraph), but at risk that the small subgraph would be algorithm [21] for dimensionality reduction. The input to LLE too small to be rigidly connected to the remaining graph. It are high dimensional data points for which LLE constructs the is not clear which approach would work better and requires weight matrix W by repeatedly solving overdetermined linear numerical experiments. However, this is out of scope of the equation systems. In the global positioning problem the in- current paper. put is not high dimensional data points, but rather distance There are many other interesting questions that follow constraints that we preprocess (using MDS or SDP) in order from both the theoretical and algorithmic aspects, such as to get a locally rigid low dimensional embedding. Therefore, proving error bounds, finding other suitable operators for the weights are obtained by solving an underdetermined sys- curves and surfaces, and dealing with specific constraints that tem (1)-(2) instead of an overdetermined system. The LRE arise from real life data sets. variant that uses the symmetric matrix S is similar to the MLLE algorithm [26] in the sense that it uses more than a I would like to thank Ronald R. Coifman who inspired this work. single weight vector, taking all basis vectors of the nullspace The center of mass operator emerged from my joint work with Yoel into account in the construction of S. Shkolnisky on CryoEM tomography. Dan Spielman is acknowl- The assumption of local rigidity of the input graph is cru- edged for designing the symmetrized weight matrix. I would also cial for the algorithm to succeed. It is reasonable to ask like to thank Jia Fang, Yosi Keller, Yuval Kluger and Richard Yang whether or not this assumption holds in practice, and how for valuable discussions on sensor networks, NMR spectroscopy and rigidity. Finally, thanks are due to Rob Kirby for making valuable the algorithm can be modified to prevail even if the assump- suggestions that improved the clarity of the manuscript. Research tion fails to hold. These are important research questions that supported by NGA NURI 2006. we will attempt to address in a future publication. Early nu- merical experiments with globally rigid two-dimensional data

1. Biswas, P., Liang, T. C., Toh, K.C., Wang, T. C., and Ye, Y. (2006) 15. Cox, T. F. and Cox, M. A. A. (2001) Multidimensional Scaling. Mono- IEEE Transactions on Automation Science and Engineering, 3(4):360–371. graphs on Statistics and Applied Probability 88. Chapman & Hall/CRC. 2. Aspnes, J., Eren, T., Goldenberg, D. K., Morse, A. S., Whiteley, W., 16. Vandenberghe, L., and Boyd, S. (1996) SIAM Review, 38(1):49–95. Yang, Y. R., Anderson, B. D. O., and Belhumeur, P. N. (2006) IEEE Transactions on Mobile Computing, 5 (12):1663–1678. 17. Wright S. J., Wahba, G. , Lu, F. , and Keles, S. (2005) Proceedings of the National Academy of Sciences, 102(35):12332–12335. 3. Ji, X. and Zha, H. (2004) In INFOCOM 2004 4:2652–2661. 18. Sun, J., Boyd, S., Xiao, L., and Diaconis, P. (2006) SIAM Review, 4. Crippen, G. M. and Havel, T. F. (1988) Distance Geometry and Molecular 48(4):681–699. Conformation. John Wiley & Sons. 19. Gotsman, C. and Koren, Y. (2005) Journal of Graph Algorithms and Ap- 5. Berger, B., Kleinberg, J. and Leighton, F.T. (1999) Journal of the ACM plications, 9(3):327–346. 47 (2):212–235. 20. Srebro N. and Jaakkola T. (2003) In Proceedings of the Twentieth Inter- 6. Weinberger, K. Q., Sha, F., Zhu, Q., and Saul, L. K. (2007) In Sch¨olkopf, national Conference on Machine Learning (ICML), 720–727. B., Platt, J., and Hofmann, T., editors, Advances in Neural Information 290 Processing Systems (NIPS) 19. MIT Press, Cambridge, MA. 21. Roweis, S. T. and Saul, L. K. (2000) Science, (5500):2323–2326. 7. More, J. J. and Wu, Z. (1997) SIAM Journal on Optimization, 7(3):814– 22. Donoho, D. L. and Grimes, C. (2003) Proceedings of the National Academy 100 836. of Sciences, (10):5591–5596. 15 8. Roth, B. (1981) The American Mathematical Monthly, 88(1):6–21. 23. Belkin, M. and Niyogi, P. (2003) Neural Computation, :1373–1396. 9. Hendrickson, B. (1995) SIAM Journal on Optimization, 5 (4):835–857. 24. Coifman, R. R., Lafon, S., Lee, A. B., Maggioni, M., Nadler, B., Warner, F., and Zucker, S.W. (2005) Proceedings of the National Academy of Sci- 10. Hendrickson, B. (1992) SIAM J. Comput. 21 (1):65–84. ences, 102(21):7426–7431. 11. Anderson, B. D. O., Belhumeur, P. N., Eren, T., Goldenberg, D. K., 25. Saul, L. K. and Roweis, S. T. (2003) The Journal of Machine Learning Morse, A. S., Whiteley, W., and Yang, Y. R. (2007) In ACM Baltzer Research, 4:119–155. Wireless Networks (WINET), Springer. 26. Zhang, Z. and Wang, J. (2007) In Sch¨olkopf, B., Platt, J., and Hofmann, 12. Aspnes, J., Goldenberg, D. K., and Yang, Y. R. (2004) In Proceedings T., editors, Advances in Neural Information Processing Systems (NIPS) 19. of Algorithmic Aspects of Wireless Sensor Networks: First International MIT Press, Cambridge, MA, pp. 1593–1600. Workshop (ALGOSENSORS), Lecture Notes in Computer Science 3121, Springer-Verlag, 32-44. 27. Jacobs, D. J. and Hendrickson, B. (1997) Journal of Computational Physics, 137:346-365. 13. Schoenberg, I. J. (1935) Ann. Math. 36:724–732. 28. Sturm, J. F. (1999) Optimization Methods and Software, 11 (1):625–653. 14. Young, G. and Householder, A. S. (1938) Psychometrika, 3:19–22.

4 www.pnas.org — — Footline Author US Cities Locally Rigid Embedding: no noise

0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 2

y 0 0 φ −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 −0.25 −0.25 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 x φ 1

Locally Rigid Embedding: 1% noise Locally Rigid Embedding: 10% noise

0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05

2 0 2 0 φ φ −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 −0.25 −0.25 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 φ φ 1 1

Fig. 1: Top left: map of 1097 cities in the United States. Top right: the locally rigid embedding obtained from clean local distances. Bottom left: LRE with 1% noise. Bottom right: LRE with 10% noise. Cities are colored by their x coordinate.

0.015 0.015 0.03

0.01 0.01 0.02

0.005 0.005 0.01

0 0 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Fig. 2: Numerical spectrum of W for different levels of noise: clean distances (left), 1% noise (center), and 10% noise (right).

Footline Author PNAS Issue Date Volume Issue Number 5