Distance Geometry and Data Science

Distance Geometry and Data Science Leo Liberti, CNRS & Ecole Polytechnique [email protected] Winter School in GCS @FieldsInstitute 210111-15 Most material from [L., TOP 28:271-339, 2020] http://www.lix.polytechnique.fr/~liberti/dgds.pdf 1 / 131 DG’s best known result DG’s best known result Formulations in position variables Proof of Heron’s theorem Formulations in matrix variables Metric spaces representability Diagonal dominance Missing distances Dual DD Noisy distances Dimensional reduction Principal Component Analysis Barvinok’s Naive Algorithm Summary: Isomap High-dimensional weirdness Distance geometry problem Random projections Applications Distance instability Complexity Are these results related? Number of solutions Graph embeddings for ANN MP formulations A clustering task 2 / 131 A gem in Distance Geometry I Heron’s theorem I Heron lived around year 0 I Hang out at Alexandria’s library A = ps(s − a)(s − b)(s − c) c b I A = area of triangle 1 I s = 2 (a + b + c) a Useful to measure areas of agricultural land 3 / 131 Subsection 1 Proof of Heron’s theorem 4 / 131 Heron’s theorem: Proof [M. Edwards, high school student, 2007] A. 2α + 2β + 2γ = 2π ) α + β + γ = π r + ix = ueiα r + iy = veiβ r + iz = weiγ ) (r +ix)(r +iy)(r +iz) = (uvw)ei(α+β+γ) = iπ uvw e = −uvw 2 R ) Im((r + ix)(r + iy)(r + iz)) = 0 2 q xyz ) r (x + y + z) = xyz ) r = x+y+z 1 B. s = 2 (a + b + c) = x + y + z s − a = x + y + z − y − z = x s − b = x + y + z − x − z = y s − c = x + y + z − x − y = z 1 a + b + c A = (ra + rb + rc) = r = rs = ps(s − a)(s − b)(s − c) 2 2 5 / 131 Metric spaces representability DG’s best known result Formulations in position variables Proof of Heron’s theorem Formulations in matrix variables Metric spaces representability Diagonal dominance Missing distances Dual DD Noisy distances Dimensional reduction Principal Component Analysis Barvinok’s Naive Algorithm Summary: Isomap High-dimensional weirdness Distance geometry problem Random projections Applications Distance instability Complexity Are these results related? Number of solutions Graph embeddings for ANN MP formulations A clustering task 6 / 131 Representing metric spaces in Rn I Given metric space (X; d) with dist. matrix D = (dij), embed X in a Euclidean space with same dist. matrix I Consider i-th row xi = (di1; : : : ; din) of D D n I Embed i 2 X by vector U (i) = xi 2 R D n D dene U : f1; : : : ; ng ! R s.t. U (i) = xi D I Thm.: (U ; `1) is a metric space with dist. matrix D i.e. 8i; j ≤ n kxi − xjk1 = dij I U D is called Universal Isometric Embedding (UIE) also known as Fréchet’s embedding I Practical issue: embedding is high-dimensional (Rn) [Kuratowski 1935] 7 / 131 Proof I Consider i; j 2 X with distance d(i; j) = dij I Then kxi − xjk1 = max jdik − djkj≤ max jdijj = dij k≤n k≤n ineq. ≤ above from triangular inequalities in metric space: dik ≤ dij + djk ^ djk ≤ dij + dik ) dik − djk ≤ dij ^ djk − dik ≤ dij )j dik − djkj ≤ dij If valid 8i; j then valid for max I max jdik − djkj over k ≤ n achieved when k 2 fi; jg ) kxi − xjk1 = dij 8 / 131 Subsection 1 Missing distances 9 / 131 UIE from incomplete metrics I If your metric space is missing some distances I Get incomplete distance matrix D I Cannot dene vectors U D(i) in UIE I Note: D denes a graph p 1 2 0 0 1 2 1 1 B 1 0 1 ? C D = B p C 4 3 @ 2 1 0 1 A 1 ? 1 0 I Complete this graph with shortest paths: d24 = 2 10 / 131 Floyd-Warshall algorithm 1/2 I Given n × n partial matrix D computes all shortest path lengths I For each triplet u; v; z of vertices in the graph, test: when going u ! v, is it convenient to pass through z? I If so, then change the path length 11 / 131 Floyd-Warshall algorithm 2/2 # initialization for u ≤ n, v ≤ n do if duv =? then duv 1 end if end for # main loop for z ≤ n do for u ≤ n do for v ≤ n do if duv > duz + dzv then duv duz + dzv end if end for end for end for 12 / 131 Subsection 2 Noisy distances 13 / 131 Schoenberg’s theorem I [I. Schoenberg, Remarks to Maurice Fréchet’s article “Sur la déﬁnition axiomatique d’une classe d’espaces distanciés vectoriellement applicable sur l’espace de Hilbert”, Ann. Math., 1935] I Question: When is a given matrix a Euclidean Distance Matrix (EDM)? Thm. 1 2 2 2 D = (dij) is an EDM if 2 (d1i + d1j − dij j 2 ≤ i; j ≤ n) is PSD of rank K 14 / 131 Gram in function of EDM K I x = (x1; : : : ; xn) ⊆ R , written as n × K matrix > I matrix G = xx = (xi · xj) is the Gram matrix of x Lemma: G 0 and each M 0 is a Gram matrix of some x I Useful variant of Schoenberg’s theorem Relates EDMs and Gram matrices 1 G = − JD2J (?) 2 2 2 I where D = (dij) and 0 1 1 1 1 1 − n − n · · · − n B − 1 1 − 1 · · · − 1 C J = I − 1 11> = B n n n C n n B . .. C @ . A 1 1 1 − n − n ··· 1 − n 15 / 131 Multidimensional scaling (MDS) I Often get approximate EDMs D~ from raw data (dissimilarities, discrepancies, dierences) ~ 1 ~ 2 I G = − 2 JD J is an approximate Gram matrix I Approximate Gram ) spectral decomposition P Λ~P > has Λ~ 6≥ 0 I Let Λ closest PSD diagonal matrix to Λ~: zero the negative components of Λ~ p I x = P Λ is an “approximate embedding” of D~ 16 / 131 Classic MDS: Main result 1. Prove lemma: matrix is Gram if it is PSD 1 2 2. Prove that G = − 2 JD J 17 / 131 Proof of lemma I Gram ⊆ PSD I x is an n × K real matrix I G = xx> its Gram matrix n I For each y 2 R we have > > > > > > 2 yGy = y(xx )y = (yx)(x y ) = (yx)(yx) = kyxk2 ≥ 0 I ) G 0 I PSD ⊆ Gram I Let G 0 be n × n I Spectral decomposition: G = P ΛP > (P orthogonal,pΛ ≥ 0 diagonal) Λ ≥ 0 ) Λ exists I p p p p > > > > I G = P ΛPp = (P Λ)( Λ P ) = (P Λ)(P Λ) I Let x = P Λ, then G is the Gram matrix of x 18 / 131 Schoenberg’s theorem proof (1/2) I Assume zero centroid WLOG (can translate x as needed) 2 2 I Expand: dij = kxi − xj k2 = (xi − xj )(xi − xj ) = xixi + xj xj − 2xixj (∗) 2 I Aim at “inverting” (∗) to express xixj in function of dij 0 by zero centroid P 2 P P: I Sum (∗) over i: i dij = i xixi + nxj xj − 2xji xi I Similarly for j and divide by n, get: 1 X 2 1 X d = xixi + xj xj (y) n ij n i≤n i≤n 1 X 2 1 X d = xixi + xj xj (z) n ij n j≤n j≤n I Sum (y) over j, get: 1 X 2 1 X X X d = n xixi + xj xj = 2 xixi n ij n i;j i j i I Divide by n, get: 1 X 2 2 X d = xixi (∗∗) n2 ij n i;j i 19 / 131 Schoenberg’s theorem proof (2/2) I Rearrange (∗), (y), (z) as follows: 2 2xixj = xixi + xj xj − dij (1) 1 X 2 1 X xixi = d − xj xj (2) n ij n j j 1 X 2 1 X xj xj = d − xixi (3) n ij n i i I Replace LHS of Eq. (2)-(3) in RHS of Eq. (1), get 1 X 2 1 X 2 2 2 X 2xixj = d + d − d − x x n ik n kj ij n k k k k k ∗∗ 2 P x x 1 P d2 I By ( ) replace n i i with n2 ij , get i i;j 1 X 2 2 2 1 X 2 2xixj = (d + d ) − d − d (?) n ik kj ij n2 hk k h;k 2 expressing xixj in function of D (showing (?) ≡ (2G = −JD J) is messy but easy) 20 / 131 Subsection 3 Principal Component Analysis 21 / 131 Principal Component Analysis (PCA) I Given an approximate distance matrix D nd x = MDS(D) I p I However, you want x = P Λ in K dimensions but rank(Λ) > K I Only keep K largest components of Λ zero the rest I Get embedding in desired (lower) dimension 22 / 131 Example 1/3 Mathematical genealogy skeleton 23 / 131 Example 2/3 A partial view Euler Thibaut Pfaf Lagrange Laplace Möbius Gudermann Dirksen Gauss Kästner 10 1 1 9 8 2 2 2 2 Euler 11 9 1 3 10 12 12 8 Thibaut 2 10 10 3 1 1 3 Pfaf 8 8 1 3 3 1 Lagrange 2 9 11 11 7 Laplace 9 11 11 7 Möbius 4 4 2 Gudermann 2 4 Dirksen 4 0 0 10 1 1 9 8 2 2 2 2 1 B 10 0 11 9 1 3 10 12 12 8 C B C B 1 11 0 2 10 10 3 1 1 3 C B C B 1 9 2 0 8 8 1 3 3 1 C B C B 9 1 10 8 0 2 9 11 11 7 C D = B C B 8 3 10 8 2 0 9 11 11 7 C B C B 2 10 3 1 9 9 0 4 4 2 C B C B 2 12 1 3 11 11 4 0 2 4 C @ 2 12 1 3 11 11 4 2 0 4 A 2 8 3 1 7 7 2 4 4 0 24 / 131 Example 3/3 In 2D In 3D 25 / 131 Subsection 4 Summary: Isomap 26 / 131 Isomap for DG 1.

Load more