Lecture 8: Multidimensional Scaling Advanced Applied Multivariate Analysis

Lecture 8: Multidimensional scaling Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton University, State University of New York E-mail: [email protected] 1 / 46 Outline 1 Introduction to MDS 2 Classical MDS 3 Metric and non-Metric MDS 2 / 46 The next section would be . 1 Introduction to MDS 2 Classical MDS 3 Metric and non-Metric MDS 3 / 46 Multidimensional scaling Goal of Multidimensional scaling (MDS): Given pairwise dissimilarities, reconstruct a map that preserves distances. From any dissimilarity (no need to be a metric) Reconstructed map has coordinates xi = (xi1; xi2;:::; xip) and x x the natural distance (k i − j k2) 4 / 46 Multidimensional scaling MDS is a family of different algorithms, each designed to arrive at some optimal low-dimensional configuration (p = 2 or 3) MDS methods include 1 Classical MDS 2 Metric MDS 3 Non-metric MDS 5 / 46 Perception of Color in human vision To study the perception of color in human vision (Ekman, 1954, Izenman 13.2.1) 14 colors differ only in their hue (i.e., wavelengths from 434 µm to 674 µm) 14 31 people rate for each of 2 pairs of colors on a five-point scale from 0 (no similarity at all) to 4 (identical). Average of 31 ratings for each pair (representing similarity) is then scaled (by 1=4) and subtracted from 1 to represent dissimilarities 6 / 46 Perception of Color in human vision The resulting 14 × 14 dissimilarity matrix is symmetric, and contains zeros in the diagonal. MDS seeks a 2D configuration to represent these colors. 434 445 465 472 490 504 537 555 584 600 610 628 651 445 0.14 465 0.58 0.50 472 0.58 0.56 0.19 490 0.82 0.78 0.53 0.46 504 0.94 0.91 0.83 0.75 0.39 537 0.93 0.93 0.90 0.90 0.69 0.38 555 0.96 0.93 0.92 0.91 0.74 0.55 0.27 584 0.98 0.98 0.98 0.98 0.93 0.86 0.78 0.67 600 0.93 0.96 0.99 0.99 0.98 0.92 0.86 0.81 0.42 610 0.91 0.93 0.98 1.00 0.98 0.98 0.95 0.96 0.63 0.26 628 0.88 0.89 0.99 0.99 0.99 0.98 0.98 0.97 0.73 0.50 0.24 651 0.87 0.87 0.95 0.98 0.98 0.98 0.98 0.98 0.80 0.59 0.38 0.15 674 0.84 0.86 0.97 0.96 1.00 0.99 1.00 0.98 0.77 0.72 0.45 0.32 0.24 7 / 46 Perception of Color in human vision MDS reproduces the well-known two-dimensional color circle. 8 / 46 Distance, dissimilarity and similarity Distance, dissimilarity and similarity (or proximity) are defined for any pair of objects in any space. In mathematics, a distance function (that gives a distance between two objects) is also called metric, satisfying 1 d(x; y) ≥ 0, 2 d(x; y) = 0 if and only if x = y, 3 d(x; y) = d(y; x), 4 d(x; z) ≤ d(x; y) + d(y; z). Given a set of dissimilarities, one can ask whether these values are distances and, moreover, whether they can even be interpreted as Euclidean distances 9 / 46 Euclidean and non-Euclidean distance Given a dissimilarity (distance) matrix D = (dij ), MDS seeks to p find x1;:::; xn 2 R (called a configuration) so that x x dij ≈ k i − j k2 as close as possible. Oftentimes, for some large p, there always exists a configuration x x x x 1;:::; n with exact/perfect distance match dij ≡ k i − j k2. In such a case the distance d involved is called a Euclidean distance. There are, however, cases where the dissimilarity is distance, but there exists no configuration in any p with perfect match x x dij 6= k i − j k2 ; for some i; j: Such a distance is called non-Euclidean distance. 10 / 46 non-Euclidean distance Radian distance function on a circle is a metric. 1 Cannot be embedded in R (in other words, cannot find x1;:::; x4 2 R to match the distance) p (Not for any R , not shown here) Nevertheless, MDS seeks to find an optimal configuration xi x x that gives dij ≈ k i − j k2 as close as possible. 11 / 46 The next section would be . 1 Introduction to MDS 2 Classical MDS 3 Metric and non-Metric MDS 12 / 46 classical Multidimensional Scaling (cMDS){theory Suppose for now we have Euclidean distance matrix D = (dij ). The objective of classical Multidimensional Scaling (cMDS) is to find X = [x1;:::; xn] so that kxi − xj k = dij . Such a solution is x∗ x q not unique, because if X is the solution, then i := i + c, c 2 R also satisfies x∗ x∗ x x x x i − j = k( i + c) − ( j + c)k = k i − j k = dij . Any location c can be used, but the assumption of centered configuration, i.e., n X xi = 0 (1) i=1 serves well for the purpose of dimension reduction. 13 / 46 In short, the cMDS finds the centered configuration q x1;:::; xn 2 R for some q ≥ n − 1 so that their pairwise distances are the same as those corresponding distances in D. We may find the n × n Gram matrix B = X0X, rather than X. The Gram matrix is the matrix of inner products. Denote the ijth element of B as bij . We have 2 dij = bii + bjj − 2bij ; (2) 2 x x 2 x0x x0 x x0x from the fact dij = k i − j k = i i + j j − 2 i j . Remember, we seek to solve bij 's from dij 's (see the next few slides.) 14 / 46 The centering constraint (1) leads to n n n q q n X X x0x X X X X bij = i j = xik xjk = xjk xik = 0; i=1 i=1 i=1 k=1 k=1 i=1 for j = 1;:::; n. Hence, the sum of each row or column of B is 0. Pn With a notation T = trace(B) = i=1 bii , we have n n n n X 2 X 2 X X 2 dij = T +nbjj ; dij = T +nbii ; dij = 2nT : (3) i=1 j=1 j=1 i=1 15 / 46 Combining (2) and (3), the solution is unique: 2 2 2 2 bij = −1=2(dij − d·j − di· + d·· ); 2 n 2 o 2 where d·j is the average of dij ; i = 1;:::; n for each j, di· is the n 2 o 2 average of dij ; j = 1;:::; n for each i, and d·· is the average of n 2 o dij ; i; j = 1;:::; n , or equivalently 0 B = −1=2CD2C ; 2 where D2 = fdij g and C is the centering matrix. A solution X is then given by the eigen-decomposition of B(:= X0X). That is, for B = V ΛV 0, X = Λ1=2V 0: (4) 16 / 46 The space on which X lies is the eigenspace spanned by rows of V . Consider PCA based on fxi g (centered) through singular-decomposition. We have X = UΘV 0, and the PC scores are Z = U0X = ΘV 0, also in the space spanned by rows of V . 1=2 It would turn out that U = Iq and Θ = Λ The first coordinate of X has the largest variation (recall the interpretation of X using PCA scores above) If we wish to reduce the dimension to p ≤ q, then the first p rows of X, X(p), best preserves the distances dij among all other linear dimension reduction of X. 1=2 0 X(p) = Λp Vp; where Λp is the first p × p submatrix of Λ, Vp is the first p columns of V . 17 / 46 To see that the first p coordinates of xi indeed best preserve the q distance, note that the distance between xi and xj 2 R is 2 2 2 x x 2 x(1−p) x(1−p) x(∗) x(∗) dij = k i − j k = i − j + i − j x(1−p) x x(∗) where i is the subvector of i which we keep and i is the part we throw away. It is easy to see that since the variation of 2 x(∗) x(∗) x(∗) i is small, the value of i − j is small too (on average). 18 / 46 cMDS remarks p cMDS gives configurations X(p) in R for any dimension 1 ≤ p ≤ q. Configuration is centered. Coordinates are given by the principal scores, ordered from largest-to-smallest variation. Dimension reduction from X = X(q) to X(p) (p < q) is same as PCA (cutting some PC scores out). Leads to exact solution if the dissimilarity is based on Euclidean distances Can also be used for non-Euclidean distances, in fact, for any dissimilarities. 19 / 46 cMDS examples Consider two worked examples: 1 with Euclidean geometry (tetrahedron { unit edge length) 2 with circular geometry And the airline distances example (Izenman 13.1.1) 20 / 46 cMDS examples: tetrahedron Pairwise distance matrix for tetrahedron (with distance 1) 00 1 1 11 B1 0 1 1C D = B C ; @1 1 0 1A 1 1 1 0 leading to the gram matrix B(4×4) with eigenvalues (:5;:5;:5; 0).

Lecture 8: Multidimensional Scaling Advanced Applied Multivariate Analysis

Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing and Proximity Analysis

A Flocking Algorithm for Isolating Congruent Phylogenomic Datasets

Multidimensional Scaling

Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

A Review of Multidimensional Scaling (MDS) and Its Utility in Various Psychological Domains

Multidimensional Scaling Advanced Applied Multivariate Analysis STAT 2221, Fall 2013

Multidimensional Scaling

On Multidimensional Scaling and Unfolding in R: Smacof Version 2

Dimensionality Reduction a Short Tutorial

Bayesian Multidimensional Scaling Model for Ordinal Preference Data

Multivariate Data Analysis Practical and Theoretical Aspects of Analysing Multivariate Data with R

ELKI: a Large Open-Source Library for Data Analysis