Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics

VOLUME 88, NUMBER 1 PHYSICAL REVIEW LETTERS 7JANUARY 2002 Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics David Horn and Assaf Gottlieb School of Physics and Astronomy, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel (Received 16 July 2001; published 20 December 2001) We propose a novel clustering method that is based on physical intuition derived from quantum mechanics. Starting with given data points, we construct a scale-space probability function. Viewing the latter as the lowest eigenstate of a Schrödinger equation, we use simple analytic operations to derive a potential function whose minima determine cluster centers. The method has one parameter, determining the scale over which cluster structures are searched. We demonstrate it on data analyzed in two dimensions (chosen from the eigenvectors of the correlation matrix). The method is applicable in higher dimensions by limiting the evaluation of the Schrödinger potential to the locations of data points. DOI: 10.1103/PhysRevLett.88.018702 PACS numbers: 89.75.Kd, 02.70.–c, 03.65.Ge, 03.67.Lx µ ∂ Clustering of data is a well-known problem of pattern s2 Hc ϵ 2 =2 1 V͑x͒ c ෇ Ec . recognition, covered in textbooks such as [1–3]. The prob- 2 (2) lem we are looking at is defining clusters of data solely by H V the proximity of data points to one another. This problem is Here we rescaled and of the conventional quantum s one of unsupervised learning, and is in general ill defined. mechanical equation to leave only one free parameter, . x Solutions to such problems can be based on intuition de- For comparison, the case of a single point at 1 corre- V ෇ 1 ͑x 2 x ͒2 E ෇ d͞2 rived from physics. A good example of the latter is the sponds to Eq. (2) with 2s2 1 and , algorithm by [4] that is based on associating points with thus coinciding with the ground state of the harmonic os- Potts spins and formulating an appropriate model of sta- cillator in quantum mechanics. tistical mechanics. We propose an alternative that is also Given c for any set of data points we can solve Eq. (2) based on physical intuition, this one being derived from for V: s2 quantum mechanics. =2c V͑x͒ ෇ E 1 2 As an introduction to our approach we start with the c scale-space algorithm by [5] who uses a Parzen-window X d 1 2 2 ෇ E 2 1 ͑x 2 x ͒2e2͑x2xi ͒ ͞2s . estimator [3] of the probability distribution leading to the 2 2s2c i data at hand. The estimator is constructed by associating i (3) a Gaussian with each of the N data points in a Euclidean d space of dimension and summing over all of them. This Let us furthermore require that minV ෇ 0. This sets the can be represented, up to an overall normalization, by value of 2 X s =2c 2 2 2 2͑x2xi ͒ ͞2s E ෇ 2 c͑x͒ ෇ e , (1) min c (4) i and determines V͑x͒ uniquely. E has to be positive since x where i are the data points. Roberts [5] views the maxima V is a non-negative function. Moreover, since the last term of this function as determining the locations of cluster in Eq. (3) is positive definite, it follows that centers. d An alternative, and somewhat related, method is support 0 , E # . 2 (5) vector clustering (SVC) [6] that is based on a Hilbert-space analysis. In SVC, one defines a transformation from data We note that c is positive definite. Hence, being an eigen- space to vectors in an abstract Hilbert space. SVC pro- function of the operator H in Eq. (2), its eigenvalue E is the ceeds to search for the minimal sphere surrounding these lowest eigenvalue of H, i.e., it describes the ground state. states in Hilbert space. We will also associate data points All higher eigenfunctions have nodes whose numbers in- with states in Hilbert space. Such states may be repre- crease as their energy eigenvalues increase. (In quantum sented by Gaussian wave functions, whose sum is c͑x͒. mechanics, where one interprets jcj2 as the probability dis- This is the starting point of our quantum clustering (QC) tribution, all eigenfunctions of H have physical meaning. method. We will search for the Schrödinger potential for Although this approach could be adopted, we have chosen which c͑x͒ is a ground state. The minima of the potential c as the probability distribution because of the simplicity define our cluster centers. of algebraic manipulations.) The Schrödinger potential.—We wish to view c as an Given a set of points defined within some region of eigenstate of the Schrödinger equation space, we expect V͑x͒ to grow quadratically outside this 018702-1 0031-9007͞02͞88(1)͞018702(4)$15.00 © 2001 The American Physical Society 018702-1 VOLUME 88, NUMBER 1 PHYSICAL REVIEW LETTERS 7JANUARY 2002 region, and to exhibit one or several local minima within surface of V͑x͒, roughly the same clustering assignment the region. We identify these minima with cluster centers, is expected for a range of s values. One important which seems natural in view of the opposite roles of the advantage of quantum clustering is that E sets the scale two terms in Eq. (2): Given a potential function, it attracts on which minima are observed. Thus, we learn from the data distribution function c to its minima, while the Fig. 2 that the cores of all 4 clusters can be found at V Laplacian drives it away. The diffused character of the values below 0.4E. In comparison, the additional maxima distribution is the balance of the two effects. of c, which start to appear at lower values of s, may lie As an example we display results for the crab data set much lower than the leading maximum and may be hard taken from Ripley’s book [7]. These data, given in a to locate numerically. five-dimensional parameter space, show nice separation Principal component analysis (PCA).— In our example, of the four classes contained in them when displayed in data were given in some high-dimensional space and we two dimensions spanned by the second and third principal analyzed them after defining a projection and a metric, components [8] (eigenvectors) of the correlation matrix of using the PCA approach. The latter defines a metric that is the data. The information supplied to the clustering algo- intrinsic to the data, determined by second order statistics. rithm contains only the coordinates of the data points. We But, even then, several possibilities exist, leading to non- display the correct classification to allow for visual com- equivalent results. parison of thep clustering method with the data. Starting Principal component decomposition can be applied both with s ෇ 1͞ 2 we see in Fig. 1 that the Parzen proba- to the correlation matrix Cab ෇ ͗xaxb͘ and to the covari- bility distribution, or the wave-function c, has only a ance matrix single maximum. Nonetheless, the potential, displayed in Cab ෇ ͗͗͑xa 2 ͗x͘a ͒͑xb 2 ͗x͘b͒͘͘ ෇ Cab 2 ͗x͘a ͗x͘b . Fig. 2, already shows four minima at the relevant locations. The overlap of the topographic map of the potential with (6) the true classification is quite amazing. The minima are the In both cases averaging is performed over all data points, centers of attraction of the potential, and they are clearly and the indices indicate spatial coordinates from 1 to d. evident although the wave function does not display local The principal components are the eigenvectors of these maxima at these points. The fact that V͑x͒ ෇ E lies above c͑x͒ matrices. Thus we have two natural bases in which to the range where all valleys merge explains why is represent the data. Moreover, one often renormalizes the smoothly distributed over the whole domain. eigenvector projections, dividing them by the square roots As s is being decreased more minima will appear in V͑x͒ s of their eigenvalues. This procedure is known as “whiten- . For the crab data, we find two new minima as ing,” leading to a renormalized correlation or covariance is decreased to one-half. Nonetheless, the previous matrix of unity. This is a scale-free representation that minima become deeper and still dominate the scene. The would naturally lead one to start with s ෇ 1 in the search new minima are insignificant, in the sense that they lie for (higher order) structure of the data. at high values (of order E). Classifying data points to clusters according to their topographic location on the 3 3 2 2 1 1 0 PC2 0 PC2 −1 −1 −2 −2 −3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −3 PC3 −3 −2 −1 0 1 2 3 PC3 FIG. 2. A topographicp map of the potential for the crab data with s ෇ 1͞ 2, displaying four minima (denoted by crossed FIG. 1. Ripley’s crab data [7] displayed on a plot of their sec- circles) that are interpreted as cluster centers. The contours of ond and third principal components with a superimposed topo-p the topographic map are set at values of V͑x͒͞E ෇ 0.2, 0.4, 0.6, graphic map of Roberts’ probability distribution for s ෇ 1͞ 2. 0.8, 1. 018702-2 018702-2 VOLUME 88, NUMBER 1 PHYSICAL REVIEW LETTERS 7JANUARY 2002 The PCA approach that we have used in our example h͑t͒=V͑ yi͑t͒͒͒, letting the points yi reach an asymptotic was based on whitened correlation matrix projections. fixed value coinciding with a cluster center.

Load more