An Introduction to Diffusion Maps J. de la Portey, B. M. Herbsty, W. Hereman?, S. J. van der Walty y Applied Mathematics Division, Department of Mathematical Sciences, University of Stellenbosch, South Africa ? Colorado School of Mines, United States of America
[email protected],
[email protected],
[email protected],
[email protected] Abstract This paper describes a mathematical technique [1] for ances in the observations, and therefore clusters the data dealing with dimensionality reduction. Given data in a in 10 groups, one for each digit. Inside each of the 10 high-dimensional space, we show how to find parameters groups, digits are furthermore arranged according to the that describe the lower-dimensional structures of which it angle of rotation. This organisation leads to a simple two- is comprised. Unlike other popular methods such as Prin- dimensional parametrisation, which significantly reduces ciple Component Analysis and Multi-dimensional Scal- the dimensionality of the data-set, whilst preserving all ing, diffusion maps are non-linear and focus on discov- important attributes. ering the underlying manifold (lower-dimensional con- strained “surface” upon which the data is embedded). By integrating local similarities at different scales, a global description of the data-set is obtained. In comparisons, it is shown that the technique is robust to noise perturbation and is computationally inexpensive. Illustrative examples and an open implementation are given. Figure 1: Two images of the same digit at different rota- 1. Introduction: Dimensionality Reduction tion angles. The curse of dimensionality, a term which vividly re- minds us of the problems associated with the process- On the other hand, a computer sees each image as a ing of high-dimensional data, is ever-present in today’s data point in Rnm, an nm-dimensional coordinate space.