Intrinsic Dimension Estimation Using Simplex Volumes

Total Page:16

File Type:pdf, Size:1020Kb

Intrinsic Dimension Estimation Using Simplex Volumes Intrinsic Dimension Estimation using Simplex Volumes Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn vorgelegt von Daniel Rainer Wissel aus Aschaffenburg Bonn 2017 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn 1. Gutachter: Prof. Dr. Michael Griebel 2. Gutachter: Prof. Dr. Jochen Garcke Tag der Promotion: 24. 11. 2017 Erscheinungsjahr: 2018 Zusammenfassung In dieser Arbeit stellen wir eine neue Methode zur Schätzung der sogenannten intrin- sischen Dimension einer in der Regel endlichen Menge von Punkten vor. Derartige Verfahren sind wichtig etwa zur Durchführung der Dimensionsreduktion eines multi- variaten Datensatzes, ein häufig benötigter Verarbeitungsschritt im Data-Mining und maschinellen Lernen. Die zunehmende Zahl häufig automatisiert generierter, mehrdimensionaler Datensätze ernormer Größe erfordert spezielle Methoden zur Berechnung eines jeweils entsprechen- den reduzierten Datensatzes; im Idealfall werden dabei Redundanzen in den ursprüng- lichen Daten entfernt, aber gleichzeitig bleiben die für den Anwender oder die Weiter- verarbeitung entscheidenden Informationen erhalten. Verfahren der Dimensionsreduktion errechnen aus einer gegebenen Punktmenge eine neue Menge derselben Kardinalität, jedoch bestehend aus Punkten niedrigerer Dimen- sion. Die geringere Zieldimension ist dabei zumeist eine unbekannte Größe. Unter gewis- sen Modellannahmen, zum Beispiel für Punkte auf einer niederdimensionalen Mannig- faltigkeit, wird diese Größe, welche hier der Dimension der Mannigfaltigkeit entspricht, auch als intrinsische Dimension der Punktmenge bezeichnet. Zur Schätzung dieser intrinsischen Dimension existieren diverse Methoden. Viele dieser Verfahren basieren auf der Auswertung lokaler, niederdimensionaler Größen, ins- besondere Euklidischer Abstände oder Winkel, welche in Räumen verschiedener Dimen- sion entsprechend unterschiedliche Verteilungen aufweisen und somit Rückschlüsse auf den gesuchten Wert zulassen. Wir entwickeln einen neuen Ansatz, indem wir die Volumina von Simplexen beliebig hoher Dimension betrachten. Die Eckpunkte eines solchen Simplex werden dabei zufällig aus einer Menge benachbarter Datenpunkte gewählt. Der empirische Mittelwert vieler Volumina wird letztlich dazu genutzt, die zugrunde liegende intrinsische Dimension zu schätzen. Die Struktur dieser Arbeit lässt sich wie folgt zusammenfassen. Zunächst rekapi- tulieren und analysieren wir einige Zusammenhänge in hochdimensionalen Räumen, um unser intuitives Verständnis für diese zu schärfen; insbesondere betrachten wir die Entwicklung der Volumina einfacher geometrischer Objekte sowie allgemeine Konzen- trationseffekte bestimmter Größen für den Grenzfall einer gegen unendlich strebenden Raumdimension. Anschließend geben wir einen kurzen Einblick in das Thema der Dimen- sionsreduktion, welche einen Rahmen für das Hauptkapitel bildet. iii iv Im Hauptteil werden zunächst verschiedene Dimensionsbegriffe vorgestellt, des wei- teren die wichtigsten Methoden zur Schätzung der intrinsischen Dimension beschrieben und klassifiziert. Eine Auswahl von sechs dieser Methoden wird detaillierter erläutert und dient uns als Benchmark für spätere numerische Untersuchungen. Die konkrete Verfahrensweise unseres eigenen Ansatzes, welchen wir “Sample Simplex Volume” nen- nen, wird sowohl theoretisch motiviert und begründet als auch algorithmisch im Detail geschildert. Eine Kernkomponente ist dabei ein Algorithmus zur schnellen und sta- bilen Berechnung einer großen Zahl hochdimensionaler Simplex-Volumina. Aufgrund von Laufzeitüberlegungen und Tests mit verrauschten Eingabedaten entwickeln wir eine alternative Variante des ersten Ansatzes und verwenden dementsprechend die Bezeich- nungen SSV1 und SSV2. Wir führen eine Reihe numerischer Experimente sowohl mit zufällig generierten Daten als auch mit frei verfügbaren Datensätzen aus verschiedensten Anwendungen durch. Dabei liegt das Hauptaugenmerk im ersten Fall auf geometrischen Strukturen relativ hoher intrinsischer Dimension, wobei wir auch auf Problembereiche wie undersampling (eine im Verhältnis zur Dimension zu geringe Anzahl von Punkten) und verrauschte Daten eingehen. Bei Datensätzen aus konkreten Messungen ist das erwartete Ergebnis der Dimensionsschätzung nicht immer eindeutig. Wir diskutieren die damit verbunde- nen, teils anwendungsabhängigen Fragestellungen. Unsere eigenen Verfahren erweisen sich in den meisten Fällen als mindestens ebenbürtig zu den zum Vergleich herangezo- genen etablierten Techniken. Danksagungen Zunächst möchte ich Prof. Dr. Michael Griebel dafür danken, dass er diese Arbeit ermöglicht hat, sowie für viele hilfreiche Diskussionen, jegliche fach- liche Unterstützung und allgemein die angenehme Arbeitsatmosphäre am Institut für Numerische Simulation. Weiterhin gebührt mein Dank Prof. Dr. Jochen Garcke für wertvolle Anregungen aus dem Bereich des maschinellen Lernens sowie für die freundliche Übernahme des Zweitgutachtens. Ganz besonders möchte ich an dieser Stelle meine Kollegen des Instituts erwähnen, mit welchen ich zahlreiche mathematische Fragestellungen erörterte, die aber ebenso meinen Arbeitsalltag durch interessante Debatten zu vielfältigen Themen bereicherten. Alexander Rüttgers, Bastian Bohn und Jens Oettershagen sei hier für ihr aufmerksames Korrekturlesen dieser Arbeit gedankt. An die Zeit mit meiner Bürokollegin Jutta Neuen erinnere ich mich gerne zurück, schließlich stand mir bei vielen technischen Problemen Ralph Thesen jederzeit mit seiner Kompetenz und Hilfsbereitschaft zur Seite. Zu guter Letzt möchte ich meinen Freunden, insbesondere Roman, Yinan, Daniel und Roderich danken, und vor allem auch meinen Eltern, meinem Bruder Johannes, sowie Qiuqiu und Shiyin, die mich in allen Lebenslagen begleitet und unterstützt haben. Bonn, im August 2017 Daniel Wissel Contents 1 Introduction 1 2 High-dimensional Data Analysis and Dimensionality Reduction 7 2.1 Particularities of High-Dimensional Structures ............... 7 2.1.1 Some considerations on high-dimensional volumes ......... 7 2.1.2 Concentration of measure ...................... 11 2.2 Dimensionality Reduction .......................... 19 2.2.1 A compact introduction to dimensionality reduction ........ 19 2.2.2 An explanatory example: dimensionality reduction with ISOMAP 21 3 Intrinsic Dimension Estimation 27 3.1 Concepts of Dimension ............................ 27 3.1.1 Lebesgue covering dimension ..................... 28 3.1.2 Hausdorff dimension ......................... 29 3.1.3 The box-counting dimension ..................... 31 3.1.4 The correlation dimension ...................... 32 3.1.5 The q-dimension ........................... 33 3.1.6 Further notions of dimension ..................... 34 3.2 Estimation of the Intrinsic Dimension .................... 34 3.2.1 Classification and characteristics of IDE methods ......... 35 3.2.2 Selected approaches .......................... 50 3.3 Simplex Volume Computation ........................ 65 3.3.1 Theoretical preliminaries ....................... 65 3.3.2 Efficient numerical computation of simplex volumes ........ 69 v vi Contents 3.4 Intrinsic Dimension Estimation via Sample Simplex Volumes (SSV) ... 71 3.4.1 Why simplex volumes? ........................ 72 3.4.2 Model and theoretical background .................. 72 3.4.3 Concept of the SSV approach .................... 75 3.4.4 Influence of noise on simplex volumes ................ 82 3.4.5 Algorithmic description of the SSV1 and SSV2 method ...... 84 3.4.6 Complexity analysis of the SSV methods .............. 92 3.5 Numerical Results ............................... 94 3.5.1 Challenges of dimension estimation ................. 94 3.5.2 Configuration of the tested methods ................. 97 3.5.3 Technical prerequisites ........................ 99 3.5.4 Results from synthetic data ..................... 99 3.5.5 Results from real-world data ..................... 117 3.5.6 Further numerical evaluations .................... 125 3.5.7 Runtime evaluations ......................... 127 4 Application of the SSV Method in Dimensionality Reduction 133 5 Conclusion 139 Bibliography 141 Chapter 1 Introduction Apart from a small number of prototypes and high-priced, specialized machines, the first personal computers have been developed almost exclusively by American companies in the mid-seventies. Today, about fifty years later, computers have pervaded modern societies triggering changes in various domains, from global to local, e.g. from political, medical, and even ethical issues down to common work routines and habits of individuals. Probably no previous human invention has had a social impact of similar scale within such a short timespan. The workflow of a computer system can be divided into three basic steps: input, processing, output. While the growth in accumulated processing power of computer devices in their entirety over the last decades has been tremendous, the current quantity and diversity of input devices — and hence the amount of gathered input data — seems at least equally impressive. Due to decreasing hardware manufacturing costs, sensors and other kinds of automatic input devices have recently become omnipresent in modern industrial and consumer products. However, the collection and storage of the increasing amount of resulting data are essentially worthless without the existence of tools to process them in an appropriate way.
Recommended publications
  • Investigation of Nonlinearity in Hyperspectral Remotely Sensed Imagery
    Investigation of Nonlinearity in Hyperspectral Remotely Sensed Imagery by Tian Han B.Sc., Ocean University of Qingdao, 1984 M.Sc., University of Victoria, 2003 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in the Department of Computer Science © Tian Han, 2009 University of Victoria All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. Library and Archives Bibliothèque et Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l’édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada Your file Votre référence ISBN: 978-0-494-60725-1 Our file Notre référence ISBN: 978-0-494-60725-1 NOTICE: AVIS: The author has granted a non- L’auteur a accordé une licence non exclusive exclusive license allowing Library and permettant à la Bibliothèque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l’Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans le loan, distribute and sell theses monde, à des fins commerciales ou autres, sur worldwide, for commercial or non- support microforme, papier, électronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. The author retains copyright L’auteur conserve la propriété du droit d’auteur ownership and moral rights in this et des droits moraux qui protège cette thèse.
    [Show full text]
  • Fractal Geometry
    Fractal Geometry Special Topics in Dynamical Systems | WS2020 Sascha Troscheit Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Wien, Austria [email protected] July 2, 2020 Abstract Classical shapes in geometry { such as lines, spheres, and rectangles { are only rarely found in nature. More common are shapes that share some sort of \self-similarity". For example, a mountain is not a pyramid, but rather a collection of \mountain-shaped" rocks of various sizes down to the size of a gain of sand. Without any sort of scale reference, it is difficult to distinguish a mountain from a ragged hill, a boulder, or ever a small uneven pebble. These shapes are ubiquitous in the natural world from clouds to lightning strikes or even trees. What is a tree but a collection of \tree-shaped" branches? A central component of fractal geometry is the description of how various properties of geometric objects scale with size. Dimension theory in particular studies scalings by means of various dimensions, each capturing a different characteristic. The most frequent scaling encountered in geometry is exponential scaling (e.g. surface area and volume of cubes and spheres) but even natural measures can simultaneously exhibit very different behaviour on an average scale, fine scale, and coarse scale. Dimensions are used to classify these objects and distinguish them when traditional means, such as cardinality, area, and volume, are no longer appropriate. We shall establish fundamen- tal results in dimension theory which in turn influences research in diverse subject areas such as combinatorics, group theory, number theory, coding theory, data processing, and financial mathematics.
    [Show full text]
  • Assouad Type Dimensions and Homogeneity of Fractals
    Assouad type dimensions and homogeneity of fractals Jonathan M. Fraser The University of St Andrews Jonathan M. Fraser Assouad type dimensions In particular, it usually studies how much space the set takes up on small scales. Common examples of `dimensions' are the Hausdorff, packing and box dimensions. Fractals are sets with a complex structure on small scales and thus they may have fractional dimension! People working in dimension theory and fractal geometry are often concerned with the rigorous computation of the dimensions of abstract classes of fractal sets. Dimension theory A `dimension' is a function that assigns a (usually positive, finite real) number to a metric space which attempts to quantify how `large' the set is. Jonathan M. Fraser Assouad type dimensions Common examples of `dimensions' are the Hausdorff, packing and box dimensions. Fractals are sets with a complex structure on small scales and thus they may have fractional dimension! People working in dimension theory and fractal geometry are often concerned with the rigorous computation of the dimensions of abstract classes of fractal sets. Dimension theory A `dimension' is a function that assigns a (usually positive, finite real) number to a metric space which attempts to quantify how `large' the set is. In particular, it usually studies how much space the set takes up on small scales. Jonathan M. Fraser Assouad type dimensions Fractals are sets with a complex structure on small scales and thus they may have fractional dimension! People working in dimension theory and fractal geometry are often concerned with the rigorous computation of the dimensions of abstract classes of fractal sets.
    [Show full text]
  • Conformal Assouad Dimension and Modulus S
    GAFA, Geom. funct. anal. Vol. 14 (2004) 1278 – 1321 c Birkh¨auser Verlag, Basel 2004 1016-443X/04/061278-44 DOI 10.1007/s00039-004-0492-5 GAFA Geometric And Functional Analysis CONFORMAL ASSOUAD DIMENSION AND MODULUS S. Keith and T. Laakso Abstract. Let α ≥ 1andlet(X, d, µ)beanα-homogeneous metric measure space with conformal Assouad dimension equal to α. Then there exists a weak tangent of (X, d, µ) with uniformly big 1-modulus. 1 Introduction The existence of families of curves with non-vanishing modulus in a metric measure space is a strong and useful property, perhaps most appreciated in the study of abstract quasiconformal and quasisymmetric maps, Sobolev spaces, Poincar´e inequalities, and geometric rigidity questions. See for ex- ample [BoK3], [BouP2], [Ch], [HK], [HKST], [K2,3], [KZ], [P1], [Sh], [T2] and the many references contained therein. In this paper we give an exis- tence theorem for families of curves with non-vanishing modulus in weak tangents of certain metric measure spaces, and show that this existence is fundamentally related to the conformal Assouad dimension of the underly- ing metric space. Theorem 1.0.1. Let α ≥ 1 and let (X, d, µ) be an α-homogeneous metric measure space with conformal Assouad dimension equal to α. Then there exists a weak tangent of (X, d, µ) with uniformly big 1-modulus. The terminology of Theorem 1.0.1 will be explained in section 2, and several applications will be given momentarily. For the moment we note that the notion of uniformly big 1-modulus is new, and describes those metric measure spaces that are rich in curves at every location and scale, as measured in a scale-invariant way by modulus.
    [Show full text]
  • Arxiv:1511.03461V2 [Math.MG] 19 May 2017 Osadfrhrasmtoso Hs Asaesiuae.I on If Stipulated
    ON THE DIMENSIONS OF ATTRACTORS OF RANDOM SELF-SIMILAR GRAPH DIRECTED ITERATED FUNCTION SYSTEMS SASCHA TROSCHEIT Abstract. In this paper we propose a new model of random graph directed fractals that encompasses the current well-known model of random graph di- rected iterated function systems, V -variable attractors, and fractal and Man- delbrot percolation. We study its dimensional properties for similarities with and without overlaps. In particular we show that for the two classes of 1- variable and ∞-variable random graph directed attractors we introduce, the Hausdorff and upper box counting dimension coincide almost surely, irrespec- tive of overlap. Under the additional assumption of the uniform strong separa- tion condition we give an expression for the almost sure Hausdorff and Assouad dimension. 1. Introduction The study of deterministic and random fractal geometry has seen a lot of interest over the past 30 years. While we assume the reader is familiar with standard works on the subject (e.g. [7], [12], [13]) we repeat some of the material here for completeness, enabling us to set the scene for how our model fits in with and also differs from previously considered models. In the study of strange attractors in dynamical systems and in fractal geometry, one of the most commonly encountered families of attractors is the invariant set under a finite family of contractions. This is the family of Iterated Function System (IFS) attractors. An IFS is a set of mappings I = {fi}i∈I , with associated attractor F that satisfies (1.1) F = fi(F ). i[∈I d d If I is a finite index set and each fi : R → R is a contraction, then there exists a unique compact and non-empty set F in the family of compact subsets K(Rd) that satisfies this invariance (see Hutchinson [21]).
    [Show full text]
  • A Review of Dimension Reduction Techniques∗
    A Review of Dimension Reduction Techniques∗ Miguel A.´ Carreira-Perpin´~an Technical Report CS{96{09 Dept. of Computer Science University of Sheffield [email protected] January 27, 1997 Abstract The problem of dimension reduction is introduced as a way to overcome the curse of the dimen- sionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen's maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projec- tion pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text. Contents 1 Introduction to the Problem of Dimension Reduction 5 1.1 Motivation . 5 1.2 Definition of the problem . 6 1.3 Why is dimension reduction possible? . 6 1.4 The curse of the dimensionality and the empty space phenomenon . 7 1.5 The intrinsic dimension of a sample . 7 1.6 Views of the problem of dimension reduction . 8 1.7 Classes of dimension reduction problems . 8 1.8 Methods for dimension reduction. Overview of the report . 9 2 Principal Component Analysis 10 3 Projection Pursuit 12 3.1 Introduction .
    [Show full text]
  • Hölder Parameterization of Iterated Function Systems and a Self-A Ne
    Anal. Geom. Metr. Spaces 2021; 9:90–119 Research Article Open Access Matthew Badger* and Vyron Vellis Hölder Parameterization of Iterated Function Systems and a Self-Ane Phenomenon https://doi.org/10.1515/agms-2020-0125 Received November 1, 2020; accepted May 25, 2021 Abstract: We investigate the Hölder geometry of curves generated by iterated function systems (IFS) in a com- plete metric space. A theorem of Hata from 1985 asserts that every connected attractor of an IFS is locally connected and path-connected. We give a quantitative strengthening of Hata’s theorem. First we prove that every connected attractor of an IFS is (1/s)-Hölder path-connected, where s is the similarity dimension of the IFS. Then we show that every connected attractor of an IFS is parameterized by a (1/α)-Hölder curve for all α > s. At the endpoint, α = s, a theorem of Remes from 1998 already established that connected self-similar sets in Euclidean space that satisfy the open set condition are parameterized by (1/s)-Hölder curves. In a secondary result, we show how to promote Remes’ theorem to self-similar sets in complete metric spaces, but in this setting require the attractor to have positive s-dimensional Hausdor measure in lieu of the open set condition. To close the paper, we determine sharp Hölder exponents of parameterizations in the class of connected self-ane Bedford-McMullen carpets and build parameterizations of self-ane sponges. An inter- esting phenomenon emerges in the self-ane setting. While the optimal parameter s for a self-similar curve in Rn is always at most the ambient dimension n, the optimal parameter s for a self-ane curve in Rn may be strictly greater than n.
    [Show full text]
  • Fractal-Based Methods As a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: a Survey
    INFORMATICA, 2016, Vol. 27, No. 2, 257–281 257 2016 Vilnius University DOI: http://dx.doi.org/10.15388/Informatica.2016.84 Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey Rasa KARBAUSKAITE˙ ∗, Gintautas DZEMYDA Institute of Mathematics and Informatics, Vilnius University Akademijos 4, LT-08663, Vilnius, Lithuania e-mail: [email protected], [email protected] Received: December 2015; accepted: April 2016 Abstract. The estimation of intrinsic dimensionality of high-dimensional data still remains a chal- lenging issue. Various approaches to interpret and estimate the intrinsic dimensionality are deve- loped. Referring to the following two classifications of estimators of the intrinsic dimensionality – local/global estimators and projection techniques/geometric approaches – we focus on the fractal- based methods that are assigned to the global estimators and geometric approaches. The compu- tational aspects of estimating the intrinsic dimensionality of high-dimensional data are the core issue in this paper. The advantages and disadvantages of the fractal-based methods are disclosed and applications of these methods are presented briefly. Key words: high-dimensional data, intrinsic dimensionality, topological dimension, fractal dimension, fractal-based methods, box-counting dimension, information dimension, correlation dimension, packing dimension. 1. Introduction In real applications, we confront with data that are of a very high dimensionality. For ex- ample, in image analysis, each image is described by a large number of pixels of different colour. The analysis of DNA microarray data (Kriukien˙e et al., 2013) deals with a high dimensionality, too. The analysis of high-dimensional data is usually challenging.
    [Show full text]
  • A Continuous Formulation of Intrinsic Dimension
    A continuous Formulation of intrinsic Dimension Norbert Kruger¨ 1;2 Michael Felsberg3 Computer Science and Engineering1 Electrical Engineering3 Aalborg University Esbjerg Linkoping¨ University 6700 Esbjerg, Denmark SE-58183 Linkoping,¨ Sweden [email protected] [email protected] Abstract The intrinsic dimension (see, e.g., [29, 11]) has proven to be a suitable de- scriptor to distinguish between different kind of image structures such as edges, junctions or homogeneous image patches. In this paper, we will show that the intrinsic dimension is spanned by two axes: one axis represents the variance of the spectral energy and one represents the a weighted variance in orientation. Moreover, we will show in section that the topological structure of instrinsic dimension has the form of a triangle. We will review diverse def- initions of intrinsic dimension and we will show that they can be subsumed within the above mentioned scheme. We will then give a concrete continous definition of intrinsic dimension that realizes its triangular structure. 1 Introduction Natural images are dominated by specific local sub–structures, such as edges, junctions, or texture. Sub–domains of Computer Vision have analyzed these sub–structures by mak- ing use of certain conepts (such as, e.g., orientation, position, or texture gradient). These concepts were then utilized for a variety of tasks, such as, edge detection (see, e.g., [6]), junction classification (see, e.g., [27]), and texture interpretation (see, e.g., [26]). How- ever, before interpreting image patches by such concepts we want know whether and how these apply. For example, the idea of orientation does make sense for edges or lines but not for a junction or most textures.
    [Show full text]
  • On the Intrinsic Dimensionality of Image Representations
    On the Intrinsic Dimensionality of Image Representations Sixue Gong Vishnu Naresh Boddeti Anil K. Jain Michigan State University, East Lansing MI 48824 fgongsixu, vishnu, [email protected] Abstract erations, such as, ease of learning the embedding function [38], constraints on system memory, etc. instead of the ef- This paper addresses the following questions pertaining fective dimensionality necessary for image representation. to the intrinsic dimensionality of any given image represen- This naturally raises the following fundamental but related tation: (i) estimate its intrinsic dimensionality, (ii) develop questions, How compact can the representation be without a deep neural network based non-linear mapping, dubbed any loss in recognition performance? In other words, what DeepMDS, that transforms the ambient representation to is the intrinsic dimensionality of the representation? And, the minimal intrinsic space, and (iii) validate the verac- how can one obtain such a compact representation? Ad- ity of the mapping through image matching in the intrinsic dressing these questions is the primary goal of this paper. space. Experiments on benchmark image datasets (LFW, The intrinsic dimensionality (ID) of a representation IJB-C and ImageNet-100) reveal that the intrinsic dimen- refers to the minimum number of parameters (or degrees of sionality of deep neural network representations is signif- freedom) necessary to capture the entire information present icantly lower than the dimensionality of the ambient fea- in the representation [4]. Equivalently, it refers to the di- tures. For instance, SphereFace’s [26] 512-dim face repre- mensionality of the m-dimensional manifold embedded sentation and ResNet’s [16] 512-dim image representation within the d-dimensional ambient (representation)M space have an intrinsic dimensionality of 16 and 19 respectively.
    [Show full text]
  • Assouad Dimension and the Open Set Condition
    Abstract: The Assouad dimension is a measure of the complexity of a fractal set similar to the box counting dimension, but with an additional scaling requirement. We generalize Moran's open set condition and introduce a notion called grid like which allows us to compute upper bounds and exact values for the Assouad dimension of certain fractal sets that arise as the attractors of self-similar iterated function systems. Then for an arbitrary fractal set A, we explore the question of whether the Assouad dimension of the set of differences A − A obeys any bound related to the Assouad dimension of A. This question is of interest, as infinite dimensional dynamical systems with attractors possessing sets of differences of finite Assouad dimension allow embeddings into finite dimensional spaces without losing the original dynamics. We find that even in very simple, natural examples, such a bound does not generally hold. This result demonstrates how a natural phenomenon with a simple underlying structure can be difficult to measure. Assouad Dimension and the Open Set Condition Alexander M. Henderson Department of Mathematics and Statistics University of Nevada, Reno 19 April 2013 Selected References I Jouni Luukkainen. Assouad dimension: Antifractal metrization, porous sets, and homogeneous measures. Journal of the Korean Mathematical Society, 35(1):23{76, 1998. I Eric J. Olson and James C. Robinson. Almost bi-Lipschitz embeddings and almost homogeneous sets. Transactions of the American Mathematical Society, 362:145{168, 2010. I K. J. Falconer. The Geometry of Fractal Sets. Cambridge University Press, New York, 1985. I John M. Mackay. Assouad dimension of self-affine carpets.
    [Show full text]
  • Dimension Theory of Random Self-Similar and Self-Affine Constructions
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository DIMENSION THEORY OF RANDOM SELF-SIMILAR AND SELF-AFFINE CONSTRUCTIONS Sascha Troscheit A Thesis Submitted for the Degree of PhD at the University of St Andrews 2017 Full metadata for this item is available in St Andrews Research Repository at: http://research-repository.st-andrews.ac.uk/ Please use this identifier to cite or link to this item: http://hdl.handle.net/10023/11033 This item is protected by original copyright This item is licensed under a Creative Commons Licence Dimension Theory of Random Self-similar and Self-affine Constructions Sascha Troscheit This thesis is submitted in partial fulfilment for the degree of Doctor of Philosophy at the University of St Andrews April 21, 2017 to the North Sea Acknowledgements First and foremost I thank my supervisors Kenneth Falconer and Mike Todd for their countless hours of support: mathematical, academical, and otherwise; for reading through many drafts and providing valuable feedback; for the regular Analysis group outings that were more often than not the highlight of my week. I thank my father, Siegfried Troscheit; his wife, Christiane Miethe; and my mother, Krystyna Troscheit, for their support throughout all the years of university that culminated in this thesis. I would not be where I am today if it was not for my teachers Richard St¨ove and Klaus Bovermann, who supported my early adventures into physics and mathematics, and enabled me to attend university lectures while at school. I thank my friends, both in St Andrews and the rest of the world.
    [Show full text]