Geometrical Aspects of Statistical Learning Theory

Geometrical Aspects of Statistical Learning Theory Vom Fachbereich Informatik der Technischen Universität Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt von Dipl.-Phys. Matthias Hein aus Esslingen am Neckar Prufungskommission:¨ Vorsitzender: Prof. Dr. B. Schiele Erstreferent: Prof. Dr. T. Hofmann Korreferent : Prof. Dr. B. Schölkopf Tag der Einreichung: 30.9.2005 Tag der Disputation: 9.11.2005 Darmstadt, 2005 Hochschulkennziffer: D17 Abstract Geometry plays an important role in modern statistical learning theory, and many different aspects of geometry can be found in this fast developing field. This thesis addresses some of these aspects. A large part of this work will be concerned with so called manifold methods, which have recently attracted a lot of interest. The key point is that for a lot of real-world data sets it is natural to assume that the data lies on a low-dimensional submanifold of a potentially high-dimensional Euclidean space. We develop a rigorous and quite general framework for the estimation and approximation of some geometric structures and other quantities of this submanifold, using certain corresponding structures on neighborhood graphs built from random samples of that submanifold. Another part of this thesis deals with the generalization of the maximal margin principle to arbitrary metric spaces. This generalization follows quite naturally by changing the viewpoint on the well-known support vector machines (SVM). It can be shown that the SVM can be seen as an algorithm which applies the maximum margin principle to a subclass of metric spaces. The motivation to consider the generalization to arbitrary metric spaces arose by the observation that in practice the condition for the applicability of the SVM is rather difficult to check for a given metric. Nevertheless one would like to apply the successful maximum margin principle even in cases where the SVM cannot be applied. The last part deals with the specific construction of so called Hilbertian metrics and positive definite kernels on probability measures. We consider several ways of building such metrics and kernels. The emphasis lies on the incorporation of different desired pro- perties of the metric and kernel. Such metrics and kernels have a wide applicability in so called kernel methods since probability measures occur as inputs in various situations. Zusammenfassung Geometrie spielt eine wichtige Rolle in der modernen statistischen Lerntheorie. Viele Aspekte der Geometrie können in diesem sich schnell entwickelnden Feld gefunden werden. Diese Dissertation beschäftigt sich mit einigen dieser Aspekte. Ein großer Teil dieser Arbeit befasst sich mit sogenannten Mannigfaltigkeits-Methoden. Die Hauptmotivation liegt darin, daß es fur¨ Datensätze in Anwendungen eine in vielen Fällen zutreffende Annahme ist, daß die Daten auf einer niedrig-dimensionalen Un- termannigfaltigkeit eines potentiell hoch-dimensionalen Euklidischen Raumes liegen. In dieser Arbeit wird ein mathematisch strenger und allgemeiner Rahmen fur¨ die Schätzung und Approximation von geometrischen Strukturen und anderen Größen der Untermannigfaltigkeit entwickelt. Dazu werden korrespondierende Strukturen auf einem durch eine Stichprobe von Punkten der Untermannigfaltigkeit erzeug- ten Nachbarschaftsgraphen genutzt. Ein weiterer Teil dieser Dissertation behandelt die Verallgemeinerung des sogenannten maximum-margin“-Prinzips auf allgemeine ” metrische Räume. Durch eine neue Sichtweise auf die sogenannte support vector ” machine“(SVM) folgt diese Verallgemeinerung auf naturliche¨ Weise. Es wird gezeigt, daß die SVM als ein Algorithmus gesehen werden kann, der das maximum-margin“- ” Prinzip auf eine Unterklasse von metrischen Räumen anwendet. Die Motivation fur¨ diese Verallgemeinerung entstand durch das in der Praxis häufig auftretende Pro- blem, daß die Bedingungen fur¨ die Verwendung einer bestimmten Metrik in der SVM schwer zu uberpr¨ ufen¨ sind. Trotzdem wurde¨ man gerne selbst in Fällen in denen die SVM nicht angewendet werden kann das erfolgreiche maximum-margin“-Prinizp ” verwenden. Der abschließende Teil dieser Arbeit beschäftigt sich mit der speziel- len Konstruktion von sogenannnten Hilbert’schen Metriken und positiv definiten Kernen auf Wahrscheinlichkeitsmaßen. Mehrere Möglichkeiten solche Metriken und Kerne zu konstruieren werden untersucht. Der Schwerpunkt liegt dabei auf der Inte- gration verschiedener gewunschter¨ Eigenschaften in die Metrik bzw. den Kern. Sol- che Metriken und Kerne haben vielfältige Anwendungsmöglichkeiten in sogenannten Kern-Methoden, da Wahrscheinlichkeitsmaße als Eingabeformate in verschiedensten Situationen auftreten. Wissenschaftlicher Werdegang des Verfassers 10/1996–02/2002 Studium der Physik mit Nebenfach Mathematik an der Universität Tubingen.¨ 02/2002 Diplom in Physik Thema der Diplomarbeit: Numerische Simulation axialsymmetrischer, isolierter Systeme in der Allgemeinen Relativitätstheorie. Betreuer: PD. Dr. J. Frauendiener 06/2002–11/2005 Wissenschaftlicher Mitarbeiter am Max-Planck-Institut fur¨ biologische Kybernetik in Tubingen¨ in der Abteilung von Prof. Dr. Bernhard Schölkopf. Erklärung Hiermit erkläre ich, daß ich die vorliegende Arbeit - mit Ausnahme der in ihr ausdrucklich¨ genannten Hilfen - selbständig verfasst habe. Acknowledgements First of all I would like to thank Bernhard Schölkopf for giving me the possibility to do my doctoral thesis in an excellent research environment. He gave me the freedom to look for my own lines of research while always providing ideas how to progress. I also very much appreciated his advice and support in times when it was needed. I am especially thankful to Olivier Bousquet for guiding me into the world of learning theory. In our long discussions we usually grazed through all sorts of topics ranging from pure mathematics to machine learning to theoretical physics. This was very inspiring and raised my interest in several branches of mathematics. He always had time for questions and was a constant source of ideas for me. I want to thank Thomas Hofmann for giving me the opportunity to do my thesis at the TU Darmstadt. I am very thankful for his support in these last steps towards the thesis. A special thanks goes to Olaf Wittich for reading parts of the second chapter and for giving helpful comments which improved the clarity of this part. During these three years I had the pleasure to work or discuss with several other nice people. They all influenced in the way I think about learning theory. I thank all of them for their time and help: Jean-Yves Audibert, Goekhan Bakır, Stephane Boucheron, Olivier Chapelle, Jan Eichhorn, AndréElisseeff, Matthias Franz, Arthur Gretton, Jeremy Hill, Kwang-In Kim, Malte Kuss, Matti Kääriäinen, Navin Lal, Cheng Soon Ong, Petra Philips, Carl Rasmussen, Gunnar Rätsch, Lorenzo Rosasco, Alexander Smola, Koji Tsuda, Ulrike von Luxburg, Felix Wichmann, Olaf Wittich, Dengyong Zhou, Alexander Zien, Laurent Zwald. I would like to thank all the AGBS team and in particular all the PhD students in our lab for a very nice atmosphere and a lot of fun. In particular I would like to thank our pioneer Ulrike von Luxburg for pleasant and helpful discussions and for the mutual support of our small ‘theory’ group, Navin Lal for a nice time here in Tubingen,¨ Malte Kuss for providing me his Matlab script to produce the nice manifold figures, my office mate Arthur Gretton for his subtle jokes and the nice atmosphere and all AOE participants for relaxing afterhours in our lab. Finally I would like to thank my family for their unconditional help and support during my studies and to Kathrin for her understanding and for reminding me sometimes that there is more in life than a thesis. Inhaltsverzeichnis 1 Introduction 13 1.1 Introduction to statistical learning theory . ..... 13 1.1.1 Empirical risk minimization . 15 1.1.2 Regularized empirical risk minimization . 18 1.2 Geometry in statistical learning theory . 19 1.3 SummaryofContributionsofthisthesis . 20 2 Consistent Continuum Limit for Graph Structure on Point Clouds 23 2.1 AbstractDefinitionoftheGraphStructure . 27 2.1.1 Hilbert Spaces of Functions on the vertices V and the edges E 27 2.1.2 The difference operator d and its adjoint d∗ .......... 28 2.1.3 The general graph Laplacian . 29 2.1.4 The special case of an undirected graph . 29 2.1.5 Smoothness functionals for regularization on undirected graphs 31 2.2 Submanifolds in Rd and associated operators . 33 2.2.1 Basics of submanifolds . 33 2.2.2 The weighted Laplacian and the continuous smoothness functional ............................... 41 2.3 Continuumlimitofthegraphstructure . 44 2.3.1 Notationsandassumptions. 45 2.3.2 Asymptotics of Euclidean convolutions on the submanifold M 47 2.3.3 Pointwise consistency of the degree function d or kernel den- sity estimation on a submanifold in Rd ............. 52 2.3.4 Pointwise consistency of the normalized and unnormalized graphLaplacian.......................... 58 2.3.5 Weak consistency of and the smoothness functional S(f) 64 HV 2.3.6 Summary and fixation of V by mutual consistency requirement 69 2.4 Applications................................H 71 2.4.1 Intrinsic dimensionality estimation of submanifolds in Rd . 71 2.5 Appendix ................................. 84 2.5.1 U-statistics ............................ 84 3 Kernels, Associated Structures and Generalizations 85 3.1 Introduction................................ 85 3.2 Positive Definite

Geometrical Aspects of Statistical Learning Theory

CSE 152: Computer Vision Manmohan Chandraker

1 Convolution

Deep Clustering with Convolutional Autoencoders

Tensorizing Neural Networks

Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels

Pre-Training Cnns Using Convolutional Autoencoders

Universal Invariant and Equivariant Graph Neural Networks

Phd Thesis, Stanford University

Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals Ravisutha Sakrepatna Srinivasamurthy Clemson University, [email protected]

Helly Groups

Convolution Network with Custom Loss Function for the Denoising of Low SNR Raman Spectra †

Fighting Deepfake by Exposing the Convolutional Traces on Images