Support Vector Machines

Total Page:16

File Type:pdf, Size:1020Kb

Support Vector Machines Support Vector Machines Some slides adapted from •Aliferis & Tsamardinos, Vanderbilt University http://discover1.mc.vanderbilt.edu/discover/public/ml_tutorial_ol d/index.html •Rong Jin, Language Technology Institute www.contrib.andrew.cmu.edu/~jin/ir_proj/svm.ppt 1 ABDBM © Ron Shamir Support Vector Machines • Decision surface: a hyperplane in feature space • One of the most important tools in the machine learning toolbox • In a nutshell: – map the data to a predetermined very high- dimensional space via a kernel function – Find the hyperplane that maximizes the margin between the two classes – If data are not separable - find the hyperplane that maximizes the margin and minimizes the (weighted average of the) misclassifications ABDBM © Ron Shamir 2 Support Vector Machines • Three main ideas: 1. Define what an optimal hyperplane is (taking into account that it needs to be computed efficiently): maximize margin 2. Generalize to non-linearly separable problems: have a penalty term for misclassifications 3. Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data are mapped implicitly to this space ABDBM © Ron Shamir 3 Support Vector Machines • Three main ideas: 1. Define what an optimal hyperplane is (taking into account that it needs to be computed efficiently): maximize margin 2. Generalize to non-linearly separable problems: have a penalty term for misclassifications 3. Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data are mapped implicitly to this space ABDBM © Ron Shamir 4 Which Separating Hyperplane to Use? Var1 Var2 ABDBM © Ron Shamir 5 Maximizing the Margin Var 1 IDEA 1: Select the separating hyperplane that maximizes the margin! Margin Width Margin Width Var2 ABDBM © Ron Shamir 6 Support Vectors Var1 Support Vectors Margin Width Var2 ABDBM © Ron Shamir 7 Setting Up the Optimization Problem Var1 The width of the margin is: 2 k w w x b k So, the problem is: w 2 k max w w x b k k Var2 s. t . ( w x b ) k , x of class 1 w x b 0 (w x b ) k , x of class 2 ABDBM © Ron Shamir 8 Setting Up the Optimization Problem Var1 Scaling w, b so that k=1, the problem becomes: 2 max w x b 1 w w s. t . ( w x b ) 1, x of class 1 (w x b ) 1, x of class 2 w x b 1 1 Var2 w x b 0 ABDBM © Ron Shamir 9 Setting Up the Optimization Problem • If class 1 corresponds to 1 and class 2 corresponds to -1, we can rewrite (w xi b ) 1, x i with y i 1 (w xi b ) 1, x i with y i 1 • as yi( w x i b ) 1, x i • So the problem becomes: 2 1 2 max min w w or 2 s. t . y( w x b ) 1, x s. t . yi( w x i b ) 1, x i i i i ABDBM © Ron Shamir 10 Linear, Hard-Margin SVM Formulation • Find w,b that solve • Quadratic program: quadratic objective, linear (in)equality constraints • Problem is convex there is a unique global minimum value (when feasible) • There is also a unique minimizer, i.e. w and b values that provide the minimum • No solution if the data are not linearly separable 1 2 • Objective is PD polynomial-timemin solnw • Very efficient soln with modern optimization2 s. t . y( w x b ) 1, x software (handles 1000s of constraintsi iand i training instances). ABDBM © Ron Shamir 11 Lagrange multipliers 1 2 Minimize (w) || w || i yi (w xi b) i w,b, 2 s.t.1 0,...,l 0 • Convex quadratic programming problem • Duality theory applies! ABDBM © Ron Shamir 12 Dual Space • Dual Problem 1 Maximize F(Λ) Λ 1 Λ DΛ 2 Λ y 0 subject to Λ 0 where Λ (1,2 ,...,l ),y ( y1, y2 ,...,yn ),Dij yi y jxi x j • Representation for w l w i yixi • Decision function i 1 l f (x) sign( yii (x xi ) b) i 1 ABDBM © Ron Shamir 13 Comments • Representation of vector w – Linear combination of examples xi • # parameters = # examples – i: the importance of each examples • Only the points closest to the bound have i0 • Core of the algorithm: xx’ – Both matrix D and decision function require the knowledge of xx’ Dij yi y jxi x j (More on this soon) l w i yixi i 1 ABDBM © Ron Shamir 14 Support Vector Machines • Three main ideas: 1. Define what an optimal hyperplane is (taking into account that it needs to be computed efficiently): maximize margin 2. Generalize to non-linearly separable problems: have a penalty term for misclassifications 3. Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data are mapped implicitly to this space ABDBM © Ron Shamir 15 Non-Linearly Separable Data Var1 Introduce slack variables i i Allow some instances to fall j w x b 1 within the margin, w but penalize them w x b 1 1 Var w x b 0 2 ABDBM © Ron Shamir 16 Formulating the Optimization Problem Constraint becomes : Var1 yi( w x i b ) 1 i , x i i i 0 Objective function penalizes for j w x b 1 w misclassified instances and those within the margin w x b 1 1 1 2 min wC Var2 i w x b 0 2 i C trades-off margin width & misclassifications ABDBM © Ron Shamir 17 Linear, Soft-Margin SVMs yi( w x i b ) 1 i , x i 0 • Algorithm tries to keep i at zero whilei maximizing margin • Alg does not minimize the no. of misclassifications (NP-complete problem) but the sum of distances from the margin hyperplanes 2 • Other formulations use i instead • C: penalty for misclassification 1 2 • As C, we get closer to the hard-marginmin solutionwC i 2 i ABDBM © Ron Shamir 18 Dual Space • Dual Problem 1 Maximize F(Λ) Λ 1 Λ DΛ 2 Λ y 0 subject to Λ 0 Λ C1 where Λ (1,2 ,...,l ),y ( y1, y2 ,...,yn ),Dij yi y jxi x j • Only difference: upper bound C on i • Representation for w l w i yixi • Decision function i 1 l f (x) sign( yii (x xi ) b) i 1 ABDBM © Ron Shamir 19 Comments • Param C – Controls the range of i avoids over emphasizing some examples – i (C - i) = 0 (“complementary slackness”) – C can be extended to be case-dependent • Weight i – i < C i = 0 i-th example is correctly classified not quite important – i = C i can be nonzero i-th training example may be misclassified very important ABDBM © Ron Shamir 20 Robustness of Soft vs Hard Margin SVMs Var Var1 1 i i Var Var2 2 w x b 0 Soft Margin SVM Hard Margin SVM ABDBM © Ron Shamir 21 Soft vs Hard Margin SVM • Soft-Margin always has a solution • Soft-Margin is more robust to outliers – Smoother surfaces (in the non-linear case) • Hard-Margin does not require to guess the cost parameter (requires no parameters at all) ABDBM © Ron Shamir 22 Support Vector Machines • Three main ideas: 1. Define what an optimal hyperplane is (taking into account that it needs to be computed efficiently): maximize margin 2. Generalize to non-linearly separable problems: have a penalty term for misclassifications 3. Map data to high dimensional space where it is easier to classify with linear decision surfaces: reformulate problem so that data are mapped implicitly to this space ABDBM © Ron Shamir 23 Disadvantages of Linear Decision Surfaces Var1 Var2 ABDBM © Ron Shamir 24 Advantages of Non-Linear Surfaces Var1 Var2 ABDBM © Ron Shamir 25 Linear Classifiers in High- Dimensional Spaces Constructed Var 1 Feature 2 Var 2 Constructed Feature 1 Find function (x) to map to a different space ABDBM © Ron Shamir 26 Mapping Data to a High- Dimensional Space • Find function (x) to map to a different space, then SVM formulation becomes: s. t . yi( w ( x i ) b ) 1 i , x i i 0 • Data appear as (x), weights w are now weights in the new space • Explicit mapping expensive if (x) is1 very2 min wC high dimensional i 2 i • Can we solve the problem without explicitly mapping the data ? ABDBM © Ron Shamir 27 The Dual of the SVM Formulation • Original SVM formulation 1 2 – n inequality constraints min w C i w,b – n positivity constraints 2 i – n number of variables s.t. yi (w (x) b) 1 i , xi • The (Wolfe) dual of this i 0 problem – one equality constraint 1 – n positivity constraints min i j yi y j ((xi ) (x j )) i ai 2 – n number of variables i, j i (Lagrange multipliers) s.t. C 0, x – Objective function more i i complicated i yi 0 i • But: Data only appear as (xi) (xj) ABDBM © Ron Shamir 28 The Kernel Trick t • (xi) (xj) means: map data into new space, then take the inner product of the new vectors • Suppose we can find a function such that: K(xi , xj) = t (xi) (xj) , i.e., K is the inner product of the images of the data • For training, no need to explicitly map the data into the high-dimensional space to solve the optimization problem • How do we classify without explicitly mapping the new instances? Turns out sgn(wx b) sgn(i yi K(xi , x) b) i where b solves j ( y j i yi K(xi , x j ) b 1) 0, i for any j with j 0 ABDBM © Ron Shamir 29 Examples of Kernels • Assume we measure x1,x2 and we use the mapping: : x , x {,,2 x22 x x x ,2,2,1} x x 1 21 2 1 2 1 2 • Consider the function: K(x, z) (x z 1)2 • Then: t 2 2 2 2 • 휙 푥 휙 푧 = 푥1 푧1 + 푥2 푧2 + 2푥1푥2푧1푧2 + 2 2푥1푧1 + 2푥2푧2 + 1 = 푥1푧1 + 푥2푧2 + 1 = 푥 ⋅ 푧 + 1 2 = 퐾(푥, 푧) ABDBM © Ron Shamir 30 Polynomial and Gaussian Kernels K(x, z) (x z 1) p is called the polynomial kernel of degree p.
Recommended publications
  • INSTITUTE for COMPUTATIONAL and BIOLOGICAL LEARNING (ICBL)
    ICBL (HANSON, GLUCK, VAPNIK) 1 August 20th 2003 Governor's Commission on Jobs & Economic Growth INSTITUTE FOR COMPUTATIONAL and BIOLOGICAL LEARNING (ICBL) PRIMARY PARTICIPANTS: Stephen José Hanson, Psych. Rutgers-Newark; Cog. Sci. Rutgers-NB, Info. Sci., NJIT; Advanced Imaging Center, UMDNJ (Principal Investigator and Co-Director) Mark A. Gluck, CMBN Rutgers-Newark (co-Principal Investigator and Co-Director) Vladimir Vapnik, NEC (co-Principal Investigator and Co-Director) We also include participants and contact personnel for other participating institutions including Rutgers NB-Piscataway, Rutgers-Camden, UMDNJ, NJIT, Princeton, NEC, ATT, Siemens Corporate Research, and Merck Pharmaceuticals (See attachments that follow this document.) PRIMARY OBJECTIVE: The Institute for Computational and Biological Learning (ICBL) will be an interdisciplinary center for research and training in the learning sciences, encompassing computational, biological,and psychological studies of systems that adapt, extract patterns from high complexity data and in general improve from experience. Core programs include learning theory, neuroinformatics, computational neuroscience of learning and memory, neural-network models, machine learning, human computer interaction and financial/economic modeling. The learning sciences is an emerging interdisciplinary technology, recently recognized by the National Science Foundation (NSF) as a national priority. The NSF announced this year a major multi-disciplinary initiative which will fund approximately five new national Learning Sciences Centers with budgets of $3M to $5M a year for up to 10 years, for a total commitment of up to $50,000,000 per center. As envisioned by NSF, these centers will be “large-scale, long-term Centers that will extend the frontiers of knowledge on learning and create the intellectual, organizational, and physical infrastructure needed for the long-term advancement of learning research.
    [Show full text]
  • A Priori Knowledge from Non-Examples
    A priori Knowledge from Non-Examples Diplomarbeit im Fach Bioinformatik vorgelegt am 29. März 2007 von Fabian Sinz Fakultät für Kognitions- und Informationswissenschaften Wilhelm-Schickard-Institut für Informatik der Universität Tübingen in Zusammenarbeit mit Max-Planck-Institut für biologische Kybernetik Tübingen Betreuer: Prof. Dr. Bernhard Schölkopf (MPI für biologische Kybernetik) Prof. Dr. Andreas Schilling (WSI für Informatik) Prof. Dr. Hanspeter Mallot (Biologische Fakultät) 1 Erklärung Die folgende Diplomarbeit wurde ausschließlich von mir (Fabian Sinz) erstellt. Es wurden keine weiteren Hilfmittel oder Quellen außer den Angegebenen benutzt. Tübingen, den 29. März 2007, Fabian Sinz Contents 1 Introduction 4 1.1 Foreword . 4 1.2 Notation . 5 1.3 Machine Learning . 7 1.3.1 Machine Learning . 7 1.3.2 Philosophy of Science in a Nutshell - About the Im- possibility to learn without Assumptions . 11 1.3.3 General Assumptions in Bayesian and Frequentist Learning . 14 1.3.4 Implicit prior Knowledge via additional Data . 17 1.4 Mathematical Tools . 18 1.4.1 Reproducing Kernel Hilbert Spaces . 18 1.5 Regularisation . 22 1.5.1 General regularisation . 22 1.5.2 Data-dependent Regularisation . 27 2 The Universum Algorithm 32 2.1 The Universum Idea . 32 2.1.1 VC Theory and Structural Risk Minimisation (SRM) in a Nutshell . 32 2.1.2 The original Universum Idea by Vapnik . 35 2.2 Implementing the Universum into SVMs . 37 2.2.1 Support Vector Machine Implementations of the Uni- versum . 38 2.2.1.1 Hinge Loss Universum . 38 2.2.1.2 Least Squares Universum . 46 2.2.1.3 Multiclass Universum .
    [Show full text]
  • United States Patent 19 11 Patent Number: 5,640,492 Cortes Et Al
    US005640492A United States Patent 19 11 Patent Number: 5,640,492 Cortes et al. 45 Date of Patent: Jun. 17, 1997 54). SOFT MARGIN CLASSIFIER B.E. Boser, I. Goyon, and V.N. Vapnik, “A Training Algo rithm. For Optimal Margin Classifiers”. Proceedings Of The 75 Inventors: Corinna Cortes, New York, N.Y.: 4-th Workshop of Computational Learning Theory, vol. 4, Vladimir Vapnik, Middletown, N.J. San Mateo, CA, Morgan Kaufman, 1992. 73) Assignee: Lucent Technologies Inc., Murray Hill, Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, N.J. W. Hubbard, and L. Jackel, "Handwritten Digit Recognition with a Back-Propagation Network', (D. Touretzky, Ed.), Advances in Neural Information Processing Systems, vol. 2, 21 Appl. No.: 268.361 Morgan Kaufman, 1990. 22 Filed: Jun. 30, 1994 A.N. Referes et al., “Stock Ranking: Neural Networks Vs. (51] Int. Cl. ................ G06E 1/00; G06E 3100 Multiple Linear Regression':, IEEE International Conf. on 52 U.S. Cl. .................................. 395/23 Neural Networks, vol. 3, San Francisco, CA, IEEE, pp. 58) Field of Search ............................................ 395/23 1419-1426, 1993. M. Rebollo et al., “AMixed Integer Programming Approach 56) References Cited to Multi-Spectral Image Classification”, Pattern Recogni U.S. PATENT DOCUMENTS tion, vol. 9, No. 1, Jan. 1977, pp. 47-51, 54-55, and 57. 4,120,049 10/1978 Thaler et al. ........................... 365/230 V. Uebele et al., "Extracting Fuzzy Rules From Pattern 4,122,443 10/1978 Thaler et al. ....... 340/.46.3 Classification Neural Networks”, Proceedings From 1993 5,040,214 8/1991 Grossberg et al. ....................... 381/43 International Conference on Systems, Man and Cybernetics, 5,214,716 5/1993 Refregier et al..
    [Show full text]
  • On the State of the Art in Machine Learning: a Personal Review
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by CiteSeerX University of Bristol DEPARTMENT OF COMPUTER SCIENCE On the state of the art in Machine Learning: a personal review Peter A. Flach December 2000 CSTR-00-018 On the state of the art in Machine Learning: a personal review Peter A. Flach December 15, 2000 Abstract This paper reviews a number of recent books related to current developments in machine learning. Some (anticipated) trends will be sketched. These include: a trend towards combining approaches that were hitherto regarded as distinct and were studied by separate research communities; a trend towards a more prominent role of representation; and a tighter integration of machine learning techniques with techniques from areas of application such as bioinformatics. The intended readership has some knowledge of what machine learning is about, but brief tuto- rial introductions to some of the more specialist research areas will also be given. 1 Introduction This paper reviews a number of books that appeared recently in my main area of exper- tise, machine learning. Following a request from the book review editors of Artificial Intelligence, I selected about a dozen books with which I either was already familiar, or which I would be happy to read in order to find out what’s new in that particular area of the field. The choice of books is therefore subjective rather than comprehensive (and also biased towards books I liked). However, it does reflect a personal view of what I think are exciting developments in machine learning, and as a secondary aim of this paper I will expand a bit on this personal view, sketching some of the current and anticipated future trends.
    [Show full text]
  • UNIVERSITY of CALIFORNIA Los Angeles Data-Driven Learning Of
    UNIVERSITY OF CALIFORNIA Los Angeles Data-Driven Learning of Invariants and Specifications A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Saswat Padhi 2020 © Copyright by Saswat Padhi 2020 ABSTRACT OF THE DISSERTATION Data-Driven Learning of Invariants and Specifications by Saswat Padhi Doctor of Philosophy in Computer Science University of California, Los Angeles, 2020 Professor Todd D. Millstein, Chair Although the program verification community has developed several techniques for analyzing software and formally proving their correctness, these techniques are too sophisticated for end users and require significant investment in terms of time and effort. In this dissertation, I present techniques that help programmers easily formalize the initial requirements for verifying their programs — specifications and inductive invariants. The proposed techniques leverage ideas from program synthesis and statistical learning to automatically generate these formal requirements from readily available program-related data, such as test cases, execution traces etc. I detail three of these data-driven learning techniques – FlashProfile and PIE for specification learning, and LoopInvGen for invariant learning. I conclude with some principles for building robust synthesis engines, which I learned while refining the aforementioned techniques. Since program synthesis is a form of function learning, it is perhaps unsurprising that some of the fundamental issues in program synthesis have also been explored in the machine learning community. I study one particular phenomenon — overfitting. I present a formalization of overfitting in program synthesis, and discuss two mitigation strategies, inspired by existing techniques. ii The dissertation of Saswat Padhi is approved. Adnan Darwiche Sumit Gulwani Miryung Kim Jens Palsberg Todd D.
    [Show full text]
  • Accelerating Text-As-Data Research in Computational Social Science
    Accelerating Text-as-Data Research in Computational Social Science Dallas Card August 2019 CMU-ML-19-109 Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, PA Thesis Committee: Noah A. Smith, Chair Artur Dubrawski Geoff Gordon Dan Jurafsky (Stanford University) Submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Copyright c 2019 Dallas Card This work was supported by a Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship, NSF grant IIS-1211277, an REU supplement to NSF grant IIS-1562364, a Bloomberg Data Science Research Grant, a University of Washington Innovation Award, and computing resources provided by XSEDE. Keywords: machine learning, natural language processing, computational social science, graphical models, interpretability, calibration, conformal methods Abstract Natural language corpora are phenomenally rich resources for learning about people and society, and have long been used as such by various disciplines such as history and political science. Recent advances in machine learning and natural language pro- cessing are creating remarkable new possibilities for how scholars might analyze such corpora, but working with textual data brings its own unique challenges, and much of the research in computer science may not align with the desiderata of social scien- tists. In this thesis, I present a line of work on developing methods for computational social science focused primarily on observational research using natural language text. Throughout, I take seriously the concerns and priorities of the social sciences, leading to a focus on aspects of machine learning which are otherwise sometimes secondary, including calibration, interpretability, and transparency. Two ideas which unify this work are the problems of exploration and measurement, and as a running example I consider the problem of analyzing how news sources frame contemporary political issues.
    [Show full text]
  • Teaching with IMPACT
    Teaching with IMPACT Carl Trimbach [email protected] Michael Littman [email protected] Abstract Like many problems in AI in their general form, supervised learning is computationally intractable. We hypothesize that an important reason humans can learn highly complex and varied concepts, in spite of the computational difficulty, is that they benefit tremendously from experienced and insightful teachers. This paper proposes a new learning framework that provides a role for a knowledgeable, benevolent teacher to guide the process of learning a target concept in a series of \curricular" phases or rounds. In each round, the teacher's role is to act as a moderator, exposing the learner to a subset of the available training data to move it closer to mastering the target concept. Via both theoretical and empirical evidence, we argue that this framework enables simple, efficient learners to acquire very complex concepts from examples. In particular, we provide multiple examples of concept classes that are known to be unlearnable in the standard PAC setting along with provably efficient algorithms for learning them in our extended setting. A key focus of our work is the ability to learn complex concepts on top of simpler, previously learned, concepts|a direction with the potential of creating more competent artificial agents. 1. Introduction In statistical learning theory, the PAC (probably approximately correct) model (Valiant, 1984) has been a valuable tool for framing and understanding the problem of learning from examples. Many interesting and challenging problems can be analyzed and attacked in the PAC setting, resulting in powerful algorithms. Nevertheless, there exist many central and relevant concept classes that are provably unlearnable this way.
    [Show full text]
  • April 8-11, 2019 the 2019 Franklin Institute Laureates the 2019 Franklin Institute AWARDS CONVOCATION APRIL 8–11, 2019
    april 8-11, 2019 The 2019 Franklin Institute Laureates The 2019 Franklin Institute AWARDS CONVOCATION APRIL 8–11, 2019 Welcome to The Franklin Institute Awards, the range of disciplines. The week culminates in a grand oldest comprehensive science and technology medaling ceremony, befitting the distinction of this awards program in the United States. Each year, the historic awards program. Institute recognizes extraordinary individuals who In this convocation book, you will find a schedule of are shaping our world through their groundbreaking these events and biographies of our 2019 laureates. achievements in science, engineering, and business. We invite you to read about each one and to attend We celebrate them as modern day exemplars of our the events to learn even more. Unless noted otherwise, namesake, Benjamin Franklin, whose impact as a all events are free and open to the public and located scientist, inventor, and statesman remains unmatched in Philadelphia, Pennsylvania. in American history. Along with our laureates, we honor Franklin’s legacy, which has inspired the We hope this year’s remarkable class of laureates Institute’s mission since its inception in 1824. sparks your curiosity as much as they have ours. We look forward to seeing you during The Franklin From shedding light on the mechanisms of human Institute Awards Week. memory to sparking a revolution in machine learning, from sounding the alarm about an environmental crisis to making manufacturing greener, from unlocking the mysteries of cancer to developing revolutionary medical technologies, and from making the world III better connected to steering an industry giant with purpose, this year’s Franklin Institute laureates each reflect Ben Franklin’s trailblazing spirit.
    [Show full text]
  • On-Device Machine Learning: an Algorithms and Learning Theory Perspective
    On-Device Machine Learning: An Algorithms and Learning Theory Perspective SAUPTIK DHAR, America Research Center, LG Electronics JUNYAO GUO, America Research Center, LG Electronics JIAYI (JASON) LIU, America Research Center, LG Electronics SAMARTH TRIPATHI, America Research Center, LG Electronics UNMESH KURUP, America Research Center, LG Electronics MOHAK SHAH, America Research Center, LG Electronics The predominant paradigm for using machine learning models on a device is to train a model in the cloud and perform inference using the trained model on the device. However, with increasing number of smart devices and improved hardware, there is interest in performing model training on the device. Given this surge in interest, a comprehensive survey of the field from a device-agnostic perspective sets the stage for both understanding the state-of-the-art and for identifying open challenges and future avenues of research. However, on-device learning is an expansive field with connections to a large number of related topics in AI and machine learning (including online learning, model adaptation, one/few-shot learning, etc.). Hence, covering such a large number of topics in a single survey is impractical. This survey finds a middle ground by reformulating the problem of on-device learning as resource constrained learning where the resources are compute and memory. This reformulation allows tools, techniques, and algorithms from a wide variety of research areas to be compared equitably. In addition to summarizing the state-of-the-art, the survey also identifies a number of challenges and next steps for both the algorithmic and theoretical aspects ofon-device learning. ACM Reference Format: Sauptik Dhar, Junyao Guo, Jiayi (Jason) Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah.
    [Show full text]
  • Lipics-ESA-2020-38.Pdf (0.8
    A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions Hu Ding School of Computer Science and Technology, University of Science and Technology of China, Anhui, China http://staff.ustc.edu.cn/~huding/index.html [email protected] Abstract Many real-world problems can be formulated as geometric optimization problems in high dimensions, especially in the fields of machine learning and data mining. Moreover, we often need to take into account of outliers when optimizing the objective functions. However, the presence of outliers could make the problems to be much more challenging than their vanilla versions. In this paper, we study the fundamental minimum enclosing ball (MEB) with outliers problem first; partly inspired by the core-set method from Bădoiu and Clarkson, we propose a sub-linear time bi-criteria approximation algorithm based on two novel techniques, the Uniform-Adaptive Sampling method and Sandwich Lemma. To the best of our knowledge, our result is the first sub-linear time algorithm, which has the sample size (i.e., the number of sampled points) independent of both the number of input points n and dimensionality d, for MEB with outliers in high dimensions. Furthermore, we observe that these two techniques can be generalized to deal with a broader range of geometric optimization problems with outliers in high dimensions, including flat fitting, k-center clustering, and SVM with outliers, and therefore achieve the sub-linear time algorithms for these problems respectively. 2012 ACM Subject Classification Theory of computation → Approximation algorithms analysis Keywords and phrases minimum enclosing ball, outliers, shape fitting, high dimensions, sub-linear time Digital Object Identifier 10.4230/LIPIcs.ESA.2020.38 Related Version A full version of the paper is available at https://arxiv.org/abs/2004.10090.
    [Show full text]
  • Arxiv:2105.01867V1 [Cs.LG] 5 May 2021
    A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs Devansh Bisla,* Apoorva Nandini Saridena,* Anna Choromanska Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University fdb3484, ans609, [email protected] Abstract nition [49, 37], speech recognition [26, 13], and natural lan- This paper focuses on understanding how the general- guage processing [7, 20]. In recent years DL approaches ization error scales with the amount of the training data for were also shown to outperform humans in classic games deep neural networks (DNNs). Existing techniques in sta- such as Go [52]. The performance of DL models depends tistical learning theory require a computation of capacity on three factors: the model’s architecture, data set, and measures, such as VC dimension, to provably bound this training infrastructure, which together constitute the artifi- error. It is however unclear how to extend these measures cial intelligence (AI) trinity framework [3]). It has been ob- to DNNs and therefore the existing analyses are applicable served empirically [56] that increasing size of the training to simple neural networks, which are not used in practice, data is an extremely effective way for improving the perfor- e.g., linear or shallow (at most two-layer) ones or other- mance of DL models, more so than modifying the network’s wise multi-layer perceptrons. Moreover many theoretical architecture or training infrastructure. Generalization abil- error bounds are not empirically verifiable. In this paper ity of a network is defined as difference between the training we derive estimates of the generalization error that hold and test performance of the network.
    [Show full text]
  • Ieee-Level Awards
    IEEE-LEVEL AWARDS The IEEE currently bestows a Medal of Honor, fifteen Medals, thirty-three Technical Field Awards, two IEEE Service Awards, two Corporate Recognitions, two Prize Paper Awards, Honorary Memberships, one Scholarship, one Fellowship, and a Staff Award. The awards and their past recipients are listed below. Citations are available via the “Award Recipients with Citations” links within the information below. Nomination information for each award can be found by visiting the IEEE Awards Web page www.ieee.org/awards or by clicking on the award names below. Links are also available via the Recipient/Citation documents. MEDAL OF HONOR Ernst A. Guillemin 1961 Edward V. Appleton 1962 Award Recipients with Citations (PDF, 26 KB) John H. Hammond, Jr. 1963 George C. Southworth 1963 The IEEE Medal of Honor is the highest IEEE Harold A. Wheeler 1964 award. The Medal was established in 1917 and Claude E. Shannon 1966 Charles H. Townes 1967 is awarded for an exceptional contribution or an Gordon K. Teal 1968 extraordinary career in the IEEE fields of Edward L. Ginzton 1969 interest. The IEEE Medal of Honor is the highest Dennis Gabor 1970 IEEE award. The candidate need not be a John Bardeen 1971 Jay W. Forrester 1972 member of the IEEE. The IEEE Medal of Honor Rudolf Kompfner 1973 is sponsored by the IEEE Foundation. Rudolf E. Kalman 1974 John R. Pierce 1975 E. H. Armstrong 1917 H. Earle Vaughan 1977 E. F. W. Alexanderson 1919 Robert N. Noyce 1978 Guglielmo Marconi 1920 Richard Bellman 1979 R. A. Fessenden 1921 William Shockley 1980 Lee deforest 1922 Sidney Darlington 1981 John Stone-Stone 1923 John Wilder Tukey 1982 M.
    [Show full text]