A Discriminative Framework for Clustering Via Similarity Functions
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Avrim Blum, John Hopcroft, Ravindran Kannan
Foundations of Data Science This book provides an introduction to the mathematical and algorithmic founda- tions of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decompo- sition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms, and analysis for clustering, probabilistic models for large networks, representation learning including topic modeling and nonnegative matrix factorization, wavelets, and compressed sensing. Important probabilistic techniques are developed including the law of large num- bers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data. Avrim Blum is Chief Academic Officer at the Toyota Technological Institute at Chicago and formerly Professor at Carnegie Mellon University. He has over 25,000 citations for his work in algorithms and machine learning. He has received the AI Journal Classic Paper Award, ICML/COLT 10-Year Best Paper Award, Sloan Fellowship, NSF NYI award, and Herb Simon Teaching Award, and is a fellow of the Association for Computing Machinery. John Hopcroft is the IBM Professor of Engineering and Applied Mathematics at Cornell University. He is a member National Academy of Sciences and National Academy of Engineering, and a foreign member of the Chinese Academy of Sciences. -
A Learning Theoretic Framework for Clustering with Similarity Functions
A Learning Theoretic Framework for Clustering with Similarity Functions Maria-Florina Balcan∗ Avrim Blum∗ Santosh Vempala† November 23, 2007 Abstract Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as ground-truthand then design algorithms to (approximately) optimize various graph-based objective functions. However, in most applications, this similarity information is merely based on some heuristic; the ground truth is really the (unknown) correct clustering of the data points and the real goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In particular, motivated by recent work in learning theory that asks “what natural properties of a similarity (or kernel) function are sufficient to be able to learn well?” we ask “what natural properties of a similarity function are sufficient to be able to cluster well?” To study this question we develop a theoretical framework that can be viewed as an analog of the PAC learning model for clustering, where the object of study, rather than being a concept class, is a class of (concept, similarity function) pairs, or equivalently, a property the similarity function should satisfy with respect to the ground truth clustering. We then analyze both algorithmic and information theoretic issues in our model. While quite strong properties are needed if the goal is to produce a single approximately-correct clustering, we find that a number of reasonable properties are sufficient under two natural relaxations: (a) list clustering: analogous to the notion of list-decoding, the algorithm can produce a small list of clusterings (which a user can select from) and (b) hierarchical clustering: the algorithm’s goal is to produce a hierarchy such that desired clustering is some pruning of this tree (which a user could navigate). -
IARCS, the Indian Association for Research in Computing Science
IARCS, the Indian Association for Research in Computing Science, announces the 38th Foundations of Software Technology and Theoretical Computer Science conference at Ahmedabad University, Ahmedabad, Gujarat. The FSTTCS conference is a forum for presenting original results in foundational aspects of Computer Science and Software Technology. Representative areas include, but are not limited to, the following. ● Algorithmic Algebra ● Model Theory, Modal and Temporal ● Algorithms and Data Structures Logics ● Algorithmic Graph Theory and ● Models of Concurrent and Distributed Combinatorics Systems ● Approximation Algorithms ● Models of Timed, Reactive, Hybrid and ● Automata, Games and Formal Stochastic Systems Languages ● Parallel, Distributed and Online ● Combinatorial Optimization Algorithms ● Communication Complexity ● Parameterized Complexity ● Computational Biology ● Principles and Semantics of ● Computational Complexity Programming Languages ● Computational Geometry ● Program Analysis and Transformation ● Computational Learning ● Proof Complexity Theory ● Quantum Computing ● Cryptography and Security ● Randomness in Computing ● Data Streaming and Sublinear ● Specification, Verification, and algorithm Synthesis ● Game Theory and Mechanism ● Theorem Proving, Decision Procedures, Design Model Checking and Reactive Synthesis ● Logic in Computer Science ● Theoretical Aspects of Mobile and ● Machine Learning High-Performance Computing ● Quantum Computing Accepted Papers View the Accepted Papers. Click Here To Download PDF. Submissions Submissions -
Curriculum Vitae
Curriculum Vitae David P. Woodruff Biographical Computer Science Department Gates-Hillman Complex Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Citizenship: United States Email: [email protected] Home Page: http://www.cs.cmu.edu/~dwoodruf/ Research Interests Compressed Sensing, Data Stream Algorithms and Lower Bounds, Dimensionality Reduction, Distributed Computation, Machine Learning, Numerical Linear Algebra, Optimization Education All degrees received from Massachusetts Institute of Technology, Cambridge, MA Ph.D. in Computer Science. September 2007 Research Advisor: Piotr Indyk Thesis Title: Efficient and Private Distance Approximation in the Communication and Streaming Models. Master of Engineering in Computer Science and Electrical Engineering, May 2002 Research Advisor: Ron Rivest Thesis Title: Cryptography in an Unbounded Computational Model Bachelor of Science in Computer Science and Electrical Engineering May 2002 Bachelor of Pure Mathematics May 2002 Professional Experience August 2018-present, Carnegie Mellon University Computer Science Department Associate Professor (with tenure) August 2017 - present, Carnegie Mellon University Computer Science Department Associate Professor June 2018 – December 2018, Google, Mountain View Research Division Visiting Faculty Program August 2007 - August 2017, IBM Almaden Research Center Principles and Methodologies Group Research Scientist Aug 2005 - Aug 2006, Tsinghua University Institute for Theoretical Computer Science Visiting Scholar. Host: Andrew Yao Jun - Jul -
Visions in Theoretical Computer Science
VISIONS IN THEORETICAL COMPUTER SCIENCE A Report on the TCS Visioning Workshop 2020 The material is based upon work supported by the National Science Foundation under Grants No. 1136993 and No. 1734706. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. VISIONS IN THEORETICAL COMPUTER SCIENCE A REPORT ON THE TCS VISIONING WORKSHOP 2020 Edited by: Shuchi Chawla (University of Wisconsin-Madison) Jelani Nelson (University of California, Berkeley) Chris Umans (California Institute of Technology) David Woodruff (Carnegie Mellon University) 3 VISIONS IN THEORETICAL COMPUTER SCIENCE Table of Contents Foreword ................................................................................................................................................................ 5 How this Document came About ................................................................................................................................... 6 Models of Computation .........................................................................................................................................7 Computational Complexity ................................................................................................................................................7 Sublinear Algorithms ........................................................................................................................................................ -
Caltech/Mit Voting Technology Project
CALTECH/MIT VOTING TECHNOLOGY PROJECT A multi-disciplinary, collaborative project of the California Institute of Technology – Pasadena, California 91125 and the Massachusetts Institute of Technology – Cambridge, Massachusetts 02139 ADVANCES IN CRYPTOGRAPHIC VOTING SYSTEMS BEN ADIDA MIT Key words: voting systems, cryptographic, election administration, secret- ballot elections VTP WORKING PAPER #51 September 2006 Advances in Cryptographic Voting Systems by Ben Adida Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY August 2006 c Massachusetts Institute of Technology 2006. All rights reserved. Author . .................................................................. Department of Electrical Engineering and Computer Science August 31st, 2006 Certified by . ............................................................. Ronald L. Rivest Viterbi Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . ............................................................ Arthur C. Smith Chairman, Department Committee on Graduate Students 2 Advances in Cryptographic Voting Systems by Ben Adida Submitted to the Department of Electrical Engineering and Computer Science on August 31st, 2006, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract Democracy depends on the proper administration of popular -
Random Projection in the Brain and Computation with Assemblies Of
1 Random Projection in the Brain and Computation 2 with Assemblies of Neurons 3 Christos H. Papadimitriou 4 Columbia University, USA 5 [email protected] 6 Santosh S. Vempala 7 Georgia Tech, USA 8 [email protected] 9 Abstract 10 It has been recently shown via simulations [7] that random projection followed by a cap operation 11 (setting to one the k largest elements of a vector and everything else to zero), a map believed 12 to be an important part of the insect olfactory system, has strong locality sensitivity properties. 13 We calculate the asymptotic law whereby the overlap in the input vectors is conserved, verify- 14 ing mathematically this empirical finding. We then focus on the far more complex homologous 15 operation in the mammalian brain, the creation through successive projections and caps of an 16 assembly (roughly, a set of excitatory neurons representing a memory or concept) in the presence 17 of recurrent synapses and plasticity. After providing a careful definition of assemblies, we prove 18 that the operation of assembly projection converges with high probability, over the randomness 19 of synaptic connectivity, even if plasticity is relatively small (previous proofs relied on high plas- 20 ticity). We also show that assembly projection has itself some locality preservation properties. 21 Finally, we propose a large repertoire of assembly operations, including associate, merge, recip- 22 rocal project, and append, each of them both biologically plausible and consistent with what we 23 know from experiments, and show that this computational system is capable of simulating, again 24 with high probability, arbitrary computation in a quite natural way. -
Fair and Diverse Data Representation in Machine Learning
FAIR AND DIVERSE DATA REPRESENTATION IN MACHINE LEARNING A Dissertation Presented to The Academic Faculty By Uthaipon Tantipongpipat In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Algorithms, Combinatorics, and Optimization Georgia Institute of Technology August 2020 Copyright © Uthaipon Tantipongpipat 2020 FAIR AND DIVERSE DATA REPRESENTATION IN MACHINE LEARNING Approved by: Dr. Mohit Singh, Advisor Dr. Sebastian Pokutta School of Industrial and Systems Institute of Mathematics Engineering Technical University of Berlin Georgia Institute of Technology Dr. Santosh Vempala Dr. Rachel Cummings School of Computer Science School of Industrial and Systems Georgia Institute of Technology Engineering Georgia Institute of Technology Date Approved: May 8, 2020 Dr. Aleksandar Nikolov Department of Computer Science University of Toronto ACKNOWLEDGEMENTS This thesis would not be complete without many great collaborators and much support I re- ceived. My advisor, Mohit Singh, is extremely supportive for me to explore my own research interests, while always happy to discuss even a smallest technical detail when I need help. His ad- vising aim has always been for my best – for my career growth as a researcher and for a fulfilling life as a person. He is unquestionably an integral part of my success today. I want to thank Rachel Cummings, Aleksandar (Sasho) Nikolov, Sebastian Pokutta, and San- tosh Vempala for many research discussions we had together and their willingness to serve as my dissertation committees amidst COVID-19 disruption. I want to thank my other coauthors as part of my academic path: Digvijay Boob, Sara Krehbiel, Kevin Lai, Vivek Madan, Jamie Morgen- stern, Samira Samadi, Amaresh (Ankit) Siva, Chris Waites, and Weijun Xie. -
Conference Program
Last update: June 11 STOC 2004 Schedule Chicago Hilton and Towers Hotel 720 South Michigan Avenue, Chicago Saturday, June 12, 2004 5:00 - 9:00 PM Registration desk open 7:00 - 9:00 PM Welcome Reception. Boulevard Room and Foyer, 2nd floor Sunday, June 13, 2004 8:00 AM - 5:00 PM Registration desk open 8:00 - 8:35 AM Continental breakfast. Continental Ballroom Lobby, 1st floor 8:35 - 10:10 AM Session 1A. Continental Ballroom A-B. Chair: Oded Regev 8:35 - 8:55 AM Robust PCPs of Proximity, Shorter PCPs and Applications to Coding Eli Ben-Sasson (Radcliffe/Harvard), Oded Goldreich (Weizmann) Prahladh Harsha (MIT), Madhu Sudan (MIT), Salil Vadhan (Harvard) 9:00 - 9:20 AM A New PCP Outer Verifier with Applications to Homogeneous Linear Equations and Max-Bisection Jonas Holmerin (KTH, Stockholm), Subhash Khot (Georgia Tech) 9:25 - 9:45 AM Asymmetric k-Center is log∗ n - Hard to Approximate Julia Chuzhoy (Technion), Sudipto Guha (UPenn), Eran Halperin (Berkeley), Sanjeev Khanna (UPenn), Guy Kortsarz (Rutgers), Joseph (Seffi) Naor (Technion) 9:50 - 10:10 AM New Hardness Results for Congestion Minimization and Machine Scheduling Julia Chuzhoy (Technion), Joseph (Seffi) Naor (Technion) 8:35 - 10:10 AM Session 1B. Continental Ballroom C. Chair: Sandy Irani 8:35 - 8:55 AM On the Performance of Greedy Algorithms in Packet Buffering Susanne Albers (Freiburg), Markus Schmidt (Freiburg) 9:00 - 9:20 AM Adaptive Routing with End-to-End Feedback: Distributed Learning and Geometric Approaches Baruch Awerbuch (Johns Hopkins), Robert D. Kleinberg (MIT) 9:25 - 9:45 AM Know Thy Neighbor’s Neighbor: The Power of Lookahead in Randomized P2P Net- works Gurmeet Singh Manku (Stanford), Moni Naor (Weizmann), Udi Wieder (Weizmann) 9:50 - 10:10 AM The Zero-One Principle for Switching Networks Yossi Azar (Tel Aviv), Yossi Richter (Tel Aviv) 10:10 - 10:35 AM Coffee break. -
Manuel Blum's CV
Manuel Blum [email protected] MANUEL BLUM Bruce Nelson Professor of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Telephone: Office: (412) 268-3742, Fax: (412) 268-5576 Home: (412) 687-8730, Mobile: (412) 596-4063 Email: [email protected] Personal Born: 26 April1938 in Caracas, Venezuela. Citizenship: Venezuela and USA. Naturalized Citizen of USA, January 2000. Wife: Lenore Blum, Distinguished Career Professor of Computer Science, CMU Son: Avrim Blum, Professor of Computer Science, CMU. Interests: Computational Complexity; Automata Theory; Algorithms; Inductive Inference: Cryptography; Program Result-Checking; Human Interactive Proofs. Employment History Research Assistant and Research Associate for Dr. Warren S. McCulloch, Research Laboratory of Electronics, MIT, 1960-1965. Assistant Professor, Department of Mathematics, MIT, 1966-68. Visiting Assistant Professor, Associate Professor, Professor, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 1968-2001. Associate Chair for Computer Science, U.C. Berkeley, 1977-1980. Arthur J. Chick Professor of Computer Science, U.C. Berkeley, 1995-2001. Group in Logic and Methodology of Science, U.C. Berkeley, 1974-present. Visiting Professor of Computer Science. City University of Hong Kong, 1997-1999. Bruce Nelson Professor of Computer Science, Carnegie Mellon University, 2001-present. Education B.S., Electrical Engineering, MIT, 1959; M.S., Electrical Engineering, MIT. 1961. Ph.D., Mathematics, MIT, 1964, Professor Marvin Minsky, supervisor. Manuel Blum [email protected] Honors MIT Class VIB (Honor Sequence in EE), 1958-61 Sloan Foundation Fellowship, 1972-73. U.C. Berkeley Distinguished Teaching Award, 1977. Fellow of the IEEE "for fundamental contributions to the abstract theory of computational complexity,” 1982. -
Foundations of Data Science∗
Foundations of Data Science∗ Avrim Blum, John Hopcroft and Ravindran Kannan Thursday 9th June, 2016 ∗Copyright 2015. All rights reserved 1 Contents 1 Introduction 8 2 High-Dimensional Space 11 2.1 Introduction . 11 2.2 The Law of Large Numbers . 11 2.3 The Geometry of High Dimensions . 14 2.4 Properties of the Unit Ball . 15 2.4.1 Volume of the Unit Ball . 15 2.4.2 Most of the Volume is Near the Equator . 17 2.5 Generating Points Uniformly at Random from a Ball . 20 2.6 Gaussians in High Dimension . 21 2.7 Random Projection and Johnson-Lindenstrauss Lemma . 23 2.8 Separating Gaussians . 25 2.9 Fitting a Single Spherical Gaussian to Data . 27 2.10 Bibliographic Notes . 29 2.11 Exercises . 30 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) 38 3.1 Introduction and Overview . 38 3.2 Preliminaries . 39 3.3 Singular Vectors . 41 3.4 Singular Value Decomposition (SVD) . 44 3.5 Best Rank-k Approximations . 45 3.6 Left Singular Vectors . 47 3.7 Power Method for Computing the Singular Value Decomposition . 49 3.7.1 A Faster Method . 50 3.8 Singular Vectors and Eigenvectors . 52 3.9 Applications of Singular Value Decomposition . 52 3.9.1 Centering Data . 52 3.9.2 Principal Component Analysis . 53 3.9.3 Clustering a Mixture of Spherical Gaussians . 54 3.9.4 Ranking Documents and Web Pages . 59 3.9.5 An Application of SVD to a Discrete Optimization Problem . 60 3.10 Bibliographic Notes . 63 3.11 Exercises . -
Lipics-ITCS-2018-0.Pdf (0.4
9th Innovations in Theoretical Computer Science ITCS 2018, January 11–14, 2018, Cambridge, MA, USA Edited by Anna R. Karlin LIPIcs – Vol. 94 – ITCS2018 www.dagstuhl.de/lipics Editor Anna R. Karlin Allen School of Computer Science and Engineering University of Washington [email protected] ACM Classification 1998 F. Theory of Computation, G. Mathematics of Computing ISBN 978-3-95977-060-6 Published online and open access by Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing, Saarbrücken/Wadern, Germany. Online available at http://www.dagstuhl.de/dagpub/978-3-95977-060-6. Publication date January, 2018 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. License This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC-BY 3.0): http://creativecommons.org/licenses/by/3.0/legalcode. In brief, this license authorizes each and everybody to share (to copy, distribute and transmit) the work under the following conditions, without impairing or restricting the authors’ moral rights: Attribution: The work must be attributed to its authors. The copyright is retained by the corresponding authors. Digital Object Identifier: 10.4230/LIPIcs.ITCS.2018.0 ISBN 978-3-95977-060-6 ISSN 1868-8969 http://www.dagstuhl.de/lipics 0:iii LIPIcs – Leibniz International Proceedings in Informatics LIPIcs is a series of high-quality conference proceedings across all fields in informatics. LIPIcs volumes are published according to the principle of Open Access, i.e., they are available online and free of charge.