Optimal Quantum Sample Complexity of Learning Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Simulating Quantum Field Theory with a Quantum Computer
Simulating quantum field theory with a quantum computer John Preskill Lattice 2018 28 July 2018 This talk has two parts (1) Near-term prospects for quantum computing. (2) Opportunities in quantum simulation of quantum field theory. Exascale digital computers will advance our knowledge of QCD, but some challenges will remain, especially concerning real-time evolution and properties of nuclear matter and quark-gluon plasma at nonzero temperature and chemical potential. Digital computers may never be able to address these (and other) problems; quantum computers will solve them eventually, though I’m not sure when. The physics payoff may still be far away, but today’s research can hasten the arrival of a new era in which quantum simulation fuels progress in fundamental physics. Frontiers of Physics short distance long distance complexity Higgs boson Large scale structure “More is different” Neutrino masses Cosmic microwave Many-body entanglement background Supersymmetry Phases of quantum Dark matter matter Quantum gravity Dark energy Quantum computing String theory Gravitational waves Quantum spacetime particle collision molecular chemistry entangled electrons A quantum computer can simulate efficiently any physical process that occurs in Nature. (Maybe. We don’t actually know for sure.) superconductor black hole early universe Two fundamental ideas (1) Quantum complexity Why we think quantum computing is powerful. (2) Quantum error correction Why we think quantum computing is scalable. A complete description of a typical quantum state of just 300 qubits requires more bits than the number of atoms in the visible universe. Why we think quantum computing is powerful We know examples of problems that can be solved efficiently by a quantum computer, where we believe the problems are hard for classical computers. -
Arxiv:1811.04918V6 [Cs.LG] 1 Jun 2020
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers Zeyuan Allen-Zhu Yuanzhi Li Yingyu Liang [email protected] [email protected] [email protected] Microsoft Research AI Stanford University University of Wisconsin-Madison November 12, 2018 (version 6)∗ Abstract The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable con- cept classes, including two and three-layer networks with fewer parameters and smooth activa- tions. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic ap- proximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points. arXiv:1811.04918v6 [cs.LG] 1 Jun 2020 ∗V1 appears on this date, V2/V3/V4 polish writing and parameters, V5 adds experiments, and V6 reflects our conference camera ready version. Authors sorted in alphabetical order. We would like to thank Greg Yang and Sebastien Bubeck for many enlightening conversations. Y. Liang was supported in part by FA9550-18-1-0166, and would also like to acknowledge that support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation. -
Quantum Machine Learning: Benefits and Practical Examples
Quantum Machine Learning: Benefits and Practical Examples Frank Phillipson1[0000-0003-4580-7521] 1 TNO, Anna van Buerenplein 1, 2595 DA Den Haag, The Netherlands [email protected] Abstract. A quantum computer that is useful in practice, is expected to be devel- oped in the next few years. An important application is expected to be machine learning, where benefits are expected on run time, capacity and learning effi- ciency. In this paper, these benefits are presented and for each benefit an example application is presented. A quantum hybrid Helmholtz machine use quantum sampling to improve run time, a quantum Hopfield neural network shows an im- proved capacity and a variational quantum circuit based neural network is ex- pected to deliver a higher learning efficiency. Keywords: Quantum Machine Learning, Quantum Computing, Near Future Quantum Applications. 1 Introduction Quantum computers make use of quantum-mechanical phenomena, such as superposi- tion and entanglement, to perform operations on data [1]. Where classical computers require the data to be encoded into binary digits (bits), each of which is always in one of two definite states (0 or 1), quantum computation uses quantum bits, which can be in superpositions of states. These computers would theoretically be able to solve certain problems much more quickly than any classical computer that use even the best cur- rently known algorithms. Examples are integer factorization using Shor's algorithm or the simulation of quantum many-body systems. This benefit is also called ‘quantum supremacy’ [2], which only recently has been claimed for the first time [3]. There are two different quantum computing paradigms. -
Analysis of Perceptron-Based Active Learning
JournalofMachineLearningResearch10(2009)281-299 Submitted 12/07; 12/08; Published 2/09 Analysis of Perceptron-Based Active Learning Sanjoy Dasgupta [email protected] Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0404, USA Adam Tauman Kalai [email protected] Microsoft Research Office 14063 One Memorial Drive Cambridge, MA 02142, USA Claire Monteleoni [email protected] Center for Computational Learning Systems Columbia University Suite 850, 475 Riverside Drive, MC 7717 New York, NY 10115, USA Editor: Manfred Warmuth Abstract Ω 1 We start by showing that in an active learning setting, the Perceptron algorithm needs ( ε2 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adap- tive filtering rule for deciding which points to query. For data distributed uniformly over the unit 1 sphere, we show that our algorithm reaches generalization error ε after asking for just O˜(d log ε ) labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee al- gorithm. Keywords: active learning, perceptron, label complexity bounds, online learning 1. Introduction In many machine learning applications, unlabeled data is abundant but labeling is expensive. This distinction is not captured in standard models of supervised learning, and has motivated the field of active learning, in which the labels of data points are initially hidden, and the learner must pay for each label it wishes revealed. -
COVID-19 Detection on IBM Quantum Computer with Classical-Quantum Transfer Learning
medRxiv preprint doi: https://doi.org/10.1101/2020.11.07.20227306; this version posted November 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . Turk J Elec Eng & Comp Sci () : { © TUB¨ ITAK_ doi:10.3906/elk- COVID-19 detection on IBM quantum computer with classical-quantum transfer learning Erdi ACAR1*, Ihsan_ YILMAZ2 1Department of Computer Engineering, Institute of Science, C¸anakkale Onsekiz Mart University, C¸anakkale, Turkey 2Department of Computer Engineering, Faculty of Engineering, C¸anakkale Onsekiz Mart University, C¸anakkale, Turkey Received: .201 Accepted/Published Online: .201 Final Version: ..201 Abstract: Diagnose the infected patient as soon as possible in the coronavirus 2019 (COVID-19) outbreak which is declared as a pandemic by the world health organization (WHO) is extremely important. Experts recommend CT imaging as a diagnostic tool because of the weak points of the nucleic acid amplification test (NAAT). In this study, the detection of COVID-19 from CT images, which give the most accurate response in a short time, was investigated in the classical computer and firstly in quantum computers. Using the quantum transfer learning method, we experimentally perform COVID-19 detection in different quantum real processors (IBMQx2, IBMQ-London and IBMQ-Rome) of IBM, as well as in different simulators (Pennylane, Qiskit-Aer and Cirq). By using a small number of data sets such as 126 COVID-19 and 100 Normal CT images, we obtained a positive or negative classification of COVID-19 with 90% success in classical computers, while we achieved a high success rate of 94-100% in quantum computers. -
The Sample Complexity of Agnostic Learning Under Deterministic Labels
JMLR: Workshop and Conference Proceedings vol 35:1–16, 2014 The sample complexity of agnostic learning under deterministic labels Shai Ben-David [email protected] Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, CANADA Ruth Urner [email protected] School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332, USA Abstract With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from O~(d/)- many samples and classes that require samples of size Ω(d/2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity O~(d/), any proper (and therefore, any ERM) algorithm requires Ω(d/2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. -
Quantum Inductive Learning and Quantum Logic Synthesis
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 2009 Quantum Inductive Learning and Quantum Logic Synthesis Martin Lukac Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Electrical and Computer Engineering Commons Let us know how access to this document benefits ou.y Recommended Citation Lukac, Martin, "Quantum Inductive Learning and Quantum Logic Synthesis" (2009). Dissertations and Theses. Paper 2319. https://doi.org/10.15760/etd.2316 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. For more information, please contact [email protected]. QUANTUM INDUCTIVE LEARNING AND QUANTUM LOGIC SYNTHESIS by MARTIN LUKAC A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in ELECTRICAL AND COMPUTER ENGINEERING. Portland State University 2009 DISSERTATION APPROVAL The abstract and dissertation of Martin Lukac for the Doctor of Philosophy in Electrical and Computer Engineering were presented January 9, 2009, and accepted by the dissertation committee and the doctoral program. COMMITTEE APPROVALS: Irek Perkowski, Chair GarrisoH-Xireenwood -George ^Lendaris 5artM ?teven Bleiler Representative of the Office of Graduate Studies DOCTORAL PROGRAM APPROVAL: Malgorza /ska-Jeske7~Director Electrical Computer Engineering Ph.D. Program ABSTRACT An abstract of the dissertation of Martin Lukac for the Doctor of Philosophy in Electrical and Computer Engineering presented January 9, 2009. Title: Quantum Inductive Learning and Quantum Logic Synhesis Since Quantum Computer is almost realizable on large scale and Quantum Technology is one of the main solutions to the Moore Limit, Quantum Logic Synthesis (QLS) has become a required theory and tool for designing Quantum Logic Circuits. -
Sinkhorn Autoencoders
Sinkhorn AutoEncoders Giorgio Patrini⇤• Rianne van den Berg⇤ Patrick Forr´e Marcello Carioni UvA Bosch Delta Lab University of Amsterdam† University of Amsterdam KFU Graz Samarth Bhargav Max Welling Tim Genewein Frank Nielsen ‡ University of Amsterdam University of Amsterdam Bosch Center for Ecole´ Polytechnique CIFAR Artificial Intelligence Sony CSL Abstract 1INTRODUCTION Optimal transport o↵ers an alternative to Unsupervised learning aims at finding the underlying maximum likelihood for learning generative rules that govern a given data distribution PX . It can autoencoding models. We show that mini- be approached by learning to mimic the data genera- mizing the p-Wasserstein distance between tion process, or by finding an adequate representation the generator and the true data distribution of the data. Generative Adversarial Networks (GAN) is equivalent to the unconstrained min-min (Goodfellow et al., 2014) belong to the former class, optimization of the p-Wasserstein distance by learning to transform noise into an implicit dis- between the encoder aggregated posterior tribution that matches the given one. AutoEncoders and the prior in latent space, plus a recon- (AE) (Hinton and Salakhutdinov, 2006) are of the struction error. We also identify the role of latter type, by learning a representation that maxi- its trade-o↵hyperparameter as the capac- mizes the mutual information between the data and ity of the generator: its Lipschitz constant. its reconstruction, subject to an information bottle- Moreover, we prove that optimizing the en- neck. Variational AutoEncoders (VAE) (Kingma and coder over any class of universal approxima- Welling, 2013; Rezende et al., 2014), provide both a tors, such as deterministic neural networks, generative model — i.e. -
Data-Dependent Sample Complexity of Deep Neural Networks Via Lipschitz Augmentation
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation Colin Wei Tengyu Ma Computer Science Department Computer Science Department Stanford University Stanford University [email protected] [email protected] Abstract Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network’s Jacobians during training and empirically demonstrate that this improves test performance. 1 Introduction Deep networks trained in -
Nearest Centroid Classification on a Trapped Ion Quantum Computer
www.nature.com/npjqi ARTICLE OPEN Nearest centroid classification on a trapped ion quantum computer ✉ Sonika Johri1 , Shantanu Debnath1, Avinash Mocherla2,3,4, Alexandros SINGK2,3,5, Anupam Prakash2,3, Jungsang Kim1 and Iordanis Kerenidis2,3,6 Quantum machine learning has seen considerable theoretical and practical developments in recent years and has become a promising area for finding real world applications of quantum computers. In pursuit of this goal, here we combine state-of-the-art algorithms and quantum hardware to provide an experimental demonstration of a quantum machine learning application with provable guarantees for its performance and efficiency. In particular, we design a quantum Nearest Centroid classifier, using techniques for efficiently loading classical data into quantum states and performing distance estimations, and experimentally demonstrate it on a 11-qubit trapped-ion quantum machine, matching the accuracy of classical nearest centroid classifiers for the MNIST handwritten digits dataset and achieving up to 100% accuracy for 8-dimensional synthetic data. npj Quantum Information (2021) 7:122 ; https://doi.org/10.1038/s41534-021-00456-5 INTRODUCTION Thus, one might hope that noisy quantum computers are 1234567890():,; Quantum technologies promise to revolutionize the future of inherently better suited for machine learning computations than information and communication, in the form of quantum for other types of problems that need precise computations like computing devices able to communicate and process massive factoring or search problems. amounts of data both efficiently and securely using quantum However, there are significant challenges to be overcome to resources. Tremendous progress is continuously being made both make QML practical. -
Nearly Tight Sample Complexity Bounds for Learning Mixtures of Gaussians Via Sample Compression Schemes∗
Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes∗ Hassan Ashtiani Shai Ben-David Department of Computing and Software School of Computer Science McMaster University, and University of Waterloo, Vector Institute, ON, Canada Waterloo, ON, Canada [email protected] [email protected] Nicholas J. A. Harvey Christopher Liaw Department of Computer Science Department of Computer Science University of British Columbia University of British Columbia Vancouver, BC, Canada Vancouver, BC, Canada [email protected] [email protected] Abbas Mehrabian Yaniv Plan School of Computer Science Department of Mathematics McGill University University of British Columbia Montréal, QC, Canada Vancouver, BC, Canada [email protected] [email protected] Abstract We prove that Θ(e kd2="2) samples are necessary and sufficient for learning a mixture of k Gaussians in Rd, up to error " in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Oe(kd="2) samples suffice, matching a known lower bound. The upper bound is based on a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a sample compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in Rd has a small-sized sample compression. 1 Introduction Estimating distributions from observed data is a fundamental task in statistics that has been studied for over a century. -
Sample Complexity Results
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Generalization Abilities: Sample Complexity Results. The ability to generalize beyond what we have seen in the training phase is the essence of machine learning, essentially what makes machine learning, machine learning. In these notes we describe some basic concepts and the classic formalization that allows us to talk about these important concepts in a precise way. Distributional Learning The basic idea of the distributional learning setting is to assume that examples are being provided from a fixed (but perhaps unknown) distribution over the instance space. The assumption of a fixed distribution gives us hope that what we learn based on some training data will carry over to new test data we haven't seen yet. A nice feature of this assumption is that it provides us a well-defined notion of the error of a hypothesis with respect to target concept. Specifically, in the distributional learning setting (captured by the PAC model of Valiant and Sta- tistical Learning Theory framework of Vapnik) we assume that the input to the learning algorithm is a set of labeled examples S :(x1; y1);:::; (xm; ym) where xi are drawn i.i.d. from some fixed but unknown distribution D over the the instance space ∗ ∗ X and that they are labeled by some target concept c . So yi = c (xi). Here the goal is to do optimization over the given sample S in order to find a hypothesis h : X ! f0; 1g, that has small error over whole distribution D. The true error of h with respect to a target concept c∗ and the underlying distribution D is defined as err(h) = Pr (h(x) 6= c∗(x)): x∼D (Prx∼D(A) means the probability of event A given that x is selected according to distribution D.) We denote by m ∗ 1 X ∗ errS(h) = Pr (h(x) 6= c (x)) = I[h(xi) 6= c (xi)] x∼S m i=1 the empirical error of h over the sample S (that is the fraction of examples in S misclassified by h).