On the Sample Complexity of Reinforcement Learning with a Generative Model

Total Page:16

File Type:pdf, Size:1020Kb

On the Sample Complexity of Reinforcement Learning with a Generative Model On the Sample Complexity of Reinforcement Learning with a Generative Model Mohammad Gheshlaghi Azar [email protected] Department of Biophysics, Radboud University Nijmegen, 6525 EZ Nijmegen, The Netherlands R´emi Munos [email protected] INRIA Lille, SequeL Project, 40 avenue, Halley 59650, Villeneuve dAscq, France Hilbert J. Kappen [email protected] Department of Biophysics, Radboud University Nijmegen, 6525 EZ Nijmegen, The Netherlands Abstract tributions to estimate the optimal (action-)value function through the Bellman recursion. In the finite We consider the problem of learning the opti- state-action problems, it has been shown that an mal action-value function in the discounted- action-value based variant of VI, model-based Q-value reward Markov decision processes (MDPs). iteration (QVI), finds an ε-optimal estimate of the We prove a new PAC bound on the sample- action-value function with high probability using only complexity of model-based value iteration al- T = O˜(N/ (1 γ)4ε2 ) samples (Kearns & Singh, gorithm in the presence of the generative 1999; Kakade, 2004− , chap. 9.1), where N and γ denote model, which indicates that for an MDP with the size of state-action space and the discount factor, N state-action pairs and the discount factor respectively.1 Although this bound matches the best γ [0, 1) only O N log(N/δ)/ (1 γ)3ε2 existing upper bound on the sample complexity of samples∈ are required to find an ε-optimal− es- estimating the action-value function (Azar et al., timation of the action-value function with the 2011a), it has not been clear, so far, whether this probability 1 δ. We also prove a matching bound is a tight bound on the performance of QVI or lower bound of− Θ N log(N/δ)/ (1 γ)3ε2 it can be improved by a more careful analysis of QVI on the sample complexity of estimating− the algorithm. This is mainly due to the fact that there is optimal action-value function by every RL a gap of order 1/(1 γ)2 between the upper bound of algorithm. To the best of our knowledge, QVI and the state-of-the-art− result for lower bound, this is the first matching result on the sample which is of Ω N/ (1 γ)2ε2 (Azar et al., 2011b). complexity of estimating the optimal (action- − ) value function in which the upper bound In this paper, we focus on the problems which are matches the lower bound of RL in terms of N, formulated as finite state-action discounted infinite- ε, δ and 1/(1 γ). Also, both our lower bound horizon Markov decision processes (MDPs), and prove − and our upper bound significantly improve on a new tight bound of O N log(N/δ)/ (1 γ)3ε2 on the state-of-the-art in terms of 1/(1 γ). the sample complexity of the QVI algorithm.− The new − upper bound improves on the existing bound of QVI by an order of 1/(1 γ).2 We also present a new match- 1. Introduction ing lower bound of− Θ N log(N/δ)/ (1 γ)3ε2 , which also improves on the best existing lower− bound of RL Model-based value iteration (VI) (Kearns & Singh, by an order of 1/(1 γ). The new results, which close 1999; Bu¸s,oniu et al., 2010) is a well-known re- − inforcement learning (RL) (Szepesv´ari, 2010; 1The notation g = O˜(f) implies that there are constants c2 Sutton & Barto, 1998) algorithm which relies on c1 and c2 such that g c1f log (f). 2 ≤ an empirical estimate of the state-transition dis- In this paper, to keep the presentation succinct, we only consider the value iteration algorithm. However, one th Appearing in Proceedings of the 29 International Confer- can prove upper-bounds of a same order for other model- ence on Machine Learning, Edinburgh, Scotland, UK, 2012. based methods such as policy iteration and linear program- Copyright 2012 by the author(s)/owner(s). ming using the results of this paper. On the Sample Complexity of Reinforcement Learning with a Generative Model the above-mentioned gap between the lower bound and counted MDP is a quintuple (X, A,P, R,γ), where X the upper bound of RL, guarantee that no learning and A are the set of states and actions, P is the state method, given a generative model of the MDP, can transition distribution, R is the reward function, and be significantly more efficient than QVI in terms of γ (0, 1) is a discount factor. We denote by P ( x,a) the sample complexity of estimating the action-value and∈ r(x,a) the probability distribution over the·| next function. state and the immediate reward of taking action a at state x, respectively. The main idea to improve the upper bound of QVI is to express the performance loss of QVI in terms of the Remark 1. To keep the representation succinct, in variance of the sum of discounted rewards as opposed the sequel, we use the notation Z for the joint state- to the maximum V = R /(1 γ) in the previous action space X A. We also make use of the shorthand max max × results. For this we make use of− Bernstein’s concen- notations z and β for the state-action pair (x,a) and tration inequality (Cesa-Bianchi & Lugosi, 2006, ap- 1/(1 γ), respectively. − pendix, pg. 361), which bounds the estimation error Assumption 1 (MDP Regularity). We assume Z in terms of the variance of the value function. We also and, subsequently, X and A are finite sets with car- rely on the fact that the variance of the sum of dis- dinalities N, X and A , respectively. We also as- counted rewards, like the expected value of the sum sume that the| immediate| | | reward r(x,a) is taken from (value function), satisfies a Bellman-like equation, in the interval [0, 1]. which the variance of the value function plays the role of the instant reward (Munos & Moore, 1999; Sobel, A mapping π : X A is called a stationary and 1982), to derive a sharp bound on the variance of the deterministic Markovian→ policy, or just a policy in value function. In the case of lower bound, we im- short. Following a policy π in an MDP means that prove on the result of Azar et al. (2011b) by adding at each time step t the control action A A is given t ∈ some structure to the class of MDPs for which we by At = π(Xt), where Xt X. The value and the prove the lower bound: In the new model, there is a action-value functions of a policy∈ π, denoted respec- high probability for transition from every intermediate tively by V π : X R and Qπ : Z R, are defined state to itself. This adds to the difficulty of estimating as the expected sum→ of discounted→ rewards that are the value function, since even a small estimation error encountered when the policy π is executed. Given may propagate throughout the recursive structure of an MDP, the goal is to find a policy that attains ∗ , π the MDP and inflict a big performance loss especially the best possible values, V (x) supπ V (x), x for γ’s close to 1. X. Function V ∗ is called the optimal value func-∀ ∈ tion. Similarly the optimal action-value function is The rest of the paper is organized as follows. After in- defined as Q∗(x,a) = sup Qπ(x,a). We say that a troducing the notation used in the paper in Section 2, π policy π∗ is optimal if it attains the optimal V ∗(x) we describe the model-based Q-value iteration (QVI) for all x X. The policy π defines the state tran- algorithm in Subsection 2.1. We then state our main ∈ sition kernel Pπ as: Pπ(y x) , P (y x, π(x)) for all theoretical results, which are in the form of PAC sam- | π | x X. The right-linear operators P , P and Pπ are ple complexity bounds in Section 3. Section 4 con- ∈ π · · · then defined as (P Q)(z) , XP (y z)Q(y, π(y)), tains the detailed proofs of the results of Sections 3, y∈ | (PV )(z) , XP (y z)V (y) for all z Z and i.e., sample complexity bound of QVI and a general y∈ | P ∈ (P V )(x) , P (y x)V (y) for all x X, respec- new lower bound for RL. Finally, we conclude the pa- π Py∈X π | ∈ per and propose some directions for the future work in tively. Finally, shall denote the supremum (ℓ∞) P · , Section 5. norm which is defined as g maxy∈Y g(y) , where Y is a finite set and g : Y Ris a real-valued| function.| 3 → 2. Background 2.1. Model-based Q-value Iteration (QVI) In this section, we review some standard concepts The algorithm makes n transition samples from each and definitions from the theory of Markov decision state-action pair z Z for which it makes n calls to the processes (MDPs). We then present the model- generative model.4∈It then builds an empirical model based Q-value iteration algorithm of Kearns & Singh of the transition probabilities as: P (y z) , m(y,z)/n, (1999). We consider the standard reinforcement learn- | ing (RL) framework (Bertsekas & Tsitsiklis, 1996; 3For ease of exposition, in the sequel, we remove the b Sutton & Barto, 1998) in which a learning agent inter- dependence on z and x , e.g., writing Q for Q(z) and V for acts with a stochastic environment and this interaction V (x), when there is no possible confusion.
Recommended publications
  • Valuative Characterization of Central Extensions of Algebraic Tori on Krull
    Valuative characterization of central extensions of algebraic tori on Krull domains ∗ Haruhisa Nakajima † Department of Mathematics, J. F. Oberlin University Tokiwa-machi, Machida, Tokyo 194-0294, JAPAN Abstract Let G be an affine algebraic group with an algebraic torus G0 over an alge- braically closed field K of an arbitrary characteristic p. We show a criterion for G to be a finite central extension of G0 in terms of invariant theory of all regular 0 actions of any closed subgroup H containing ZG(G ) on affine Krull K-schemes such that invariant rational functions are locally fractions of invariant regular func- tions. Consider an affine Krull H-scheme X = Spec(R) and a prime ideal P of R with ht(P) = ht(P ∩ RH ) = 1. Let I(P) denote the inertia group of P un- der the action of H. The group G is central over G0 if and only if the fraction e(P, P ∩ RH )/e(P, P ∩ RI(P)) of ramification indices is equal to 1 (p = 0) or to the p-part of the order of the group of weights of G0 on RI(P)) vanishing on RI(P))/P ∩ RI(P) (p> 0) for an arbitrary X and P. MSC: primary 13A50, 14R20, 20G05; secondary 14L30, 14M25 Keywords: Krull domain; ramification index; algebraic group; algebraic torus; char- acter group; invariant theory 1 Introduction 1.A. We consider affine algebraic groups and affine schemes over a fixed algebraically closed field K of an arbitrary characteristic p. For an affine group G, denote by G0 its identity component.
    [Show full text]
  • Nitin Saurabh the Institute of Mathematical Sciences, Chennai
    ALGEBRAIC MODELS OF COMPUTATION By Nitin Saurabh The Institute of Mathematical Sciences, Chennai. A thesis submitted to the Board of Studies in Mathematical Sciences In partial fulllment of the requirements For the Degree of Master of Science of HOMI BHABHA NATIONAL INSTITUTE April 2012 CERTIFICATE Certied that the work contained in the thesis entitled Algebraic models of Computation, by Nitin Saurabh, has been carried out under my supervision and that this work has not been submitted elsewhere for a degree. Meena Mahajan Theoretical Computer Science Group The Institute of Mathematical Sciences, Chennai ACKNOWLEDGEMENTS I would like to thank my advisor Prof. Meena Mahajan for her invaluable guidance and continuous support since my undergraduate days. Her expertise and ideas helped me comprehend new techniques. Her guidance during the preparation of this thesis has been invaluable. I also thank her for always being there to discuss and clarify any matter. I am extremely grateful to all the faculty members of theory group at IMSc and CMI for their continuous encouragement and giving me an opportunity to learn from them. I would like to thank all my friends, at IMSc and CMI, for making my stay in Chennai a memorable one. Most of all, I take this opportunity to thank my parents, my uncle and my brother. Abstract Valiant [Val79, Val82] had proposed an analogue of the theory of NP-completeness in an entirely algebraic framework to study the complexity of polynomial families. Artihmetic circuits form the most standard model for studying the complexity of polynomial computations. In a note [Val92], Valiant argued that in order to prove lower bounds for boolean circuits, obtaining lower bounds for arithmetic circuits should be a rst step.
    [Show full text]
  • Arxiv:1811.04918V6 [Cs.LG] 1 Jun 2020
    Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers Zeyuan Allen-Zhu Yuanzhi Li Yingyu Liang [email protected] [email protected] [email protected] Microsoft Research AI Stanford University University of Wisconsin-Madison November 12, 2018 (version 6)∗ Abstract The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable con- cept classes, including two and three-layer networks with fewer parameters and smooth activa- tions. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic ap- proximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points. arXiv:1811.04918v6 [cs.LG] 1 Jun 2020 ∗V1 appears on this date, V2/V3/V4 polish writing and parameters, V5 adds experiments, and V6 reflects our conference camera ready version. Authors sorted in alphabetical order. We would like to thank Greg Yang and Sebastien Bubeck for many enlightening conversations. Y. Liang was supported in part by FA9550-18-1-0166, and would also like to acknowledge that support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation.
    [Show full text]
  • Analysis of Perceptron-Based Active Learning
    JournalofMachineLearningResearch10(2009)281-299 Submitted 12/07; 12/08; Published 2/09 Analysis of Perceptron-Based Active Learning Sanjoy Dasgupta [email protected] Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0404, USA Adam Tauman Kalai [email protected] Microsoft Research Office 14063 One Memorial Drive Cambridge, MA 02142, USA Claire Monteleoni [email protected] Center for Computational Learning Systems Columbia University Suite 850, 475 Riverside Drive, MC 7717 New York, NY 10115, USA Editor: Manfred Warmuth Abstract Ω 1 We start by showing that in an active learning setting, the Perceptron algorithm needs ( ε2 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adap- tive filtering rule for deciding which points to query. For data distributed uniformly over the unit 1 sphere, we show that our algorithm reaches generalization error ε after asking for just O˜(d log ε ) labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee al- gorithm. Keywords: active learning, perceptron, label complexity bounds, online learning 1. Introduction In many machine learning applications, unlabeled data is abundant but labeling is expensive. This distinction is not captured in standard models of supervised learning, and has motivated the field of active learning, in which the labels of data points are initially hidden, and the learner must pay for each label it wishes revealed.
    [Show full text]
  • The Complexity Zoo
    The Complexity Zoo Scott Aaronson www.ScottAaronson.com LATEX Translation by Chris Bourke [email protected] 417 classes and counting 1 Contents 1 About This Document 3 2 Introductory Essay 4 2.1 Recommended Further Reading ......................... 4 2.2 Other Theory Compendia ............................ 5 2.3 Errors? ....................................... 5 3 Pronunciation Guide 6 4 Complexity Classes 10 5 Special Zoo Exhibit: Classes of Quantum States and Probability Distribu- tions 110 6 Acknowledgements 116 7 Bibliography 117 2 1 About This Document What is this? Well its a PDF version of the website www.ComplexityZoo.com typeset in LATEX using the complexity package. Well, what’s that? The original Complexity Zoo is a website created by Scott Aaronson which contains a (more or less) comprehensive list of Complexity Classes studied in the area of theoretical computer science known as Computa- tional Complexity. I took on the (mostly painless, thank god for regular expressions) task of translating the Zoo’s HTML code to LATEX for two reasons. First, as a regular Zoo patron, I thought, “what better way to honor such an endeavor than to spruce up the cages a bit and typeset them all in beautiful LATEX.” Second, I thought it would be a perfect project to develop complexity, a LATEX pack- age I’ve created that defines commands to typeset (almost) all of the complexity classes you’ll find here (along with some handy options that allow you to conveniently change the fonts with a single option parameters). To get the package, visit my own home page at http://www.cse.unl.edu/~cbourke/.
    [Show full text]
  • UNIONS of LINES in Fn 1. Introduction the Main Problem We
    UNIONS OF LINES IN F n RICHARD OBERLIN Abstract. We show that if a collection of lines in a vector space over a finite field has \dimension" at least 2(d−1)+β; then its union has \dimension" at least d + β: This is the sharp estimate of its type when no structural assumptions are placed on the collection of lines. We also consider some refinements and extensions of the main result, including estimates for unions of k-planes. 1. Introduction The main problem we will consider here is to give a lower bound for the dimension of the union of a collection of lines in terms of the dimension of the collection of lines, without imposing a structural hy- pothesis on the collection (in contrast to the Kakeya problem where one assumes that the lines are direction-separated, or perhaps satisfy the weaker \Wolff axiom"). Specifically, we are motivated by the following conjecture of D. Ober- lin (hdim denotes Hausdorff dimension). Conjecture 1.1. Suppose d ≥ 1 is an integer, that 0 ≤ β ≤ 1; and that L is a collection of lines in Rn with hdim(L) ≥ 2(d − 1) + β: Then [ (1) hdim( L) ≥ d + β: L2L The bound (1), if true, would be sharp, as one can see by taking L to be the set of lines contained in the d-planes belonging to a β- dimensional family of d-planes. (Furthermore, there is nothing to be gained by taking 1 < β ≤ 2 since the dimension of the set of lines contained in a d + 1-plane is 2(d − 1) + 2.) Standard Fourier-analytic methods show that (1) holds for d = 1, but the conjecture is open for d > 1: As a model problem, one may consider an analogous question where Rn is replaced by a vector space over a finite-field.
    [Show full text]
  • The Sample Complexity of Agnostic Learning Under Deterministic Labels
    JMLR: Workshop and Conference Proceedings vol 35:1–16, 2014 The sample complexity of agnostic learning under deterministic labels Shai Ben-David [email protected] Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, CANADA Ruth Urner [email protected] School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332, USA Abstract With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from O~(d/)- many samples and classes that require samples of size Ω(d/2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity O~(d/), any proper (and therefore, any ERM) algorithm requires Ω(d/2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting.
    [Show full text]
  • Lecture 11 — October 16, 2012 1 Overview 2
    6.841: Advanced Complexity Theory Fall 2012 Lecture 11 | October 16, 2012 Prof. Dana Moshkovitz Scribe: Hyuk Jun Kweon 1 Overview In the previous lecture, there was a question about whether or not true randomness exists in the universe. This question is still seriously debated by many physicists and philosophers. However, even if true randomness does not exist, it is possible to use pseudorandomness for practical purposes. In any case, randomization is a powerful tool in algorithms design, sometimes leading to more elegant or faster ways of solving problems than known deterministic methods. This naturally leads one to wonder whether randomization can solve problems outside of P, de- terministic polynomial time. One way of viewing randomized algorithms is that for each possible setting of the random bits used, a different strategy is pursued by the algorithm. A property of randomized algorithms that succeed with bounded error is that the vast majority of strategies suc- ceed. With error reduction, all but exponentially many strategies will succeed. A major theme of theoretical computer science over the last 20 years has been to turn algorithms with this property into algorithms that deterministically find a winning strategy. From the outset, this seems like a daunting task { without randomness, how can one efficiently search for such strategies? As we'll see in later lectures, this is indeed possible, modulo some hardness assumptions. There are two main objectives in this lecture. The first objective is to introduce some probabilis- tic complexity classes, by bounding time or space. These classes are probabilistic analogues of deterministic complexity classes P and L.
    [Show full text]
  • R Z- /Li ,/Zo Rl Date Copyright 2012
    Hearing Moral Reform: Abolitionists and Slave Music, 1838-1870 by Karl Byrn A thesis submitted to Sonoma State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS III History Dr. Steve Estes r Z- /Li ,/zO rL Date Copyright 2012 By Karl Bym ii AUTHORIZATION FOR REPRODUCTION OF MASTER'S THESIS/PROJECT I grant permission for the reproduction ofparts ofthis thesis without further authorization from me, on the condition that the person or agency requesting reproduction absorb the cost and provide proper acknowledgement of authorship. Permission to reproduce this thesis in its entirety must be obtained from me. iii Hearing Moral Reform: Abolitionists and Slave Music, 1838-1870 Thesis by Karl Byrn ABSTRACT Purpose ofthe Study: The purpose of this study is to analyze commentary on slave music written by certain abolitionists during the American antebellum and Civil War periods. The analysis is not intended to follow existing scholarship regarding the origins and meanings of African-American music, but rather to treat this music commentary as a new source for understanding the abolitionist movement. By treating abolitionists' reports of slave music as records of their own culture, this study will identify larger intellectual and cultural currents that determined their listening conditions and reactions. Procedure: The primary sources involved in this study included journals, diaries, letters, articles, and books written before, during, and after the American Civil War. I found, in these sources, that the responses by abolitionists to the music of slaves and freedmen revealed complex ideologies and motivations driven by religious and reform traditions, antebellum musical cultures, and contemporary racial and social thinking.
    [Show full text]
  • Sinkhorn Autoencoders
    Sinkhorn AutoEncoders Giorgio Patrini⇤• Rianne van den Berg⇤ Patrick Forr´e Marcello Carioni UvA Bosch Delta Lab University of Amsterdam† University of Amsterdam KFU Graz Samarth Bhargav Max Welling Tim Genewein Frank Nielsen ‡ University of Amsterdam University of Amsterdam Bosch Center for Ecole´ Polytechnique CIFAR Artificial Intelligence Sony CSL Abstract 1INTRODUCTION Optimal transport o↵ers an alternative to Unsupervised learning aims at finding the underlying maximum likelihood for learning generative rules that govern a given data distribution PX . It can autoencoding models. We show that mini- be approached by learning to mimic the data genera- mizing the p-Wasserstein distance between tion process, or by finding an adequate representation the generator and the true data distribution of the data. Generative Adversarial Networks (GAN) is equivalent to the unconstrained min-min (Goodfellow et al., 2014) belong to the former class, optimization of the p-Wasserstein distance by learning to transform noise into an implicit dis- between the encoder aggregated posterior tribution that matches the given one. AutoEncoders and the prior in latent space, plus a recon- (AE) (Hinton and Salakhutdinov, 2006) are of the struction error. We also identify the role of latter type, by learning a representation that maxi- its trade-o↵hyperparameter as the capac- mizes the mutual information between the data and ity of the generator: its Lipschitz constant. its reconstruction, subject to an information bottle- Moreover, we prove that optimizing the en- neck. Variational AutoEncoders (VAE) (Kingma and coder over any class of universal approxima- Welling, 2013; Rezende et al., 2014), provide both a tors, such as deterministic neural networks, generative model — i.e.
    [Show full text]
  • Data-Dependent Sample Complexity of Deep Neural Networks Via Lipschitz Augmentation
    Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation Colin Wei Tengyu Ma Computer Science Department Computer Science Department Stanford University Stanford University [email protected] [email protected] Abstract Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network’s Jacobians during training and empirically demonstrate that this improves test performance. 1 Introduction Deep networks trained in
    [Show full text]
  • Collapsing Exact Arithmetic Hierarchies
    Electronic Colloquium on Computational Complexity, Report No. 131 (2013) Collapsing Exact Arithmetic Hierarchies Nikhil Balaji and Samir Datta Chennai Mathematical Institute fnikhil,[email protected] Abstract. We provide a uniform framework for proving the collapse of the hierarchy, NC1(C) for an exact arith- metic class C of polynomial degree. These hierarchies collapses all the way down to the third level of the AC0- 0 hierarchy, AC3(C). Our main collapsing exhibits are the classes 1 1 C 2 fC=NC ; C=L; C=SAC ; C=Pg: 1 1 NC (C=L) and NC (C=P) are already known to collapse [1,18,19]. We reiterate that our contribution is a framework that works for all these hierarchies. Our proof generalizes a proof 0 1 from [8] where it is used to prove the collapse of the AC (C=NC ) hierarchy. It is essentially based on a polynomial degree characterization of each of the base classes. 1 Introduction Collapsing hierarchies has been an important activity for structural complexity theorists through the years [12,21,14,23,18,17,4,11]. We provide a uniform framework for proving the collapse of the NC1 hierarchy over an exact arithmetic class. Using 0 our method, such a hierarchy collapses all the way down to the AC3 closure of the class. 1 1 1 Our main collapsing exhibits are the NC hierarchies over the classes C=NC , C=L, C=SAC , C=P. Two of these 1 1 hierarchies, viz. NC (C=L); NC (C=P), are already known to collapse ([1,19,18]) while a weaker collapse is known 0 1 for a third one viz.
    [Show full text]