Be a Smooth Manifold, and F : M → R a Function
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Corrective View of Neural Networks: Representation, Memorization and Learning
Proceedings of Machine Learning Research vol 125:1{54, 2020 33rd Annual Conference on Learning Theory A Corrective View of Neural Networks: Representation, Memorization and Learning Guy Bresler [email protected] Department of EECS, MIT Cambridge, MA, USA. Dheeraj Nagaraj [email protected] Department of EECS, MIT Cambridge, MA, USA. Editors: Jacob Abernethy and Shivani Agarwal Abstract We develop a corrective mechanism for neural network approximation: the total avail- able non-linear units are divided into multiple groups and the first group approx- imates the function under consideration, the second approximates the error in ap- proximation produced by the first group and corrects it, the third group approxi- mates the error produced by the first and second groups together and so on. This technique yields several new representation and learning results for neural networks: 1. Two-layer neural networks in the random features regime (RF) can memorize arbi- Rd ~ n trary labels for n arbitrary points in with O( θ4 ) ReLUs, where θ is the minimum distance between two different points. This bound can be shown to be optimal in n up to logarithmic factors. 2. Two-layer neural networks with ReLUs and smoothed ReLUs can represent functions with an error of at most with O(C(a; d)−1=(a+1)) units for a 2 N [ f0g when the function has Θ(ad) bounded derivatives. In certain cases d can be replaced with effective dimension q d. Our results indicate that neural networks with only a single nonlinear layer are surprisingly powerful with regards to representation, and show that in contrast to what is suggested in recent work, depth is not needed in order to represent highly smooth functions. -
Smooth Bumps, a Borel Theorem and Partitions of Smooth Functions on P.C.F. Fractals
SMOOTH BUMPS, A BOREL THEOREM AND PARTITIONS OF SMOOTH FUNCTIONS ON P.C.F. FRACTALS. LUKE G. ROGERS, ROBERT S. STRICHARTZ, AND ALEXANDER TEPLYAEV Abstract. We provide two methods for constructing smooth bump functions and for smoothly cutting off smooth functions on fractals, one using a prob- abilistic approach and sub-Gaussian estimates for the heat operator, and the other using the analytic theory for p.c.f. fractals and a fixed point argument. The heat semigroup (probabilistic) method is applicable to a more general class of metric measure spaces with Laplacian, including certain infinitely ramified fractals, however the cut off technique involves some loss in smoothness. From the analytic approach we establish a Borel theorem for p.c.f. fractals, showing that to any prescribed jet at a junction point there is a smooth function with that jet. As a consequence we prove that on p.c.f. fractals smooth functions may be cut off with no loss of smoothness, and thus can be smoothly decom- posed subordinate to an open cover. The latter result provides a replacement for classical partition of unity arguments in the p.c.f. fractal setting. 1. Introduction Recent years have seen considerable developments in the theory of analysis on certain fractal sets from both probabilistic and analytic viewpoints [1, 14, 26]. In this theory, either a Dirichlet energy form or a diffusion on the fractal is used to construct a weak Laplacian with respect to an appropriate measure, and thereby to define smooth functions. As a result the Laplacian eigenfunctions are well un- derstood, but we have little knowledge of other basic smooth functions except in the case where the fractal is the Sierpinski Gasket [19, 5, 20]. -
High Energy and Smoothness Asymptotic Expansion of the Scattering Amplitude
HIGH ENERGY AND SMOOTHNESS ASYMPTOTIC EXPANSION OF THE SCATTERING AMPLITUDE D.Yafaev Department of Mathematics, University Rennes-1, Campus Beaulieu, 35042, Rennes, France (e-mail : [email protected]) Abstract We find an explicit expression for the kernel of the scattering matrix for the Schr¨odinger operator containing at high energies all terms of power order. It turns out that the same expression gives a complete description of the diagonal singular- ities of the kernel in the angular variables. The formula obtained is in some sense universal since it applies both to short- and long-range electric as well as magnetic potentials. 1. INTRODUCTION d−1 d−1 1. High energy asymptotics of the scattering matrix S(λ): L2(S ) → L2(S ) for the d Schr¨odinger operator H = −∆+V in the space H = L2(R ), d ≥ 2, with a real short-range potential (bounded and satisfying the condition V (x) = O(|x|−ρ), ρ > 1, as |x| → ∞) is given by the Born approximation. To describe it, let us introduce the operator Γ0(λ), −1/2 (d−2)/2 ˆ 1/2 d−1 (Γ0(λ)f)(ω) = 2 k f(kω), k = λ ∈ R+ = (0, ∞), ω ∈ S , (1.1) of the restriction (up to the numerical factor) of the Fourier transform fˆ of a function f to −1 −1 the sphere of radius k. Set R0(z) = (−∆ − z) , R(z) = (H − z) . By the Sobolev trace −r d−1 theorem and the limiting absorption principle the operators Γ0(λ)hxi : H → L2(S ) and hxi−rR(λ + i0)hxi−r : H → H are correctly defined as bounded operators for any r > 1/2 and their norms are estimated by λ−1/4 and λ−1/2, respectively. -
Bertini's Theorem on Generic Smoothness
U.F.R. Mathematiques´ et Informatique Universite´ Bordeaux 1 351, Cours de la Liberation´ Master Thesis in Mathematics BERTINI1S THEOREM ON GENERIC SMOOTHNESS Academic year 2011/2012 Supervisor: Candidate: Prof.Qing Liu Andrea Ricolfi ii Introduction Bertini was an Italian mathematician, who lived and worked in the second half of the nineteenth century. The present disser- tation concerns his most celebrated theorem, which appeared for the first time in 1882 in the paper [5], and whose proof can also be found in Introduzione alla Geometria Proiettiva degli Iperspazi (E. Bertini, 1907, or 1923 for the latest edition). The present introduction aims to informally introduce Bertini’s Theorem on generic smoothness, with special attention to its re- cent improvements and its relationships with other kind of re- sults. Just to set the following discussion in an historical perspec- tive, recall that at Bertini’s time the situation was more or less the following: ¥ there were no schemes, ¥ almost all varieties were defined over the complex numbers, ¥ all varieties were embedded in some projective space, that is, they were not intrinsic. On the contrary, this dissertation will cope with Bertini’s the- orem by exploiting the powerful tools of modern algebraic ge- ometry, by working with schemes defined over any field (mostly, but not necessarily, algebraically closed). In addition, our vari- eties will be thought of as abstract varieties (at least when over a field of characteristic zero). This fact does not mean that we are neglecting Bertini’s original work, containing already all the rele- vant ideas: the proof we shall present in this exposition, over the complex numbers, is quite close to the one he gave. -
Jia 84 (1958) 0125-0165
INSTITUTE OF ACTUARIES A MEASURE OF SMOOTHNESS AND SOME REMARKS ON A NEW PRINCIPLE OF GRADUATION BY M. T. L. BIZLEY, F.I.A., F.S.S., F.I.S. [Submitted to the Institute, 27 January 19581 INTRODUCTION THE achievement of smoothness is one of the main purposes of graduation. Smoothness, however, has never been defined except in terms of concepts which themselves defy definition, and there is no accepted way of measuring it. This paper presents an attempt to supply a definition by constructing a quantitative measure of smoothness, and suggests that such a measure may enable us in the future to graduate without prejudicing the process by an arbitrary choice of the form of the relationship between the variables. 2. The customary method of testing smoothness in the case of a series or of a function, by examining successive orders of differences (called hereafter the classical method) is generally recognized as unsatisfactory. Barnett (J.1.A. 77, 18-19) has summarized its shortcomings by pointing out that if we require successive differences to become small, the function must approximate to a polynomial, while if we require that they are to be smooth instead of small we have to judge their smoothness by that of their own differences, and so on ad infinitum. Barnett’s own definition, which he recognizes as being rather vague, is as follows : a series is smooth if it displays a tendency to follow a course similar to that of a simple mathematical function. Although Barnett indicates broadly how the word ‘simple’ is to be interpreted, there is no way of judging decisively between two functions to ascertain which is the simpler, and hence no quantitative or qualitative measure of smoothness emerges; there is a further vagueness inherent in the term ‘similar to’ which would prevent the definition from being satisfactory even if we could decide whether any given function were simple or not. -
Robinson Showed That It Is Possible to Rep- Resent Each Schwartz Distribution T As an Integral T(4) = Sf4, Where F Is Some Nonstandard Smooth Function
Indag. Mathem., N.S., 11 (1) 129-138 March 30,200O Constructive nonstandard representations of generalized functions by Erik Palmgren* Department of Mathematics,Vppsala University, P. 0. Box 480, S-75106 Uppa& &v&n, e-mail:[email protected] Communicated by Prof. A.S. Troelstra at the meeting of May 31, 1999 ABSTRACT Using techniques of nonstandard analysis Abraham Robinson showed that it is possible to rep- resent each Schwartz distribution T as an integral T(4) = sf4, where f is some nonstandard smooth function. We show that the theory based on this representation can be developed within a constructive setting. I. INTRODUCTION Robinson (1966) demonstrated that Schwartz’ theory of distributions could be given a natural formulation using techniques of nonstandard analysis, so that distributions become certain nonstandard smooth functions. In particular, Dirac’s delta-function may then be taken to be the rational function where E is a positive infinitesimal. As is wellknown, classical nonstandard analysis is based on strongly nonconstructive assumptions. In this paper we present a constructive version of Robinson’s theory using the framework of constructive nonstandard analysis developed in Palmgren (1997, 1998), which was based on Moerdijk’s (1995) sheaf-theoretic nonstandard models. The main *The author gratefully acknowledges support from the Swedish Research Council for Natural Sciences (NFR). 129 points of the present paper are the elimination of the standard part map from Robinson’s theory and a constructive proof of the representation of distribu- tions as nonstandard smooth functions (Theorem 2.6). We note that Schmieden and Laugwitz already in 1958 introduced their ver- sion of nonstandard analysis to give an interpretation of generalized functions which are closer to those of the Mikusinski-Sikorski calculus. -
Delta Functions and Distributions
When functions have no value(s): Delta functions and distributions Steven G. Johnson, MIT course 18.303 notes Created October 2010, updated March 8, 2017. Abstract x = 0. That is, one would like the function δ(x) = 0 for all x 6= 0, but with R δ(x)dx = 1 for any in- These notes give a brief introduction to the mo- tegration region that includes x = 0; this concept tivations, concepts, and properties of distributions, is called a “Dirac delta function” or simply a “delta which generalize the notion of functions f(x) to al- function.” δ(x) is usually the simplest right-hand- low derivatives of discontinuities, “delta” functions, side for which to solve differential equations, yielding and other nice things. This generalization is in- a Green’s function. It is also the simplest way to creasingly important the more you work with linear consider physical effects that are concentrated within PDEs, as we do in 18.303. For example, Green’s func- very small volumes or times, for which you don’t ac- tions are extremely cumbersome if one does not al- tually want to worry about the microscopic details low delta functions. Moreover, solving PDEs with in this volume—for example, think of the concepts of functions that are not classically differentiable is of a “point charge,” a “point mass,” a force plucking a great practical importance (e.g. a plucked string with string at “one point,” a “kick” that “suddenly” imparts a triangle shape is not twice differentiable, making some momentum to an object, and so on. -
Lecture 2 — February 25Th 2.1 Smooth Optimization
Statistical machine learning and convex optimization 2016 Lecture 2 — February 25th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization. For a convex function, a local minimum is a global minimum and the uniqueness is assured in the case of strict convexity. In the sequel, g is a convex function on Rd. The aim is to find one (or the) minimum θ Rd and the value of the function at this minimum g(θ ). Some key additional ∗ ∗ properties will∈ be assumed for g : Lipschitz continuity • d θ R , θ 6 D g′(θ) 6 B. ∀ ∈ k k2 ⇒k k2 Smoothness • d 2 (θ , θ ) R , g′(θ ) g′(θ ) 6 L θ θ . ∀ 1 2 ∈ k 1 − 2 k2 k 1 − 2k2 Strong convexity • d 2 µ 2 (θ , θ ) R ,g(θ ) > g(θ )+ g′(θ )⊤(θ θ )+ θ θ . ∀ 1 2 ∈ 1 2 2 1 − 2 2 k 1 − 2k2 We refer to Lecture 1 of this course for additional information on these properties. We point out 2 key references: [1], [2]. 2.1 Smooth optimization If g : Rd R is a convex L-smooth function, we remind that for all θ, η Rd: → ∈ g′(θ) g′(η) 6 L θ η . k − k k − k Besides, if g is twice differentiable, it is equivalent to: 0 4 g′′(θ) 4 LI. Proposition 2.1 (Properties of smooth convex functions) Let g : Rd R be a con- → vex L-smooth function. Then, we have the following inequalities: L 2 1. -
Chapter 16 the Tangent Space and the Notion of Smoothness
Chapter 16 The tangent space and the notion of smoothness We will always assume K algebraically closed. In this chapter we follow the approach of ˇ n Safareviˇc[S]. We define the tangent space TX,P at a point P of an affine variety X A as ⇢ the union of the lines passing through P and “ touching” X at P .Itresultstobeanaffine n subspace of A .Thenwewillfinda“local”characterizationofTX,P ,thistimeinterpreted as a vector space, the direction of T ,onlydependingonthelocalring :thiswill X,P OX,P allow to define the tangent space at a point of any quasi–projective variety. 16.1 Tangent space to an affine variety Assume first that X An is closed and P = O =(0,...,0). Let L be a line through P :if ⇢ A(a1,...,an)isanotherpointofL,thenageneralpointofL has coordinates (ta1,...,tan), t K.IfI(X)=(F ,...,F ), then the intersection X L is determined by the following 2 1 m \ system of equations in the indeterminate t: F (ta ,...,ta )= = F (ta ,...,ta )=0. 1 1 n ··· m 1 n The solutions of this system of equations are the roots of the greatest common divisor G(t)of the polynomials F1(ta1,...,tan),...,Fm(ta1,...,tan)inK[t], i.e. the generator of the ideal they generate. We may factorize G(t)asG(t)=cte(t ↵ )e1 ...(t ↵ )es ,wherec K, − 1 − s 2 ↵ ,...,↵ =0,e, e ,...,e are non-negative, and e>0ifandonlyifP X L.The 1 s 6 1 s 2 \ number e is by definition the intersection multiplicity at P of X and L. -
[Math.FA] 1 Apr 1998 E Words Key 1991 Abstract *)Tersac Fscn N H Hr Uhrhsbe Part Been Has Author 93-0452
RENORMINGS OF Lp(Lq) by R. Deville∗, R. Gonzalo∗∗ and J. A. Jaramillo∗∗ (*) Laboratoire de Math´ematiques, Universit´eBordeaux I, 351, cours de la Lib´eration, 33400 Talence, FRANCE. email : [email protected] (**) Departamento de An´alisis Matem´atico, Universidad Complutense de Madrid, 28040 Madrid, SPAIN. email : [email protected] and [email protected] Abstract : We investigate the best order of smoothness of Lp(Lq). We prove in particular ∞ that there exists a C -smooth bump function on Lp(Lq) if and only if p and q are even integers and p is a multiple of q. arXiv:math/9804002v1 [math.FA] 1 Apr 1998 1991 Mathematics Subject Classification : 46B03, 46B20, 46B25, 46B30. Key words : Higher order smoothness, Renormings, Geometry of Banach spaces, Lp spaces. (**) The research of second and the third author has been partially supported by DGICYT grant PB 93-0452. 1 Introduction The first results on high order differentiability of the norm in a Banach space were given by Kurzweil [10] in 1954, and Bonic and Frampton [2] in 1965. In [2] the best order of differentiability of an equivalent norm on classical Lp-spaces, 1 < p < ∞, is given. The best order of smoothness of equivalent norms and of bump functions in Orlicz spaces has been investigated by Maleev and Troyanski (see [13]). The work of Leonard and Sundaresan [11] contains a systematic investigation of the high order differentiability of the norm function in the Lebesgue-Bochner function spaces Lp(X). Their results relate the continuously n-times differentiability of the norm on X and of the norm on Lp(X). -
Convergence Theorems for Gradient Descent
Convergence Theorems for Gradient Descent Robert M. Gower. October 5, 2018 Abstract Here you will find a growing collection of proofs of the convergence of gradient and stochastic gradient descent type method on convex, strongly convex and/or smooth functions. Important disclaimer: Theses notes do not compare to a good book or well prepared lecture notes. You should only read these notes if you have sat through my lecture on the subject and would like to see detailed notes based on my lecture as a reminder. Under any other circumstances, I highly recommend reading instead the first few chapters of the books [3] and [1]. 1 Assumptions and Lemmas 1.1 Convexity We say that f is convex if d f(tx + (1 − t)y) ≤ tf(x) + (1 − t)f(y); 8x; y 2 R ; t 2 [0; 1]: (1) If f is differentiable and convex then every tangent line to the graph of f lower bounds the function values, that is d f(y) ≥ f(x) + hrf(x); y − xi ; 8x; y 2 R : (2) We can deduce (2) from (1) by dividing by t and re-arranging f(y + t(x − y)) − f(y) ≤ f(x) − f(y): t Now taking the limit t ! 0 gives hrf(y); x − yi ≤ f(x) − f(y): If f is twice differentiable, then taking a directional derivative in the v direction on the point x in (2) gives 2 2 d 0 ≥ hrf(x); vi + r f(x)v; y − x − hrf(x); vi = r f(x)v; y − x ; 8x; y; v 2 R : (3) Setting y = x − v then gives 2 d 0 ≤ r f(x)v; v ; 8x; v 2 R : (4) 1 2 d The above is equivalent to saying the r f(x) 0 is positive semi-definite for every x 2 R : An analogous property to (2) holds even when the function is not differentiable. -
Online Methods in Machine Learning: Recitation 1
Online Methods in Machine Learning: Recitation 1. As seen in class, many offline machine learning problems can be written as: n 1 X min `(w, (xt, yt)) + λR(w), (1) w∈C n t=1 where C is a subset of Rd, R is a penalty function, and ` is the loss function that measures how well our predictor, w, explains the relationship between x and y. Example: Support Vector λ 2 w Machines with R(w) = ||w|| and `(w, (xt, yt)) = max(0, 1 − ythw, xti) where h , xti is 2 2 ||w||2 the signed distance to the hyperplane {x | hw, xi = 0} used to classify the data. A couple of remarks: 1. In the context of classification, ` is typically a convex surrogate loss function in the sense that (i) w → `(w, (x, y)) is convex for any example (x, y) and (ii) `(w, (x, y)) is larger than the true prediction error given by 1yhw,xi≤0 for any example (x, y). In general, we introduce this surrogate loss function because it would be difficult to solve (1) computationally. On the other hand, convexity enables the development of general- purpose methods (e.g. Gradient Descent) that are fast, easy to implement, and simple to analyze. Moreover, the techniques and the proofs generalize very well to the online setting, where the data (xt, yt) arrive sequentially and we have to make predictions online. This theme will be explored later in the class. 2. If ` is a surrogate loss for a classification problem and we solve (1) (e.g with Gradient Descent), we cannot guarantee that our predictor will perform well with respect to our original classification problem n X min 1yhw,xi≤0 + λR(w), (2) w∈C t=1 hence it is crucial to choose a surrogate loss function that reflects accurately the classifi- cation problem at hand.