Finite Range Method of Approximation for Balance Laws in Measure Spaces

Total Page:16

File Type:pdf, Size:1020Kb

Finite Range Method of Approximation for Balance Laws in Measure Spaces Kinetic and Related Models doi:10.3934/krm.2017027 c American Institute of Mathematical Sciences Volume 10, Number 3, September 2017 pp. 669{688 FINITE RANGE METHOD OF APPROXIMATION FOR BALANCE LAWS IN MEASURE SPACES Piotr Gwiazda∗, Piotr Orlinski and Agnieszka Ulikowska Institute of Applied Mathematics and Mechanics, University of Warsaw ul. Banacha 2, 02-097 Warsaw, Poland (Communicated by Jos´eAntonio Carrillo) Abstract. In the following paper we reconsider a numerical scheme recently introduced in [10]. The method was designed for a wide class of size structured population models with a nonlocal term describing the birth process. Despite its numerous advantages it features the exponential growth in time of the number of particles constituting the numerical solution. We introduce a new algorithm free from this inconvenience. The improvement is based on the application the Finite Range Approximation to the nonlocal term. We prove the convergence of the derived method and provide the rate of its convergence. Moreover, the results are illustrated by numerical simulations applied to various test cases. 1. Introduction. The aim of this paper is to present an improvement of the nu- merical scheme which was introduced in [10] and which possesses an unfavourable feature: in many cases the number of particles constituting the numerical solution can increase exponentially in time. To deal with the problem we apply a Finite Range Approximation method to the nonlocal term appearing on the right hand side of the model equation (1), see Subsection 2.1 for details. As shown in [10], the convergence of the scheme follows from the stability estimate [9, Theorem 2.11 (ii)]; however, its assumptions are not fulfilled after the application of the approximation procedure. For that reason we need to establish a relaxed version of the stability estimate and apply a new strategy in the proof of convergence of the scheme. The scheme under consideration follows a current trend which is based on a kinetic approach to population dynamics problems [2,3, 17, 21, 22, 23, 26]. Within this approach a population of individuals is divided into groups, so called cohorts. Each cohort is represented by the number of individuals and by their average state. Thus it seems natural to approximate the distribution of the population by a linear combination of Dirac measures. Such a method of approximation is very suitable for numerical studies, especially when it comes to compatibility of a model with experimental data. Indeed, a result of a measurement of a population usually boils down to providing the number of individuals with their average state within the underlying cohort. A good example of such measurements are demographical studies which provide data about a size of the age-cohorts. 2010 Mathematics Subject Classification. P92D25, 65M12, 65M75. Key words and phrases. Structured population models, particle method, measure valued solu- tions, Radon measures, flat metric. ∗ Corresponding author: Piotr Gwiazda. 669 670 PIOTR GWIAZDA, PIOTR ORLINSKI AND AGNIESZKA ULIKOWSKA A broad group of methods originating from the kinetic theory are particle meth- ods, which are designed to model the behavior of large groups of interacting particles or individuals. Over the last decades they have been successfully applied to solve numerically many problems originating from physics as the Euler equation in fluid mechanics [19, 32] and the Vlasov equation in plasma physics [5, 12, 18]. Recently, the particle methods have been used in problems related to crowd dynamics and flow of pedestrians [17, 26, 27], models of a collective motion of large groups of agents [8, 16,4] and population dynamics [9]. For more applications see [24, 25, 28, 29] and references therein. In this paper we focus on the population dynamics and the following size struc- tured population model @ @ Z µ + (b(t; µ)µ) + c(t; µ)µ = (η(t; µ))(y) dµ (y); (1) @t @x R+ where t 2 [0;T ] and x ≥ 0 denote, respectively, time and the size of an individual. Solutions of (1) are considered in a weak sense, see [9, Definition 2.2] for the exact definition. In general, the x variable can describe other physiological states (e.g. length or weight) but for sake of simplicity we stick to the size variable. The measure µ is a distribution of individuals with respect to x. We assume that an individual changes its size according to the following ODE x_ = b(t; µ)(x); where b describes the dynamics of the transformation, that is, the speed of the individual's growth. Function c(t; µ)(x) stands for the death rate, and the integral term describes the birth process. To explain briefly the meaning of the nonlocal term in (1), we assume for a moment that function η does not depend neither on time t nor the population state µ. Then, for a fixed y ≥ 0, function η(y) describes a distribution (with respect to x) of the offspring of individuals at size y. Additionally, in the particular case where all new born individuals have the same size xb we set η(y) = β(y)δx=xb ; (2) where β(y) is related to the probability that individuals at size y procreate. If (2) holds, then the integral in (1) transforms into a boundary condition and, as a consequence, (1) can be reduced to the following classical renewal equation with the nonlocal boundary condition @ @ µ + (bµ) + cµ = 0; for x ≥ x ; @t @x b Z (3) b b b(x )Dλµ(x ) = β(y) dµ (y); R+ b where Dλµ(x ) is the Radon-Nikodym derivative, if it exists, of µ with respect to the Lebesgue measure at xb. The model (1) describes a population which undergoes processes of birth, death and development. The number of individuals in the population and its total biomass change in time, which is clearly indicated by the nonconservative character of the problem. We need to underline that the lack of mass conservation is the main challenge associated with the application of the particle methods in population dy- namics. Let us briefly recall that the most common mathematical framework for the kinetic theory is a space of probability measures equipped with a Wasserstein FINITE RANGE APPROXIMATION METHOD 671 distance. Unfortunately, the 1-Wasserstein distance W1 between two Radon mea- sures µ and ν such that R dµ =6 R dν is infinite, which is the reason why natural distances for measures, like the Wasserstein distances, cannot be exploited in the case of the nonconservative problems. Indeed, let µ, ν be finite Radon measures on R such that µ(R) =6 ν(R). Then, according to [30, Section 7.1] Z W1(µ, ν) = sup '(x) d( µ − ν)(x): Lip(') ≤ 1 R Z ≥ a d( µ − ν)(x) = a(µ(R) − ν(R)); R yields W1(µ, ν) = +1, as a 2 R can be arbitrarily large. Therefore, to consider the well{posedness of the models of population dynamics in the space of measures, the suitable framework is needed in the first place. It has been recently established by replacing the Wasserstein distance by the flat metric (see Section3 for definitions and technical details). One of the first steps in that field has been made in [22, 23], where existence, uniqueness and stability of solutions to (3) in the space of finite, nonnegative Radon measures equipped with the flat metric were proved. Within this framework the first formal proof of convergence of a corresponding particle method for (3) has been conducted in [6]. The method is called the Escalator Boxcar Train (EBT), and although it was described for the first time in the 80's in [13], the proof of its convergence and the convergence rate [20] is very recent. Well{posedness of a general size-structured population model (1) in the space of measures was established in [9], and a numerical scheme based on the particle methods was developed in [10]. In the latter paper an essential assumption is the particular form of the η function, namely r X η(y) = βp(y)δx=fp(y); (4) p=1 r which means that the size of a child belongs to a set ffp(y)gp=1, where y stands for b the size of its parent. For instance, letting r = 1, f1(y) = x , and β1(y) = β > 0 corresponds to the special case of (2), and leads to the equation (3). Another common example is a simple symmetric cell division model, which arises due to the 1 following settings; r = 1, f1(y) = 2 y, and β1(y) = β > 0. The asymmetric case may be obtained with r = 2, f1(y) = σy, f2(y) = (1 − σ)y, where 0 < σ < 1, and β1(y) = β1 > 0, β2(y) = β2 > 0. In both cases the cell division process is understood as the birth of two new cells and the death of a mother cell, which should be incorporated in the death rate. As has been already stated above, in the kinetic approach a solution is approx- imated by a linear combination of Dirac measures at each discrete time moment. In the case of the algorithm developed in [10] Dirac deltas represent cohorts, that is groups of individuals of a similar size. Since a population undergoes a process of births, at least one additional Dirac measure is created at each time step of the algorithm. In case of the particular choice of (2) it is exactly one Dirac measure, since all new born individuals have the same size. However, in the case of the sym- metric cell division model the number of Dirac measures is doubled at each time step, which results in the exponential growth of particles.
Recommended publications
  • On Stochastic Distributions and Currents
    NISSUNA UMANA INVESTIGAZIONE SI PUO DIMANDARE VERA SCIENZIA S’ESSA NON PASSA PER LE MATEMATICHE DIMOSTRAZIONI LEONARDO DA VINCI vol. 4 no. 3-4 2016 Mathematics and Mechanics of Complex Systems VINCENZO CAPASSO AND FRANCO FLANDOLI ON STOCHASTIC DISTRIBUTIONS AND CURRENTS msp MATHEMATICS AND MECHANICS OF COMPLEX SYSTEMS Vol. 4, No. 3-4, 2016 dx.doi.org/10.2140/memocs.2016.4.373 ∩ MM ON STOCHASTIC DISTRIBUTIONS AND CURRENTS VINCENZO CAPASSO AND FRANCO FLANDOLI Dedicated to Lucio Russo, on the occasion of his 70th birthday In many applications, it is of great importance to handle random closed sets of different (even though integer) Hausdorff dimensions, including local infor- mation about initial conditions and growth parameters. Following a standard approach in geometric measure theory, such sets may be described in terms of suitable measures. For a random closed set of lower dimension with respect to the environment space, the relevant measures induced by its realizations are sin- gular with respect to the Lebesgue measure, and so their usual Radon–Nikodym derivatives are zero almost everywhere. In this paper, how to cope with these difficulties has been suggested by introducing random generalized densities (dis- tributions) á la Dirac–Schwarz, for both the deterministic case and the stochastic case. For the last one, mean generalized densities are analyzed, and they have been related to densities of the expected values of the relevant measures. Ac- tually, distributions are a subclass of the larger class of currents; in the usual Euclidean space of dimension d, currents of any order k 2 f0; 1;:::; dg or k- currents may be introduced.
    [Show full text]
  • Lipschitz Continuity of Convex Functions
    Lipschitz Continuity of Convex Functions Bao Tran Nguyen∗ Pham Duy Khanh† November 13, 2019 Abstract We provide some necessary and sufficient conditions for a proper lower semi- continuous convex function, defined on a real Banach space, to be locally or globally Lipschitz continuous. Our criteria rely on the existence of a bounded selection of the subdifferential mapping and the intersections of the subdifferential mapping and the normal cone operator to the domain of the given function. Moreover, we also point out that the Lipschitz continuity of the given function on an open and bounded (not necessarily convex) set can be characterized via the existence of a bounded selection of the subdifferential mapping on the boundary of the given set and as a consequence it is equivalent to the local Lipschitz continuity at every point on the boundary of that set. Our results are applied to extend a Lipschitz and convex function to the whole space and to study the Lipschitz continuity of its Moreau envelope functions. Keywords Convex function, Lipschitz continuity, Calmness, Subdifferential, Normal cone, Moreau envelope function. Mathematics Subject Classification (2010) 26A16, 46N10, 52A41 1 Introduction arXiv:1911.04886v1 [math.FA] 12 Nov 2019 Lipschitz continuous and convex functions play a significant role in convex and nonsmooth analysis. It is well-known that if the domain of a proper lower semicontinuous convex function defined on a real Banach space has a nonempty interior then the function is continuous over the interior of its domain [3, Proposition 2.111] and as a consequence, it is subdifferentiable (its subdifferential is a nonempty set) and locally Lipschitz continuous at every point in the interior of its domain [3, Proposition 2.107].
    [Show full text]
  • The Beginnings 2 2. the Topology of Metric Spaces 5 3. Sequences and Completeness 9 4
    MATH 3963 NONLINEAR ODES WITH APPLICATIONS R MARANGELL Contents 1. Metric Spaces - The Beginnings 2 2. The Topology of Metric Spaces 5 3. Sequences and Completeness 9 4. The Contraction Mapping Theorem 13 5. Lipschitz Continuity 19 6. Existence and Uniqueness of Solutions to ODEs 21 7. Dependence on Initial Conditions and Parameters 26 8. Maximal Intervals of Existence 29 Date: September 6, 2017. 1 R Marangell Part II - Existence and Uniqueness of ODES 1. Metric Spaces - The Beginnings So.... I am from California, and so I fly through L.A. a lot on my way to my parents house. How far is that from Sydney? Well... Google tells me it's 12000 km (give or take), and in fact this agrees with my airline - they very nicely give me 12000 km on my frequent flyer points. But.... well.... are they really 12000 km apart? What if I drilled a hole through the surface of the Earth? Using a little trigonometry, and the fact that the radius of the earth is R = 6371 km, we have that the distance `as the mole digs' so-to-speak from Sydney to Los Angeles CA is given by (see Figure 1): 12000 x = 2R sin ≈ 10303 km (give or take) 2R Figure 1. A schematic of the earth showing an `equatorial' (or great) circle from Sydney to L.A. So I have two different answers, to the same question, and both are correct. Indeed, Sydney is both 12000 km and around 10303 km away from L.A. What's going on here is pretty obvious, but we're going to make it precise.
    [Show full text]
  • Convergence Rates for Deterministic and Stochastic Subgradient
    Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer∗ Abstract We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global O(1/√T ) convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor’s classic subgradient analysis and implies generalizations of the standard convergence rates for gradient descent on functions with Lipschitz or H¨older continuous gradients. Further, we show a O(1/√T ) convergence rate for the stochastic projected subgradient method on convex functions with at most quadratic growth, which improves to O(1/T ) under either strong convexity or a weaker quadratic lower bound condition. 1 Introduction We consider the nonsmooth, convex optimization problem given by min f(x) x∈Q for some lower semicontinuous convex function f : Rd R and closed convex feasible → ∪{∞} region Q. We assume Q lies in the domain of f and that this problem has a nonempty set of minimizers X∗ (with minimum value denoted by f ∗). Further, we assume orthogonal projection onto Q is computationally tractable (which we denote by PQ( )). arXiv:1712.04104v3 [math.OC] 26 Feb 2018 Since f may be nondifferentiable, we weaken the notion of gradients to· subgradients. The set of all subgradients at some x Q (referred to as the subdifferential) is denoted by ∈ ∂f(x)= g Rd y Rd f(y) f(x)+ gT (y x) . { ∈ | ∀ ∈ ≥ − } We consider solving this problem via a (potentially stochastic) projected subgradient method.
    [Show full text]
  • Probability Measures on Metric Spaces
    Probability measures on metric spaces Onno van Gaans These are some loose notes supporting the first sessions of the seminar Stochastic Evolution Equations organized by Dr. Jan van Neerven at the Delft University of Technology during Winter 2002/2003. They contain less information than the common textbooks on the topic of the title. Their purpose is to present a brief selection of the theory that provides a basis for later study of stochastic evolution equations in Banach spaces. The notes aim at an audience that feels more at ease in analysis than in probability theory. The main focus is on Prokhorov's theorem, which serves both as an important tool for future use and as an illustration of techniques that play a role in the theory. The field of measures on topological spaces has the luxury of several excellent textbooks. The main source that has been used to prepare these notes is the book by Parthasarathy [6]. A clear exposition is also available in one of Bour- baki's volumes [2] and in [9, Section 3.2]. The theory on the Prokhorov metric is taken from Billingsley [1]. The additional references for standard facts on general measure theory and general topology have been Halmos [4] and Kelley [5]. Contents 1 Borel sets 2 2 Borel probability measures 3 3 Weak convergence of measures 6 4 The Prokhorov metric 9 5 Prokhorov's theorem 13 6 Riesz representation theorem 18 7 Riesz representation for non-compact spaces 21 8 Integrable functions on metric spaces 24 9 More properties of the space of probability measures 26 1 The distribution of a random variable in a Banach space X will be a probability measure on X.
    [Show full text]
  • The Lipschitz Constant of Self-Attention
    The Lipschitz Constant of Self-Attention Hyunjik Kim 1 George Papamakarios 1 Andriy Mnih 1 Abstract constraint for neural networks, to control how much a net- Lipschitz constants of neural networks have been work’s output can change relative to its input. Such Lips- explored in various contexts in deep learning, chitz constraints are useful in several contexts. For example, such as provable adversarial robustness, estimat- Lipschitz constraints can endow models with provable ro- ing Wasserstein distance, stabilising training of bustness against adversarial pertubations (Cisse et al., 2017; GANs, and formulating invertible neural net- Tsuzuku et al., 2018; Anil et al., 2019), and guaranteed gen- works. Such works have focused on bounding eralisation bounds (Sokolic´ et al., 2017). Moreover, the dual the Lipschitz constant of fully connected or con- form of the Wasserstein distance is defined as a supremum volutional networks, composed of linear maps over Lipschitz functions with a given Lipschitz constant, and pointwise non-linearities. In this paper, we hence Lipschitz-constrained networks are used for estimat- investigate the Lipschitz constant of self-attention, ing Wasserstein distances (Peyré & Cuturi, 2019). Further, a non-linear neural network module widely used Lipschitz-constrained networks can stabilise training for in sequence modelling. We prove that the stan- GANs, an example being spectral normalisation (Miyato dard dot-product self-attention is not Lipschitz et al., 2018). Finally, Lipschitz-constrained networks are for unbounded input domain, and propose an al- also used to construct invertible models and normalising ternative L2 self-attention that is Lipschitz. We flows. For example, Lipschitz-constrained networks can be derive an upper bound on the Lipschitz constant of used as a building block for invertible residual networks and L2 self-attention and provide empirical evidence hence flow-based generative models (Behrmann et al., 2019; for its asymptotic tightness.
    [Show full text]
  • Some Special Results of Measure Theory
    Some Special Results of Measure Theory By L. Le Cam1 U.C. Berkeley 1. Introduction The purpose of the present report is to record in writing some results of measure theory that are known to many people but do not seem to be in print, or at least seem to be difficult to find in the printed literature. The first result, originally proved by a consortium including R.M. Dudley, J. Feldman, D. Fremlin, C.C. Moore and R. Solovay in 1970 says something like this: Let X be compact with a Radon measure µ.Letf be a map from X to a metric space Y such that for every open set S ⊂ Y the inverse image, f −1(S) is Radon measurable. Then, if the cardinality of f(X)isnot outlandishly large, there is a subset X0 ⊂ X such that µ(X\X0)=0and f(X0) is separable. Precise definition of what outlandishly large means will be given below. The theorem may not appear very useful. However, after seeing it, one usually looks at empirical measures and processes in a different light. The theorem could be stated briefly as follows: A measurable image of a Radon measure in a complete metric space is Radon. Section 6 Theorem 7 gives an extension of the result where maps are replaced by Markov kernels. Sec- tion 8, Theorem 9, gives an extension to the case where the range space is paracompact instead of metric. The second part of the paper is an elaboration on certain classes of mea- sures that are limits in a suitable sense of “molecular” ones, that is measures carried by finite sets.
    [Show full text]
  • Hölder-Continuity for the Nonlinear Stochastic Heat Equation with Rough Initial Conditions
    Stoch PDE: Anal Comp (2014) 2:316–352 DOI 10.1007/s40072-014-0034-6 Hölder-continuity for the nonlinear stochastic heat equation with rough initial conditions Le Chen · Robert C. Dalang Received: 22 October 2013 / Published online: 14 August 2014 © Springer Science+Business Media New York 2014 Abstract We study space-time regularity of the solution of the nonlinear stochastic heat equation in one spatial dimension driven by space-time white noise, with a rough initial condition. This initial condition is a locally finite measure μ with, possibly, exponentially growing tails. We show how this regularity depends, in a neighborhood of t = 0, on the regularity of the initial condition. On compact sets in which t > 0, 1 − 1 − the classical Hölder-continuity exponents 4 in time and 2 in space remain valid. However, on compact sets that include t = 0, the Hölder continuity of the solution is α ∧ 1 − α ∧ 1 − μ 2 4 in time and 2 in space, provided is absolutely continuous with an α-Hölder continuous density. Keywords Nonlinear stochastic heat equation · Rough initial data · Sample path Hölder continuity · Moments of increments Mathematics Subject Classification Primary 60H15 · Secondary 60G60 · 35R60 L. Chen and R. C. Dalang were supported in part by the Swiss National Foundation for Scientific Research. L. Chen (B) · R. C. Dalang Institut de mathématiques, École Polytechnique Fédérale de Lausanne, Station 8, CH-1015 Lausanne, Switzerland e-mail: [email protected] R. C. Dalang e-mail: robert.dalang@epfl.ch Present Address: L. Chen Department of Mathematics, University of Utah, 155 S 1400 E RM 233, Salt Lake City, UT 84112-0090, USA 123 Stoch PDE: Anal Comp (2014) 2:316–352 317 1 Introduction Over the last few years, there has been considerable interest in the stochastic heat equation with non-smooth initial data: ∂ ν ∂2 ˙ ∗ − u(t, x) = ρ(u(t, x)) W(t, x), x ∈ R, t ∈ R+, ∂t 2 ∂x2 (1.1) u(0, ·) = μ(·).
    [Show full text]
  • Real Analysis II, Winter 2018
    Real Analysis II, Winter 2018 From the Finnish original “Moderni reaalianalyysi”1 by Ilkka Holopainen adapted by Tuomas Hytönen February 22, 2018 1Version dated September 14, 2011 Contents 1 General theory of measure and integration 2 1.1 Measures . 2 1.11 Metric outer measures . 4 1.20 Regularity of measures, Radon measures . 7 1.31 Uniqueness of measures . 9 1.36 Extension of measures . 11 1.45 Product measure . 14 1.52 Fubini’s theorem . 16 2 Hausdorff measures 21 2.1 Basic properties of Hausdorff measures . 21 2.12 Hausdorff dimension . 24 2.17 Hausdorff measures on Rn ...................... 25 3 Compactness and convergence of Radon measures 30 3.1 Riesz representation theorem . 30 3.13 Weak convergence of measures . 35 3.17 Compactness of measures . 36 4 On the Hausdorff dimension of fractals 39 4.1 Mass distribution and Frostman’s lemma . 39 4.16 Self-similar fractals . 43 5 Differentiation of measures 52 5.1 Besicovitch and Vitali covering theorems . 52 1 Chapter 1 General theory of measure and integration 1.1 Measures Let X be a set and P(X) = fA : A ⊂ Xg its power set. Definition 1.2. A collection M ⊂ P (X) is a σ-algebra of X if 1. ? 2 M; 2. A 2 M ) Ac = X n A 2 M; S1 3. Ai 2 M, i 2 N ) i=1 Ai 2 M. Example 1.3. 1. P(X) is the largest σ-algebra of X; 2. f?;Xg is the smallest σ-algebra of X; 3. Leb(Rn) = the Lebesgue measurable subsets of Rn; 4.
    [Show full text]
  • 1. the Rademacher Theorem
    GEOMETRIC ANALYSIS PIOTR HAJLASZ 1. The Rademacher theorem 1.1. The Rademacher theorem. Lipschitz functions defined on one dimensional inter- vals are differentiable a.e. This is a classical result that is covered in most of courses in mea- sure theory. However, Rademacher proved a much deeper result that Lipschitz functions defined on open sets in Rn are also differentiable a.e. Let us first discuss differentiability of Lipschitz functions defined on one dimensional intervals. This result is a special case of differentiability of absolutely continuous functions. Definition 1.1. We say that a function f :[a; b] ! R is absolutely continuous if for very " > 0 there is δ > 0 such that if (x1; x1 +h1);:::; (xk; xk +hk) are pairwise disjoint intervals Pk in [a; b] of total length less than δ, i=1 hi < δ, then k X jf(xi + hi) − f(xi)j < ": i=1 The definition of an absolutely continuous function reminds the definition of a uniformly continuous function. Indeed, we would obtain the definition of a uniformly continuous function if we would restrict just to a single interval (x; x+h), i.e. if we would assume that k = 1. Despite similarity, the class of absolutely continuous function is much smaller than the class of uniformly continuous function. For example Lipschitz function f :[a; b] ! R are absolutely continuous, but in general H¨oldercontinuous functions are uniformly continuous, but not absolutely continuous. Proposition 1.2. If f; g :[a; b] ! R are absolutely continuous, then also the functions f ± g and fg are absolutely continuous.
    [Show full text]
  • On Lipschitz Continuity of Projections 2
    ON LIPSCHITZ CONTINUITY OF PROJECTIONS ONTO POLYHEDRAL MOVING SETS EWA M. BEDNARCZUK1 AND KRZYSZTOF E. RUTKOWSKI2 Abstract. In Hilbert space setting we prove local lipchitzness of projections onto parametric polyhedral sets represented as solutions to systems of inequalities and equations with parameters appearing both in left-hand-sides and right-hand-sides of the constraints. In deriving main results we assume that data are locally Lips- chitz functions of parameter and the relaxed constant rank constraint qualification condition is satisfied. 1. Introduction Continuity of metric projections of a given v¯ onto moving subsets have already been investigated in a number of instances. In the framework of Hilbert spaces, the projection ′ PC (¯v) of v¯ onto closed convex sets C, C , i.e., solutions to optimization problems minimize kz − v¯k subject to z ∈ C, (Proj) are unique and Hölder continuous with the exponent 1/2 in the sense that there exists a constant ℓH > 0 with ′ 1/2 kPC (¯v) − PC′ (¯v)k≤ ℓH [dρ(C, C )] , where dρ(·, ·) denotes the bounded Hausdorff distance (see [2] and also [7, Example 1.2]). In the case where the sets, on which we project a given v¯, are solution sets to systems of equations and inequalities, the problem P roj is a special case of a general parametric problem minimize ϕ0(p, x) subject to (Par) ϕi(p, x)=0 i ∈ I1, ϕi(p, x) ≤ 0 i ∈ I2, where x ∈ H, p ∈D⊂G, H - Hilbert space, G - metric space, ϕi : D×H→ R, arXiv:1909.13715v2 [math.OC] 6 Oct 2019 i ∈ {0} ∪ I1 ∪ I2.
    [Show full text]
  • Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
    Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness Vien V. Mai 1 Mikael Johansson 1 Abstract problems are at the core of many machine-learning appli- cations, and are often solved using stochastic (sub)gradient Stochastic gradient algorithms are often unsta- methods. In spite of their successes, stochastic gradient ble when applied to functions that do not have methods can be sensitive to their parameters (Nemirovski Lipschitz-continuous and/or bounded gradients. et al., 2009; Asi & Duchi, 2019a) and have severe instabil- Gradient clipping is a simple and effective tech- ity (unboundedness) problems when applied to functions nique to stabilize the training process for prob- that grow faster than quadratically in the decision vector x lems that are prone to the exploding gradient (Andradottir¨ , 1996; Asi & Duchi, 2019a). Consequently, a problem. Despite its widespread popularity, the careful (and sometimes time-consuming) parameter tuning convergence properties of the gradient clipping is often required for these methods to perform well in prac- heuristic are poorly understood, especially for tice. Even so, a good parameter selection is not sufficient to stochastic problems. This paper establishes both circumvent the instability issue on steep functions. qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method Gradient clipping and the closely related gradient normal- (SGD) for non-smooth convex functions with ization technique are simple modifications to the underlying rapidly growing subgradients. Our analyses show algorithm to control the step length that an update can make that clipping enhances the stability of SGD and relative to the current iterate. These techniques enhance that the clipped SGD algorithm enjoys finite con- the stability of the optimization process, while adding es- vergence rates in many cases.
    [Show full text]