How Svms Can Estimate Quantiles and the Median

Total Page:16

File Type:pdf, Size:1020Kb

How Svms Can Estimate Quantiles and the Median How SVMs can estimate quantiles and the median Ingo Steinwart Andreas Christmann Information Sciences Group CCS-3 Department of Mathematics Los Alamos National Laboratory Vrije Universiteit Brussel Los Alamos, NM 87545, USA B-1050 Brussels, Belgium [email protected] [email protected] Abstract We investigate quantile regression based on the pinball loss and the ǫ-insensitive loss. For the pinball loss a condition on the data-generating distribution P is given that ensures that the conditional quantiles are approximated with respect to 1. This result is then used to derive an oracle inequality for an SVM based kon · thek pinball loss. Moreover, we show that SVMs based on the ǫ-insensitive loss estimate the conditional median only under certain conditions on P . 1 Introduction Let P be a distribution on X Y , where X is an arbitrary set and Y R is closed. The goal of quantile regression is to estimate× the conditional quantile, i.e., the set valued⊂ function F ∗ (x) := t R :P ( , t] x τ and P [t, ) x 1 τ , x X, τ,P ∈ −∞ | ≥ ∞ | ≥ − ∈ where τ (0, 1) is a fixed constant and P( x), x X, is the (regular) conditional probability. For conceptual∈ simplicity (though mathematically· | this is∈ not necessary) we assume throughout this paper that F ∗ (x) consists of singletons, i.e., there exists a function f ∗ : X R, called the conditional τ,P τ,P → τ-quantile function, such that F ∗ (x) = f ∗ (x) , x X. Let us now consider the so-called τ,P { τ,P } ∈ τ-pinball loss L : R R [0, ) defined by L (y, t) := ψ (y t), where ψ (r) = (τ 1)r, if τ × → ∞ τ τ − τ − r < 0, and ψτ (r) = τr, if r 0. Moreover, given a (measurable) function f : X R we define the L -risk of f by (f) :=≥ E L (y, f(x)). Now recall that f ∗ is up to→ zero sets the only τ RLτ ,P (x,y)∼P τ τ,P function that minimizes the -risk, i.e. ∗ ∗ , where the infi- Lτ Lτ ,P(fτ,P) = inf Lτ ,P(f) =: Lτ ,P mum is taken over all f : X R. Based onR this observation severalR estimatorsR minimizing a (mod- → ified) empirical Lτ -risk were proposed (see [5] for a survey on both parametric and non-parametric methods) for situations where P is unknown, but i.i.d. samples D := ((x1, y1),..., (xn, yn)) drawn from P are given. In particular, [6, 4, 10] proposed an SVM that finds a solution fD,λ H of n ∈ 2 1 arg min λ f H + Lτ (yi, f(xi)) , (1) f∈H k k n i=1 X where λ > 0 is a regularization parameter and H is a reproducing kernel Hilbert space (RKHS) over X. Note that this optimization problem can be solved by considering the dual problem [4, 10], but since this technique is nowadays standard in machine learning we omit the details. Moreover, [10] contains an exhaustive empirical study as well some theoretical considerations. Empirical methods estimating quantiles with the help of the pinball loss typically obtain functions for which is close to ∗ with high probability. However, in general this only fD Lτ ,P(fD) Lτ ,P R ∗ R implies that fD is close to fτ,P in a very weak sense (see [7, Remark 3.18]), and hence there is so far only little justification for using fD as an estimate of the quantile function. Our goal is to address this issue by showing that under certain realistic assumptions on P we have an inequality of the form ∗ ∗ f f cP L ,P(f) . (2) k − τ,PkL1(PX ) ≤ R τ − RLτ ,P q We then use this inequality to establish an oracle inequality for SVMs defined by (1). In addition, we illustrate how this oracle inequality can be used to obtain learning rates and to justify a data- dependent method for finding the hyper-parameter λ and H. Finally, we generalize the methods for establishing (2) to investigate the role of ǫ in the ǫ-insensitive loss used in standard SVM regression. 2 Main results In the following X is an arbitrary, non-empty set equipped with a σ-algebra, and Y R is a closed non-empty set. Given a distribution P on X Y we further assume throughout this⊂ paper that the × σ-algebra on X is complete with respect to the marginal distribution PX of P, i.e., every subset of a PX -zero set is contained in the σ-algebra. Since the latter can always be ensured by increasing the original σ-algebra in a suitable manner we note that this is not a restriction at all. Definition 2.1 A distribution Q on R is said to have a τ-quantile of type α > 0 if there exists a τ-quantile t∗ R and a constant c > 0 such that for all s [0, α] we have ∈ Q ∈ Q (t∗, t∗ + s) c s and Q (t∗ s, t∗) c s . (3) ≥ Q − ≥ Q It is not difficult to see that a distribution Q having a τ-quantile of some type α has a unique τ- ∗ quantile t . Moreover, if Q has a Lebesgue density hQ then Q has a τ-quantile of type α if hQ is ∗ ∗ ∗ ∗ bounded away from zero on [t α, t +α] since we can use cQ := inf hQ(t): t [t α, t +α] in (3). This assumption is general− enough to cover many distributions{ used in parametric∈ − statistics} such as Gaussian, Student’s t, and logistic distributions (with Y = R), Gamma and log-normal distributions (with Y = [0, )), and uniform and Beta distributions (with Y = [0, 1]). ∞ The following definition describes distributions on X Y whose conditional distributions P( x), x X, have the same τ-quantile type α. × · | ∈ Definition 2.2 Let p (0, ], τ (0, 1), and α > 0. A distribution P on X Y is said to have a ∈ ∞ ∈ × τ-quantile of p-average type α, if Qx := P( x) has PX -almost surely a τ-quantile type α and b : X (0, ) defined by b(x):=c , where· | c is the constant in (3), satisfies b−1 L (P ). → ∞ P( · |x) P( · |x) ∈ p X Let us now give some examples for distributions having τ-quantiles of p-average type α. Example 2.3 Let P be a distribution on X R with marginal distribution PX and regular condi- ×−z tional probability Qx ( , y] := 1/(1+e ), y R, where z := y m(x) /σ(x), m : X R describes a location shift,−∞ and σ : X [β, 1/β] describes∈ a scale modification− for some constant→ → β (0, 1]. Let us further assume that the functions m and σ are measurable. Thus Qx is a logistic ∈ −z −z 2 R distribution having the positive and bounded Lebesgue density hQx (y) = e /(1 + e ) , y . The -quantile function is ∗ ∗ τ , , and we can choose∈ τ t (x) := fτ,Qx = m(x) + σ(x) log( 1−τ ) x X ∗ ∗ ∈ b(x) = inf hQx (t): t [t (x) α, t (x) + α] . Note that hQx (m(x) + y) = hQx (m(x) y) for all y R,{ and h (y)∈is strictly− decreasing for}y [m(x), ). Some calculations show − ∈ Qx ∈ ∞ ∗ ∗ u1(x) u2(x) 1 b(x) = min hQx (t (x) α), hQx (t (x)+α) = min 2 , 2 cα,β , , − (1+) u1(x) (1+) u2(x) ∈ 4 1−τ −α/σ(x) 1−τ α/σ(x) n o where u1(x) := τ e , u2(x) := τ e and cα,β > 0 can be chosen independent of x, because σ(x) [β, 1/β]. Hence b−1 L (P ) and P has a τ-quantile of -average type α. ∈ ∈ ∞ X ∞ Example 2.4 Let P˜ be a distribution on X Y with marginal distribution P˜ and regular con- × X ditional probability Q˜ x := P(˜ x) on Y . Furthermore, assume that Q˜ x is P˜X -almost surely of τ-quantile type α. Let us now· | consider the family of distributions P with marginal distribu- tion P˜ and regular conditional distributions Q := P˜ ( m(x))/σ(x) x , x X, where X x · − ∈ m : X R and σ : X (β, 1/β) are as in the previous example. Then Qx has a τ-quantile ∗ → ∗ → fτ,Q = m(x) + σ(x)f ˜ of type αβ, because we obtain for s [0, αβ] the inequality x τ,Qx ∈ ∗ ∗ ˜ ∗ ∗ Qx (fτ,Q , fτ,Q + s) = Qx (f ˜ , f ˜ + s/σ(x)) b(x)s/σ(x) b(x)βs . x x τ,Qx τ,Qx ≥ ≥ Consequently, P has a τ-quantile of p-average type αβ if and only if P˜ does have a τ-quantile of p-average type α. The following theorem shows that for distributions having a quantile of p-average type the condi- tional quantile can be estimated by functions that approximately minimize the pinball risk. Theorem 2.5 Let p (0, ], τ (0, 1), α > 0 be real numbers, and q := p . Moreover, let ∈ ∞ ∈ p1+ P be a distribution on X Y that has a τ-quantile of p-average type α. Then for all f : X R × − p+2 2p → satisfying (f) ∗ 2 1p+ α 1p+ we have RLτ ,P − RLτ ,P ≤ ∗ √ −1 1/2 ∗ f f L (P ) 2 b Lτ ,P(f) . k − τ,Pk q X ≤ k kLp(PX ) R − RLτ ,P q Our next goal is to establish an oracle inequality for SVMs defined by (1). To this end let us assume Y = [ 1, 1]. Then we have Lτ (y, t¯) Lτ (y, t) for all y Y , t R, where t¯ denotes t clipped to the interval− [ 1, 1], i.e., t¯ := max ≤1, min 1, t .
Recommended publications
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • Lipschitz Continuity of Convex Functions
    Lipschitz Continuity of Convex Functions Bao Tran Nguyen∗ Pham Duy Khanh† November 13, 2019 Abstract We provide some necessary and sufficient conditions for a proper lower semi- continuous convex function, defined on a real Banach space, to be locally or globally Lipschitz continuous. Our criteria rely on the existence of a bounded selection of the subdifferential mapping and the intersections of the subdifferential mapping and the normal cone operator to the domain of the given function. Moreover, we also point out that the Lipschitz continuity of the given function on an open and bounded (not necessarily convex) set can be characterized via the existence of a bounded selection of the subdifferential mapping on the boundary of the given set and as a consequence it is equivalent to the local Lipschitz continuity at every point on the boundary of that set. Our results are applied to extend a Lipschitz and convex function to the whole space and to study the Lipschitz continuity of its Moreau envelope functions. Keywords Convex function, Lipschitz continuity, Calmness, Subdifferential, Normal cone, Moreau envelope function. Mathematics Subject Classification (2010) 26A16, 46N10, 52A41 1 Introduction arXiv:1911.04886v1 [math.FA] 12 Nov 2019 Lipschitz continuous and convex functions play a significant role in convex and nonsmooth analysis. It is well-known that if the domain of a proper lower semicontinuous convex function defined on a real Banach space has a nonempty interior then the function is continuous over the interior of its domain [3, Proposition 2.111] and as a consequence, it is subdifferentiable (its subdifferential is a nonempty set) and locally Lipschitz continuous at every point in the interior of its domain [3, Proposition 2.107].
    [Show full text]
  • The Beginnings 2 2. the Topology of Metric Spaces 5 3. Sequences and Completeness 9 4
    MATH 3963 NONLINEAR ODES WITH APPLICATIONS R MARANGELL Contents 1. Metric Spaces - The Beginnings 2 2. The Topology of Metric Spaces 5 3. Sequences and Completeness 9 4. The Contraction Mapping Theorem 13 5. Lipschitz Continuity 19 6. Existence and Uniqueness of Solutions to ODEs 21 7. Dependence on Initial Conditions and Parameters 26 8. Maximal Intervals of Existence 29 Date: September 6, 2017. 1 R Marangell Part II - Existence and Uniqueness of ODES 1. Metric Spaces - The Beginnings So.... I am from California, and so I fly through L.A. a lot on my way to my parents house. How far is that from Sydney? Well... Google tells me it's 12000 km (give or take), and in fact this agrees with my airline - they very nicely give me 12000 km on my frequent flyer points. But.... well.... are they really 12000 km apart? What if I drilled a hole through the surface of the Earth? Using a little trigonometry, and the fact that the radius of the earth is R = 6371 km, we have that the distance `as the mole digs' so-to-speak from Sydney to Los Angeles CA is given by (see Figure 1): 12000 x = 2R sin ≈ 10303 km (give or take) 2R Figure 1. A schematic of the earth showing an `equatorial' (or great) circle from Sydney to L.A. So I have two different answers, to the same question, and both are correct. Indeed, Sydney is both 12000 km and around 10303 km away from L.A. What's going on here is pretty obvious, but we're going to make it precise.
    [Show full text]
  • A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Convergence Rates for Deterministic and Stochastic Subgradient
    Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer∗ Abstract We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global O(1/√T ) convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor’s classic subgradient analysis and implies generalizations of the standard convergence rates for gradient descent on functions with Lipschitz or H¨older continuous gradients. Further, we show a O(1/√T ) convergence rate for the stochastic projected subgradient method on convex functions with at most quadratic growth, which improves to O(1/T ) under either strong convexity or a weaker quadratic lower bound condition. 1 Introduction We consider the nonsmooth, convex optimization problem given by min f(x) x∈Q for some lower semicontinuous convex function f : Rd R and closed convex feasible → ∪{∞} region Q. We assume Q lies in the domain of f and that this problem has a nonempty set of minimizers X∗ (with minimum value denoted by f ∗). Further, we assume orthogonal projection onto Q is computationally tractable (which we denote by PQ( )). arXiv:1712.04104v3 [math.OC] 26 Feb 2018 Since f may be nondifferentiable, we weaken the notion of gradients to· subgradients. The set of all subgradients at some x Q (referred to as the subdifferential) is denoted by ∈ ∂f(x)= g Rd y Rd f(y) f(x)+ gT (y x) . { ∈ | ∀ ∈ ≥ − } We consider solving this problem via a (potentially stochastic) projected subgradient method.
    [Show full text]
  • The Lipschitz Constant of Self-Attention
    The Lipschitz Constant of Self-Attention Hyunjik Kim 1 George Papamakarios 1 Andriy Mnih 1 Abstract constraint for neural networks, to control how much a net- Lipschitz constants of neural networks have been work’s output can change relative to its input. Such Lips- explored in various contexts in deep learning, chitz constraints are useful in several contexts. For example, such as provable adversarial robustness, estimat- Lipschitz constraints can endow models with provable ro- ing Wasserstein distance, stabilising training of bustness against adversarial pertubations (Cisse et al., 2017; GANs, and formulating invertible neural net- Tsuzuku et al., 2018; Anil et al., 2019), and guaranteed gen- works. Such works have focused on bounding eralisation bounds (Sokolic´ et al., 2017). Moreover, the dual the Lipschitz constant of fully connected or con- form of the Wasserstein distance is defined as a supremum volutional networks, composed of linear maps over Lipschitz functions with a given Lipschitz constant, and pointwise non-linearities. In this paper, we hence Lipschitz-constrained networks are used for estimat- investigate the Lipschitz constant of self-attention, ing Wasserstein distances (Peyré & Cuturi, 2019). Further, a non-linear neural network module widely used Lipschitz-constrained networks can stabilise training for in sequence modelling. We prove that the stan- GANs, an example being spectral normalisation (Miyato dard dot-product self-attention is not Lipschitz et al., 2018). Finally, Lipschitz-constrained networks are for unbounded input domain, and propose an al- also used to construct invertible models and normalising ternative L2 self-attention that is Lipschitz. We flows. For example, Lipschitz-constrained networks can be derive an upper bound on the Lipschitz constant of used as a building block for invertible residual networks and L2 self-attention and provide empirical evidence hence flow-based generative models (Behrmann et al., 2019; for its asymptotic tightness.
    [Show full text]
  • Nonparametric Multivariate Kurtosis and Tailweight Measures
    Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed.
    [Show full text]
  • Variance Reduction Techniques for Estimating Quantiles and Value-At-Risk" (2010)
    New Jersey Institute of Technology Digital Commons @ NJIT Dissertations Electronic Theses and Dissertations Spring 5-31-2010 Variance reduction techniques for estimating quantiles and value- at-risk Fang Chu New Jersey Institute of Technology Follow this and additional works at: https://digitalcommons.njit.edu/dissertations Part of the Databases and Information Systems Commons, and the Management Information Systems Commons Recommended Citation Chu, Fang, "Variance reduction techniques for estimating quantiles and value-at-risk" (2010). Dissertations. 212. https://digitalcommons.njit.edu/dissertations/212 This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact [email protected]. Cprht Wrnn & trtn h prht l f th Untd Stt (tl , Untd Stt Cd vrn th n f phtp r thr rprdtn f prhtd trl. Undr rtn ndtn pfd n th l, lbrr nd rhv r thrzd t frnh phtp r thr rprdtn. On f th pfd ndtn tht th phtp r rprdtn nt t b “d fr n prp thr thn prvt td, hlrhp, r rrh. If , r rt fr, r ltr , phtp r rprdtn fr prp n x f “fr tht r b lbl fr prht nfrnnt, h ntttn rrv th rht t rf t pt pn rdr f, n t jdnt, flfllnt f th rdr ld nvlv vltn f prht l. l t: h thr rtn th prht hl th r Inttt f hnl rrv th rht t dtrbt th th r drttn rntn nt: If d nt h t prnt th p, thn lt “ fr: frt p t: lt p n th prnt dl rn h n tn lbrr h rvd f th prnl nfrtn nd ll ntr fr th pprvl p nd brphl th f th nd drttn n rdr t prtt th dntt f I rdt nd flt.
    [Show full text]
  • Hölder-Continuity for the Nonlinear Stochastic Heat Equation with Rough Initial Conditions
    Stoch PDE: Anal Comp (2014) 2:316–352 DOI 10.1007/s40072-014-0034-6 Hölder-continuity for the nonlinear stochastic heat equation with rough initial conditions Le Chen · Robert C. Dalang Received: 22 October 2013 / Published online: 14 August 2014 © Springer Science+Business Media New York 2014 Abstract We study space-time regularity of the solution of the nonlinear stochastic heat equation in one spatial dimension driven by space-time white noise, with a rough initial condition. This initial condition is a locally finite measure μ with, possibly, exponentially growing tails. We show how this regularity depends, in a neighborhood of t = 0, on the regularity of the initial condition. On compact sets in which t > 0, 1 − 1 − the classical Hölder-continuity exponents 4 in time and 2 in space remain valid. However, on compact sets that include t = 0, the Hölder continuity of the solution is α ∧ 1 − α ∧ 1 − μ 2 4 in time and 2 in space, provided is absolutely continuous with an α-Hölder continuous density. Keywords Nonlinear stochastic heat equation · Rough initial data · Sample path Hölder continuity · Moments of increments Mathematics Subject Classification Primary 60H15 · Secondary 60G60 · 35R60 L. Chen and R. C. Dalang were supported in part by the Swiss National Foundation for Scientific Research. L. Chen (B) · R. C. Dalang Institut de mathématiques, École Polytechnique Fédérale de Lausanne, Station 8, CH-1015 Lausanne, Switzerland e-mail: [email protected] R. C. Dalang e-mail: robert.dalang@epfl.ch Present Address: L. Chen Department of Mathematics, University of Utah, 155 S 1400 E RM 233, Salt Lake City, UT 84112-0090, USA 123 Stoch PDE: Anal Comp (2014) 2:316–352 317 1 Introduction Over the last few years, there has been considerable interest in the stochastic heat equation with non-smooth initial data: ∂ ν ∂2 ˙ ∗ − u(t, x) = ρ(u(t, x)) W(t, x), x ∈ R, t ∈ R+, ∂t 2 ∂x2 (1.1) u(0, ·) = μ(·).
    [Show full text]
  • Modified Odd Weibull Family of Distributions: Properties and Applications Christophe Chesneau, Taoufik El Achi
    Modified Odd Weibull Family of Distributions: Properties and Applications Christophe Chesneau, Taoufik El Achi To cite this version: Christophe Chesneau, Taoufik El Achi. Modified Odd Weibull Family of Distributions: Properties and Applications. 2019. hal-02317235 HAL Id: hal-02317235 https://hal.archives-ouvertes.fr/hal-02317235 Submitted on 15 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modified Odd Weibull Family of Distributions: Properties and Applications Christophe CHESNEAU and Taoufik EL ACHI LMNO, University of Caen Normandie, 14000, Caen, France 17 August 2019 Abstract In this paper, a new family of continuous distributions, called the modified odd Weibull-G (MOW-G) family, is studied. The MOW-G family has the feature to use the Weibull distribu- tion as main generator and a new modification of the odd transformation, opening new horizon in terms of statistical modelling. Its main theoretical and practical aspects are explored. In particular, for the mathematical properties, we investigate some results in distribution, quantile function, skewness, kurtosis, moments, moment generating function, order statistics and en- tropy. For the statistical aspect, the maximum likelihood estimation method is used to estimate the model parameters.
    [Show full text]
  • 1. the Rademacher Theorem
    GEOMETRIC ANALYSIS PIOTR HAJLASZ 1. The Rademacher theorem 1.1. The Rademacher theorem. Lipschitz functions defined on one dimensional inter- vals are differentiable a.e. This is a classical result that is covered in most of courses in mea- sure theory. However, Rademacher proved a much deeper result that Lipschitz functions defined on open sets in Rn are also differentiable a.e. Let us first discuss differentiability of Lipschitz functions defined on one dimensional intervals. This result is a special case of differentiability of absolutely continuous functions. Definition 1.1. We say that a function f :[a; b] ! R is absolutely continuous if for very " > 0 there is δ > 0 such that if (x1; x1 +h1);:::; (xk; xk +hk) are pairwise disjoint intervals Pk in [a; b] of total length less than δ, i=1 hi < δ, then k X jf(xi + hi) − f(xi)j < ": i=1 The definition of an absolutely continuous function reminds the definition of a uniformly continuous function. Indeed, we would obtain the definition of a uniformly continuous function if we would restrict just to a single interval (x; x+h), i.e. if we would assume that k = 1. Despite similarity, the class of absolutely continuous function is much smaller than the class of uniformly continuous function. For example Lipschitz function f :[a; b] ! R are absolutely continuous, but in general H¨oldercontinuous functions are uniformly continuous, but not absolutely continuous. Proposition 1.2. If f; g :[a; b] ! R are absolutely continuous, then also the functions f ± g and fg are absolutely continuous.
    [Show full text]