Data Density and Structure

Total Page:16

File Type:pdf, Size:1020Kb

Data Density and Structure Part II Data Density and Structure 187 189 A canonical problem in statistics is to gain understanding of a given random sample, y1,y2,...,yn, so as to understand the process that yielded the data. The specific objective is to make inferences about the population from which the random sample arose. In many cases we wish to make inferences only about some finite set of parameters, such as the mean and variance, that describe the population. In other cases we want to predict a future value of an observation. Sometimes the objective is more difficult; we want to estimate a function that characterizes the distribution of the population. The cumulative distribution function (CDF) or the probability density function (PDF) provides a complete description of the population, and so we may wish to estimate these functions. In the simpler cases of statistical inference, we assume that the form of the CDF P is known, and that there is a parameter, θ =Θ(P ), of finite dimension that characterizes the distribution within that assumed family of forms. An objective in such cases may be to determine an estimate θ of the parameter θ. The parameter may completely characterize the probability distribution of the population or it may just determine an important property of the distribution, such as its mean or median. If the distribution or density function is assumed known up to a vector of parameters, the complete description is provided by the parameter estimate. For example, if the distribution is assumed to be normal, the form of P is known. It involves two parameters, the mean µ and the variance σ2. The problem of completely describing the distribution is merely the problem of estimating θ =(µ, σ2). In this case, the estimates of the CDF, P,andthedensity,p, are the normal CDF and density with the estimate of the parameter, θ, plugged in. If no assumptions, or only weak assumptions, are made about the form of the distribution or density function, the estimation problem is much more diffi- cult. Because the distribution function or density function is a characterization from which all other properties of the distribution could be determined, we ex- pect the estimation of the function to be the most difficult type of statistical inference. “Most difficult” is clearly a heuristic concept and here may mean that the estimator is most biased, most variable, most difficult to compute, most mathematically intractable, and so on. Estimators such as θ for the parameter θ or p for the density p are usu- ally random variables, hence, we are interested in the statistical properties of these estimators. If our approach to the problem treats θ and p as fixed (but unknown), then the distribution of θ and p canbeusedtomakeinformative statements about θ and p. Alternatively, if θ and p are viewed as realizations of random variables, then the distribution of θ and p canbeusedtomakein- formative statements about conditional distributions of the parameter and the function, given the observed data. While the CDF in some ways is more fundamental in characterizing a prob- ability distribution (it always exists and is defined the same for both continuous 190 and discrete distributions), the probability density function is more familiar to most data analysts. Important properties such as skewness, modes, and so on can be seen more readily from a plot of the probability density function than from a plot of the CDF. We are therefore usually more interested in estimating the density, p, than the CDF, P . Some methods of estimating the density, however, are based on estimates of the CDF. The simplest estimate of the CDF is the empirical cumulative distribution function, the ECDF, which is defined as 1 n P (y)= I(−∞ ](y ). (7.6) n n ,y i i=1 (See page 363 for definition and properties of the indicator function IS (·)inthe ECDF.) As we have seen on page 11, the ECDF is pointwise unbiased for the CDF. The derivative of the ECDF, the empirical probability density function (EPDF), 1 n p (y)= δ(y − y ), (7.7) n n i i=1 where δ is the Dirac delta function, is just a series of spikes at points corre- sponding to the observed values. It is not very useful as an estimator of the probability density. It is, however, unbiased for the probability density function at any point. In the absence of assumptions about the form of the density p,theesti- mation problem may be computationally intensive. A very large sample is usually required in order to get a reliable estimate of the density. How good the estimate is depends on the dimension of the random variable. Heuristically, the higher the dimension, the larger the sample required to provide adequate representation of the sample space. Density estimation generally has more modest goals than the development of a mathematical expression that yields the probability density function p everywhere. Although we may develop such an expression, the purpose of the estimate is usually a more general understanding of the population: • to identify structure in the population, its modality, tail behavior, skew- ness, and so on; • to classify the data and to identify different subpopulations giving rise to it; or • to make a visual presentation that represents the population density. There are several ways to approach the probability density estimation prob- lem. In a parametric approach mentioned above, a parametric family of dis- tributions, such as a normal distribution or a beta distribution, is assumed. The density is estimated by estimating the parameters of the distribution and 191 substituting the estimates into the expression for the density. In a nonparamet- ric approach, only very general assumptions are made about the distribution. These assumptions may only address the shape of the distribution, such as an assumption of unimodality, or an assumption of continuity or other degrees of smoothness of the density function. There are various semi-parametric ap- proaches in which, for example, parametric assumptions may be made only over a subset of the range of the distribution, or in a multivariate case, a parametric approach may be taken for some elements of the random vector and a non- parametric approach for others. Another approach is to assume a more general family of distributions, perhaps characterized by a differential equation, for ex- ample, and to fit the equation by equating properties of the sample, such as sample moments, with the corresponding properties of the equation. In the case of parametric estimation, we have a complete estimate of the density; that is, an estimate at all points. In nonparametric estimation, we generally develop estimates of the ordinate of the density function at specific points. After the estimates are available at given points a smooth function can be fitted. In the next few chapters we will be concerned primarily with nonpara- metric estimation of probability densities and identification of structure in the data. In Chapter 11 we will consider building models that express asymmetric relationships between variables, and making inference about those models..
Recommended publications
  • The Exponential Family 1 Definition
    The Exponential Family David M. Blei Columbia University November 9, 2016 The exponential family is a class of densities (Brown, 1986). It encompasses many familiar forms of likelihoods, such as the Gaussian, Poisson, multinomial, and Bernoulli. It also encompasses their conjugate priors, such as the Gamma, Dirichlet, and beta. 1 Definition A probability density in the exponential family has this form p.x / h.x/ exp >t.x/ a./ ; (1) j D f g where is the natural parameter; t.x/ are sufficient statistics; h.x/ is the “base measure;” a./ is the log normalizer. Examples of exponential family distributions include Gaussian, gamma, Poisson, Bernoulli, multinomial, Markov models. Examples of distributions that are not in this family include student-t, mixtures, and hidden Markov models. (We are considering these families as distributions of data. The latent variables are implicitly marginalized out.) The statistic t.x/ is called sufficient because the probability as a function of only depends on x through t.x/. The exponential family has fundamental connections to the world of graphical models (Wainwright and Jordan, 2008). For our purposes, we’ll use exponential 1 families as components in directed graphical models, e.g., in the mixtures of Gaussians. The log normalizer ensures that the density integrates to 1, Z a./ log h.x/ exp >t.x/ d.x/ (2) D f g This is the negative logarithm of the normalizing constant. The function h.x/ can be a source of confusion. One way to interpret h.x/ is the (unnormalized) distribution of x when 0. It might involve statistics of x that D are not in t.x/, i.e., that do not vary with the natural parameter.
    [Show full text]
  • A Skew Extension of the T-Distribution, with Applications
    J. R. Statist. Soc. B (2003) 65, Part 1, pp. 159–174 A skew extension of the t-distribution, with applications M. C. Jones The Open University, Milton Keynes, UK and M. J. Faddy University of Birmingham, UK [Received March 2000. Final revision July 2002] Summary. A tractable skew t-distribution on the real line is proposed.This includes as a special case the symmetric t-distribution, and otherwise provides skew extensions thereof.The distribu- tion is potentially useful both for modelling data and in robustness studies. Properties of the new distribution are presented. Likelihood inference for the parameters of this skew t-distribution is developed. Application is made to two data modelling examples. Keywords: Beta distribution; Likelihood inference; Robustness; Skewness; Student’s t-distribution 1. Introduction Student’s t-distribution occurs frequently in statistics. Its usual derivation and use is as the sam- pling distribution of certain test statistics under normality, but increasingly the t-distribution is being used in both frequentist and Bayesian statistics as a heavy-tailed alternative to the nor- mal distribution when robustness to possible outliers is a concern. See Lange et al. (1989) and Gelman et al. (1995) and references therein. It will often be useful to consider a further alternative to the normal or t-distribution which is both heavy tailed and skew. To this end, we propose a family of distributions which includes the symmetric t-distributions as special cases, and also includes extensions of the t-distribution, still taking values on the whole real line, with non-zero skewness. Let a>0 and b>0be parameters.
    [Show full text]
  • On a Problem Connected with Beta and Gamma Distributions by R
    ON A PROBLEM CONNECTED WITH BETA AND GAMMA DISTRIBUTIONS BY R. G. LAHA(i) 1. Introduction. The random variable X is said to have a Gamma distribution G(x;0,a)if du for x > 0, (1.1) P(X = x) = G(x;0,a) = JoT(a)" 0 for x ^ 0, where 0 > 0, a > 0. Let X and Y be two independently and identically distributed random variables each having a Gamma distribution of the form (1.1). Then it is well known [1, pp. 243-244], that the random variable W = X¡iX + Y) has a Beta distribution Biw ; a, a) given by 0 for w = 0, (1.2) PiW^w) = Biw;x,x)=\ ) u"-1il-u)'-1du for0<w<l, Ío T(a)r(a) 1 for w > 1. Now we can state the converse problem as follows : Let X and Y be two independently and identically distributed random variables having a common distribution function Fix). Suppose that W = Xj{X + Y) has a Beta distribution of the form (1.2). Then the question is whether £(x) is necessarily a Gamma distribution of the form (1.1). This problem was posed by Mauldon in [9]. He also showed that the converse problem is not true in general and constructed an example of a non-Gamma distribution with this property using the solution of an integral equation which was studied by Goodspeed in [2]. In the present paper we carry out a systematic investigation of this problem. In §2, we derive some general properties possessed by this class of distribution laws Fix).
    [Show full text]
  • Lectures on the Local Semicircle Law for Wigner Matrices
    Lectures on the local semicircle law for Wigner matrices Florent Benaych-Georges∗ Antti Knowlesy September 11, 2018 These notes provide an introduction to the local semicircle law from random matrix theory, as well as some of its applications. We focus on Wigner matrices, Hermitian random matrices with independent upper-triangular entries with zero expectation and constant variance. We state and prove the local semicircle law, which says that the eigenvalue distribution of a Wigner matrix is close to Wigner's semicircle distribution, down to spectral scales containing slightly more than one eigenvalue. This local semicircle law is formulated using the Green function, whose individual entries are controlled by large deviation bounds. We then discuss three applications of the local semicircle law: first, complete delocalization of the eigenvectors, stating that with high probability the eigenvectors are approximately flat; second, rigidity of the eigenvalues, giving large deviation bounds on the locations of the individual eigenvalues; third, a comparison argument for the local eigenvalue statistics in the bulk spectrum, showing that the local eigenvalue statistics of two Wigner matrices coincide provided the first four moments of their entries coincide. We also sketch further applications to eigenvalues near the spectral edge, and to the distribution of eigenvectors. arXiv:1601.04055v4 [math.PR] 10 Sep 2018 ∗Universit´eParis Descartes, MAP5. Email: [email protected]. yETH Z¨urich, Departement Mathematik. Email: [email protected].
    [Show full text]
  • A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
    S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; [email protected] 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; [email protected] 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; [email protected] * Correspondence: [email protected] or [email protected] Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1.
    [Show full text]
  • Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families
    STATS 300A: Theory of Statistics Fall 2015 Lecture 2 | September 24 Lecturer: Lester Mackey Scribe: Stephen Bates and Andy Tsao 2.1 Recap Last time, we set out on a quest to develop optimal inference procedures and, along the way, encountered an important pair of assertions: not all data is relevant, and irrelevant data can only increase risk and hence impair performance. This led us to introduce a notion of lossless data compression (sufficiency): T is sufficient for P with X ∼ Pθ 2 P if X j T (X) is independent of θ. How far can we take this idea? At what point does compression impair performance? These are questions of optimal data reduction. While we will develop general answers to these questions in this lecture and the next, we can often say much more in the context of specific modeling choices. With this in mind, let's consider an especially important class of models known as the exponential family models. 2.2 Exponential Families Definition 1. The model fPθ : θ 2 Ωg forms an s-dimensional exponential family if each Pθ has density of the form: s ! X p(x; θ) = exp ηi(θ)Ti(x) − B(θ) h(x) i=1 • ηi(θ) 2 R are called the natural parameters. • Ti(x) 2 R are its sufficient statistics, which follows from NFFC. • B(θ) is the log-partition function because it is the logarithm of a normalization factor: s ! ! Z X B(θ) = log exp ηi(θ)Ti(x) h(x)dµ(x) 2 R i=1 • h(x) 2 R: base measure.
    [Show full text]
  • Semi-Parametric Likelihood Functions for Bivariate Survival Data
    Old Dominion University ODU Digital Commons Mathematics & Statistics Theses & Dissertations Mathematics & Statistics Summer 2010 Semi-Parametric Likelihood Functions for Bivariate Survival Data S. H. Sathish Indika Old Dominion University Follow this and additional works at: https://digitalcommons.odu.edu/mathstat_etds Part of the Mathematics Commons, and the Statistics and Probability Commons Recommended Citation Indika, S. H. S.. "Semi-Parametric Likelihood Functions for Bivariate Survival Data" (2010). Doctor of Philosophy (PhD), Dissertation, Mathematics & Statistics, Old Dominion University, DOI: 10.25777/ jgbf-4g75 https://digitalcommons.odu.edu/mathstat_etds/30 This Dissertation is brought to you for free and open access by the Mathematics & Statistics at ODU Digital Commons. It has been accepted for inclusion in Mathematics & Statistics Theses & Dissertations by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. SEMI-PARAMETRIC LIKELIHOOD FUNCTIONS FOR BIVARIATE SURVIVAL DATA by S. H. Sathish Indika BS Mathematics, 1997, University of Colombo MS Mathematics-Statistics, 2002, New Mexico Institute of Mining and Technology MS Mathematical Sciences-Operations Research, 2004, Clemson University MS Computer Science, 2006, College of William and Mary A Dissertation Submitted to the Faculty of Old Dominion University in Partial Fulfillment of the Requirement for the Degree of DOCTOR OF PHILOSOPHY DEPARTMENT OF MATHEMATICS AND STATISTICS OLD DOMINION UNIVERSITY August 2010 Approved by: DayanandiN; Naik Larry D. Le ia M. Jones ABSTRACT SEMI-PARAMETRIC LIKELIHOOD FUNCTIONS FOR BIVARIATE SURVIVAL DATA S. H. Sathish Indika Old Dominion University, 2010 Director: Dr. Norou Diawara Because of the numerous applications, characterization of multivariate survival dis­ tributions is still a growing area of research.
    [Show full text]
  • Delta Functions and Distributions
    When functions have no value(s): Delta functions and distributions Steven G. Johnson, MIT course 18.303 notes Created October 2010, updated March 8, 2017. Abstract x = 0. That is, one would like the function δ(x) = 0 for all x 6= 0, but with R δ(x)dx = 1 for any in- These notes give a brief introduction to the mo- tegration region that includes x = 0; this concept tivations, concepts, and properties of distributions, is called a “Dirac delta function” or simply a “delta which generalize the notion of functions f(x) to al- function.” δ(x) is usually the simplest right-hand- low derivatives of discontinuities, “delta” functions, side for which to solve differential equations, yielding and other nice things. This generalization is in- a Green’s function. It is also the simplest way to creasingly important the more you work with linear consider physical effects that are concentrated within PDEs, as we do in 18.303. For example, Green’s func- very small volumes or times, for which you don’t ac- tions are extremely cumbersome if one does not al- tually want to worry about the microscopic details low delta functions. Moreover, solving PDEs with in this volume—for example, think of the concepts of functions that are not classically differentiable is of a “point charge,” a “point mass,” a force plucking a great practical importance (e.g. a plucked string with string at “one point,” a “kick” that “suddenly” imparts a triangle shape is not twice differentiable, making some momentum to an object, and so on.
    [Show full text]
  • Distributions (3) © 2008 Winton 2 VIII
    © 2008 Winton 1 Distributions (3) © onntiW8 020 2 VIII. Lognormal Distribution • Data points t are said to be lognormally distributed if the natural logarithms, ln(t), of these points are normally distributed with mean μ ion and standard deviatσ – If the normal distribution is sampled to get points rsample, then the points ersample constitute sample values from the lognormal distribution • The pdf or the lognormal distribution is given byf - ln(x)()μ 2 - 1 2 f(x)= e ⋅ 2 σ >(x 0) σx 2 π because ∞ - ln(x)()μ 2 - ∞ -() t μ-2 1 1 2 1 2 ∫ e⋅ 2eσ dx = ∫ dt ⋅ 2σ (where= t ln(x)) 0 xσ2 π 0σ2 π is the pdf for the normal distribution © 2008 Winton 3 Mean and Variance for Lognormal • It can be shown that in terms of μ and σ σ2 μ + E(X)= 2 e 2⋅μ σ + 2 σ2 Var(X)= e( ⋅ ) e - 1 • If the mean E and variance V for the lognormal distribution are given, then the corresponding μ and σ2 for the normal distribution are given by – σ2 = log(1+V/E2) – μ = log(E) - σ2/2 • The lognormal distribution has been used in reliability models for time until failure and for stock price distributions – The shape is similar to that of the Gamma distribution and the Weibull distribution for the case α > 2, but the peak is less towards 0 © 2008 Winton 4 Graph of Lognormal pdf f(x) 0.5 0.45 μ=4, σ=1 0.4 0.35 μ=4, σ=2 0.3 μ=4, σ=3 0.25 μ=4, σ=4 0.2 0.15 0.1 0.05 0 x 0510 μ and σ are the lognormal’s mean and std dev, not those of the associated normal © onntiW8 020 5 IX.
    [Show full text]
  • MTHE/MATH 332 Introduction to Control
    1 Queen’s University Mathematics and Engineering and Mathematics and Statistics MTHE/MATH 332 Introduction to Control (Preliminary) Supplemental Lecture Notes Serdar Yuksel¨ April 3, 2021 2 Contents 1 Introduction ....................................................................................1 1.1 Introduction................................................................................1 1.2 Linearization...............................................................................4 2 Systems ........................................................................................7 2.1 System Properties...........................................................................7 2.2 Linear Systems.............................................................................8 2.2.1 Representation of Discrete-Time Signals in terms of Unit Pulses..............................8 2.2.2 Linear Systems.......................................................................8 2.3 Linear and Time-Invariant (Convolution) Systems................................................9 2.4 Bounded-Input-Bounded-Output (BIBO) Stability of Convolution Systems........................... 11 2.5 The Frequency Response (or Transfer) Function of Linear Time-Invariant Systems..................... 11 2.6 Steady-State vs. Transient Solutions............................................................ 11 2.7 Bode Plots................................................................................. 12 2.8 Interconnections of Systems..................................................................
    [Show full text]
  • Computational Methods in Nonlinear Physics 1. Review of Probability and Statistics
    Computational Methods in Nonlinear Physics 1. Review of probability and statistics P.I. Hurtado Departamento de Electromagnetismo y F´ısicade la Materia, and Instituto Carlos I de F´ısica Te´oricay Computacional. Universidad de Granada. E-18071 Granada. Spain E-mail: [email protected] Abstract. These notes correspond to the first part (20 hours) of a course on Computational Methods in Nonlinear Physics within the Master on Physics and Mathematics (FisyMat) of University of Granada. In this first chapter we review some concepts of probability theory and stochastic processes. Keywords: Computational physics, probability and statistics, Monte Carlo methods, stochastic differential equations, Langevin equation, Fokker-Planck equation, molecular dynamics. References and sources [1] R. Toral and P. Collet, Stochastic Numerical Methods, Wiley (2014). [2] H.-P. Breuer and F. Petruccione, The Theory of Open Quantum Systems, Oxford Univ. Press (2002). [3] N.G. van Kampen, Stochastic Processes in Physics and Chemistry, Elsevier (2007). [4] Wikipedia: https://en.wikipedia.org CONTENTS 2 Contents 1 The probability space4 1.1 The σ-algebra of events....................................5 1.2 Probability measures and Kolmogorov axioms........................6 1.3 Conditional probabilities and statistical independence....................8 2 Random variables9 2.1 Definition of random variables................................. 10 3 Average values, moments and characteristic function 15 4 Some Important Probability Distributions 18 4.1 Bernuilli distribution.....................................
    [Show full text]
  • Theoretical Properties of the Weighted Feller-Pareto Distributions." Asian Journal of Mathematics and Applications, 2014: 1-12
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Georgia Southern University: Digital Commons@Georgia Southern Georgia Southern University Digital Commons@Georgia Southern Mathematical Sciences Faculty Publications Mathematical Sciences, Department of 2014 Theoretical Properties of the Weighted Feller- Pareto Distributions Oluseyi Odubote Georgia Southern University Broderick O. Oluyede Georgia Southern University, [email protected] Follow this and additional works at: https://digitalcommons.georgiasouthern.edu/math-sci-facpubs Part of the Mathematics Commons Recommended Citation Odubote, Oluseyi, Broderick O. Oluyede. 2014. "Theoretical Properties of the Weighted Feller-Pareto Distributions." Asian Journal of Mathematics and Applications, 2014: 1-12. source: http://scienceasia.asia/index.php/ama/article/view/173/ https://digitalcommons.georgiasouthern.edu/math-sci-facpubs/307 This article is brought to you for free and open access by the Mathematical Sciences, Department of at Digital Commons@Georgia Southern. It has been accepted for inclusion in Mathematical Sciences Faculty Publications by an authorized administrator of Digital Commons@Georgia Southern. For more information, please contact [email protected]. ASIAN JOURNAL OF MATHEMATICS AND APPLICATIONS Volume 2014, Article ID ama0173, 12 pages ISSN 2307-7743 http://scienceasia.asia THEORETICAL PROPERTIES OF THE WEIGHTED FELLER-PARETO AND RELATED DISTRIBUTIONS OLUSEYI ODUBOTE AND BRODERICK O. OLUYEDE Abstract. In this paper, for the first time, a new six-parameter class of distributions called weighted Feller-Pareto (WFP) and related family of distributions is proposed. This new class of distributions contains several other Pareto-type distributions such as length-biased (LB) Pareto, weighted Pareto (WP I, II, III, and IV), and Pareto (P I, II, III, and IV) distributions as special cases.
    [Show full text]