3. the Multivariate Normal Distribution 3.1 Introduction

Total Page:16

File Type:pdf, Size:1020Kb

3. the Multivariate Normal Distribution 3.1 Introduction 3. The Multivariate Normal Distribution 3.1 Introduction • A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis • While real data are never exactly multivariate normal, the normal density is often a useful approximation to the \true" population distribution because of a central limit effect. • One advantage of the multivariate normal distribution stems from the fact that it is mathematically tractable and \nice" results can be obtained. 1 To summarize, many real-world problems fall naturally within the framework of normal theory. The importance of the normal distribution rests on its dual role as both population model for certain natural phenomena and approximate sampling distribution for many statistics. 2 3.2 The Multivariate Normal density and Its Properties • Recall that the univariate normal distribution, with mean µ and variance σ2, has the probability density function 1 2 f(x) = p e−[(x−µ)/σ] =2 − 1 < x < 1 2πσ2 • The term x − µ2 = (x − µ)(σ2)−1(x − µ) σ • This can be generalized for p × 1 vector x of observations on serval variables as (x − µ)0Σ−1(x − µ) The p × 1 vector µ represents the expected value of the random vector X, and the p × p matrix Σ is the variance-covariance matrix of X. 3 0 • A p-dimensional normal density for the random vector X = [X1;X2;:::;Xp] has the form 1 0 −1 f(x) = e−(x−µ) Σ (x−µ)=2 (2π)p=2jΣj1=2 where −∞ < xi < 1; i = 1; 2; : : : ; p: We should denote this p-dimensional normal density by Np(µ; Σ): 4 Example 3.1 (Bivariate normal density) Let us evaluate the p = 2 variate normal density in terms of the individual parameters µ = E(X ); µ = 1 p p1 2 E(X2); σ11 = Var(X1); σ22 = Var(X2), and ρ12 = σ12=( σ11 σ22) = Corr(X1;X2): Result 3.1 If Σ is positive definite, so that Σ−1 exists, then 1 Σe = λe implies Σ−1e = e λ so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair (1/λ, e) for Σ−1. Also Σ−1 is positive definite. 5 6 Constant probability density contour = f all x such that (x − µ)0Σ−1(x − µ) = c2g = surface of an ellipsoid centered at µ. Contours of constant density for the p-dimensional normal distribution are ellipsoids defined by x such the that (x − µ)0Σ−1(x − µ) = c2 p These ellipsoids are centered at µ and have axes ±c λiei, where Σei = λi for i = 1; 2; : : : ; p. 7 Example 4.2 (Contours of the bivariate normal density) Obtain the axes of constant probability density contours for a bivariate normal distribution when σ11 = σ22 8 The solid ellipsoid of x values satisfying 0 −1 2 (x − µ) Σ (x − µ) ≤ χp(α) 2 has probability 1−α where χp(α) is the upper (100α)th percentile of a chi-square distribution with p degrees of freedom. 9 Additional Properties of the Multivariate Normal Distribution The following are true for a normal vector X having a multivariate normal distribution: 1. Linear combination of the components of X are normally distributed. 2. All subsets of the components of X have a (multivariate) normal distribution. 3. Zero covariance implies that the corresponding components are independently distributed. 4. The conditional distributions of the components are normal. 10 Result 3.2 If X is distributed as Np(µ; Σ), then any linear combination of 0 0 0 variables a X = a1X1 +a2X2 +···+apXp is distributed as N(a µ; a Σa). Also 0 0 0 if a X is distributed as N(a µ; a Σa) for every a, then X must be Np(µ; Σ): Example 3.3 (The distribution of a linear combination of the component of a normal random vector) Consider the linear combination a0X of a multivariate normal random vector determined by the choice a0 = [1; 0;:::; 0]: Result 3.3 If X is distributed as Np(µ; Σ), the q linear combinations 2 3 a11X1 + ··· + a1pXp 6 a21X1 + ··· + a2pXp 7 A(q×p)Xp×1 = 6 . 7 4 . 5 aq1X1 + ··· + aqpXp 0 are distributed as Nq(Aµ; AΣA ). Also Xp×1 + dp×1, where d is a vector of constants, is distributed as Np(µ + d; Σ). 11 Example 3.4 (The distribution of two linear combinations of the components of a normal random vector) For X distributed as N3(µ; Σ), find the distribution of 2 3 X1 X1 − X2 1 −1 0 = 4 X2 5 = AX X2 − X3 0 1 −1 X3 12 Result 3.4 All subsets of X are normally distributed. If we respectively partition X, its mean vector µ, and its covariance matrix Σ as 2 3 2 3 X1 µ1 6 (q × 1) 7 6 (q × 1) 7 6 7 6 7 X(p×1) = 6 ······ 7 µ = 6 ······ 7 6 7 (p×1) 6 7 4 X2 5 4 µ2 5 (p − q) × 1 (p − q) × 1 and 2 3 Σ11 Σ12 6 (q × 1) (q × (p − q)) 7 6 7 Σ(p×p) = 6 ············ 7 6 7 4 Σ21 Σ22 5 ((p − q) × q) ((p − q) × (p − q)) then X1 is distributed as Nq(µ1; Σ11). Example 3.5 (The distribution of a subset of a normal random vector) 0 If X is distributed as N5(µ; Σ), find the distribution of [X2;X4] . 13 Result 3.5 (a) If X1 and X2 are independent, then Cov(X1; X2) = 0, a q1 × q2 matrix of zeros, where X1 is q1 × 1 random vector and X2 is q2 × 1. random vector X1 µ1 Σ11 Σ12 (b) If is Nq1+q2 ; , then X1 and X2 are X2 µ2 Σ21 Σ22 independent if and only if Σ12 = Σ21 = 0. (c) If X1 and X2 are independent and are distributed as Nq (µ ; Σ11) 1 1 X1 and Nq2(µ2; Σ22), respectively, then has the multivariate normal X2 distribution µ1 Σ11 0 Nq1+q2 ; µ2 0 Σ22 14 Example 3.6 (The equivalence of zero covariance and independence for normal variables) Let X3×1 be N3(µ; Σ) with 2 4 1 0 3 Σ = 4 1 3 0 5 0 0 2 Are X1 and X2 independent ? What about (X1;X2) and X3 ? X1 µ1 Result 3.6 Let X = be distributed as Np(µ; Σ) with , Σ = X2 µ2 Σ11 Σ12 , and jΣ22j > 0. Then the conditional distribution of X1, given Σ21 Σ22 that X2 = x2 is normal and has −1 Mean = µ1 + Σ12Σ22 (x2 − µ2) and −1 Covariance = Σ11 − Σ12Σ22 Σ21 Note that the covariance does not depend on the value x2 of the conditioning variable. 15 Example 3.7 (The conditional density of a bivariate normal distribution) Obtain the conditional density of X1, give that X2 = x2 for any bivariate distribution. Result 3.7 Let X be distributed as Np(µ; Σ) with jΣj > 0. Then 0 −1 2 2 (a) (X − µ) Σ (X − µ) is distributed as χp, where χp denotes the chi-square distribution with p degrees of freedom. (b) The Np(µ; Σ)distribution assign probability 1 − α to the solid ellipsoid 0 −1 2 2 fx :(x − µ) Σ (x − µ) ≤ χp(α)g, where χp(α) denote the upper (100α)th 2 percentile of the χp distribution. 16 Result 3.8 Let X1; X2;:::; Xn be mutually independent with Xj distributed as Np(µj; Σ). (Note that each Xj has the same covariance matrix Σ.) Then V1 = c1X1 + c2X2 + ··· + cnXn n n ! P P 2 is distributed as Np cjµj; ( cj )Σ . Moreover, V1 and V2 = b1X1 + j=1 j=1 b2X2 + ··· + bnXn are jointly multivariate normal with covariance matrix 2 n 3 P 2 0 ( cj )Σ b cΣ 6 j=1 7 6 n 7 4 0 P 2 5 b cΣ21 ( bj )Σ j=1 n 0 P Consequently, V1 and V2 are independent if b c = cjbj = 0. j=1 17 Example 3.8 (Linear combinations of random vectors) Let X1; X2; X3 and X4 be independent and identically distributed 3 × 1 random vectors with 2 3 3 2 3 −1 1 3 µ = 4 −1 5 and Σ = 4 −1 1 0 5 1 1 0 2 0 (a) find the mean and variance of the linear combination a X1 of the three 0 components of X1 where a = [a1 a2 a3] . (b) Consider two linear combinations of random vectors 1 1 1 1 X + X + X + X 2 1 2 2 2 3 2 4 and X1 + X2 + X3 − 3X4: Find the mean vector and covariance matrix for each linear combination of vectors and also the covariance between them. 18 3.3 Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Multivariate Normal Likelihood • Joint density function of all p × 1 observed random vectors X1; X2;:::; Xn Joint density of X1; X2;:::; Xn n Y 1 0 −1 = e−(xj−µ) Σ (xj−µ)=2 (2π)p=2jΣj1=2 j=1 n P 0 −1 1 − (xj−µ) Σ (xj−µ)=2 = e j=1 (2π)np=2jΣjn=2 " n !#. −1 P ¯ ¯ 0 ¯ ¯ 0 1 −tr Σ (xj−x)(xj−x) +n(x−µ)(x−µ) 2 = e j=1 (2π)np=2jΣjn=2 19 • Likelihood When the numerical values of the observations become available, they may be substituted for the xj in the equation above. The resulting expression, now considered as a function of µ and Σ for the fixed set of observations x1; x2;:::; xn, is called the likelihood. • Maximum likelihood estimation One meaning of best is to select the parameter values that maximize the joint density evaluated at the observations.
Recommended publications
  • The Statistical Analysis of Distributions, Percentile Rank Classes and Top-Cited
    How to analyse percentile impact data meaningfully in bibliometrics: The statistical analysis of distributions, percentile rank classes and top-cited papers Lutz Bornmann Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstraße 8, 80539 Munich, Germany; [email protected]. 1 Abstract According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year. Up to now, bibliometric research has concerned itself primarily with the calculation of percentiles. This study suggests how percentiles can be analysed meaningfully for an evaluation study. Publication sets from four universities are compared with each other to provide sample data. These suggestions take into account on the one hand the distribution of percentiles over the publications in the sets (here: universities) and on the other hand concentrate on the range of publications with the highest citation impact – that is, the range which is usually of most interest in the evaluation of scientific performance. Key words percentiles; research evaluation; institutional comparisons; percentile rank classes; top-cited papers 2 1 Introduction According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year (Bornmann, de Moya Anegón, & Leydesdorff, 2012; Bornmann, Mutz, Marx, Schier, & Daniel, 2011; Leydesdorff, Bornmann, Mutz, & Opthof, 2011). Until today, it has been customary in evaluative bibliometrics to use the arithmetic mean value to normalize citation data (Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011).
    [Show full text]
  • Relationships Among Some Univariate Distributions
    IIE Transactions (2005) 37, 651–656 Copyright C “IIE” ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080/07408170590948512 Relationships among some univariate distributions WHEYMING TINA SONG Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, Taiwan, Republic of China E-mail: [email protected], wheyming [email protected] Received March 2004 and accepted August 2004 The purpose of this paper is to graphically illustrate the parametric relationships between pairs of 35 univariate distribution families. The families are organized into a seven-by-five matrix and the relationships are illustrated by connecting related families with arrows. A simplified matrix, showing only 25 families, is designed for student use. These relationships provide rapid access to information that must otherwise be found from a time-consuming search of a large number of sources. Students, teachers, and practitioners who model random processes will find the relationships in this article useful and insightful. 1. Introduction 2. A seven-by-five matrix An understanding of probability concepts is necessary Figure 1 illustrates 35 univariate distributions in 35 if one is to gain insights into systems that can be rectangle-like entries. The row and column numbers are modeled as random processes. From an applications labeled on the left and top of Fig. 1, respectively. There are point of view, univariate probability distributions pro- 10 discrete distributions, shown in the first two rows, and vide an important foundation in probability theory since 25 continuous distributions. Five commonly used sampling they are the underpinnings of the most-used models in distributions are listed in the third row.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function
    LECTURES 2 - 3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series fXtg is a series of observations taken sequentially over time: xt is an observation recorded at a specific time t. Characteristics of times series data: observations are dependent, become available at equally spaced time points and are time-ordered. This is a discrete time series. The purposes of time series analysis are to model and to predict or forecast future values of a series based on the history of that series. 2.2 Some descriptive techniques. (Based on [BD] x1.3 and x1.4) ......................................................................................... Take a step backwards: how do we describe a r.v. or a random vector? ² for a r.v. X: 2 d.f. FX (x) := P (X · x), mean ¹ = EX and variance σ = V ar(X). ² for a r.vector (X1;X2): joint d.f. FX1;X2 (x1; x2) := P (X1 · x1;X2 · x2), marginal d.f.FX1 (x1) := P (X1 · x1) ´ FX1;X2 (x1; 1) 2 2 mean vector (¹1; ¹2) = (EX1; EX2), variances σ1 = V ar(X1); σ2 = V ar(X2), and covariance Cov(X1;X2) = E(X1 ¡ ¹1)(X2 ¡ ¹2) ´ E(X1X2) ¡ ¹1¹2. Often we use correlation = normalized covariance: Cor(X1;X2) = Cov(X1;X2)=fσ1σ2g ......................................................................................... To describe a process X1;X2;::: we define (i) Def. Distribution function: (fi-di) d.f. Ft1:::tn (x1; : : : ; xn) = P (Xt1 · x1;:::;Xtn · xn); i.e. this is the joint d.f. for the vector (Xt1 ;:::;Xtn ). (ii) First- and Second-order moments. ² Mean: ¹X (t) = EXt 2 2 2 2 ² Variance: σX (t) = E(Xt ¡ ¹X (t)) ´ EXt ¡ ¹X (t) 1 ² Autocovariance function: γX (t; s) = Cov(Xt;Xs) = E[(Xt ¡ ¹X (t))(Xs ¡ ¹X (s))] ´ E(XtXs) ¡ ¹X (t)¹X (s) (Note: this is an infinite matrix).
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Random Variables and Probability Distributions 1.1
    RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable.
    [Show full text]
  • Spatial Autocorrelation: Covariance and Semivariance Semivariance
    Spatial Autocorrelation: Covariance and Semivariancence Lily Housese ­P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart
    [Show full text]
  • A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow
    Prepared in cooperation with the Vermont Center for Geographic Information A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow Scientific Investigations Report 2006–5217 U.S. Department of the Interior U.S. Geological Survey A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow By Scott A. Olson and Michael C. Brouillette Prepared in cooperation with the Vermont Center for Geographic Information Scientific Investigations Report 2006–5217 U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior DIRK KEMPTHORNE, Secretary U.S. Geological Survey P. Patrick Leahy, Acting Director U.S. Geological Survey, Reston, Virginia: 2006 For product and ordering information: World Wide Web: http://www.usgs.gov/pubprod Telephone: 1-888-ASK-USGS For more information on the USGS--the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment: World Wide Web: http://www.usgs.gov Telephone: 1-888-ASK-USGS Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this report is in the public domain, permission must be secured from the individual copyright owners to reproduce any copyrighted materials contained within this report. Suggested citation: Olson, S.A., and Brouillette, M.C., 2006, A logistic regression equation for estimating the probability of a stream in Vermont having intermittent
    [Show full text]
  • Chapter 8 Fundamental Sampling Distributions And
    CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Random Sampling pling procedure, it is desirable to choose a random sample in the sense that the observations are made The basic idea of the statistical inference is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1;X2;:::;Xn be n independent random variables, the parameters and obtain more information about the each having the same probability distribution f (x). population. Thus we must make sure that the samples Define X1;X2;:::;Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1;x2;:::;xn) = f (x1) f (x2) f (xn): ··· 8.2 Some Important Statistics It is important to measure the center and the variabil- ity of the population. For the purpose of the inference, we study the following measures regarding to the cen- ter and the variability. 8.2.1 Location Measures of a Sample The most commonly used statistics for measuring the center of a set of data, arranged in order of mag- nitude, are the sample mean, sample median, and sample mode. Let X1;X2;:::;Xn represent n random variables. Sample Mean To calculate the average, or mean, add all values, then Bias divide by the number of individuals.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Approaching Mean-Variance Efficiency for Large Portfolios
    Approaching Mean-Variance Efficiency for Large Portfolios ∗ Mengmeng Ao† Yingying Li‡ Xinghua Zheng§ First Draft: October 6, 2014 This Draft: November 11, 2017 Abstract This paper studies the large dimensional Markowitz optimization problem. Given any risk constraint level, we introduce a new approach for estimating the optimal portfolio, which is developed through a novel unconstrained regression representation of the mean-variance optimization problem, combined with high-dimensional sparse regression methods. Our estimated portfolio, under a mild sparsity assumption, asymptotically achieves mean-variance efficiency and meanwhile effectively controls the risk. To the best of our knowledge, this is the first time that these two goals can be simultaneously achieved for large portfolios. The superior properties of our approach are demonstrated via comprehensive simulation and empirical studies. Keywords: Markowitz optimization; Large portfolio selection; Unconstrained regression, LASSO; Sharpe ratio ∗Research partially supported by the RGC grants GRF16305315, GRF 16502014 and GRF 16518716 of the HKSAR, and The Fundamental Research Funds for the Central Universities 20720171073. †Wang Yanan Institute for Studies in Economics & Department of Finance, School of Economics, Xiamen University, China. [email protected] ‡Hong Kong University of Science and Technology, HKSAR. [email protected] §Hong Kong University of Science and Technology, HKSAR. [email protected] 1 1 INTRODUCTION 1.1 Markowitz Optimization Enigma The groundbreaking mean-variance portfolio theory proposed by Markowitz (1952) contin- ues to play significant roles in research and practice. The optimal mean-variance portfolio has a simple explicit expression1 that only depends on two population characteristics, the mean and the covariance matrix of asset returns. Under the ideal situation when the underlying mean and covariance matrix are known, mean-variance investors can easily compute the optimal portfolio weights based on their preferred level of risk.
    [Show full text]