Lecture 1. Random Vectors and Multivariate Normal Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 1. Random Vectors and Multivariate Normal Distribution Lecture 1. Random vectors and multivariate normal distribution 1.1 Moments of random vector A random vector X of size p is a column vector consisting of p random variables X1;:::;Xp 0 and is X = (X1;:::;Xp) . The mean or expectation of X is defined by the vector of expec- tations, 0 1 E(X1) B . C µ ≡ E(X) = @ . A ; E(Xp) which exists if EjXij < 1 for all i = 1; : : : ; p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m×p), B(m×q), C(1×n), and D(m×n), E(AX + BY) = AE(X) + BE(Y); E(AXC + D) = AE(X)C + D: 2 For a random vector X of size p satisfying E(Xi ) < 1 for all i = 1; : : : ; p, the variance{ covariance matrix (or just covariance matrix) of X is Σ ≡ Cov(X) = E[(X − EX)(X − EX)0]: The covariance matrix of X is a p × p square, symmetric matrix. In particular, Σij = Cov(Xi;Xj) = Cov(Xj;Xi) = Σji. Some properties: 1. Cov(X) = E(XX0) − E(X)E(X)0: 2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X). 0 3. If A(m×p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative definite. 1.2 Multivariate normal distribution - nonsingular case Recall that the univariate normal distribution with mean µ and variance σ2 has density 2 − 1 1 −2 f(x) = (2πσ ) 2 exp[− (x − µ)σ (x − µ)]: 2 Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows. p p Definition 1. Let µ 2 R and Σ(p×p) > 0. A random vector X 2 R has p-variate normal distribution with mean µ and covariance matrix Σ if it has probability density function − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; (1) 2 p for x 2 R . We use the notation X ∼ Np(µ, Σ). Theorem 3. If X ∼ Np(µ, Σ) for Σ > 0, then − 1 1. Y = Σ 2 (X − µ) ∼ Np(0; Ip), L 1 2. X = Σ 2 Y + µ where Y ∼ Np(0; Ip), 3. E(X) = µ and Cov(X) = Σ, 4. for any fixed v 2 Rp , v0X is univariate normal. 5. U = (X − µ)0Σ−1(X − µ) ∼ χ2(p). Example 1 (Bivariate normal). 1.2.1 Geometry of multivariate normal The multivariate normal distribution has location parameter µ and the shape parameter Σ > 0. In particular, let's look into the contour of equal density p Ec = fx 2 R : f(x) = c0g p 0 −1 2 = fx 2 R :(x − µ) Σ (x − µ) = c g: 0 Moreover, consider the spectral decomposition of Σ = UΛU where U = [u1;:::; up] and Λ = diag(λ1; : : : ; λp) with λ1 ≥ λ2 ≥ ::: ≥ λp > 0. The Ec, for any pc > 0, is an ellipsoid centered around µ with principal axes ui of length proportional to λi. If Σ = Ip, the ellipsoid is the surface of a sphere of radius c centered at µ. As an example, consider a bivariate normal distribution N2(0; Σ) with 2 1 cos(π=4) − sin(π=4) 3 0 cos(π=4) − sin(π=4)0 Σ = = : 1 2 sin(π=4) cos(π=4) 0 1 sin(π=4) cos(π=4) The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0:5; 1; 1:5; 2;:::. 2 Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role. 1.3 General multivariate normal distribution The characteristic function of a random vector X is defined as it0X p 'X(t) = E(e ); for t 2 R : Note that the characteristic function is C-valued, and always exists. We collect some important facts. L 1. 'X(t) = 'Y(t) if and only if X = Y. 2. If X and Y are independent, then 'X+Y = 'X(t)'Y(t). 3. Xn ) X if and only if 'Xn (t) ! 'X(t) for all t. An important corollary follows from the uniqueness of the characteristic function. Corollary 4 (Cramer{Wold device). If X is a p × 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t0X, for every t 2 Rp. Corollary 4 paves the way to the definition of (general) multivariate normal distribution. Definition 2. A random vector X 2 Rp has a multivariate normal distribution if t0X is an univariate normal for all t 2 Rp. The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a degenerate distribution δc has a normal distribution with variance 0, i.e., c ∼ N(c; 0). The definition does not require that Cov(X) is nonsingular. 3 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ ≥ 0 is, for t 2 Rp, 1 '(t) = exp[it0µ − t0Σt]: 2 If Σ > 0, then the pdf exists and is the same as (1). In the following, the notation X ∼ N(µ, Σ) is valid for a non-negative definite Σ. How- ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite. Proposition 6. If X ∼ Np(µ, Σ) and Y = AX + b for A(q×p) and b(q×1), then Y ∼ 0 Nq(Aµ + b; AΣA ). Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is, X1 µ1 Σ11 Σ12 X = ∼ Np ; : X2 µ2 Σ21 Σ22 Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1; X2) = Σ12 = 0. Proposition 8. The conditional distribution of X1 given X2 = x2 is −1 −1 Nr(µ1 + Σ12Σ22 (x2 − µ2); Σ11 − Σ12Σ22 Σ21) ∗ −1 ∗ Proof. Consider new random vectors X1 = X1 − Σ12Σ22 X2 and X2 = X2, ∗ −1 ∗ X1 Ir −Σ12Σ22 X = ∗ = AX; A = : X2 0(s×r) Is By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗ ∗ ∗ leads that X1 and X2 are independent. The result follows by writing ∗ −1 X1 = X1 + Σ12Σ22 X2; ∗ −1 and that the distribution (law) of X1 given X2 = x2 is L(X1 j X2 = x2) = L(X1+Σ12Σ22 X2 j ∗ −1 X2 = x2) = L(X1 + Σ12Σ22 x2 j X2 = x2), which is a MVN of dimension r. 4 1.4 Multivariate Central Limit Theorem p If X1; X2;::: 2 R are i.i.d. with E(Xi) = µ and Cov(X) = Σ, then n − 1 X n 2 (Xj − µ) ) Np(0; Σ) as n ! 1; j=1 or equivalently, 1 ¯ n 2 (Xn − µ) ) Np(0; Σ) as n ! 1; ¯ 1 Pn where Xn = 2 j=1 Xj. ¯ The delta-method can be used for asymptotic normality of h(Xn) for some function h : Rp ! R. In particular, denote rh(x) for the gradient of h at x. Using the first two terms of Taylor series, ¯ 0 ¯ ¯ 2 h(Xn) = h(µ) + (rh(µ)) (Xn − µ) + Op(kXn − µk2); Then Slutsky's theorem gives the result, p p p ¯ 0 ¯ ¯ 0 ¯ n(h(Xn) − h(µ)) = (rh(µ)) n(Xn − µ) + Op( n(Xn − µ) (Xn − µ)) 0 ) (rh(µ)) Np(0; Σ) as n ! 1; 0 = Np(0; (rh(µ)) Σ(rh(µ))) 1.5 Quadratic forms in normal random vectors Let X ∼ Np(µ, Σ). A quadratic form in X is a random variable of the form p p 0 X X Y = X AX = XiaijXj; i=1 j=1 where A is a p × p symmetric matrix. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent. Example 2. A special case: If X ∼ Np(0; Ip) and A = Ip, p 0 0 X 2 2 Y = X AX = X X = Xi ∼ χ (p): i=1 Fact 1. Recall the following: 1.A p × p matrix A is idempotent if A2 = A. 0 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λi) and Γ is orthogonal. 3. If A is symmetric idempotent, (a) its eigenvalues are either 0 or 1, 5 (b) rank(A) = #fnon zero eigenvaluesg = trace(A). 2 Theorem 9. Let X ∼ Np(0; σ I) and A be a p × p symmetric matrix. Then X0AX Y = ∼ χ2(m) σ2 if and only if A is idempotent of rank m < p. Corollary 10. Let X ∼ Np(0; Σ) and A be a p × p symmetric matrix. Then Y = X0AX ∼ χ2(m) if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m. 0 −1 2 Example 3. If X ∼ Np(µ, Σ) then (X − µ) Σ (X − µ) ∼ χ (p). Theorem 11. Let X ∼ Np(0; I) and A be a p × p symmetric matrix, and B be a k × p matrix.
Recommended publications
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function
    LECTURES 2 - 3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series fXtg is a series of observations taken sequentially over time: xt is an observation recorded at a specific time t. Characteristics of times series data: observations are dependent, become available at equally spaced time points and are time-ordered. This is a discrete time series. The purposes of time series analysis are to model and to predict or forecast future values of a series based on the history of that series. 2.2 Some descriptive techniques. (Based on [BD] x1.3 and x1.4) ......................................................................................... Take a step backwards: how do we describe a r.v. or a random vector? ² for a r.v. X: 2 d.f. FX (x) := P (X · x), mean ¹ = EX and variance σ = V ar(X). ² for a r.vector (X1;X2): joint d.f. FX1;X2 (x1; x2) := P (X1 · x1;X2 · x2), marginal d.f.FX1 (x1) := P (X1 · x1) ´ FX1;X2 (x1; 1) 2 2 mean vector (¹1; ¹2) = (EX1; EX2), variances σ1 = V ar(X1); σ2 = V ar(X2), and covariance Cov(X1;X2) = E(X1 ¡ ¹1)(X2 ¡ ¹2) ´ E(X1X2) ¡ ¹1¹2. Often we use correlation = normalized covariance: Cor(X1;X2) = Cov(X1;X2)=fσ1σ2g ......................................................................................... To describe a process X1;X2;::: we define (i) Def. Distribution function: (fi-di) d.f. Ft1:::tn (x1; : : : ; xn) = P (Xt1 · x1;:::;Xtn · xn); i.e. this is the joint d.f. for the vector (Xt1 ;:::;Xtn ). (ii) First- and Second-order moments. ² Mean: ¹X (t) = EXt 2 2 2 2 ² Variance: σX (t) = E(Xt ¡ ¹X (t)) ´ EXt ¡ ¹X (t) 1 ² Autocovariance function: γX (t; s) = Cov(Xt;Xs) = E[(Xt ¡ ¹X (t))(Xs ¡ ¹X (s))] ´ E(XtXs) ¡ ¹X (t)¹X (s) (Note: this is an infinite matrix).
    [Show full text]
  • Ph 21.5: Covariance and Principal Component Analysis (PCA)
    Ph 21.5: Covariance and Principal Component Analysis (PCA) -v20150527- Introduction Suppose we make a measurement for which each data sample consists of two measured quantities. A simple example would be temperature (T ) and pressure (P ) taken at time (t) at constant volume(V ). The data set is Ti;Pi ti N , which represents a set of N measurements. We wish to make sense of the data and determine thef dependencej g of, say, P on T . Suppose P and T were for some reason independent of each other; then the two variables would be uncorrelated. (Of course we are well aware that P and V are correlated and we know the ideal gas law: PV = nRT ). How might we infer the correlation from the data? The tools for quantifying correlations between random variables is the covariance. For two real-valued random variables (X; Y ), the covariance is defined as (under certain rather non-restrictive assumptions): Cov(X; Y ) σ2 (X X )(Y Y ) ≡ XY ≡ h − h i − h i i where ::: denotes the expectation (average) value of the quantity in brackets. For the case of P and T , we haveh i Cov(P; T ) = (P P )(T T ) h − h i − h i i = P T P T h × i − h i × h i N−1 ! N−1 ! N−1 ! 1 X 1 X 1 X = PiTi Pi Ti N − N N i=0 i=0 i=0 The extension of this to real-valued random vectors (X;~ Y~ ) is straighforward: D E Cov(X;~ Y~ ) σ2 (X~ < X~ >)(Y~ < Y~ >)T ≡ X~ Y~ ≡ − − This is a matrix, resulting from the product of a one vector and the transpose of another vector, where X~ T denotes the transpose of X~ .
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • 5.1 Convergence in Distribution
    556: MATHEMATICAL STATISTICS I CHAPTER 5: STOCHASTIC CONVERGENCE The following definitions are stated in terms of scalar random variables, but extend naturally to vector random variables defined on the same probability space with measure P . Forp example, some results 2 are stated in terms of the Euclidean distance in one dimension jXn − Xj = (Xn − X) , or for se- > quences of k-dimensional random variables Xn = (Xn1;:::;Xnk) , 0 1 1=2 Xk @ 2A kXn − Xk = (Xnj − Xj) : j=1 5.1 Convergence in Distribution Consider a sequence of random variables X1;X2;::: and a corresponding sequence of cdfs, FX1 ;FX2 ;::: ≤ so that for n = 1; 2; :: FXn (x) =P[Xn x] : Suppose that there exists a cdf, FX , such that for all x at which FX is continuous, lim FX (x) = FX (x): n−!1 n Then X1;:::;Xn converges in distribution to random variable X with cdf FX , denoted d Xn −! X and FX is the limiting distribution. Convergence of a sequence of mgfs or cfs also indicates conver- gence in distribution, that is, if for all t at which MX (t) is defined, if as n −! 1, we have −! () −!d MXi (t) MX (t) Xn X: Definition : DEGENERATE DISTRIBUTIONS The sequence of random variables X1;:::;Xn converges in distribution to constant c if the limiting d distribution of X1;:::;Xn is degenerate at c, that is, Xn −! X and P [X = c] = 1, so that { 0 x < c F (x) = X 1 x ≥ c Interpretation: A special case of convergence in distribution occurs when the limiting distribution is discrete, with the probability mass function only being non-zero at a single value, that is, if the limiting random variable is X, then P [X = c] = 1 and zero otherwise.
    [Show full text]
  • A Study of Non-Central Skew T Distributions and Their Applications in Data Analysis and Change Point Detection
    A STUDY OF NON-CENTRAL SKEW T DISTRIBUTIONS AND THEIR APPLICATIONS IN DATA ANALYSIS AND CHANGE POINT DETECTION Abeer M. Hasan A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: Arjun K. Gupta, Co-advisor Wei Ning, Advisor Mark Earley, Graduate Faculty Representative Junfeng Shang. Copyright c August 2013 Abeer M. Hasan All rights reserved iii ABSTRACT Arjun K. Gupta, Co-advisor Wei Ning, Advisor Over the past three decades there has been a growing interest in searching for distribution families that are suitable to analyze skewed data with excess kurtosis. The search started by numerous papers on the skew normal distribution. Multivariate t distributions started to catch attention shortly after the development of the multivariate skew normal distribution. Many researchers proposed alternative methods to generalize the univariate t distribution to the multivariate case. Recently, skew t distribution started to become popular in research. Skew t distributions provide more flexibility and better ability to accommodate long-tailed data than skew normal distributions. In this dissertation, a new non-central skew t distribution is studied and its theoretical properties are explored. Applications of the proposed non-central skew t distribution in data analysis and model comparisons are studied. An extension of our distribution to the multivariate case is presented and properties of the multivariate non-central skew t distri- bution are discussed. We also discuss the distribution of quadratic forms of the non-central skew t distribution. In the last chapter, the change point problem of the non-central skew t distribution is discussed under different settings.
    [Show full text]
  • Spatial Autocorrelation: Covariance and Semivariance Semivariance
    Spatial Autocorrelation: Covariance and Semivariancence Lily Housese ­P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Approaching Mean-Variance Efficiency for Large Portfolios
    Approaching Mean-Variance Efficiency for Large Portfolios ∗ Mengmeng Ao† Yingying Li‡ Xinghua Zheng§ First Draft: October 6, 2014 This Draft: November 11, 2017 Abstract This paper studies the large dimensional Markowitz optimization problem. Given any risk constraint level, we introduce a new approach for estimating the optimal portfolio, which is developed through a novel unconstrained regression representation of the mean-variance optimization problem, combined with high-dimensional sparse regression methods. Our estimated portfolio, under a mild sparsity assumption, asymptotically achieves mean-variance efficiency and meanwhile effectively controls the risk. To the best of our knowledge, this is the first time that these two goals can be simultaneously achieved for large portfolios. The superior properties of our approach are demonstrated via comprehensive simulation and empirical studies. Keywords: Markowitz optimization; Large portfolio selection; Unconstrained regression, LASSO; Sharpe ratio ∗Research partially supported by the RGC grants GRF16305315, GRF 16502014 and GRF 16518716 of the HKSAR, and The Fundamental Research Funds for the Central Universities 20720171073. †Wang Yanan Institute for Studies in Economics & Department of Finance, School of Economics, Xiamen University, China. [email protected] ‡Hong Kong University of Science and Technology, HKSAR. [email protected] §Hong Kong University of Science and Technology, HKSAR. [email protected] 1 1 INTRODUCTION 1.1 Markowitz Optimization Enigma The groundbreaking mean-variance portfolio theory proposed by Markowitz (1952) contin- ues to play significant roles in research and practice. The optimal mean-variance portfolio has a simple explicit expression1 that only depends on two population characteristics, the mean and the covariance matrix of asset returns. Under the ideal situation when the underlying mean and covariance matrix are known, mean-variance investors can easily compute the optimal portfolio weights based on their preferred level of risk.
    [Show full text]
  • Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
    Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis Hervé Cardot, Antoine Godichon-Baggioni Institut de Mathématiques de Bourgogne, Université de Bourgogne Franche-Comté, 9, rue Alain Savary, 21078 Dijon, France July 12, 2016 Abstract The geometric median covariation matrix is a robust multivariate indicator of dis- persion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence prop- erties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of consider- ing the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN. Keywords. Averaging, Functional data, Geometric median, Online algorithms, Online principal components, Recursive robust estimation, Stochastic gradient, Weiszfeld’s algo- arXiv:1504.02852v5 [math.ST] 9 Jul 2016 rithm.
    [Show full text]
  • Characteristics and Statistics of Digital Remote Sensing Imagery (1)
    Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point.
    [Show full text]
  • A Multivariate Student's T-Distribution
    Open Journal of Statistics, 2016, 6, 443-450 Published Online June 2016 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/10.4236/ojs.2016.63040 A Multivariate Student’s t-Distribution Daniel T. Cassidy Department of Engineering Physics, McMaster University, Hamilton, ON, Canada Received 29 March 2016; accepted 14 June 2016; published 17 June 2016 Copyright © 2016 by author and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/ Abstract A multivariate Student’s t-distribution is derived by analogy to the derivation of a multivariate normal (Gaussian) probability density function. This multivariate Student’s t-distribution can have different shape parameters νi for the marginal probability density functions of the multi- variate distribution. Expressions for the probability density function, for the variances, and for the covariances of the multivariate t-distribution with arbitrary shape parameters for the marginals are given. Keywords Multivariate Student’s t, Variance, Covariance, Arbitrary Shape Parameters 1. Introduction An expression for a multivariate Student’s t-distribution is presented. This expression, which is different in form than the form that is commonly used, allows the shape parameter ν for each marginal probability density function (pdf) of the multivariate pdf to be different. The form that is typically used is [1] −+ν Γ+((ν n) 2) T ( n) 2 +Σ−1 n 2 (1.[xx] [ ]) (1) ΓΣ(νν2)(π ) This “typical” form attempts to generalize the univariate Student’s t-distribution and is valid when the n marginal distributions have the same shape parameter ν .
    [Show full text]