Multivariate Normal Distribution Edps/Soc 584, Psych 594

Total Page:16

File Type:pdf, Size:1020Kb

Multivariate Normal Distribution Edps/Soc 584, Psych 594 Multivariate Normal Distribution Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology ILLINOIS university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Outline ◮ Motivation ◮ The multivariate normal distribution ◮ The Bivariate Normal Distribution ◮ More properties of multivariate normal ◮ Estimation of µ and Σ ◮ Central Limit Theorem Reading: Johnson & Wichern pages 149–176 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 2.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Motivation ◮ To be able to make inferences about populations, we need a model for the distribution of random variables We’ll use −→ the multivariate normal distribution, because... ◮ It’s often a good population model. It’s a reasonably good approximation of many phenomenon. A lot of variables are approximately normal (due to the central limit theorem for sums and averages). ◮ The sampling distribution of (test) statistics are often approximately multivariate or univariate normal due to the central limit theorem. ◮ Due to it’s central importance, we need to thoroughly understand and know it’s properties. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 3.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Introduction to the Multivariate Normal ◮ The probability density function of the Univariate normal distribution (p = 1 variables): 1 1 x µ 2 f (x)= exp − for < x < √ 2 −2 σ −∞ ∞ 2πσ ◮ The parameters that completely characterize the distribution: ◮ µ = E(X ) = mean ◮ σ2 = var(X ) = variance C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 4.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Introduction to the Multivariate Normal (continued) Area corresponds to probability: 68% area between µ σ and 95% between µ 1.96σ: ± ± C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 5.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Generalization to Multivariate Normal 2 x µ − − = (x µ)(σ2) 1(x µ) σ − − A squared statistical distance between x & µ in standard deviation units. Generalization to p > 1 variables: ◮ We have xp×1 and parameters µp×1 and Σp×p. ◮ The exponent term for multivariate normal is ′ − (x µ) Σ 1(x µ) − − where < xi < for i = 1,..., p. −∞ ∞ ◮ This is a scalar and reduces to what’s at the top for p = 1. − ◮ It is a squared statistical distance of x to µ (if Σ 1 exists). It takes into consideration both variability and covariability. ◮ Integrating 1 ... exp (x µ)′Σ−1(x µ) = (2π)p/2 Σ 1/2 −2 − − | | x1 xp C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 6.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Proper Distribution Since the sum of probabilities over all possible values must add up to 1, we need to divide by (2π)p/2 Σ 1/2 to get a “proper” density | | function. Multivariate Normal density function: 1 1 ′ − f (x)= exp (x µ) Σ 1(x µ) (2π)p/2 Σ 1/2 −2 − − | | where < xi < for i = 1,..., p. −∞ ∞ To denote this, we use p(µ, Σ) N For p = 1, this reduces to the univariate normal p.d.f. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 7.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Bivariate Normal: p =2 x E(x ) µ x = 1 E(x)= 1 = 1 = µ x E(x ) µ 2 2 2 σ σ Σ = 11 12 σ σ 12 22 and −1 1 σ22 σ12 Σ = 2 − σ σ σ σ12 σ11 11 22 − 12 − If we replace σ12 by ρ12√σ11σ22, then we get −1 1 σ22 ρ12√σ11σ22 Σ = 2 − σ σ (1 ρ ) ρ12√σ11σ22 σ11 11 22 − 12 − Using this, let’s look at the statistical distance of x from µ. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 8.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Bivariate Normal & Statistical Distance The quantity in the exponent of the bivariate normal is ′ − (x µ) Σ 1(x µ) − − 1 = ((x µ ), (x µ )) 1 − 1 2 − 2 σ σ (1 ρ2 ) 11 22 − 12 σ ρ √σ σ ρ √σ σ x µ 22 − 12 11 22 − 12 11 22 1 − 1 × σ11 x2 µ2 − 2 2 1 x1 µ1 x2 µ2 x1 µ1 x2 µ2 = 2 − + − 2ρ12 − − 1 ρ √σ11 √σ22 − √σ11 √σ22 − 12 1 2 2 = 2 z1 + z2 2ρ12z1z2 1 ρ12 − − C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 9.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Bivariate Normal & Independence 2 2 1 1 x1 µ1 x2 µ2 f (x) = exp − 2 − + − 2π√σ11σ22 2(1 ρ ) √σ11 √σ22 − 12 x1 µ1 x2 µ2 2ρ12 − − − √σ √σ 11 22 If σ12 = 0 or equivalently ρ12 =0, then X1 and X2 are uncorrelated. For bivariate normal, σ12 = 0 implies that X1 and X2 are statistically independent, because the density factors 1 1 x µ 2 x µ 2 f (x) = exp − 1 − 1 + 2 − 2 2π√σ11σ22 2 √σ11 √σ22 1 1 x µ 2 1 1 x µ 2 = exp − 1 − 1 exp − 2 − 2 √2πσ 2 √σ11 √2πσ 2 √σ22 11 22 = f (x ) f (x ) 1 1 × 2 2 C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 10.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Picture: µk = 0, σkk = 1, r =0.0 0.2 0.15 0.1 0.05 0 5 4 0 2 0 −2 −5 −4 y−axis x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 11.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Overhead µk = 0, σkk = 1, r =0.0 4 3 2 1 0 y−axis −1 −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 12.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Picture: µk = 0, σkk = 1, r =0.75 0.25 0.2 0.15 0.1 0.05 0 5 4 0 2 0 −2 −5 −4 y−axis x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 13.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Overhead: µk = 0, σkk = 1, r =0.75 4 3 2 1 0 y−axis −1 −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 x−axis C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 14.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Summary: Comparing r =0.0 vs r =0.75 For the figures shown, µ1 = µ2 = 0 and σ11 = σ22 = 1: ◮ With r = 0.0, ◮ Σ = diag(σ11, σ22), a diagonal matrix. ◮ Density is “random” in the x-y plane. ◮ When take a slice parallel to x-y, you get a circle. ◮ When r = .75, ◮ Σ is not a diagonal . ◮ Density is not random in x-y plane. ◮ There is a linear tilt (ie., density is concentrated on a line). ◮ When you take a slice you get an ellipse that’s tilted. ◮ Tilt depends on relative values of σ11 and σ22 (and scale used in plotting). ◮ When Σ = σ2I (i.e., diagonal with equal variances), it’s “spherical normal”. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 15.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Real Time Software Demo ◮ binormal.m (Peter Dunn) ◮ Graph Bivariate .R (http://www.stat.ucl.ac.be/ISpersonnel/lecoutre/stats/fichiers/˜gallery C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 16.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Slices of Multivariate Normal Density ◮ For bi-variate normal, you get an ellipse whose equation is ′ − (x µ) Σ 1(x µ)= c2 − − which gives all (x1, x2) pairs with constant probability. ◮ The ellipses are call contours and all are centered around µ. ◮ Definition: A constant probability contour equals ′ − = all x such that (x µ) Σ 1(x µ)= c2 { − − } = surface of ellipsoid centered at µ { } C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 17.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Probability Contours: Axes of ellipsoid Important Points: ◮ (x µ)′Σ−1(x µ) χ2 (if Σ > 0) − − ∼ p | | ◮ The solid ellipsoid of values x that satisfy ′ − (x µ) Σ 1(x µ) c2 = χ2 − − ≤ p(α) has probability (1 α) where χ2 is the (1 α)th100% point of − p(α) − the chi-square distribution with p degrees of freedom. C.J. Anderson (Illinois) Multivariate Normal Distribution Spring 2015 18.1/ 56 Motivation Intro. to Multivariate Normal Bivariate Normal More Properties Estimation CLT Others Example: Axses of Ellipses & Prob. Contours Back to the example where x N with ∼ 2 5 9 16 µ = and Σ = ρ = .667 10 16 64 → and we want the “95% probability contour”. The upper 5% point of the chi-square distribution with 2 degrees 2 √ of freedom is χ2(.05) = 5.9915, so c = 5.9915 = 2.4478 th Axes: µ c√λi ei where (λi , ei ) is the i (i = 1, 2) ± eigenvalue/eigenvector pair of Σ. ′ λ1 = 68.316 e1 = (.2604, .9655) ′ λ = 4.684 e = (.9655, .2604) 2 2 − C.J.
Recommended publications
  • Conditional Central Limit Theorems for Gaussian Projections
    1 Conditional Central Limit Theorems for Gaussian Projections Galen Reeves Abstract—This paper addresses the question of when projec- The focus of this paper is on the typical behavior when the tions of a high-dimensional random vector are approximately projections are generated randomly and independently of the Gaussian. This problem has been studied previously in the context random variables. Given an n-dimensional random vector X, of high-dimensional data analysis, where the focus is on low- dimensional projections of high-dimensional point clouds. The the k-dimensional linear projection Z is defined according to focus of this paper is on the typical behavior when the projections Z =ΘX, (1) are generated by an i.i.d. Gaussian projection matrix. The main results are bounds on the deviation between the conditional where Θ is a k n random matrix that is independent of X. distribution of the projections and a Gaussian approximation, Throughout this× paper it assumed that X has finite second where the conditioning is on the projection matrix. The bounds are given in terms of the quadratic Wasserstein distance and moment and that the entries of Θ are i.i.d. Gaussian random relative entropy and are stated explicitly as a function of the variables with mean zero and variance 1/n. number of projections and certain key properties of the random Our main results are bounds on the deviation between the vector. The proof uses Talagrand’s transportation inequality and conditional distribution of Z given Θ and a Gaussian approx- a general integral-moment inequality for mutual information.
    [Show full text]
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • Efficient Estimation of Parameters of the Negative Binomial Distribution
    E±cient Estimation of Parameters of the Negative Binomial Distribution V. SAVANI AND A. A. ZHIGLJAVSKY Department of Mathematics, Cardi® University, Cardi®, CF24 4AG, U.K. e-mail: SavaniV@cardi®.ac.uk, ZhigljavskyAA@cardi®.ac.uk (Corresponding author) Abstract In this paper we investigate a class of moment based estimators, called power method estimators, which can be almost as e±cient as maximum likelihood estima- tors and achieve a lower asymptotic variance than the standard zero term method and method of moments estimators. We investigate di®erent methods of implementing the power method in practice and examine the robustness and e±ciency of the power method estimators. Key Words: Negative binomial distribution; estimating parameters; maximum likelihood method; e±ciency of estimators; method of moments. 1 1. The Negative Binomial Distribution 1.1. Introduction The negative binomial distribution (NBD) has appeal in the modelling of many practical applications. A large amount of literature exists, for example, on using the NBD to model: animal populations (see e.g. Anscombe (1949), Kendall (1948a)); accident proneness (see e.g. Greenwood and Yule (1920), Arbous and Kerrich (1951)) and consumer buying behaviour (see e.g. Ehrenberg (1988)). The appeal of the NBD lies in the fact that it is a simple two parameter distribution that arises in various di®erent ways (see e.g. Anscombe (1950), Johnson, Kotz, and Kemp (1992), Chapter 5) often allowing the parameters to have a natural interpretation (see Section 1.2). Furthermore, the NBD can be implemented as a distribution within stationary processes (see e.g. Anscombe (1950), Kendall (1948b)) thereby increasing the modelling potential of the distribution.
    [Show full text]
  • Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
    Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function
    LECTURES 2 - 3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series fXtg is a series of observations taken sequentially over time: xt is an observation recorded at a specific time t. Characteristics of times series data: observations are dependent, become available at equally spaced time points and are time-ordered. This is a discrete time series. The purposes of time series analysis are to model and to predict or forecast future values of a series based on the history of that series. 2.2 Some descriptive techniques. (Based on [BD] x1.3 and x1.4) ......................................................................................... Take a step backwards: how do we describe a r.v. or a random vector? ² for a r.v. X: 2 d.f. FX (x) := P (X · x), mean ¹ = EX and variance σ = V ar(X). ² for a r.vector (X1;X2): joint d.f. FX1;X2 (x1; x2) := P (X1 · x1;X2 · x2), marginal d.f.FX1 (x1) := P (X1 · x1) ´ FX1;X2 (x1; 1) 2 2 mean vector (¹1; ¹2) = (EX1; EX2), variances σ1 = V ar(X1); σ2 = V ar(X2), and covariance Cov(X1;X2) = E(X1 ¡ ¹1)(X2 ¡ ¹2) ´ E(X1X2) ¡ ¹1¹2. Often we use correlation = normalized covariance: Cor(X1;X2) = Cov(X1;X2)=fσ1σ2g ......................................................................................... To describe a process X1;X2;::: we define (i) Def. Distribution function: (fi-di) d.f. Ft1:::tn (x1; : : : ; xn) = P (Xt1 · x1;:::;Xtn · xn); i.e. this is the joint d.f. for the vector (Xt1 ;:::;Xtn ). (ii) First- and Second-order moments. ² Mean: ¹X (t) = EXt 2 2 2 2 ² Variance: σX (t) = E(Xt ¡ ¹X (t)) ´ EXt ¡ ¹X (t) 1 ² Autocovariance function: γX (t; s) = Cov(Xt;Xs) = E[(Xt ¡ ¹X (t))(Xs ¡ ¹X (s))] ´ E(XtXs) ¡ ¹X (t)¹X (s) (Note: this is an infinite matrix).
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Random Variables and Probability Distributions 1.1
    RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable.
    [Show full text]
  • Spatial Autocorrelation: Covariance and Semivariance Semivariance
    Spatial Autocorrelation: Covariance and Semivariancence Lily Housese ­P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart
    [Show full text]
  • Chapter 8 Fundamental Sampling Distributions And
    CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Random Sampling pling procedure, it is desirable to choose a random sample in the sense that the observations are made The basic idea of the statistical inference is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1;X2;:::;Xn be n independent random variables, the parameters and obtain more information about the each having the same probability distribution f (x). population. Thus we must make sure that the samples Define X1;X2;:::;Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1;x2;:::;xn) = f (x1) f (x2) f (xn): ··· 8.2 Some Important Statistics It is important to measure the center and the variabil- ity of the population. For the purpose of the inference, we study the following measures regarding to the cen- ter and the variability. 8.2.1 Location Measures of a Sample The most commonly used statistics for measuring the center of a set of data, arranged in order of mag- nitude, are the sample mean, sample median, and sample mode. Let X1;X2;:::;Xn represent n random variables. Sample Mean To calculate the average, or mean, add all values, then Bias divide by the number of individuals.
    [Show full text]
  • Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
    Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis Hervé Cardot, Antoine Godichon-Baggioni Institut de Mathématiques de Bourgogne, Université de Bourgogne Franche-Comté, 9, rue Alain Savary, 21078 Dijon, France July 12, 2016 Abstract The geometric median covariation matrix is a robust multivariate indicator of dis- persion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence prop- erties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of consider- ing the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN. Keywords. Averaging, Functional data, Geometric median, Online algorithms, Online principal components, Recursive robust estimation, Stochastic gradient, Weiszfeld’s algo- arXiv:1504.02852v5 [math.ST] 9 Jul 2016 rithm.
    [Show full text]