Some Probability and Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Some Probability and Statistics Appendix A Some probability and statistics A.1 Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually de- noted by capital letters, X,Y,Z, etc, and their probability distributions, defined by the cumulative distribution function (CDF) FX (x)= P(X x), etc. ≤ To a random experiment we define a sample space Ω, that contains all the outcomes that can occur in the experiment. A random variable, X, is a function defined on a sample space Ω. To each outcome ω Ω it defines a ∈ real number X(ω), that represents the value of a numeric quantity that can be measured in the experiment. For X to be called a random variable, the probability P(X x) has to be defined for all real x. ≤ Distributions and moments A probability distribution with cumulative distribution function FX (x) can be discrete or continuous with probability function pX (x) and probability density function fX (x), respectively, such that ∑k x pX (k), if X takes only integer values, P ≤ FX (x)= (X x)= x ≤ ( ∞ fX (y)dy. − R A distribution can be of mixed type. The distribution function is then an inte- gral plus discrete jumps; see Appendix B. The expectation of a random variable X is defined as the center of gravity in the distribution, ∑ kp (k), E k X [X]= mX = ∞ ( x= ∞ x fX (x)dx. − R The variance is a simple measure of the spreading of the distribution and is 261 262 SOMEPROBABILITYANDSTATISTICS defined as ∑ (k m )2 p (k), 2 2 2 k X X V[X]= E[(X mX ) ]= E[X ] m = − − − X ∞ 2 ( x= ∞(x mX ) fX (x)dx. − − Chebyshev’s inequality states that, for all εR> 0, 2 E[(X mX) ] P( X mX > ε) − . | − | ≤ ε2 In order to describe the statistical properties of a random function one needs the notion of multivariate distributions. If the result of a experiment is described by two different quantities, denoted by X and Y , e.g. length and height of randomly chosen individual in a population, or the value of a random function at two different time points, one has to deal with a two-dimensional random variable. This is described by its two-dimensional distribution func- tion FX Y (x,y)= P(X x,Y y), or the corresponding two-dimensional prob- , ≤ ≤ ability or density function, ∂ 2F (x,y) f (x,y)= X,Y . X,Y ∂x∂y Two random variables X,Y are independent if, for all x,y, FX,Y (x,y)= FX (x)FY (y). An important concept is the covariance between two random variables X and Y, defined as C[X,Y]= E[(X mX )(Y mY )] = E[XY] mX mY . − − − The correlation coefficient is equal to the dimensionless, normalized covari- ance,1 C[X,Y ] ρ[X,Y]= . V[X]V[Y] ρ Two random variable with zero correlation,p [X,Y]= 0, are called uncorre- lated. Note that if two random variables X and Y are independent, then they are also uncorrelated, but the reverse does not hold. It can happen that two uncorrelated variables are dependent. 1 Speaking about the “correlation between two random quantities”, one often means the de- gree of covariation between the two. However, one has to remember that the correlation only measures the degree of linear covariation. Another meaning of the term correlation is used in connection with two data series, (x1,...,xn) and (y1,...,yn). Then sometimes the sum of ∑n products, 1 xkyk, can be called “correlation”, and a device that produces this sum is called a “correlator”. MULTIDIMENSIONALNORMALDISTRIBUTION 263 Conditional distributions If X and Y are two random variables with bivariate density function fX,Y (x,y), we can define the conditional distribution for X given Y = y, by the condi- tional density, fX,Y (x,y) fX Y =y(x)= , | fY (y) for every y where the marginal density fY (y) is non-zero. The expectation in this distribution, the conditional expectation, is a function of y, and is denoted and defined as E [X Y = y]= x fX Y =y(x)dx = m(y). | x | Z The conditional variance is defined as V 2 σ [X Y = y]= (x m(y)) fX Y =y(x)dx = X Y (y). | Zx − | | The unconditional expectation of X can be obtained from the law of total probability, and computed as E E E [X]= x fX Y =y(x)dx fY (y)dy = [ [X Y]]. y x | | Z Z The unconditional variance of X is given by V[X]= E[V[X Y]] + V[E[X Y]]. | | A.2 Multidimensional normal distribution A one-dimensional normal random variable X with expectation m and vari- ance σ 2 has probability density function 1 1 x m 2 fX (x)= exp − , √ πσ 2 −2 σ 2 ( ) and we write X ∼ N(m,σ 2). If m = 0 and σ = 1, the normal distribution is standardized. If X ∼ N(0,1), then σX + m N(m,σ 2), and if X ∼ N(m,σ 2) ∈ then (X m)/σ ∼ N(0,1). We accept a constant random variable, X m, as − ≡ a normal variable, X ∼ N(m,0). 264 SOMEPROBABILITYANDSTATISTICS Now, let X1,...,Xn have expectation mk = E[Xk] and covariances σ jk = C[Xj,Xk], and define, (with ′ for transpose), µ = (m1,...,mn)′, Σ = (σ jk)= the covariance matrix for X1,...,Xn. It is a characteristic property of the normal distributions that all linear combinations of a multivariate normal variable also has a normal distribution. To formulate the definition, write a = (a1,...,an)′ and X = (X1,...,Xn)′, with a X = a1X1 + + anXn. Then, ′ ··· n E[a′X]= a′µ = ∑ a jm j, j=1 n (A.1) V[a′X]= a′Σa = ∑a jakσ jk. j,k ✤ ✜ Definition A.1. The random variables X1,...,Xn, are said to have an n-dimensional normal distribution is every linear combination a1X1 + + anXn has a normal distribution. From (A.1) we have ··· that X = (X1,...,Xn)′ is n-dimensional normal, if and only if a′X ∼ µ Σ N(a′µ ,a′Σa) for all a = (a1,...,an)′. ✣ ✢ Obviously, Xk = 0 X1 + + 1 Xk + + 0 Xn, is normal, i.e., all marginal distributions· in an ···n-dimensional· ··· normal· distribution are one- dimensional normal. However, the reverse is not necessarily true; there are variables X1,...,Xn, each of which is one-dimensional normal, but the vector (X1,...,Xn)′ is not n-dimensional normal. It is an important consequence of the definition that sums and differences of n-dimensional normal variables have a normal distribution. If the covariance matrix Σ is non-singular, the n-dimensional normal dis- tribution has a probability density function (with x = (x1,...,xn)′) 1 1 1 exp (x µ )′Σ− (x µ) . (A.2) 2π n/2√detΣ −2 − − ( ) The distribution is said to be non-singular. The density (A.2) is constant on 1 n every ellipsoid (x µ ) Σ− (x µ )= C in R . − ′ − MULTIDIMENSIONALNORMALDISTRIBUTION 265 ✤ ✜ Note: the density function of an n-dimensional normal distribution is uniquely determined by the expectations and covariances. ✣ ✢ Example A.1. Suppose X1,X2 have a two-dimensional normal distribution If 2 detΣ = σ11σ22 σ > 0, − 12 then Σ is non-singular, and 1 1 σ22 σ12 Σ− = − . det Σ σ12 σ11 − 1 With Q(x1,x2) = (x µ) Σ− (x µ ), − ′ − Q(x1,x2)= 1 2σ σ 2σ = σ σ σ 2 (x1 m1) 22 2(x1 m1)(x2 m2) 12 + (x2 m2) 11 = 11 22 12 − − − − − − 2 2 1 x1 m1 ρ x1 m1 x2 m2 x2 m2 = ρ2 −σ 2 −σ −σ + −σ , 1 √ 11 − √ 11 √ 22 √ 22 − σ where we also used the correlation coefficient ρ = 12 , and, √σ11σ22 1 1 fX ,X (x1,x2)= exp Q(x1,x2) . (A.3) 1 2 2 2π σ11σ22(1 ρ ) −2 − 2 For variables with m1 = mp2 = 0 and σ11 = σ22 = σ , the bivariate density is 1 1 2 ρ 2 fX1,X2 (x1,x2)= exp (x1 2 x1x2 + x2) . 2πσ 2 1 ρ2 −2σ 2(1 ρ2) − − − We see that, if ρ = 0, thisp is the density of two independent normal variables. Figure A.1 shows the density function fX1,X2 (x1,x2) and level curves for a bivariate normal distribution with expectation µ = (0,0), and covariance matrix 1 0.5 Σ = . 0.5 1 The correlation coefficient is ρ = 0.5. N Remark A.1. If the covariance matrix Σ is singular and non-invertible, then there exists at least one set of constants a1,...,an, not all equal to 0, such Σ that a′Σa = 0. From (A.1) it follows that V[a′X]= 0, which means that a′X is µ constant equal to a′µ . The distribution of X is concentrated to a hyper plane n a′x = constant in R . The distribution is said to be singular and it has no density function in Rn. 266 SOMEPROBABILITYANDSTATISTICS 0.2 2 0.1 1 0 0 −1 2 0 −2 −2 2 −2 0 −2 0 2 Figure A.1 Two-dimensional normal density . Left: density function; Right: elliptic level curves at levels 0.01,0.02,0.05,0.1,0.15. Remark A.2. Formula (A.1) implies that every covariance matrix Σ is pos- itive definite or positive semi-definite, i.e., ∑ a jakσ jk 0 for all a1,...,an. j,k ≥ Conversely, if Σ is a symmetric, positive definite matrix of size n n, i.e., if × ∑ a jakσ jk > 0 for all a1,...,an = 0,...,0, then (A.2) defines the density j,k 6 function for an n-dimensional normal distribution with expectation mk and covariances σ jk.
Recommended publications
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • A Test for Singularity 1
    STATISTIII & PROBABIUTY Lk'TTERS ELSEVIER Statistics & Probability Letters 40 (1998) 215-226 A test for singularity 1 Attila Frigyesi *, Ola H6ssjer Institute for Matematisk Statistik, Lunds University, Box 118, 22100 Lund, Sweden Received March 1997; received in revised form February 1998 Abstract The LP-norm and other functionals of the kernel estimate of density functions (as functions of the bandwidth) are studied as means to test singularity. Given an absolutely continuous distribution function, the LP-norm, and the other studied functionals, will converge under some mild side-constraints as the bandwidth decreases. In the singular case, the functionals will diverge. These properties may be used to test whether data comes from an arbitrarily large class of absolutely continuous distributions or not. (~) 1998 Elsevier Science B.V. All rights reserved AMS classification." primary 60E05; secondary 28A99; 62G07 Keywords: Singularity; Absolute continuity; Kernel density estimation 1. Introduction In this paper we aim to test whether a given set of data (= {XI ..... X,}), with a marginal distribution #, stems from an absolutely continuous distribution function or from a singular one. A reference to this problem is by Donoho (1988). Consider a probability measure # on (g~P,~(~P)), where ~(~P) is the Borel a-algebra on R p. The measure tt is said to be singular with respect to the Lebesgue measure 2 if there exists a measurable set A such that /~(AC)=0 and 2(A)=0. If we only require #(A)>0 and 2(A)=0, then /~ is said to have a singular part. If tt does not have a singular part, then it is absolutely continuous with respect to 2.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • Vector Spaces
    ApPENDIX A Vector Spaces This appendix reviews some of the basic definitions and properties of vectm spaces. It is presumed that, with the possible exception of Theorem A.14, all of the material presented here is familiar to the reader. Definition A.I. A set.A c Rn is a vector space if, for any x, y, Z E .A and scalars a, 13, operations of vector addition and scalar multiplication are defined such that: (1) (x + y) + Z = x + (y + z). (2) x + y = y + x. (3) There exists a vector 0 E.A such that x + 0 = x = 0 + x. (4) For any XE.A there exists y = -x such that x + y = 0 = y + x. (5) a(x + y) = ax + ay. (6) (a + f3)x = ax + f3x. (7) (af3)x = a(f3x). (8) 1· x = x. Definition A.2. Let .A be a vector space and let JV be a set with JV c .A. JV is a subspace of .A if and only if JV is a vector space. Theorem A.3. Let .A be a vector space and let JV be a nonempty subset of .A. If JV is closed under addition and scalar multiplication, then JV is a subspace of .A. Theorem A.4. Let .A be a vector space and let Xl' ••• , Xr be in .A. The set of all linear combinations of Xl' .•• ,xr is a subspace of .A. Appendix A. Vector Spaces 325 Definition A.S. The set of all linear combinations of Xl'OO',X, is called the space spanned by Xl' ..
    [Show full text]
  • Ph 21.5: Covariance and Principal Component Analysis (PCA)
    Ph 21.5: Covariance and Principal Component Analysis (PCA) -v20150527- Introduction Suppose we make a measurement for which each data sample consists of two measured quantities. A simple example would be temperature (T ) and pressure (P ) taken at time (t) at constant volume(V ). The data set is Ti;Pi ti N , which represents a set of N measurements. We wish to make sense of the data and determine thef dependencej g of, say, P on T . Suppose P and T were for some reason independent of each other; then the two variables would be uncorrelated. (Of course we are well aware that P and V are correlated and we know the ideal gas law: PV = nRT ). How might we infer the correlation from the data? The tools for quantifying correlations between random variables is the covariance. For two real-valued random variables (X; Y ), the covariance is defined as (under certain rather non-restrictive assumptions): Cov(X; Y ) σ2 (X X )(Y Y ) ≡ XY ≡ h − h i − h i i where ::: denotes the expectation (average) value of the quantity in brackets. For the case of P and T , we haveh i Cov(P; T ) = (P P )(T T ) h − h i − h i i = P T P T h × i − h i × h i N−1 ! N−1 ! N−1 ! 1 X 1 X 1 X = PiTi Pi Ti N − N N i=0 i=0 i=0 The extension of this to real-valued random vectors (X;~ Y~ ) is straighforward: D E Cov(X;~ Y~ ) σ2 (X~ < X~ >)(Y~ < Y~ >)T ≡ X~ Y~ ≡ − − This is a matrix, resulting from the product of a one vector and the transpose of another vector, where X~ T denotes the transpose of X~ .
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • Random Variable = a Real-Valued Function of an Outcome X = F(Outcome)
    Random Variables (Chapter 2) Random variable = A real-valued function of an outcome X = f(outcome) Domain of X: Sample space of the experiment. Ex: Consider an experiment consisting of 3 Bernoulli trials. Bernoulli trial = Only two possible outcomes – success (S) or failure (F). • “IF” statement: if … then “S” else “F” • Examine each component. S = “acceptable”, F = “defective”. • Transmit binary digits through a communication channel. S = “digit received correctly”, F = “digit received incorrectly”. Suppose the trials are independent and each trial has a probability ½ of success. X = # successes observed in the experiment. Possible values: Outcome Value of X (SSS) (SSF) (SFS) … … (FFF) Random variable: • Assigns a real number to each outcome in S. • Denoted by X, Y, Z, etc., and its values by x, y, z, etc. • Its value depends on chance. • Its value becomes available once the experiment is completed and the outcome is known. • Probabilities of its values are determined by the probabilities of the outcomes in the sample space. Probability distribution of X = A table, formula or a graph that summarizes how the total probability of one is distributed over all the possible values of X. In the Bernoulli trials example, what is the distribution of X? 1 Two types of random variables: Discrete rv = Takes finite or countable number of values • Number of jobs in a queue • Number of errors • Number of successes, etc. Continuous rv = Takes all values in an interval – i.e., it has uncountable number of values. • Execution time • Waiting time • Miles per gallon • Distance traveled, etc. Discrete random variables X = A discrete rv.
    [Show full text]
  • Laws of Total Expectation and Total Variance
    Laws of Total Expectation and Total Variance Definition of conditional density. Assume and arbitrary random variable X with density fX . Take an event A with P (A) > 0. Then the conditional density fXjA is defined as follows: 8 < f(x) 2 P (A) x A fXjA(x) = : 0 x2 = A Note that the support of fXjA is supported only in A. Definition of conditional expectation conditioned on an event. Z Z 1 E(h(X)jA) = h(x) fXjA(x) dx = h(x) fX (x) dx A P (A) A Example. For the random variable X with density function 8 < 1 3 64 x 0 < x < 4 f(x) = : 0 otherwise calculate E(X2 j X ≥ 1). Solution. Step 1. Z h i 4 1 1 1 x=4 255 P (X ≥ 1) = x3 dx = x4 = 1 64 256 4 x=1 256 Step 2. Z Z ( ) 1 256 4 1 E(X2 j X ≥ 1) = x2 f(x) dx = x2 x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 8192 = x5 dx = x6 = 255 1 64 255 64 6 x=1 765 Definition of conditional variance conditioned on an event. Var(XjA) = E(X2jA) − E(XjA)2 1 Example. For the previous example , calculate the conditional variance Var(XjX ≥ 1) Solution. We already calculated E(X2 j X ≥ 1). We only need to calculate E(X j X ≥ 1). Z Z Z ( ) 1 256 4 1 E(X j X ≥ 1) = x f(x) dx = x x3 dx P (X ≥ 1) fx≥1g 255 1 64 Z ( ) ( )( ) h i 256 4 1 256 1 1 x=4 4096 = x4 dx = x5 = 255 1 64 255 64 5 x=1 1275 Finally: ( ) 8192 4096 2 630784 Var(XjX ≥ 1) = E(X2jX ≥ 1) − E(XjX ≥ 1)2 = − = 765 1275 1625625 Definition of conditional expectation conditioned on a random variable.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Multicollinearity
    Chapter 10 Multicollinearity 10.1 The Nature of Multicollinearity 10.1.1 Extreme Collinearity The standard OLS assumption that ( xi1; xi2; : : : ; xik ) not be linearly related means that for any ( c1; c2; : : : ; ck ) xik =6 c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.1) − − for some i. If the assumption is violated, then we can find ( c1; c2; : : : ; ck 1 ) such that − xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 (10.2) − − for all i. Define x12 ··· x1k xk1 c1 x22 ··· x2k xk2 c2 X1 = 0 . 1 ; xk = 0 . 1 ; and c = 0 . 1 : . B C B C B C B xn2 ··· xnk C B xkn C B ck 1 C B C B C B − C @ A @ A @ A Then extreme collinearity can be represented as xk = X1c: (10.3) We have represented extreme collinearity in terms of the last explanatory vari- able. Since we can always re-order the variables this choice is without loss of generality and the analysis could be applied to any non-constant variable by moving it to the last column. 10.1.2 Near Extreme Collinearity Of course, it is rare, in practice, that an exact linear relationship holds. Instead, we have xik = c1xi1 + c2xi2 + ··· + ck 1xi;k 1 + vi (10.4) − − 133 134 CHAPTER 10. MULTICOLLINEARITY or, more compactly, xk = X1c + v; (10.5) where the v's are small relative to the x's. If we think of the v's as random variables they will have small variance (and zero mean if X includes a column of ones). A convenient way to algebraically express the degree of collinearity is the sample correlation between xik and wi = c1xi1 +c2xi2 +···+ck 1xi;k 1, namely − − cov( xik; wi ) cov( wi + vi; wi ) rx;w = = (10.6) var(xi;k)var(wi) var(wi + vi)var(wi) d d Clearly, as the variancep of vi grows small, thisp value will go to unity.
    [Show full text]
  • Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F
    Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F Course : Basic Econometrics : HC43 / Statistics B.A. Hons Economics, Semester IV/ Semester III Delhi University Course Instructor: Siddharth Rathore Assistant Professor Economics Department, Gargi College Siddharth Rathore guj75845_appC.qxd 4/16/09 12:41 PM Page 461 APPENDIX C SOME IMPORTANT PROBABILITY DISTRIBUTIONS In Appendix B we noted that a random variable (r.v.) can be described by a few characteristics, or moments, of its probability function (PDF or PMF), such as the expected value and variance. This, however, presumes that we know the PDF of that r.v., which is a tall order since there are all kinds of random variables. In practice, however, some random variables occur so frequently that statisticians have determined their PDFs and documented their properties. For our purpose, we will consider only those PDFs that are of direct interest to us. But keep in mind that there are several other PDFs that statisticians have studied which can be found in any standard statistics textbook. In this appendix we will discuss the following four probability distributions: 1. The normal distribution 2. The t distribution 3. The chi-square (␹2 ) distribution 4. The F distribution These probability distributions are important in their own right, but for our purposes they are especially important because they help us to find out the probability distributions of estimators (or statistics), such as the sample mean and sample variance. Recall that estimators are random variables. Equipped with that knowledge, we will be able to draw inferences about their true population values.
    [Show full text]