CENTRAL LIMIT THEOREM Contents 1. Introduction 1 2. Convergence 3 3. Variance Matrices 7 4. Multivariate Normal Distribution

Total Page:16

File Type:pdf, Size:1020Kb

CENTRAL LIMIT THEOREM Contents 1. Introduction 1 2. Convergence 3 3. Variance Matrices 7 4. Multivariate Normal Distribution CENTRAL LIMIT THEOREM FREDERICK VU Abstract. This expository paper provides a short introduction to probabil- ity theory before proving a central theorem in probability theory, the central limit theorem. The theorem concerns the eventual convergence to a normal distribution of an average of a sampling of independently distributed random variables with identical variance and mean. The paper shall use L´evy'sconti- nuity theorem to go about proving the central limit theorem. Contents 1. Introduction 1 2. Convergence 3 3. Variance Matrices 7 4. Multivariate Normal Distribution 8 5. Characteristic Functions and the L´evyContinuity Theorem 8 Acknowledgments 13 References 13 1. Introduction Before we state the central limit theorem, we must first define several terms. An understanding of the terms relies on basic functional analysis fitted with new probability terminology. Definition 1.1. A probability space is a triple (Ω; F;P ) where Ω is a non-empy set, F is a σ-algebra (collection of subsets closed under countable unions/intersections of countable many subsets) of measurable subsets of Ω, and P is a finite measure on the measurable space (Ω; F) with P (Ω) = 1. P is referred to as a probability. Definition 1.2. A random variable X is a measurable function from a prob- ability space (Ω; F;P ) to a measurable space (S, S) where S is a σ-algebra of measurable subsets of S. Normally (S, S) is the real numbers with the Borel σ- algebra. We will maintain this notation, but conform to the norm throughout the paper. A random vector is a column vector whose components are real-valued random variables defined on the same probability space. In many places in this paper, a statement concerning random variables will presume the existence of some general probability space. Definition 1.3. The expected value of a real-valued random variable X is defined as the Lebesgue integral of X with respect to the measure P Z E(X) ≡ X dP: Ω 1 2 FREDERICK VU For a random vector X, the expected value E(X) is the vector whose components are E(Xi) Definition 1.4. Because independence is such a central notion in probability it is best to define it early. First, define the distribution of a random variable as Q ≡ P ◦ X−1 defined on (S; S) by Q(B) := P (X−1(B)) ≡ P (X 2 B) ≡ P (! 2 Ω: X(!) 2 B);B 2 S: This possibly confusing notation can be understood as the pushforward measure of P to (S; S). Definition 1.5. A set of random variables X1; :::; Xn with Xi a map from (Ω; F;P ) to (Si; Si) is called independent if the distribution Q of X := (X1; :::; Xn) on the product space (S = S1 × · · · × Sn; S = S1 × · · · × Sn) is the product measure Q = Q1 × · · · × Qn where Qi is the distribution of Xi, or more compactly: n Y Q(B1 × · · · × Bn) = Qi(Bi) i=1 Two random vectors are said to be independent if their components are pairwise independent as above. Since the (multivariate) central limit theorem won't be stated until much further along due to the required definitions of normal distributions and many lemmas along the way, we pause here to give an informal statement of the central theorem before continuing on with a few basic lemmas from probability theory. The central limit theorem basically says that given a fixed distribution, if one were to repeatedly, but independently, sample from such distribution, the average value will roughly approach the expected value of the corresponding random variable, even giving a bell-shaped curve if one were to plot a histogram. The following are simple inequalities used often in the paper. Lemma 1.6. (Markov's Inequality) If X is a nonnegative randomvariable and a > 0, then (X) P (X ≥ a) ≤ E : a Proof. Denote for U ⊆ Ω, the indicator function of U, IU . Then by linearity of the integral and the definition of the probability distribution E(X) ≥ E(aIX≥a) = aE(IX≥1) = aP (X ≥ a) Corollary 1.7. (Chebyshev's Inequality) For any random variable X and a > 0 ((X − (X))2) P (jX − (X)j) ≥ a) ≤ E E : E a2 Proof. Consider the random variable (X − E(X))2 and apply Markov's inequality. There are many ways to understand probability measures, and it is from these different points of view and their interrelations that one can derive the multitude of theorems following. CENTRAL LIMIT THEOREM 3 Definitions 1.8. The cumulative distribution function (cdf) of a random n vector X = (Xi; :::; Xn) is the function FX : R ! R FX(x) = P (X1 ≤ x1; :::; Xn ≤ xn) For a continuous random vector X, define the probability density function as @n fX(x) = FX(x1; :::; xn) @x1 ··· @xn This provides us with another way to write the distribution of a random vector X. For A ⊆ Rn, Z P (X 2 A) = fX(x) dx: A Remark 1.9. For a continuous random variable X, there is also another way to express the expected value of powers of X. Z n n (1.10) E(X ) = x fX (x) dx R This is just a specific case of Z (1.11) E(g(X)) = g(x)fX (x) dx R where g is a measurable function. 2. Convergence Definition 2.1. A sequence of cumulative distribution functions fFng is said to converge in distribution, or converge weakly, to the cumulative distribution function F , denoted Fn ) F , if (2.2) lim Fn(x) = F (x) n for every continuity point x of F . If Qn and Q are their respective distribution functions, then we may equivalently define Qn ) Q if for every A = (−∞; x) for which Q(x) = 0, lim Qn(A) = Q(A): n Similarly, if Xn and X are the respective random variables corresponding to Fn and F , we write Xn ) X defined equivalently. Since distributions are just measures on some measurable (S; S), which again is generally the reals, we have a similar understanding of convergence of measures rather than just distributions. The following theorem allows the representation of weakly convergent measures as the distribution of random variables defined on a common probability space. Theorem 2.3. Suppose that µn and µ are probability measures on (R; R) and µn ) µ. Then there exist random variables Xn and X on some (Ω; F;P ) such that Xn;X have respective distributions µn; µ, and Xn(!) ! X(!) for each ! 2 Ω. Proof. Take (Ω; F;P ) to be the set (0; 1) with Borel subsets of (0; 1) and the Lebesgue measure. Denote the cumulative distributions associated with µn; µ by Fn;F , and put Xn(!) = inf[x : ! ≤ Fn(x)] and X(!) = inf[x : ! ≤ F (x)]: 4 FREDERICK VU The set [x : ! ≤ F (x)] is closed on the left since F (x) is right-continuous as are all cumulative distributions, and therefore it is the set [X(!); 1). Hence, ! ≤ F (x) if and only if X(!) ≤ x, and P [! : X(!) ≤ x] = P [! : ! ≤ F (x)] = F (x). Thus, X has cumulative distribution F ; similarly, Xn has distribution F . To prove pointwise convergence, suppose for a given > 0, we choose x so that X(!) − < x < X(!) and µ(x) = 0. Then F (x) < !, and Fn(x) ! F (x) implies that for large enough n, Fn(x) < !, and therefore X(!) − < x < Xn(!). Thus lim inf Xn(!) ≤ X(!): n 0 Now for ! > !, we may similarly choose a y such that ! ≤ Fn(y) and hence 0 Xn(!) ≤ y < X(! ) + . Thus 0 lim sup Xn(!) ≤ X(! ): n Therefore, if X is continuous at !, Xn(!) ! X(!). Since X is increasing on (0; 1), it has at most countably many discontinuities. For any point of discontinuity !, define Xn(!) = X(!) = 0. Since the set of discontinuities has Lebesgue measure 0, the distributions remain unchanged. At the heart of many theorems in probability is the properties of convergence of distribution functions. We now come to many fundamental convergence theo- rems in probability, though in essence they are rehashings of conventional proofs from functional analysis. The first theorem essentially says that measurable maps preserve limits. Theorem 2.4. Let h : R ! R be measurable and let the set Dh of its continuities −1 −1 be measurable. If µn ) µ as before and µ(Dh) = 0, then µn ◦ h ) µ ◦ h . Proof. Using the random variables Xn;X defined in the previous proof, we see h(Xn(!)) ! h(X(!)) almost everywhere. Therefore h(Xn) ) h(X), where such notation means composition h ◦ X. For A ⊆ R, since P [h(X) 2 A] = P [X 2 h−1(A)] = µ(h−1(A)); −1 −1 h◦X has distribution µh ; similarly, h◦Xn has distribution µnh , again abusing −1 −1 notation of composition. Thus h(Xn) ) h(X) is equivalent to µnh ) µh . Corollary 2.5. If Xn ) X and P [X 2 Dh] = 0, then h(Xn) ) h(X). Lemma 2.6. µn ) µ if and only if for every bounded, continuous function f, R R f dµn ! f dµ. Proof. For the forward proof, in the same process as seen in the proof of theorem 2.3, we have f(Xn) ! f(X) almost everywhere. By change of variables and the dominated convergence theorem, Z Z f dµn = E(f(Xn)) ! E(f(X)) = f dµ. Conversely, consider the cumulative distribution functions Fn;F associated with µn; µ and suppose x < y. Define the function f by f(t) = 1 for t ≤ x; f(t) = 0 CENTRAL LIMIT THEOREM 5 R for t ≥ y; and f(t) = (y − t)=(y − x) for x ≤ t ≤ y.
Recommended publications
  • LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function
    LECTURES 2 - 3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series fXtg is a series of observations taken sequentially over time: xt is an observation recorded at a specific time t. Characteristics of times series data: observations are dependent, become available at equally spaced time points and are time-ordered. This is a discrete time series. The purposes of time series analysis are to model and to predict or forecast future values of a series based on the history of that series. 2.2 Some descriptive techniques. (Based on [BD] x1.3 and x1.4) ......................................................................................... Take a step backwards: how do we describe a r.v. or a random vector? ² for a r.v. X: 2 d.f. FX (x) := P (X · x), mean ¹ = EX and variance σ = V ar(X). ² for a r.vector (X1;X2): joint d.f. FX1;X2 (x1; x2) := P (X1 · x1;X2 · x2), marginal d.f.FX1 (x1) := P (X1 · x1) ´ FX1;X2 (x1; 1) 2 2 mean vector (¹1; ¹2) = (EX1; EX2), variances σ1 = V ar(X1); σ2 = V ar(X2), and covariance Cov(X1;X2) = E(X1 ¡ ¹1)(X2 ¡ ¹2) ´ E(X1X2) ¡ ¹1¹2. Often we use correlation = normalized covariance: Cor(X1;X2) = Cov(X1;X2)=fσ1σ2g ......................................................................................... To describe a process X1;X2;::: we define (i) Def. Distribution function: (fi-di) d.f. Ft1:::tn (x1; : : : ; xn) = P (Xt1 · x1;:::;Xtn · xn); i.e. this is the joint d.f. for the vector (Xt1 ;:::;Xtn ). (ii) First- and Second-order moments. ² Mean: ¹X (t) = EXt 2 2 2 2 ² Variance: σX (t) = E(Xt ¡ ¹X (t)) ´ EXt ¡ ¹X (t) 1 ² Autocovariance function: γX (t; s) = Cov(Xt;Xs) = E[(Xt ¡ ¹X (t))(Xs ¡ ¹X (s))] ´ E(XtXs) ¡ ¹X (t)¹X (s) (Note: this is an infinite matrix).
    [Show full text]
  • Spatial Autocorrelation: Covariance and Semivariance Semivariance
    Spatial Autocorrelation: Covariance and Semivariancence Lily Housese ­P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart
    [Show full text]
  • Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
    Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis Hervé Cardot, Antoine Godichon-Baggioni Institut de Mathématiques de Bourgogne, Université de Bourgogne Franche-Comté, 9, rue Alain Savary, 21078 Dijon, France July 12, 2016 Abstract The geometric median covariation matrix is a robust multivariate indicator of dis- persion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence prop- erties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of consider- ing the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN. Keywords. Averaging, Functional data, Geometric median, Online algorithms, Online principal components, Recursive robust estimation, Stochastic gradient, Weiszfeld’s algo- arXiv:1504.02852v5 [math.ST] 9 Jul 2016 rithm.
    [Show full text]
  • Characteristics and Statistics of Digital Remote Sensing Imagery (1)
    Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Central Limit Theorem and Its Applications to Baseball
    Central Limit Theorem and Its Applications to Baseball by Nicole Anderson A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 (Honours Seminar) Lakehead University Thunder Bay, Ontario, Canada copyright c (2014) Nicole Anderson Abstract This honours project is on the Central Limit Theorem (CLT). The CLT is considered to be one of the most powerful theorems in all of statistics and probability. In probability theory, the CLT states that, given certain conditions, the sample mean of a sufficiently large number or iterates of independent random variables, each with a well-defined ex- pected value and well-defined variance, will be approximately normally distributed. In this project, a brief historical review of the CLT is provided, some basic concepts, two proofs of the CLT and several properties are discussed. As an application, we discuss how to use the CLT to study the sampling distribution of the sample mean and hypothesis testing using baseball statistics. i Acknowledgements I would like to thank my supervisor, Dr. Li, who helped me by sharing his knowledge and many resources to help make this paper come to life. I would also like to thank Dr. Adam Van Tuyl for all of his help with Latex, and support throughout this project. Thank you very much! ii Contents Abstract i Acknowledgements ii Chapter 1. Introduction 1 1. Historical Review of Central Limit Theorem 1 2. Central Limit Theorem in Practice 1 Chapter 2. Preliminaries 3 1. Definitions 3 2. Central Limit Theorem 7 Chapter 3. Proofs of Central Limit Theorem 8 1.
    [Show full text]
  • Covariance of Cross-Correlations: Towards Efficient Measures for Large-Scale Structure
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library Mon. Not. R. Astron. Soc. 400, 851–865 (2009) doi:10.1111/j.1365-2966.2009.15490.x Covariance of cross-correlations: towards efficient measures for large-scale structure Robert E. Smith Institute for Theoretical Physics, University of Zurich, Zurich CH 8037, Switzerland Accepted 2009 August 4. Received 2009 July 17; in original form 2009 June 13 ABSTRACT We study the covariance of the cross-power spectrum of different tracers for the large-scale structure. We develop the counts-in-cells framework for the multitracer approach, and use this to derive expressions for the full non-Gaussian covariance matrix. We show that for the usual autopower statistic, besides the off-diagonal covariance generated through gravitational mode- coupling, the discreteness of the tracers and their associated sampling distribution can generate strong off-diagonal covariance, and that this becomes the dominant source of covariance as spatial frequencies become larger than the fundamental mode of the survey volume. On comparison with the derived expressions for the cross-power covariance, we show that the off-diagonal terms can be suppressed, if one cross-correlates a high tracer-density sample with a low one. Taking the effective estimator efficiency to be proportional to the signal-to-noise ratio (S/N), we show that, to probe clustering as a function of physical properties of the sample, i.e. cluster mass or galaxy luminosity, the cross-power approach can outperform the autopower one by factors of a few.
    [Show full text]
  • On the Use of the Autocorrelation and Covariance Methods for Feedforward Control of Transverse Angle and Position Jitter in Linear Particle Beam Accelerators*
    '^C 7 ON THE USE OF THE AUTOCORRELATION AND COVARIANCE METHODS FOR FEEDFORWARD CONTROL OF TRANSVERSE ANGLE AND POSITION JITTER IN LINEAR PARTICLE BEAM ACCELERATORS* Dean S. Ban- Advanced Photon Source, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439 ABSTRACT It is desired to design a predictive feedforward transverse jitter control system to control both angle and position jitter m pulsed linear accelerators. Such a system will increase the accuracy and bandwidth of correction over that of currently available feedback correction systems. Intrapulse correction is performed. An offline process actually "learns" the properties of the jitter, and uses these properties to apply correction to the beam. The correction weights calculated offline are downloaded to a real-time analog correction system between macropulses. Jitter data were taken at the Los Alamos National Laboratory (LANL) Ground Test Accelerator (GTA) telescope experiment at Argonne National Laboratory (ANL). The experiment consisted of the LANL telescope connected to the ANL ZGS proton source and linac. A simulation of the correction system using this data was shown to decrease the average rms jitter by a factor of two over that of a comparable standard feedback correction system. The system also improved the correction bandwidth. INTRODUCTION Figure 1 shows the standard setup for a feedforward transverse jitter control system. Note that one pickup #1 and two kickers are needed to correct beam position jitter, while two pickup #l's and one kicker are needed to correct beam trajectory-angle jitter. pickup #1 kicker pickup #2 Beam Fast loop Slow loop Processor Figure 1. Feedforward Transverse Jitter Control System It is assumed that the beam is fast enough to beat the correction signal (through the fast loop) to the kicker.
    [Show full text]
  • Lecture 4 Multivariate Normal Distribution and Multivariate CLT
    Lecture 4 Multivariate normal distribution and multivariate CLT. T We start with several simple observations. If X = (x1; : : : ; xk) is a k 1 random vector then its expectation is × T EX = (Ex1; : : : ; Exk) and its covariance matrix is Cov(X) = E(X EX)(X EX)T : − − Notice that a covariance matrix is always symmetric Cov(X)T = Cov(X) and nonnegative definite, i.e. for any k 1 vector a, × a T Cov(X)a = Ea T (X EX)(X EX)T a T = E a T (X EX) 2 0: − − j − j � We will often use that for any vector X its squared length can be written as X 2 = XT X: If we multiply a random k 1 vector X by a n k matrix A then the covariancej j of Y = AX is a n n matrix × × × Cov(Y ) = EA(X EX)(X EX)T AT = ACov(X)AT : − − T Multivariate normal distribution. Let us consider a k 1 vector g = (g1; : : : ; gk) of i.i.d. standard normal random variables. The covariance of g is,× obviously, a k k identity × matrix, Cov(g) = I: Given a n k matrix A, the covariance of Ag is a n n matrix × × � := Cov(Ag) = AIAT = AAT : Definition. The distribution of a vector Ag is called a (multivariate) normal distribution with covariance � and is denoted N(0; �): One can also shift this disrtibution, the distribution of Ag + a is called a normal distri­ bution with mean a and covariance � and is denoted N(a; �): There is one potential problem 23 with the above definition - we assume that the distribution depends only on covariance ma­ trix � and does not depend on the construction, i.e.
    [Show full text]
  • Lecture 2 Covariance Is the Statistical Measure That Indicates The
    Lecture 2 Covariance is the statistical measure that indicates the interactive risk of a security relative to others in a portfolio of securities. In other words, the way security returns vary with each other affects the overall risk of the portfolio. The covariance between two securities X and Y may be calculated using the following formula: Where: Covxy = Covariance between x and y. Rx = Return of security x. Ry = Return of security y Rx = Expected or mean return of security x. Ry = Expected or mean return of security y. N = Number of observations. Calculation of Covariance Year Rx Deviation R…3 Deviation Product of deviations Rx – Rx y Ry – Ry (Rx – Rx) (Ry – Ry) 1 10 -4 17 5 -20 2 12 -2 13 1 -2 3 16 2 10 -2 -4 4 18 4 8 -4 -16 = -42 / 4 = -10.5 The covariance is a measure of how returns of two securities move together. If the returns of the two securities move in the same direction consistently the covariance would be positive. If the returns of the two securities move in opposite direction consistently the covariance would be negative. If the movements of returns are independent of each other, covariance would be close to zero. Covariance is an absolute measure of interactive risk between two securities. To facilitate comparison, covariance can be standardized. Dividing the covariance between two securities by product of the standard deviation of each security gives such a standardised measure. This measure is called the coefficient of correlation. This may be expressed as: Where Rxy = Coefficient of correlation between x and y Covxy = Covariance between x and y.
    [Show full text]
  • Lectures on Statistics
    Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation . 1 1.2 The sample mean . 3 1.3 The sample variance . 4 1.4 The central limit theorem . 5 1.5 Joint distributions of random variables . 6 1.6 Problems . 7 2 Probability 9 2.1 Events and probability . 9 2.2 The sample proportion . 10 2.3 The central limit theorem . 11 2.4 Problems . 13 3 Estimation 15 3.1 Estimating means . 15 3.2 Two population means . 17 3.3 Estimating population proportions . 17 3.4 Two population proportions . 18 3.5 Supplement: Confidence intervals . 18 3.6 Problems . 19 4 Hypothesis testing 21 4.1 Null and alternative hypothesis . 21 4.2 Hypothesis on a mean . 21 4.3 Two means . 23 4.4 Hypothesis on a proportion . 23 4.5 Two proportions . 24 4.6 Independence . 24 4.7 Power . 25 4.8 Loss . 29 4.9 Supplement: P-values . 31 4.10 Problems . 33 iii iv CONTENTS 5 Order statistics 35 5.1 Sample median and population median . 35 5.2 Comparison of sample mean and sample median . 37 5.3 The Kolmogorov-Smirnov statistic . 38 5.4 Other goodness of fit statistics . 39 5.5 Comparison with a fitted distribution . 40 5.6 Supplement: Uniform order statistics . 41 5.7 Problems . 42 6 The bootstrap 43 6.1 Bootstrap samples . 43 6.2 The ideal bootstrap estimator . 44 6.3 The Monte Carlo bootstrap estimator . 44 6.4 Supplement: Sampling from a finite population .
    [Show full text]
  • Week 7: Multiple Regression
    Week 7: Multiple Regression Brandon Stewart1 Princeton October 24, 26, 2016 1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Jens Hainmueller and Danny Hidalgo. Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 1 / 145 Where We've Been and Where We're Going... Last Week I regression with two variables I omitted variables, multicollinearity, interactions This Week I Monday: F matrix form of linear regression I Wednesday: F hypothesis tests Next Week I break! I then ::: regression in social science Long Run I probability ! inference ! regression Questions? Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 2 / 145 1 Matrix Algebra Refresher 2 OLS in matrix form 3 OLS inference in matrix form 4 Inference via the Bootstrap 5 Some Technical Details 6 Fun With Weights 7 Appendix 8 Testing Hypotheses about Individual Coefficients 9 Testing Linear Hypotheses: A Simple Case 10 Testing Joint Significance 11 Testing Linear Hypotheses: The General Case 12 Fun With(out) Weights Stewart (Princeton) Week 7: Multiple Regression October 24, 26, 2016 3 / 145 Why Matrices and Vectors? Here's one way to write the full multiple regression model: yi = β0 + xi1β1 + xi2β2 + ··· + xiK βK + ui Notation is going to get needlessly messy as we add variables Matrices are clean, but they are like a foreign language You need to build intuitions over a long period of time (and they will return in Soc504) Reminder of Parameter Interpretation: β1 is the effect of a one-unit change in xi1 conditional on all other xik . We are going to review the key points quite quickly just to refresh the basics.
    [Show full text]