1 Random Variables

Total Page:16

File Type:pdf, Size:1020Kb

1 Random Variables Political Science 100a/200a Fall 2001 Probability, part II1 1 Random variables Recall idea of a variable and types of variables: ... • Will now redescribe in probability theory terms as a random variable. Here is a techni- • cal/mathematical definition: Defn: A random variable is a function that assigns a number to each point in a sample space S. For social science purposes, a more intuitive definition is this: A random variable is a process or mechanism that assigns values of a variable to different cases. e.g.: Let S = a list of countries , with an arbitrary country in the sample space de- • noted s S. f g 2 1. Then X(s) = 1990 per capita GDP of country s is a random variable. For instance, X(Ghana) = $902 X(France) = $13; 904 etc... 2. Likewise, Y (s) = 1 is country s had a civil war in the 1990s, and Y (s) = 0 if not, is a random variable. 3. S = list of survey respondents , X(s) = Party ID of respondent s. f g Typically, we will drop the notation X(s) and just write X and Y , leaving it implicit • that this a random variable that associates a number to each point in a sample space. Why \random"? • { Because we will be thinking about \experiments" where we draw a point or collec- tion of points from the sample space at random or according to some probability distribution. 1Notes by James D. Fearon, Dept. of Political Science, Stanford University, October 2001. 1 { Because we may believe that the process that assigns a number like GDP (the value of the variable) to a case has a random or stochastic element. { Some statisticians, like Friedman, actually define a random variable as \a chance procedure for generating a number," by which he means some repeatable process that has a well-understood source of randomness and for which \objective" proba- bilities can in principle be computed. { For others (and most social scientists, I think) there is no necessary implication that the value of the variable (e.g., a country's per capita GDP) is actually produced by a stochastic process, although in fact this is often how we will think about it (treat as \as if" the result of a repeatable random process). e.g.: an individual's propensity for suicide is assumed to be based on character- ∗ istics of the person (such as degree of social integration) plus \random noise" e.g.: a particular measurement of the weight of an object is understood to be ∗ produced as equal to true weight + bias + noise. Observation: Things that are true of functions are also true, generally, of random • variables. { Thus, if if X(s) and Y (s) are random variables (defined on the same sample space), then so are X(s)+Y (s), X(s) Y (s), X(s)Y (s), and X(s)=Y (s) (provided Y (s) = 0 for all S). − 6 { In general: Thm : If φ(z) is a function and X(s) is a random variable, then φ(X(s)) is a random variable as well. { How are such combinations and compositions of two random variables formed? Case by case. For example, we form Z(s) = X(s) + Y (s) by adding the the values of X(s) + Y (s) for each point s in the sample space. Important: Instead of working with a probability distribution or measure defined on • the sample space S, it is often more convenient to work with the implied probability distribution on the different possible values of the random variable X(s). Let • f(x) = P ( s X(s) = x ): f j g Thus, f(x) is the probability that the random variable will take the value x. The set s X(s) = x is the set of all points in the sample space (possible outcomes of the expf j eriment) sucg h that X(s) = x. (Note: We need to do it like this because there might be many such s S.) 2 So f(x) is a probability distribution on possible outcomes of X(s) which need not be • the same as the probability distribution on the state space S. 2 { e.g.: Let X(s) be the number of heads that appear in two tosses of a fair coin, with S taken as hh; ht; th; tt . Then f g f(0) = P ( tt ) = 1=4 f(1) = P (fht;g th ) = 1=2 f(2) = P (fhh ) g= 1=4 f(x) = 0 forf allg other x = 0; 1; 2 6 { e.g.: Let X(s) be 1 for dyad years with a war and 0 for dyad years with no war, with S taken as a sample of dyad years. Then f(0) = P ( s : s is a peaceful dyadyear ) = :999 f(1) = P (fs : s is dyad year at war ) =g :001 f g From now on, we will often work directly with random variables and the associated • probability distributions on them as defined here, bypassing the underlying sample space. 2 Discrete random variables Defn: A random variable X has a discrete probability distribution f(x) if X can take only a finite number of values (or a countably infinite number of values). e.g.: Let X be the number of heads in n tosses of a fair coin (thus X is a random • variable). Then for x 0; 1; 2; : : : ; 10 , 2 f g 10 1 f(x) = ; x 210 and for all other x, f(x) = 0. Plot with Stata ... Note that this is the same idea as for a histogram, which may take a more continuous • variable, group it into categories, and display the probability associated with each of these categories. e.g.: Let X be the number of bills sent by Congress to the president in a given year. • f(x) would then refer to the probability that x bills are sent. Another way to describe the probability distribution of a random variable ... 3 Defn: If a random variable X has a discrete probability distribution given by f(x), then the cumulative distribution function (c.d.f) of X is F (x) = P (X x): ≤ In words, F (x) is the probability that the random variable produces a value of less than • or equal to x. e.g.: Graph the c.d.f. of a random variable distributed by a binomial distribution for • n = 2 and p = 1=2. ... e.g.: for n = 3 ... • Defn: If an experiment or process produces two random variables X and Y , then the joint probability distribution of X and Y may be written as f(x; y) = P (X = x & Y = y): e.g.: You roll two dice, letting X be the random variable referring to the result on the • first die, and Y the value that shows on the second. What is f(x; y)? recall defn of independence, example where it does not hold (e.g., income and life • expectancy) .. Draw 3D picture of a discrete joint density ... • 3 Continuous random variables Whenever you work with a sample space S that is finite, then you get a discrete distri- • bution f(x). The samples we observe in practice are always finite and thus have discrete distributions. For theoretical reasons, though, it is extremely useful to be able to represent the idea • of a sample space that is (uncountably) infinite. e.g.: Take as the sample space all the points in the unit interval, so that S = [0; 1], with • typical element x [0; 1]. 2 Suppose further that we consider a probability measure such that every point in this • interval is equally likely. This is called a uniform distribution on [0; 1], and is often denoted U[0; 1]. 4 How characterize in terms of a probability distribution function like f(x) for a discrete • distribution? A paradox: If there are an infinite number of possible outcomes in the sample space, and each is equally likely, then what is the probability of any particular point, e.g., s = :378? ... if 0, won't work, if positive, won't work ... Instead, we define a probability measure that assigns positive probability numbers only • to sets of points in [0, 1], and in particular, only to what are called measurable sets of points in the sample space. e.g.: Intuitively, what would be the probability of drawing a point from the subinterval • [0; 1=2] if all points are equally likely? What about from [1=4; 1=2]? In general, we will characterize the uniform probability distribution on [0, 1] in terms of • a density function f(x). This is related to but NOT the same thing as f(x) = P (X(s) = x) in the discrete case. Defn: A density function f(x) for a continuous random variable X has the property that the area under f(x) for any interval [a; b] is equal to the probability of drawing an x [a; b]. 2 In the case of U[0; 1], f(x) = 1 for all x [0; 1]. • 2 Show graphically ... • Demonstrate that P ([a; b]) equals area under f(x) for [a; b]. • What about the probability of drawing x = :358? Or what about the probability of • drawing x :358; :989 ? Defined as having \measure zero" (i.e., zero probability). True for an2y collectionf ofg particular points, even if infinite. In calculus terms, a random variable X has a continuous probability distribution or • continuous density if there exists a nonnegative function f(x) defined on the real line such that for any interval A R, ⊂ P (X A) = f(x)dx: 2 ZA (For those who haven't had calculus, think of the S-like thing as another form of sum- mation operator.
Recommended publications
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F
    Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F Course : Basic Econometrics : HC43 / Statistics B.A. Hons Economics, Semester IV/ Semester III Delhi University Course Instructor: Siddharth Rathore Assistant Professor Economics Department, Gargi College Siddharth Rathore guj75845_appC.qxd 4/16/09 12:41 PM Page 461 APPENDIX C SOME IMPORTANT PROBABILITY DISTRIBUTIONS In Appendix B we noted that a random variable (r.v.) can be described by a few characteristics, or moments, of its probability function (PDF or PMF), such as the expected value and variance. This, however, presumes that we know the PDF of that r.v., which is a tall order since there are all kinds of random variables. In practice, however, some random variables occur so frequently that statisticians have determined their PDFs and documented their properties. For our purpose, we will consider only those PDFs that are of direct interest to us. But keep in mind that there are several other PDFs that statisticians have studied which can be found in any standard statistics textbook. In this appendix we will discuss the following four probability distributions: 1. The normal distribution 2. The t distribution 3. The chi-square (␹2 ) distribution 4. The F distribution These probability distributions are important in their own right, but for our purposes they are especially important because they help us to find out the probability distributions of estimators (or statistics), such as the sample mean and sample variance. Recall that estimators are random variables. Equipped with that knowledge, we will be able to draw inferences about their true population values.
    [Show full text]
  • Random Variables and Applications
    Random Variables and Applications OPRE 6301 Random Variables. As noted earlier, variability is omnipresent in the busi- ness world. To model variability probabilistically, we need the concept of a random variable. A random variable is a numerically valued variable which takes on different values with given probabilities. Examples: The return on an investment in a one-year period The price of an equity The number of customers entering a store The sales volume of a store on a particular day The turnover rate at your organization next year 1 Types of Random Variables. Discrete Random Variable: — one that takes on a countable number of possible values, e.g., total of roll of two dice: 2, 3, ..., 12 • number of desktops sold: 0, 1, ... • customer count: 0, 1, ... • Continuous Random Variable: — one that takes on an uncountable number of possible values, e.g., interest rate: 3.25%, 6.125%, ... • task completion time: a nonnegative value • price of a stock: a nonnegative value • Basic Concept: Integer or rational numbers are discrete, while real numbers are continuous. 2 Probability Distributions. “Randomness” of a random variable is described by a probability distribution. Informally, the probability distribution specifies the probability or likelihood for a random variable to assume a particular value. Formally, let X be a random variable and let x be a possible value of X. Then, we have two cases. Discrete: the probability mass function of X specifies P (x) P (X = x) for all possible values of x. ≡ Continuous: the probability density function of X is a function f(x) that is such that f(x) h P (x < · ≈ X x + h) for small positive h.
    [Show full text]
  • UNIT NUMBER 19.4 PROBABILITY 4 (Measures of Location and Dispersion)
    “JUST THE MATHS” UNIT NUMBER 19.4 PROBABILITY 4 (Measures of location and dispersion) by A.J.Hobson 19.4.1 Common types of measure 19.4.2 Exercises 19.4.3 Answers to exercises UNIT 19.4 - PROBABILITY 4 MEASURES OF LOCATION AND DISPERSION 19.4.1 COMMON TYPES OF MEASURE We include, here three common measures of location (or central tendency), and one common measure of dispersion (or scatter), used in the discussion of probability distributions. (a) The Mean (i) For Discrete Random Variables If the values x1, x2, x3, . , xn of a discrete random variable, x, have probabilities P1,P2,P3,....,Pn, respectively, then Pi represents the expected frequency of xi divided by the total number of possible outcomes. For example, if the probability of a certain value of x is 0.25, then there is a one in four chance of its occurring. The arithmetic mean, µ, of the distribution may therefore be given by the formula n X µ = xiPi. i=1 (ii) For Continuous Random Variables In this case, it is necessary to use the probability density function, f(x), for the distribution which is the rate of increase of the probability distribution function, F (x). For a small interval, δx of x-values, the probability that any of these values occurs is ap- proximately f(x)δx, which leads to the formula Z ∞ µ = xf(x) dx. −∞ (b) The Median (i) For Discrete Random Variables The median provides an estimate of the middle value of x, taking into account the frequency at which each value occurs.
    [Show full text]
  • Field Guide to Continuous Probability Distributions
    Field Guide to Continuous Probability Distributions Gavin E. Crooks v 1.0.0 2019 G. E. Crooks – Field Guide to Probability Distributions v 1.0.0 Copyright © 2010-2019 Gavin E. Crooks ISBN: 978-1-7339381-0-5 http://threeplusone.com/fieldguide Berkeley Institute for Theoretical Sciences (BITS) typeset on 2019-04-10 with XeTeX version 0.99999 fonts: Trump Mediaeval (text), Euler (math) 271828182845904 2 G. E. Crooks – Field Guide to Probability Distributions Preface: The search for GUD A common problem is that of describing the probability distribution of a single, continuous variable. A few distributions, such as the normal and exponential, were discovered in the 1800’s or earlier. But about a century ago the great statistician, Karl Pearson, realized that the known probabil- ity distributions were not sufficient to handle all of the phenomena then under investigation, and set out to create new distributions with useful properties. During the 20th century this process continued with abandon and a vast menagerie of distinct mathematical forms were discovered and invented, investigated, analyzed, rediscovered and renamed, all for the purpose of de- scribing the probability of some interesting variable. There are hundreds of named distributions and synonyms in current usage. The apparent diver- sity is unending and disorienting. Fortunately, the situation is less confused than it might at first appear. Most common, continuous, univariate, unimodal distributions can be orga- nized into a small number of distinct families, which are all special cases of a single Grand Unified Distribution. This compendium details these hun- dred or so simple distributions, their properties and their interrelations.
    [Show full text]
  • Random Variables and Probability Distributions
    Schaum's Outline of Probability and Statistics CHAPTERCHAPTER 122 Random Variables and Probability Distributions Random Variables Suppose that to each point of a sample space we assign a number. We then have a function defined on the sam- ple space. This function is called a random variable (or stochastic variable) or more precisely a random func- tion (stochastic function). It is usually denoted by a capital letter such as X or Y. In general, a random variable has some specified physical, geometrical, or other significance. EXAMPLE 2.1 Suppose that a coin is tossed twice so that the sample space is S ϭ {HH, HT, TH, TT}. Let X represent the number of heads that can come up. With each sample point we can associate a number for X as shown in Table 2-1. Thus, for example, in the case of HH (i.e., 2 heads), X ϭ 2 while for TH (1 head), X ϭ 1. It follows that X is a random variable. Table 2-1 Sample Point HH HT TH TT X 2110 It should be noted that many other random variables could also be defined on this sample space, for example, the square of the number of heads or the number of heads minus the number of tails. A random variable that takes on a finite or countably infinite number of values (see page 4) is called a dis- crete random variable while one which takes on a noncountably infinite number of values is called a nondiscrete random variable. Discrete Probability Distributions Let X be a discrete random variable, and suppose that the possible values that it can assume are given by x1, x2, x3, .
    [Show full text]
  • Discrete Random Variables and Probability Distributions
    Discrete Random 3 Variables and Probability Distributions Stat 4570/5570 Based on Devore’s book (Ed 8) Random Variables We can associate each single outcome of an experiment with a real number: We refer to the outcomes of such experiments as a “random variable”. Why is it called a “random variable”? 2 Random Variables Definition For a given sample space S of some experiment, a random variable (r.v.) is a rule that associates a number with each outcome in the sample space S. In mathematical language, a random variable is a “function” whose domain is the sample space and whose range is the set of real numbers: X : R S ! So, for any event s, we have X(s)=x is a real number. 3 Random Variables Notation! 1. Random variables - usually denoted by uppercase letters near the end of our alphabet (e.g. X, Y). 2. Particular value - now use lowercase letters, such as x, which correspond to the r.v. X. Birth weight example 4 Two Types of Random Variables A discrete random variable: Values constitute a finite or countably infinite set A continuous random variable: 1. Its set of possible values is the set of real numbers R, one interval, or a disjoint union of intervals on the real line (e.g., [0, 10] ∪ [20, 30]). 2. No one single value of the variable has positive probability, that is, P(X = c) = 0 for any possible value c. Only intervals have positive probabilities. 5 Probability Distributions for Discrete Random Variables Probabilities assigned to various outcomes in the sample space S, in turn, determine probabilities associated with the values of any particular random variable defined on S.
    [Show full text]
  • Skewness and Kurtosis As Indicators of Non-Gaussianity in Galactic Foreground Maps
    Prepared for submission to JCAP Skewness and Kurtosis as Indicators of Non-Gaussianity in Galactic Foreground Maps Assaf Ben-Davida;b , Sebastian von Hauseggerb and Andrew D. Jacksona aNiels Bohr International Academy, The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, DK-2100 Copenhagen Ø, Denmark bDiscovery Center, The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, DK-2100 Copenhagen Ø, Denmark E-mail: [email protected], [email protected] Abstract. Observational cosmology is entering an era in which high precision will be required in both measurement and data analysis. Accuracy, however, can only be achieved with a thorough understanding of potential sources of contamination from foreground effects. Our primary focus will be on non-Gaussian effects in foregrounds. This issue will be crucial for coming experiments to determine B-mode polarization. We propose a novel method for investigating a data set in terms of skewness and kurtosis in locally defined regions that collectively cover the entire sky. The method is demonstrated on two sky maps: (i) the SMICA map of Cosmic Microwave Background fluctuations provided by the Planck Collaboration and (ii) a version of the Haslam map at 408 MHz that describes synchrotron radiation. We find that skewness and kurtosis can be evaluated in combination to reveal local physical information. In the present case, we demonstrate that the statistical properties of both maps in small local regions are predominantly Gaussian. This result was expected for the SMICA map. It is surprising that it also applies for the Haslam map given its evident large scale non-Gaussianity. The approach described here has a generality and flexibility that should make it useful in a variety of astrophysical and cosmological contexts.
    [Show full text]
  • Stat 5421 Lecture Notes Exponential Families, Part I Charles J. Geyer April 4, 2016
    Stat 5421 Lecture Notes Exponential Families, Part I Charles J. Geyer April 4, 2016 Contents 1 Introduction 3 1.1 Definition . 3 1.2 Terminology . 4 1.3 Affine Functions . 4 1.4 Non-Uniqueness . 5 1.5 Non-Degeneracy . 6 1.6 Cumulant Functions . 6 1.7 Fullness and Regularity . 6 2 Examples I 7 2.1 Binomial . 7 2.2 Poisson . 9 2.3 Negative Binomial . 10 2.4 Multinomial . 12 2.4.1 Try I . 12 2.4.2 Try II . 12 2.4.3 Try III . 14 2.4.4 Summary . 16 3 Independent and Identically Distributed 18 4 Sufficiency 19 4.1 Definition . 19 4.2 The Sufficiency Principle . 19 4.3 The Neyman-Fisher Factorization Criterion . 19 4.4 The Pitman-Koopman-Darmois Theorem . 20 4.5 Linear Models . 20 4.6 Canonical Affine Submodels . 23 4.7 Sufficient Dimension Reduction . 25 1 5 Examples II 25 5.1 Logistic Regression . 25 5.2 Poisson Regression with Log Link . 27 6 Maximum Likelihood Estimation 27 6.1 Identifiability . 27 6.2 Mean-Value Parameters . 28 6.3 Canonical versus Mean-Value Parameters . 29 6.4 Maximum Likelihood . 30 6.5 Fisher Information . 31 6.6 Sufficient Dimension Reduction Revisited . 32 6.7 Examples III . 33 6.7.1 Data . 33 6.7.2 Fit . 33 6.7.3 Observed Equals Expected . 35 6.7.4 Wald Tests . 36 6.7.5 Wald Confidence Intervals . 38 6.7.6 Likelihood Ratio Tests . 38 6.7.7 Likelihood Ratio Confidence Intervals . 39 6.7.8 Rao Tests .
    [Show full text]
  • Lecture Notes 5)
    Chapter 3 Continuous Random Variables 3.1 Introduction Rather than summing probabilities related to discrete random variables, here for continuous random variables, the density curve is integrated to determine probability. Exercise 3.1 (Introduction) Patient's number of visits, X, and duration of visit, Y . density, pmf f(x) density, pdf f(y) = y/6, 2 < y <_ 4 probability less than 1.5 = 1 sum of probability 1 at specific values probability less than 3 0.75 P(X < 1.5) = P(X = 0) + P(X = 1) 2/3 = 0.25 + 0.50 = 0.75 = area under curve, 0.50 P(Y < 3) = 5/12 1/2 probability at 3, P(X = 2) = 0.25 P(Y = 3) = 0 0.25 1/3 0 1 2 0 1 2 3 4 x probability (distribution): cdf F(x) 1 1 probability = 0.75 0.75 value of function, F(3) = P(Y < 3) = 5/12 0.50 probability less than 1.5 = value of function 0.25 F(1.5 ) = P(X < 1.5) = 0.75 0.25 0 1 2 0 1 2 3 4 x Figure 3.1: Comparing discrete and continuous distributions 73 74 Chapter 3. Continuous Random Variables (LECTURE NOTES 5) 1. Number of visits, X is a (i) discrete (ii) continuous random variable, and duration of visit, Y is a (i) discrete (ii) continuous random variable. 2. Discrete (a) P (X = 2) = (i) 0 (ii) 0:25 (iii) 0:50 (iv) 0:75 (b) P (X ≤ 1:5) = P (X ≤ 1) = F (1) = 0:25 + 0:50 = 0:75 requires (i) summation (ii) integration and is a value of a (i) probability mass function (ii) cumulative distribution function which is a (i) stepwise (ii) smooth increasing function (c) E(X) = (i) P xf(x) (ii) R xf(x) dx (d) V ar(X) = (i) E(X2) − µ2 (ii) E(Y 2) − µ2 (e) M(t) = (i) E etX (ii) E etY (f) Examples of discrete densities (distributions) include (choose one or more) (i) uniform (ii) geometric (iii) hypergeometric (iv) binomial (Bernoulli) (v) Poisson 3.
    [Show full text]
  • Statistics of the Spectral Kurtosis Estimator
    Statistics of the Spectral Kurtosis Estimator Gelu M. Nita1 and Dale E. Gary1 ABSTRACT Spectral Kurtosis (SK; defined by Nita et al. 2007) is a statistical approach for detecting and removing radio frequency interference (RFI) in radio astronomy data. In this paper, the statistical properties of the SK estimator are investigated and all moments of its probability density function are analytically determined. These moments provide a means to determine the tail probabilities of the esti- mator that are essential to defining the thresholds for RFI discrimination. It is shown that, for a number of accumulated spectra M ≥ 24, the first SK standard moments satisfy the conditions required by a Pearson Type IV (Pearson 1985) probability distribution function (PDF), which is shown to accurately reproduce the observed distributions. The cumulative function (CF) of the Pearson Type IV, in both analytical and numerical form, is then found suitable for accurate estimation of the tail probabilities of the SK estimator. This same framework is also shown to be applicable to the related Time Domain Kurtosis (TDK) es- timator (Ruf, Gross, & Misra 2006), whose PDF corresponds to Pearson Type IV when the number of time-domain samples is M ≥ 46. The PDF and CF are determined for this case also. Subject headings: SK- Spectral Kurtosis, TDK- Time Domain Kurtosis, RFI- Radio Frequency Interference 1. Introduction Given the expansion of radio astronomy instrumentation to ever broader bandwidths, and the simultaneous increase in usage of the radio spectrum for wireless communication, radio frequency interference (RFI) has become a limiting factor in the design of a new generation of radio telescopes.
    [Show full text]
  • Sufficient Statistics and Exponential Family 1 Statistics and Sufficient
    Math 541: Statistical Theory II Su±cient Statistics and Exponential Family Lecturer: Songfeng Zheng 1 Statistics and Su±cient Statistics Suppose we have a random sample X1; ¢ ¢ ¢ ;Xn taken from a distribution f(xjθ) which relies on an unknown parameter θ in a parameter space £. The purpose of parameter estimation is to estimate the parameter θ from the random sample. We have already studied three parameter estimation methods: method of moment, maximum likelihood, and Bayes estimation. We can see from the previous examples that the estimators can be expressed as a function of the random sample X1; ¢ ¢ ¢ ;Xn. Such a function is called a statistic. Formally, any real-valued function T = r(X1; ¢ ¢ ¢ ;Xn) of the observations in the sam- ple is called a statistic. In this function, there should not be any unknown parameter. For example, suppose we have a random sample X1; ¢ ¢ ¢ ;Xn, then X, max(X1; ¢ ¢ ¢ ;Xn), median(X1; ¢ ¢ ¢ ;Xn), and r(X1; ¢ ¢ ¢ ;Xn) = 4 are statistics; however X1 + ¹ is not statistic if ¹ is unknown. For the parameter estimation problem, we know nothing about the parameter but the obser- vations from such a distribution. Therefore, the observations X1; ¢ ¢ ¢ ;Xn is our ¯rst hand of information source about the parameter, that is to say, all the available information about the parameter is contained in the observations. However, we know that the estimators we obtained are always functions of the observations, i.e., the estimators are statistics, e.g. sam- ple mean, sample standard deviations, etc. In some sense, this process can be thought of as \compress" the original observation data: initially we have n numbers, but after this \com- pression", we only have 1 numbers.
    [Show full text]