Lecture 7, Review Data Reduction 1. Sufficient Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 7, Review Data Reduction 1. Sufficient Statistics Lecture 7, Review Data Reduction We should think about the following questions carefully before the "simplification" process: • Is there any loss of information due to summarization? • How to compare the amount of information about θ in the original data X and in 푇(푿)? • Is it sufficient to consider only the "reduced data" T? 1. Sufficient Statistics A statistic T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution). Example. Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ? Sufficiency Principle If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle : Any inference procedure should depend on the data only through sufficient statistics. 1 Definition: Sufficient Statistic (in terms of Conditional Probability) (discrete case): For any x and t, if the conditional pdf of X given T: 푃(퐗 = 퐱, 푇(퐗) = 푡) 푃(퐗 = 퐱) 푃(퐗 = 퐱|푇(퐗) = 푡) = = 푃(푇(퐗) = 푡) 푃(푇(퐗) = 푡) does not depend on θ then we say 푇(퐗) is a sufficient statistic for θ. Sufficient Statistic, the general definition (for both discrete and continuous variables): Let the pdf of data X is 푓(퐱; 휃) and the pdf of T be 푞(푡; 휃). If 푓(퐱; 휃)/푞(푇 (퐱); 휃) is free of θ, (may depend on x) (∗) for all x and θ, then T is a sufficient statistic for θ. Example: Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ? 푋 푖. 푖. 푑. Bernoulli: 푓(푥) = 휃 (1− 휃) , 푥 = 0,1 ∑ ∑ 푓(풙; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃) 푇 = ∑ 푋 ~ B(n,θ): 푛 ∑ ∑ 푞(푡; 휃) = 푞(∑ 푥) = 휃 (1− 휃) ∑ 푥 Thus 푓(퐱; 휃) 푛 = 1/ 푞(푇 (퐱); 휃) 푡 is free of θ, So by the definition, ∑ 푋 is a sufficient statistic for 휃. 2 Example. 푋,…, 푋 iid 푁 (휃,1). 푇 = 푋. Remarks: The definition (*) is not always easy to apply. • Need to guess the form of a sufficient statistic. • Need to figure out the distribution of T. How to find a sufficient statistic? 2. (Neyman-Fisher) Factorization theorem. T is sufficient if and only if 푓(퐱; 휃) can be written as the product 푔(푇(퐱); 휃)ℎ(퐱), where the first factor depends on x only though 푇(퐱) and the second factor is free of θ. Example. Binomial. iid bin(1, θ) Solution 1: Bernoulli: 푓(푥) = 휃(1− 휃), 푥 = 0,1 ∑ ∑ 푓(퐱; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃) = 휃∑ (1− 휃)∑ ⋅ [1] = 푔 푥 , 휃 ⋅ℎ(푥,…, 푥) So according to the factorization theorem, 푇 = ∑ 푋 is a sufficient statistic for 휃. Solution 2: 푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 휃 (1− 휃) , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 ∑ ( )∑ = 휃 1− 휃 , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒 3 = 휃 (1− 휃) ℎ(푥, 푥,…, 푥) = 푔(푡, 휃)ℎ(푥,…, 푥), where 푔(푡, 휃) = 휃(1− 휃) and 1, 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for θ. Example. Exp(λ). Let 푋,…, 푋 be a random sample from an exponential distribution with rate 휆. And Let 푇 = 푋 + 푋 +⋯+ 푋 and 푓 be the joint density of 푋, 푋,…, 푋. 푓(퐱; 휆) = 푓(푥, 푥,…, 푥|휆) 휆푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 ∑ = 휆 푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒 = 휆 푒 ℎ(푥,…, 푥) = 푔(푡, 휆)ℎ(푥,…, 푥) where 푔(푡, 휆) = 휆푒, and 1, 푖푓 푥 > 0, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for 휆. Example. Normal. iid N(θ,1). Please derive the sufficient statistic for θ by yourself. 4 When the range of X depends on θ, should be more careful about factorization. Must use indicator functions explicitly. Example. Uniform. iid 푼(ퟎ, 휽). Solution 1: Let 푋,…, 푋 be a random sample from an uniform distribution on (0, 휃). And Let 푇 = 푋() and 푓 be the joint density of 푋, 푋,…, 푋. Then 푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒 1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒 1 , 푖푓 휃 > 푥 ≥⋯≥ 푥 >0 = 휃 () () 0, 표푡ℎ푒푟푤푖푠푒 = 푔(푡, 휃)ℎ(푥,…, 푥) where , 푖푓 휃 > 푡 >0 푔(푡, 휃) = , 0, 표푡ℎ푒푟푤푖푠푒 and 1, 푖푓 푥 >0 ℎ(푥 ,…, 푥 ) = () 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for 휃. *** I personally prefer this approach because it is most straightforward. Alternatively, one can use the indicator function and simplify the solution as illustrated next. 5 Definition: Indicator function 1, 푖푓 푥 ∈ 퐴 퐼 (푥)= 0, 푖푓 푥 ∉ 퐴 Solution 2 (in terms of the indicator function): Uniform: 푓(푥) = , 푥 ∈ (0, 휃) 1 푓(퐱; 휃) = 푓(푥 ,…, 푥 ) = , 푥 ∈ (0, 휃),∀푖 휃 1 = 퐼 (푥 ) 휃 (,) 1 = 퐼 푥 ⋅ 퐼 푥 휃 (,) () (,) () 1 =[ 퐼 푥 ]⋅[퐼 푥 ] 휃 (,) () (,) () = 푔푥(), 휃 ⋅ℎ(푥,…, 푥) So by factorization theorem, 푥() is a sufficient statistic for 휃. Example: Please derive the sufficient statistics for θ, when given a random sample of size n from 푈 (휃, 휃 + 1). Solution: 1. Indicator function approach: Uniform: 푓(푥) = 1, 푥 ∈(휃, 휃 + 1) 푓(푥,…, 푥|휃) = (1) , 푥 ∈ (휃, 휃 +1),∀푖 = (1) 퐼(,)(푥) = 퐼(,)푥() ⋅ 퐼(,)푥() =[퐼(,)푥() ⋅ 퐼(,)푥()] ⋅ [1] = 푔푥(), 푥(), 휃 ⋅ ℎ(푥,…, 푥) So, 푇 = 푋(), 푋() is a SS for 휃. 2. Do not use the indicator function: 6 푓(푥, 푥,…, 푥|휃) 1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 1, 푖푓 휃 +1> 푥 ≥⋯≥ 푥 > 휃 = () () 0, 표푡ℎ푒푟푤푖푠푒 = 푔(푥(), 푥(), 휃)ℎ(푥,…, 푥) where 1, 푖푓 휃 +1> 푥 푎푛푑 푥 > 휃 푔푥 , 푥 , 휃 = () () , () () 0, 표푡ℎ푒푟푤푖푠푒 and ℎ(푥,…, 푥) =1 So 푇 = 푋(), 푋() is a SS for 휃. Two-dimensional Examples. Example. Normal. iid 푵(흁, 흈ퟐ). 휽 = (흁, 흈ퟐ) (both unknown). Let 푋,…, 푋 be a random sample from a normal distribution 푁(휇, 휎). And Let 1 푋 = 푋 , 푛 1 S = (푋 − 푋), 푛 −1 and let 푓 be the joint density of 푋, 푋,…, 푋. 1 () 푓(퐱; 휽) = 푓(푥, 푥,…, 푥|휇, 휎 )= 푒 √2휋휎 7 1 ∑ ( ) = 푒 2휋휎 Now (푥 − 휇) = (푥 − 푥̅ + 푥̅ − 휇) = (푥 − 푥̅) +2 (푥 − 푥̅)(푥̅ − 휇) + (푥̅ − 휇) = (n−1)s +2(푥̅ − 휇) (푥 − 푥̅) + 푛(푥̅ − 휇) = (n−1)s + 푛(푥̅ − 휇). Thus, 푓(퐱; 휽) = 푓(푥, 푥,…, 푥 | 휇, 휎 ) 1 1 = exp (− ((푛 −1)푠 + 푛(푥̅ − 휇))) 2휋휎 2휎 = 푔(푥̅, 푠 , 휇, 휎 )ℎ(푥,…, 푥), where 푔(푥̅, 푠, 휇, 휎) 1 1 = 푒푥푝 − (푛 −1)푠 + 푛(푥̅ − 휇), 2휋휎 2휎 and ℎ(푥,…, 푥) = 1. In this case we say (푋, 푆) is sufficient for (휇, 휎). 8 3. (Regular) Exponential Family The density function of a regular exponential family is: 푓 (푥; 휽) = 푐(휽)ℎ(푥) exp 푤(휽)푡(푥) , 휽 = (휃,…, 휃) Example. Poisson(θ) 1 푓 (푥; 휃) = exp (−휃) exp[ln(휃) ∗ 푥] 푥! Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown). 1 () 푓푥; 휇, 휎)= 푒 √2휋휎 1 1 = exp − (푥 − 휇) √2휋휎 2휎 1 1 = exp − (푥 −2푥휇 + 휇) √2휋휎 2휎 1 휇 1 = exp − exp − (푥 −2푥휇) √2휋휎 2휎 2휎 4. Theorem (Exponential family & sufficient Statistic). Let 푋,…, 푋 be a random sample from the regular exponential family. Then 푻(푿) = 푡(푋) ,…, 푡(푋) is sufficient for 휽 = (휃,…, 휃). 9 Example. Poisson(θ) Let 푋,…, 푋 be a random sample from Poisson(θ) Then 푇(푿) = 푋 is sufficient for 휃 . Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown). Let 푋,…, 푋 be a random sample from 푵(휇, 휎 ) Then 푻(푿) =( 푋 , 푋 ) is sufficient for 휽 = (휇, 휎). Exercise. Apply the general exponential family result to all the standard families discussed above such as binomial, Poisson, normal, exponential, gamma. A Non-Exponential Family Example. Discrete uniform. 푃 (푋 = 푥) = 1/휃, 푥 = 1,..., 휃, 휃 is a positive integer. Another Non-exponential Example. 푋,..., 푋iid 푈 (0, 휃), 푇 = 푋(). 10 Universal Cases. 푋,…, 푋are iid with density 푓 . • The original data 푋,…, 푋 are always sufficient for 휃. (They are trivial statistics, since they do not lead any data reduction) • Order statistics 푇 = 푋(),…, 푋() are always sufficient for 휃. ( The dimension of order statistics is 푛, the same as the dimension of the data. Still this is a nontrivial reduction as 푛! different values of data corresponds to one value of 푇 . ) 5. Theorem (Rao-Blackwell) Let 푋,…, 푋 be a random sample from the population with pdf 푓 (풙; 휽). Let 푻(푿) be a sufficient statistic for θ, and 푼(푿) be any unbiased estimator of θ. Let 푼∗(푿) = 퐸[푼(푿)|푻], then (1) 푼∗(푿) is an unbiased estimator of 휽, (2) 푼∗(푿) is a function of T, (3) 푽풂풓(푼∗)≤ 푽풂풓(푼) for every 휽, and 푽풂풓(푼∗) < 푉푎푟(푈) for some 휽 unless 푼∗ = 푼 with probability 1 .
Recommended publications
  • 5. Completeness and Sufficiency 5.1. Complete Statistics. Definition 5.1. a Statistic T Is Called Complete If Eg(T) = 0 For
    5. Completeness and sufficiency 5.1. Complete statistics. Definition 5.1. A statistic T is called complete if Eg(T ) = 0 for all θ and some function g implies that P (g(T ) = 0; θ) = 1 for all θ. This use of the word complete is analogous to calling a set of vectors v1; : : : ; vn complete if they span the whole space, that is, any v can P be written as a linear combination v = ajvj of these vectors. This is equivalent to the condition that if w is orthogonal to all vj's, then w = 0. To make the connection with Definition 5.1, let's consider the discrete case. Then completeness means that P g(t)P (T = t; θ) = 0 implies that g(t) = 0. Since the sum may be viewed as the scalar prod- uct of the vectors (g(t1); g(t2);:::) and (p(t1); p(t2);:::), with p(t) = P (T = t), this is the analog of the orthogonality condition just dis- cussed. We also see that the terminology is somewhat misleading. It would be more accurate to call the family of distributions p(·; θ) complete (rather than the statistic T ). In any event, completeness means that the collection of distributions for all possible values of θ provides a sufficiently rich set of vectors. In the continuous case, a similar inter- pretation works. Completeness now refers to the collection of densities f(·; θ), and hf; gi = R fg serves as the (abstract) scalar product in this case. Example 5.1. Let's take one more look at the coin flip example.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Lecture 4: Sufficient Statistics 1 Sufficient Statistics
    ECE 830 Fall 2011 Statistical Signal Processing instructor: R. Nowak Lecture 4: Sufficient Statistics Consider a random variable X whose distribution p is parametrized by θ 2 Θ where θ is a scalar or a vector. Denote this distribution as pX (xjθ) or p(xjθ), for short. In many signal processing applications we need to make some decision about θ from observations of X, where the density of X can be one of many in a family of distributions, fp(xjθ)gθ2Θ, indexed by different choices of the parameter θ. More generally, suppose we make n independent observations of X: X1;X2;:::;Xn where p(x1 : : : xnjθ) = Qn i=1 p(xijθ). These observations can be used to infer or estimate the correct value for θ. This problem can be posed as follows. Let x = [x1; x2; : : : ; xn] be a vector containing the n observations. Question: Is there a lower dimensional function of x, say t(x), that alone carries all the relevant information about θ? For example, if θ is a scalar parameter, then one might suppose that all relevant information in the observations can be summarized in a scalar statistic. Goal: Given a family of distributions fp(xjθ)gθ2Θ and one or more observations from a particular dis- tribution p(xjθ∗) in this family, find a data compression strategy that preserves all information pertaining to θ∗. The function identified by such strategyis called a sufficient statistic. 1 Sufficient Statistics Example 1 (Binary Source) Suppose X is a 0=1 - valued variable with P(X = 1) = θ and P(X = 0) = 1 − θ.
    [Show full text]
  • Minimum Rates of Approximate Sufficient Statistics
    1 Minimum Rates of Approximate Sufficient Statistics Masahito Hayashi,y Fellow, IEEE, Vincent Y. F. Tan,z Senior Member, IEEE Abstract—Given a sufficient statistic for a parametric family when one is given X, then Y is called a sufficient statistic of distributions, one can estimate the parameter without access relative to the family fPXjZ=zgz2Z . We may then write to the data. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, X for n independent samples drawn from a k-nomial distribution PXjZ=z(x) = PXjY (xjy)PY jZ=z(y); 8 (x; z) 2 X × Z with d = k − 1 degrees of freedom, the length of the code scales y2Y as d log n + O(1). In many applications, we may not have a (1) useful notion of sufficient statistics (e.g., when the parametric or more simply that X (−− Y (−− Z forms a Markov chain family is not an exponential family) and we also may not need in this order. Because Y is a function of X, it is also true that to reconstruct the generating distribution exactly. By adopting I(Z; X) = I(Z; Y ). This intuitively means that the sufficient a Shannon-theoretic approach in which we allow a small error in estimating the generating distribution, we construct various statistic Y provides as much information about the parameter approximate sufficient statistics and show that the code length Z as the original data X does. d can be reduced to 2 log n + O(1). We consider errors measured For concreteness in our discussions, we often (but not according to the relative entropy and variational distance criteria.
    [Show full text]
  • The Likelihood Function - Introduction
    The Likelihood Function - Introduction • Recall: a statistical model for some data is a set { f θ : θ ∈ Ω} of distributions, one of which corresponds to the true unknown distribution that produced the data. • The distribution fθ can be either a probability density function or a probability mass function. • The joint probability density function or probability mass function of iid random variables X1, …, Xn is n θ ()1 ,..., n = ∏ θ ()xfxxf i . i=1 week 3 1 The Likelihood Function •Let x1, …, xn be sample observations taken on corresponding random variables X1, …, Xn whose distribution depends on a parameter θ. The likelihood function defined on the parameter space Ω is given by L|(θ x1 ,..., xn ) = θ f( 1,..., xn ) x . • Note that for the likelihood function we are fixing the data, x1,…, xn, and varying the value of the parameter. •The value L(θ | x1, …, xn) is called the likelihood of θ. It is the probability of observing the data values we observed given that θ is the true value of the parameter. It is not the probability of θ given that we observed x1, …, xn. week 3 2 Examples • Suppose we toss a coin n = 10 times and observed 4 heads. With no knowledge whatsoever about the probability of getting a head on a single toss, the appropriate statistical model for the data is the Binomial(10, θ) model. The likelihood function is given by • Suppose X1, …, Xn is a random sample from an Exponential(θ) distribution. The likelihood function is week 3 3 Sufficiency - Introduction • A statistic that summarizes all the information in the sample about the target parameter is called sufficient statistic.
    [Show full text]
  • Clinical Trial Design As a Decision Problem
    Special Issue Paper Received 2 July 2016, Revised 15 November 2016, Accepted 19 November 2016 Published online 13 January 2017 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/asmb.2222 Clinical trial design as a decision problem Peter Müllera*†, Yanxun Xub and Peter F. Thallc The intent of this discussion is to highlight opportunities and limitations of utility-based and decision theoretic arguments in clinical trial design. The discussion is based on a specific case study, but the arguments and principles remain valid in general. The exam- ple concerns the design of a randomized clinical trial to compare a gel sealant versus standard care for resolving air leaks after pulmonary resection. The design follows a principled approach to optimal decision making, including a probability model for the unknown distributions of time to resolution of air leaks under the two treatment arms and an explicit utility function that quan- tifies clinical preferences for alternative outcomes. As is typical for any real application, the final implementation includessome compromises from the initial principled setup. In particular, we use the formal decision problem only for the final decision, but use reasonable ad hoc decision boundaries for making interim group sequential decisions that stop the trial early. Beyond the discussion of the particular study, we review more general considerations of using a decision theoretic approach for clinical trial design and summarize some of the reasons why such approaches are not commonly used. Copyright © 2017 John Wiley & Sons, Ltd. Keywords: Bayesian decision problem; Bayes rule; nonparametric Bayes; optimal design; sequential stopping 1. Introduction We discuss opportunities and practical limitations of approaching clinical trial design as a formal decision problem.
    [Show full text]
  • Statistical Inference
    GU4204: Statistical Inference Bodhisattva Sen Columbia University February 27, 2020 Contents 1 Introduction5 1.1 Statistical Inference: Motivation.....................5 1.2 Recap: Some results from probability..................5 1.3 Back to Example 1.1...........................8 1.4 Delta method...............................8 1.5 Back to Example 1.1........................... 10 2 Statistical Inference: Estimation 11 2.1 Statistical model............................. 11 2.2 Method of Moments estimators..................... 13 3 Method of Maximum Likelihood 16 3.1 Properties of MLEs............................ 20 3.1.1 Invariance............................. 20 3.1.2 Consistency............................ 21 3.2 Computational methods for approximating MLEs........... 21 3.2.1 Newton's Method......................... 21 3.2.2 The EM Algorithm........................ 22 1 4 Principles of estimation 23 4.1 Mean squared error............................ 24 4.2 Comparing estimators.......................... 25 4.3 Unbiased estimators........................... 26 4.4 Sufficient Statistics............................ 28 5 Bayesian paradigm 33 5.1 Prior distribution............................. 33 5.2 Posterior distribution........................... 34 5.3 Bayes Estimators............................. 36 5.4 Sampling from a normal distribution.................. 37 6 The sampling distribution of a statistic 39 6.1 The gamma and the χ2 distributions.................. 39 6.1.1 The gamma distribution..................... 39 6.1.2 The Chi-squared distribution.................. 41 6.2 Sampling from a normal population................... 42 6.3 The t-distribution............................. 45 7 Confidence intervals 46 8 The (Cramer-Rao) Information Inequality 51 9 Large Sample Properties of the MLE 57 10 Hypothesis Testing 61 10.1 Principles of Hypothesis Testing..................... 61 10.2 Critical regions and test statistics.................... 62 10.3 Power function and types of error.................... 64 10.4 Significance level............................
    [Show full text]
  • Structural Equation Models and Mixture Models with Continuous Non-Normal Skewed Distributions
    Structural Equation Models And Mixture Models With Continuous Non-Normal Skewed Distributions Tihomir Asparouhov and Bengt Muth´en Mplus Web Notes: No. 19 Version 2 July 3, 2014 1 Abstract In this paper we describe a structural equation modeling framework that allows non-normal skewed distributions for the continuous observed and latent variables. This framework is based on the multivariate restricted skew t-distribution. We demonstrate the advantages of skewed structural equation modeling over standard SEM modeling and challenge the notion that structural equation models should be based only on sample means and covariances. The skewed continuous distributions are also very useful in finite mixture modeling as they prevent the formation of spurious classes formed purely to compensate for deviations in the distributions from the standard bell curve distribution. This framework is implemented in Mplus Version 7.2. 2 1 Introduction Standard structural equation models reduce data modeling down to fitting means and covariances. All other information contained in the data is ignored. In this paper, we expand the standard structural equation model framework to take into account the skewness and kurtosis of the data in addition to the means and the covariances. This new framework looks deeper into the data to yield a more informative structural equation model. There is a preconceived notion that standard structural equation models are sufficient as long as the standard errors of the parameter estimates are adjusted for failure to meet the normality assumption, but this is not correct. Even with robust estimation, the data are reduced to means and covariances. Only the standard errors of the parameter estimates extract additional information from the data.
    [Show full text]
  • An Empirical Test of the Sufficient Statistic Result for Monetary Shocks
    An Empirical Test of the Sufficient Statistic Result for Monetary Shocks Andrea Ferrara Advisor: Prof. Francesco Lippi Thesis submitted to Einaudi Institute for Economics and Finance Department of Economics, LUISS Guido Carli to satisfy the requirements of the Master in Economics and Finance June 2020 Acknowledgments I thank my advisor Francesco Lippi for his guidance, availability and patience; I have been fortunate to have a continuous and close discussion with him throughout my master thesis’ studies and I am grateful for his teachings. I also thank Erwan Gautier and Herve´ Le Bihan at the Banque de France for providing me the data and for their comments. I also benefited from the comments of Felipe Berrutti, Marco Lippi, Claudio Michelacci, Tommaso Proietti and workshop participants at EIEF. I am grateful to the entire faculty of EIEF and LUISS for the teachings provided during my master. I am thankful to my classmates for the time spent together studying and in particular for their friendship. My friends in Florence have been a reference point in hard moments. Finally, I thank my family for their presence in any circumstance during these years. Abstract We empirically test the sufficient statistic result of Alvarez, Lippi and Oskolkov (2020). This theoretical result predicts that the cumulative effect of a monetary shock is summarized by the ratio of two steady state moments: frequency and kurtosis of price changes. Our strategy consists of three steps. In the first step, we employ a Factor Augmented VAR to estimate the response of different sectors to a monetary shock. In the second step, using microdata, we compute the sectorial frequency and the kurtosis of price changes.
    [Show full text]
  • 1 Sufficient Statistic Theorem
    Mathematical Statistics (NYU, Spring 2003) Summary (answers to his potential exam questions) By Rebecca Sela 1Sufficient statistic theorem (1) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetT (X1, ..., Xn) be asufficient statistic for θ with continuous factor function F (T (X1,...,Xn),θ). Then, P (X A T (X )=t) = lim P (X A (T (X ) t h) ∈ | h 0 ∈ | − ≤ → ¯ ¯ P (X A, (T (X ) t h¯)/h ¯ ¯ ¯ = lim ∈ − ≤ h 0 ¯ ¯ → P ( T (X¯ ) t h¯)/h ¯ − ≤ ¯ d ¯ ¯ P (X A,¯ T (X ) ¯t) = dt ∈ ¯ ≤ ¯ d P (T (X ) t) dt ≤ Consider first the numerator: d d P (X A, T (X ) t)= f(x1,θ)...f(xn,θ)dx1...dxn dt ∈ ≤ dt A x:T (x)=t Z ∩{ } d = F (T (x),θ),h(x)dx1...dxn dt A x:T (x)=t Z ∩{ } 1 = lim F (T (x),θ),h(x)dx1...dxn h 0 h A x: T (x) t h → Z ∩{ | − |≤ } Since mins [t,t+h] F (s, θ) F (t, θ) maxs [t,t+h] on the interval [t, t + h], we find: ∈ ≤ ≤ ∈ 1 1 lim (min F (s, θ)) h(x)dx lim F (T (x),θ)h(x)dx h 0 s [t,t+h] h A x: T (x) t h ≤ h 0 h A x: T (x) t h → ∈ Z ∩{ k − k≤ } → Z ∩{ k − k≤ } 1 lim (max F (s, θ)) h(x)dx ≤ h 0 s [t,t+h] h A x: T (x) t h → ∈ Z ∩{ k − k≤ } 1 By the continuity of F (t, θ), limh 0(mins [t,t+h] F (s, θ)) h h(x)dx = → ∈ A x: T (x) t h 1 ∩{ k − k≤ } limh 0(maxs [t,t+h] F (s, θ)) h A x: T (x) t h h(x)dx = F (t, θ).Thus, → ∈ ∩{ k − k≤ } R R 1 1 lim F (T (x),θ),h(x)dx1...dxn = F (t, θ) lim h(x)dx h 0 h A x: T (x) t h h 0 h A x: T (x) t h → Z ∩{ | − |≤ } → Z ∩{ | − |≤ } d = F (t, θ) h(x)dx dt A x:T (x) t Z ∩{ ≤ } 1 If we let A be all of Rn, then we have the case of the denominator.
    [Show full text]
  • STAT 517:Sufficiency
    STAT 517:Sufficiency Minimal sufficiency and Ancillary Statistics. Sufficiency, Completeness, and Independence Prof. Michael Levine March 1, 2016 Levine STAT 517:Sufficiency Motivation I Not all sufficient statistics created equal... I So far, we have mostly had a single sufficient statistic for one parameter or two for two parameters (with some exceptions) I Is it possible to find the minimal sufficient statistics when further reduction in their number is impossible? I Commonly, for k parameters one can get k minimal sufficient statistics Levine STAT 517:Sufficiency Example I X1;:::; Xn ∼ Unif (θ − 1; θ + 1) so that 1 f (x; θ) = I (x) 2 (θ−1,θ+1) where −∞ < θ < 1 I The joint pdf is −n 2 I(θ−1,θ+1)(min xi ) I(θ−1,θ+1)(max xi ) I It is intuitively clear that Y1 = min xi and Y2 = max xi are joint minimal sufficient statistics Levine STAT 517:Sufficiency Occasional relationship between MLE's and minimal sufficient statistics I Earlier, we noted that the MLE θ^ is a function of one or more sufficient statistics, when the latter exists I If θ^ is itself a sufficient statistic, then it is a function of others...and so it may be a sufficient statistic 2 2 I E.g. the MLE θ^ = X¯ of θ in N(θ; σ ), σ is known, is a minimal sufficient statistic for θ I The MLE θ^ of θ in a P(θ) is a minimal sufficient statistic for θ ^ I The MLE θ = Y(n) = max1≤i≤n Xi of θ in a Unif (0; θ) is a minimal sufficient statistic for θ ^ ¯ ^ n−1 2 I θ1 = X and θ2 = n S of θ1 and θ2 in N(θ1; θ2) are joint minimal sufficient statistics for θ1 and θ2 Levine STAT 517:Sufficiency Formal definition I A sufficient statistic
    [Show full text]
  • Stats 512 513 Review ♥
    Stats 512 513 Review ♥ Eileen Burns, FSA, MAAA June 16, 2009 Contents 1 Basic Probability 4 1.1 Experiment, Sample Space, RV, and Probability . .... 4 1.2 Density, Value Number Urn Model, and Dice . ... 5 1.3 Expected Value Descriptive Parameters . ..... 5 1.4 More About Normal Distributions ............................... 5 1.5 Independent Trials and a Pictorial CLT . .... 5 1.6 The Population, the Sample, and Data ............................. 5 1.7 Elementary Probability, Stressing Independence . ... 5 1.8 Expectation, Variance, and the CLT . .... 6 1.9 Applications of the CLT . .... 7 2 Introduction to Statistics 8 2.1 PresentationofData ................................ ....... 8 2.2 Estimation of µ and σ2 ..................................... 8 2.3 Elementary Classical Statistics . ......... 8 2.4 Elementary Statistical Applications . ........ 9 3 Probability Models 10 3.1 MathFacts ........................................ 10 3.2 Combinatorics and Hypergeometric RVs . ...... 11 3.3 Independent Bernoulli Trials . 11 3.4 ThePoissonDistribution ............................. ....... 12 3.5 The Poisson Process N ...................................... 12 3.6 The Failure Rate Function λ( ) ................................. 13 · 3.7 Min, Max, Median, and Order Statistics . 13 3.8 Multinomial Distributions . ....... 14 3.9 Sampling from a Finite Populations, with a CLT ....................... 14 4 Dependent Random Variables 17 4.1 Two-Dimensional Discrete RVs . ..... 17 4.2 Two-Dimensional Continuous RVs . ..... 18 4.3 Conditional Expectation
    [Show full text]