Chapter 4 Sufficient Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 4 Sufficient Statistics Chapter 4 Sufficient Statistics 4.1 Statistics and Sampling Distributions Suppose that the data Y1,...,Yn is drawn from some population. The ob- served data is Y1 = y1,...,Yn = yn where y1,....,yn are numbers. Let y = (y1,...,yn). Real valued functions T (y1,...,yn) = T (y) are of interest as are vector valued functions T (y)=(T1(y),...,Tk(y)). Sometimes the data Y 1,..., Y n are random vectors. Again interest is in functions of the data. Typically the data has a joint pdf or pmf f(y ,...,y θ) where the vector of 1 n| unknown parameters is θ =(θ1,...,θk). (In the joint pdf or pmf, the y1,...,yn are dummy variables, not the observed data.) Definition 4.1. A statistic is a function of the data that does not depend on any unknown parameters. The probability distribution of the statistic is called the sampling distribution of the statistic. Let the data Y = (Y1,...,Yn) where the Yi are random variables. If T (y1,...,yn) is a real valued function whose domain includes the sample space of Y , then W = T (Y1,...,Yn) is a statistic provided that T does not dependY on any unknown parameters. The data comes from some probability distribution and the statistic is a random variable and hence also comes from some probability distribution. To avoid confusing the distribution of the statistic with the distribution of the data, the distribution of the statistic is called the sampling distribution of the statistic. If the observed data is Y1 = y1,...,Yn = yn, then the observed value of the statistic is W = w = T (y1,...,yn). Similar remarks apply when the statistic T is vector valued and 106 when the data Y 1,..., Y n are random vectors. Often Y1,...,Yn will be iid and statistics of the form n n aiYi and t(Yi) i=1 i=1 X X are especially important. Chapter 10 and Theorems 2.17, 2.18, 3.6 and 3.7 are useful for finding the sampling distributions of some of these statistics when the Yi are iid from a given brand name distribution that is usually an exponential family. The following example lists some important statistics. Example 4.1. Let the Y1,...,Yn be the data. a) The sample mean n Y Y = i=1 i . (4.1) n P b) The sample variance n (Y Y )2 n Y 2 n(Y )2 S2 S2 = i=1 i − = i=1 i − . (4.2) ≡ n n 1 n 1 P − P − 2 c) The sample standard deviation S Sn = Sn. d) If the data Y ,...,Y is arranged≡ in ascending order from smallest to 1 n p largest and written as Y Y , then Y is the ith order statistic (1) ≤··· ≤ (n) (i) and the Y(i)’s are called the order statistics. e) The sample median MED(n)= Y((n+1)/2) ifnisodd, (4.3) Y + Y MED(n)= (n/2) ((n/2)+1) if n is even. 2 f) The sample median absolute deviation or median deviation is MAD(n) = MED( Y MED(n) , i = 1,...,n). (4.4) | i − | g) The sample maximum max(n) = Y(n) (4.5) and the observed max y(n) is the largest value of the observed data. 107 h) The sample minimum min(n) = Y(1) (4.6) and the observed min y(1) is the smallest value of the observed data. Example 4.2. Usually the term “observed” is dropped. Hence below “data” is “observed data”, “observed order statistics” is “order statistics” and “observed value of MED(n)” is “MED(n).” Let the data be 9, 2, 7, 4, 1, 6, 3, 8, 5 (so Y1 = y1 = 9,...,Y9 = y9 = 5). Then the order statistics are 1, 2, 3, 4, 5, 6, 7, 8, 9. Then MED(n) = 5 and MAD(n)=2=MED 0, 1, 1, 2, 2, 3, 3, 4, 4 . { } 2 Example 4.3. Let the Y1,...,Yn be iid N(µ, σ ). Then n (Y µ)2 T = i=1 i − n n P is a statistic iff µ is known. The following theorem is extremely important and the proof follows Rice (1988, p. 171-173) closely. 2 Theorem 4.1. Let the Y1,...,Yn be iid N(µ, σ ). a) The sample mean Y N(µ, σ2/n). ∼ b) Y and S2 are independent. c) (n 1)S2/σ2 χ2 . Hence n (Y Y )2 σ2χ2 . − ∼ n−1 i=1 i − ∼ n−1 Proof. a) follows from TheoremP 2.17e. b) The moment generating function of (Y , Y Y,...,Y Y ) is 1 − n − m(s,t ,...,t )= E(exp[sY + t (Y Y )+ + t (Y Y )]). 1 n 1 1 − ··· n n − By Theorem 2.22, Y and (Y Y,...,Y Y ) are independent if 1 − n − m(s,t1,...,tn)= mY (s) m(t1,...,tn) where mY (s) is the mgf of Y and m(t1,...,tn) isthemgfof (Y1 Y,...,Yn Y ). Now − − n n n n t (Y Y )= t Y Y nt = t Y tY i i − i i − i i − i i=1 i=1 i=1 i=1 X X X X 108 and thus n n s n sY + t (Y Y )= [ +(t t )]Y = a Y . i i − n i − i i i i=1 i=1 i=1 X X X Now n a = n [ s +(t t )] = s and i=1 i i=1 n i − P n Pn s2 s s2 n a2 = [ + 2 (t t )+(t t )2]= + (t t )2. i n2 n i − i − n i − i=1 i=1 i=1 X X X Hence n n m(s,t ,...,t )= E(exp[sY + t (Y Y )]) = E[exp( a Y )] 1 n i i − i i i=1 i=1 X X n = mY1,...,Yn (a1,...,an)= mYi (ai) i=1 Y since the Yi are independent. Now n n 2 n 2 n σ 2 σ 2 m i (a )= exp µa + a = exp µ a + a Y i i 2 i i 2 i i=1 i=1 i=1 i=1 ! Y Y X X σ2 s2 σ2 n = exp µs + + (t t )2 2 n 2 i − " i=1 # X σ2 σ2 n = exp µs + s2 exp (t t )2 . 2n 2 i − " i=1 # X Now the first factor is the mgf of Y and the second factor is m(t1,...,tn) = m(0,t1,...,tn) since the mgf of the marginal is found from the mgf of the joint distribution by setting all terms not in the marginal to 0 (ie set s = 0 in m(s,t1,...,tn) to find m(t1,...,tn)). Hence the mgf factors and Y (Y Y,...,Y Y ). 1 − n − Since S2 is a function of (Y Y,...,Y Y ), it is also true that Y S2. 1 − n − 109 c) (Y µ)/σ N(0, 1) so (Y µ)2/σ2 χ2 and i − ∼ i − ∼ 1 1 n (Y µ)2 χ2 . σ2 i − ∼ n i=1 X Now n n n (Y µ)2 = (Y Y + Y µ)2 = (Y Y )2 + n(Y µ)2. i − i − − i − − i=1 i=1 i=1 X X X Hence 2 1 n 1 n Y µ W = (Y µ)2 = (Y Y )2 + − = U + V. σ2 i − σ2 i − σ/√n i=1 i=1 X X Since U V by b), m (t)= m (t) m (t). Since W χ2 and V χ2, W U V ∼ n ∼ 1 m (t) (1 2t)−n/2 m (t)= W = − = (1 2t)−(n−1)/2 U m (t) (1 2t)−1/2 − V − 2 which is the mgf of a χn−1 distribution. QED Theorem 4.2. Let the Y1,...,Yn be iid with cdf FY and pdf fY . a) The pdf of T = Y(n) is n−1 fY(n) (t)= n[FY (t)] fY (t). b) The pdf of T = Y(1) is f (t)= n[1 F (t)]n−1f (t). Y(1) − Y Y c) Let 2 r n. Then the joint pdf of Y , Y ,...,Y is ≤ ≤ (1) (2) (r) n! r f (t ,...,t )= [1 F (t )]n−r f (t ). Y(1),...,Y(r) 1 r (n r)! − Y r Y i i=1 − Y Proof of a) and b). a) The cdf of Y(n) is n F (t)= P (Y t)= P (Y t,...,Y t)= P (Y t)=[F (t)]n. Y(n) (n) ≤ 1 ≤ n ≤ i ≤ Y i=1 Y 110 Hence the pdf of Y(n) is d d F (t)= [F (t)]n = n[F (t)]n−1f (t). dt Y(n) dt Y Y Y b) The cdf of Y(1) is F (t)= P (Y t) = 1 P (Y >t) = 1 P (Y > t,...,Y >t) Y(1) (1) ≤ − (1) − 1 n n = 1 P (Y >t) = 1 [1 F (t)]n. − i − − Y i=1 Y Hence the pdf of Y(n) is d d F (t)= (1 [1 F (t)]n)= n[1 F (t)]n−1f (t). QED dt Y(n) dt − − Y − Y Y To see that c) may be true, consider the following argument adapted from Mann, Schafer and Singpurwalla (1974, p. 93). Let ∆ti be a small positive number and notice that P (E) ≡ P (t1 < Y(1) <t1 +∆t1,t2 < Y(2) <t2 +∆t2,...,tr < Y(r) <tr +∆tr) tr+∆tr t1+∆t1 = f (w ,...,w )dw dw ··· Y(1),...,Y(r) 1 r 1 ··· r Ztr Zt1 r f (t ,...,t ) ∆t .
Recommended publications
  • 5. Completeness and Sufficiency 5.1. Complete Statistics. Definition 5.1. a Statistic T Is Called Complete If Eg(T) = 0 For
    5. Completeness and sufficiency 5.1. Complete statistics. Definition 5.1. A statistic T is called complete if Eg(T ) = 0 for all θ and some function g implies that P (g(T ) = 0; θ) = 1 for all θ. This use of the word complete is analogous to calling a set of vectors v1; : : : ; vn complete if they span the whole space, that is, any v can P be written as a linear combination v = ajvj of these vectors. This is equivalent to the condition that if w is orthogonal to all vj's, then w = 0. To make the connection with Definition 5.1, let's consider the discrete case. Then completeness means that P g(t)P (T = t; θ) = 0 implies that g(t) = 0. Since the sum may be viewed as the scalar prod- uct of the vectors (g(t1); g(t2);:::) and (p(t1); p(t2);:::), with p(t) = P (T = t), this is the analog of the orthogonality condition just dis- cussed. We also see that the terminology is somewhat misleading. It would be more accurate to call the family of distributions p(·; θ) complete (rather than the statistic T ). In any event, completeness means that the collection of distributions for all possible values of θ provides a sufficiently rich set of vectors. In the continuous case, a similar inter- pretation works. Completeness now refers to the collection of densities f(·; θ), and hf; gi = R fg serves as the (abstract) scalar product in this case. Example 5.1. Let's take one more look at the coin flip example.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Lecture 4: Sufficient Statistics 1 Sufficient Statistics
    ECE 830 Fall 2011 Statistical Signal Processing instructor: R. Nowak Lecture 4: Sufficient Statistics Consider a random variable X whose distribution p is parametrized by θ 2 Θ where θ is a scalar or a vector. Denote this distribution as pX (xjθ) or p(xjθ), for short. In many signal processing applications we need to make some decision about θ from observations of X, where the density of X can be one of many in a family of distributions, fp(xjθ)gθ2Θ, indexed by different choices of the parameter θ. More generally, suppose we make n independent observations of X: X1;X2;:::;Xn where p(x1 : : : xnjθ) = Qn i=1 p(xijθ). These observations can be used to infer or estimate the correct value for θ. This problem can be posed as follows. Let x = [x1; x2; : : : ; xn] be a vector containing the n observations. Question: Is there a lower dimensional function of x, say t(x), that alone carries all the relevant information about θ? For example, if θ is a scalar parameter, then one might suppose that all relevant information in the observations can be summarized in a scalar statistic. Goal: Given a family of distributions fp(xjθ)gθ2Θ and one or more observations from a particular dis- tribution p(xjθ∗) in this family, find a data compression strategy that preserves all information pertaining to θ∗. The function identified by such strategyis called a sufficient statistic. 1 Sufficient Statistics Example 1 (Binary Source) Suppose X is a 0=1 - valued variable with P(X = 1) = θ and P(X = 0) = 1 − θ.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Minimum Rates of Approximate Sufficient Statistics
    1 Minimum Rates of Approximate Sufficient Statistics Masahito Hayashi,y Fellow, IEEE, Vincent Y. F. Tan,z Senior Member, IEEE Abstract—Given a sufficient statistic for a parametric family when one is given X, then Y is called a sufficient statistic of distributions, one can estimate the parameter without access relative to the family fPXjZ=zgz2Z . We may then write to the data. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, X for n independent samples drawn from a k-nomial distribution PXjZ=z(x) = PXjY (xjy)PY jZ=z(y); 8 (x; z) 2 X × Z with d = k − 1 degrees of freedom, the length of the code scales y2Y as d log n + O(1). In many applications, we may not have a (1) useful notion of sufficient statistics (e.g., when the parametric or more simply that X (−− Y (−− Z forms a Markov chain family is not an exponential family) and we also may not need in this order. Because Y is a function of X, it is also true that to reconstruct the generating distribution exactly. By adopting I(Z; X) = I(Z; Y ). This intuitively means that the sufficient a Shannon-theoretic approach in which we allow a small error in estimating the generating distribution, we construct various statistic Y provides as much information about the parameter approximate sufficient statistics and show that the code length Z as the original data X does. d can be reduced to 2 log n + O(1). We consider errors measured For concreteness in our discussions, we often (but not according to the relative entropy and variational distance criteria.
    [Show full text]
  • The Likelihood Function - Introduction
    The Likelihood Function - Introduction • Recall: a statistical model for some data is a set { f θ : θ ∈ Ω} of distributions, one of which corresponds to the true unknown distribution that produced the data. • The distribution fθ can be either a probability density function or a probability mass function. • The joint probability density function or probability mass function of iid random variables X1, …, Xn is n θ ()1 ,..., n = ∏ θ ()xfxxf i . i=1 week 3 1 The Likelihood Function •Let x1, …, xn be sample observations taken on corresponding random variables X1, …, Xn whose distribution depends on a parameter θ. The likelihood function defined on the parameter space Ω is given by L|(θ x1 ,..., xn ) = θ f( 1,..., xn ) x . • Note that for the likelihood function we are fixing the data, x1,…, xn, and varying the value of the parameter. •The value L(θ | x1, …, xn) is called the likelihood of θ. It is the probability of observing the data values we observed given that θ is the true value of the parameter. It is not the probability of θ given that we observed x1, …, xn. week 3 2 Examples • Suppose we toss a coin n = 10 times and observed 4 heads. With no knowledge whatsoever about the probability of getting a head on a single toss, the appropriate statistical model for the data is the Binomial(10, θ) model. The likelihood function is given by • Suppose X1, …, Xn is a random sample from an Exponential(θ) distribution. The likelihood function is week 3 3 Sufficiency - Introduction • A statistic that summarizes all the information in the sample about the target parameter is called sufficient statistic.
    [Show full text]
  • Statistical Inference
    GU4204: Statistical Inference Bodhisattva Sen Columbia University February 27, 2020 Contents 1 Introduction5 1.1 Statistical Inference: Motivation.....................5 1.2 Recap: Some results from probability..................5 1.3 Back to Example 1.1...........................8 1.4 Delta method...............................8 1.5 Back to Example 1.1........................... 10 2 Statistical Inference: Estimation 11 2.1 Statistical model............................. 11 2.2 Method of Moments estimators..................... 13 3 Method of Maximum Likelihood 16 3.1 Properties of MLEs............................ 20 3.1.1 Invariance............................. 20 3.1.2 Consistency............................ 21 3.2 Computational methods for approximating MLEs........... 21 3.2.1 Newton's Method......................... 21 3.2.2 The EM Algorithm........................ 22 1 4 Principles of estimation 23 4.1 Mean squared error............................ 24 4.2 Comparing estimators.......................... 25 4.3 Unbiased estimators........................... 26 4.4 Sufficient Statistics............................ 28 5 Bayesian paradigm 33 5.1 Prior distribution............................. 33 5.2 Posterior distribution........................... 34 5.3 Bayes Estimators............................. 36 5.4 Sampling from a normal distribution.................. 37 6 The sampling distribution of a statistic 39 6.1 The gamma and the χ2 distributions.................. 39 6.1.1 The gamma distribution..................... 39 6.1.2 The Chi-squared distribution.................. 41 6.2 Sampling from a normal population................... 42 6.3 The t-distribution............................. 45 7 Confidence intervals 46 8 The (Cramer-Rao) Information Inequality 51 9 Large Sample Properties of the MLE 57 10 Hypothesis Testing 61 10.1 Principles of Hypothesis Testing..................... 61 10.2 Critical regions and test statistics.................... 62 10.3 Power function and types of error.................... 64 10.4 Significance level............................
    [Show full text]
  • Lecture 14: Order Statistics; Conditional Expectation
    Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 14: Order Statistics; Conditional Expectation Relevant textbook passages: Pitman [5]: Section 4.6 Larsen–Marx [2]: Section 3.11 14.1 Order statistics Pitman [5]: Given a random vector (X(1,...,Xn) on the probability) space (S, E,P ), for each s ∈ S, sort the § 4.6 components into a vector X(1)(s),...,X(n)(s) satisfying X(1)(s) 6 X(2)(s) 6 ··· 6 X(n)(s). The vector (X(1),...,X(n)) is called the vector of order statistics of (X1,...,Xn). Equivalently, { } X(k) = min max{Xj : j ∈ J} : J ⊂ {1, . , n} & |J| = k . Order statistics play an important role in the study of auctions, among other things. 14.2 Marginal Distribution of Order Statistics From here on, we shall assume that the original random variables X1,...,Xn are independent and identically distributed. Let X1,...,Xn be independent and identically distributed random variables with common cumulative distribution function F ,( and let (X) (1),...,X(n)) be the vector of order statistics of X1,...,Xn. By breaking the event X(k) 6 x into simple disjoint subevents, we get ( ) X(k) 6 x = ( ) X(n) 6 x ∪ ( ) X(n) > x, X(n−1) 6 x . ∪ ( ) X(n) > x, . , X(j+1) > x, X(j) 6 x . ∪ ( ) X(n) > x, . , X(k+1) > x, X(k) 6 x . 14–1 Ma 3/103 Winter 2017 KC Border Order Statistics; Conditional Expectation 14–2 Each of these subevents is disjoint from the ones above it, and each has a binomial probability: ( ) X(n) > x, .
    [Show full text]
  • Medians and Order Statistics
    Medians and Order Statistics CLRS Chapter 9 1 What Are Order Statistics? The k-th order statistic is the k-th smallest element of an array. 3 4 13 14 23 27 41 54 65 75 8th order statistic ⎢n⎥ ⎢ ⎥ The lower median is the ⎣ 2 ⎦ -th order statistic ⎡n⎤ ⎢ ⎥ The upper median is the ⎢ 2 ⎥ -th order statistic If n is odd, lower and upper median are the same 3 4 13 14 23 27 41 54 65 75 lower median upper median What are Order Statistics? Selecting ith-ranked item from a collection. – First: i = 1 – Last: i = n ⎢n⎥ ⎡n⎤ – Median(s): i = ⎢ ⎥,⎢ ⎥ ⎣2⎦ ⎢2⎥ 3 Order Statistics Overview • Assume collection is unordered, otherwise trivial. find ith order stat = A[i] • Can sort first – Θ(n lg n), but can do better – Θ(n). • I can find max and min in Θ(n) time (obvious) • Can we find any order statistic in linear time? (not obvious!) 4 Order Statistics Overview How can we modify Quicksort to obtain expected-case Θ(n)? ? ? Pivot, partition, but recur only on one set of data. No join. 5 Using the Pivot Idea • Randomized-Select(A[p..r],i) looking for ith o.s. if p = r return A[p] q <- Randomized-Partition(A,p,r) k <- q-p+1 the size of the left partition if i=k then the pivot value is the answer return A[q] else if i < k then the answer is in the front return Randomized-Select(A,p,q-1,i) else then the answer is in the back half return Randomized-Select(A,q+1,r,i-k) 6 Randomized Selection • Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = O(n2) • No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10)
    [Show full text]
  • An Empirical Test of the Sufficient Statistic Result for Monetary Shocks
    An Empirical Test of the Sufficient Statistic Result for Monetary Shocks Andrea Ferrara Advisor: Prof. Francesco Lippi Thesis submitted to Einaudi Institute for Economics and Finance Department of Economics, LUISS Guido Carli to satisfy the requirements of the Master in Economics and Finance June 2020 Acknowledgments I thank my advisor Francesco Lippi for his guidance, availability and patience; I have been fortunate to have a continuous and close discussion with him throughout my master thesis’ studies and I am grateful for his teachings. I also thank Erwan Gautier and Herve´ Le Bihan at the Banque de France for providing me the data and for their comments. I also benefited from the comments of Felipe Berrutti, Marco Lippi, Claudio Michelacci, Tommaso Proietti and workshop participants at EIEF. I am grateful to the entire faculty of EIEF and LUISS for the teachings provided during my master. I am thankful to my classmates for the time spent together studying and in particular for their friendship. My friends in Florence have been a reference point in hard moments. Finally, I thank my family for their presence in any circumstance during these years. Abstract We empirically test the sufficient statistic result of Alvarez, Lippi and Oskolkov (2020). This theoretical result predicts that the cumulative effect of a monetary shock is summarized by the ratio of two steady state moments: frequency and kurtosis of price changes. Our strategy consists of three steps. In the first step, we employ a Factor Augmented VAR to estimate the response of different sectors to a monetary shock. In the second step, using microdata, we compute the sectorial frequency and the kurtosis of price changes.
    [Show full text]
  • Exponential Families and Theoretical Inference
    EXPONENTIAL FAMILIES AND THEORETICAL INFERENCE Bent Jørgensen Rodrigo Labouriau August, 2012 ii Contents Preface vii Preface to the Portuguese edition ix 1 Exponential families 1 1.1 Definitions . 1 1.2 Analytical properties of the Laplace transform . 11 1.3 Estimation in regular exponential families . 14 1.4 Marginal and conditional distributions . 17 1.5 Parametrizations . 20 1.6 The multivariate normal distribution . 22 1.7 Asymptotic theory . 23 1.7.1 Estimation . 25 1.7.2 Hypothesis testing . 30 1.8 Problems . 36 2 Sufficiency and ancillarity 47 2.1 Sufficiency . 47 2.1.1 Three lemmas . 48 2.1.2 Definitions . 49 2.1.3 The case of equivalent measures . 50 2.1.4 The general case . 53 2.1.5 Completeness . 56 2.1.6 A result on separable σ-algebras . 59 2.1.7 Sufficiency of the likelihood function . 60 2.1.8 Sufficiency and exponential families . 62 2.2 Ancillarity . 63 2.2.1 Definitions . 63 2.2.2 Basu's Theorem . 65 2.3 First-order ancillarity . 67 2.3.1 Examples . 67 2.3.2 Main results . 69 iii iv CONTENTS 2.4 Problems . 71 3 Inferential separation 77 3.1 Introduction . 77 3.1.1 S-ancillarity . 81 3.1.2 The nonformation principle . 83 3.1.3 Discussion . 86 3.2 S-nonformation . 91 3.2.1 Definition . 91 3.2.2 S-nonformation in exponential families . 96 3.3 G-nonformation . 99 3.3.1 Transformation models . 99 3.3.2 Definition of G-nonformation . 103 3.3.3 Cox's proportional risks model .
    [Show full text]
  • 1 Sufficient Statistic Theorem
    Mathematical Statistics (NYU, Spring 2003) Summary (answers to his potential exam questions) By Rebecca Sela 1Sufficient statistic theorem (1) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetT (X1, ..., Xn) be asufficient statistic for θ with continuous factor function F (T (X1,...,Xn),θ). Then, P (X A T (X )=t) = lim P (X A (T (X ) t h) ∈ | h 0 ∈ | − ≤ → ¯ ¯ P (X A, (T (X ) t h¯)/h ¯ ¯ ¯ = lim ∈ − ≤ h 0 ¯ ¯ → P ( T (X¯ ) t h¯)/h ¯ − ≤ ¯ d ¯ ¯ P (X A,¯ T (X ) ¯t) = dt ∈ ¯ ≤ ¯ d P (T (X ) t) dt ≤ Consider first the numerator: d d P (X A, T (X ) t)= f(x1,θ)...f(xn,θ)dx1...dxn dt ∈ ≤ dt A x:T (x)=t Z ∩{ } d = F (T (x),θ),h(x)dx1...dxn dt A x:T (x)=t Z ∩{ } 1 = lim F (T (x),θ),h(x)dx1...dxn h 0 h A x: T (x) t h → Z ∩{ | − |≤ } Since mins [t,t+h] F (s, θ) F (t, θ) maxs [t,t+h] on the interval [t, t + h], we find: ∈ ≤ ≤ ∈ 1 1 lim (min F (s, θ)) h(x)dx lim F (T (x),θ)h(x)dx h 0 s [t,t+h] h A x: T (x) t h ≤ h 0 h A x: T (x) t h → ∈ Z ∩{ k − k≤ } → Z ∩{ k − k≤ } 1 lim (max F (s, θ)) h(x)dx ≤ h 0 s [t,t+h] h A x: T (x) t h → ∈ Z ∩{ k − k≤ } 1 By the continuity of F (t, θ), limh 0(mins [t,t+h] F (s, θ)) h h(x)dx = → ∈ A x: T (x) t h 1 ∩{ k − k≤ } limh 0(maxs [t,t+h] F (s, θ)) h A x: T (x) t h h(x)dx = F (t, θ).Thus, → ∈ ∩{ k − k≤ } R R 1 1 lim F (T (x),θ),h(x)dx1...dxn = F (t, θ) lim h(x)dx h 0 h A x: T (x) t h h 0 h A x: T (x) t h → Z ∩{ | − |≤ } → Z ∩{ | − |≤ } d = F (t, θ) h(x)dx dt A x:T (x) t Z ∩{ ≤ } 1 If we let A be all of Rn, then we have the case of the denominator.
    [Show full text]