Advanced Statistical Theory I Spring 2010

Advanced Statistical Theory I Spring 2010 Prof. J. Jin∗ Chris Almost† Contents Contents1 1 Exponential Families3 1.1 Introduction and examples......................... 3 1.2 Moments.................................... 4 1.3 Stein’s identity................................. 6 2 Statistics7 2.1 Sufficient statistics.............................. 7 2.2 Ancillary and complete statistics...................... 9 3 Unbiased Estimators 12 3.1 Uniform minimum variance unbiased estimators............ 12 3.2 Power series distributions.......................... 16 3.3 Non-parametric UMVUE........................... 17 4 Lower Bounds on Risk 18 4.1 Hellinger distance .............................. 19 4.2 Fisher’s information ............................. 23 4.3 The Cramér-Rao lower bound ....................... 25 5 Decision Theory 27 5.1 Game theory: the finite case........................ 28 5.2 Minimax rules, Bayes rules, and admissibility.............. 29 5.3 Conjugate Priors ............................... 31 5.4 Inadmissibility................................. 34 ∗[email protected] †[email protected] 1 2 CONTENTS 6 Hypothesis Testing 37 6.1 Simple tests: the Neyman-Pearson lemma................ 37 6.2 Complex tests................................. 39 6.3 UMPU tests................................... 44 6.4 Nuisance parameters............................. 46 7 Maximum Likelihood Estimators 51 7.1 Kullback-Leibler information........................ 53 7.2 Examples.................................... 54 7.3 Method of Moments ............................. 56 Index 59 Exponential Families 3 1 Exponential Families 1.1 Introduction and examples 1.1.1 Definition. A family of distributions is said to form an exponential family if the distribution Pθ (the parameter θ may be multi-dimensional) has density of the form s ! X pθ (x) = exp ηi(θ)Ti(x) B(θ) h(x). i=1 − This decomposition is not unique, but B is defined uniquely up to constant multi- ple. Write ηi = ηi(θ) and write s ! X p(x η) = exp ηi Ti(x) A(η) h(x), j i=1 − where A(η) = B(θ). 1.1.2 Examples. 2 (i) The family of normal distributions has parameters (ξ, σ ) and density functions 1 x ξ 2 ξ 1 ξ2 ( ) 2 p 2 pθ (x) = exp − 2 = exp 2 x 2 x 2 log 2πσ p2πσ − 2σ σ − 2σ − 2σ − T1(x) = x 2 T2(x) = x ξ2 p B ξ, σ2 log 2πσ2 ( ) = 2σ2 + ξ η 1 = σ2 1 η2 = 2 −2σ 2 r η1 π A(η) = + log . −4η2 −η2 (ii) For the Poisson distribution, e θ θ x 1 − pθ (x) = = exp x log θ θ x! − x! T(x) = x 1 h x ( ) = x! B(θ) = θ η(θ) = log θ η A(η) = e . 4 AST I (iii) For the binomial distribution, n x n x θ n pθ (x) = θ (1 θ) − = exp x log + n log(1 θ) x − 1 θ − x − T(x) = x n h x ( ) = x B(θ) = n log(1 θ) − θ − η θ log ( ) = 1 θ − 1 A(η) = n log η − e + 1 Suppose we have n independent samples X1,..., X n from Pθ in an exponential family. Individually, the density functions are pθ (xk), so the joint density function is n n s ! Y Y X pθ (xk) = exp ηi(θ)Ti(xk) A(θ) h(xk) k=1 k=1 i=1 − s n ! ! X X = exp ηi Ti(xk) nA(θ) h(x1) h(xn) i=1 k=1 − ··· s ! X ˜ =: exp n ηi Ti(x) nA(θ) h(x1) h(xn) i=1 − ··· where T˜ x : 1 Pn T x . i( ) = n k=1 i( k) 2 1.1.3 Example. If we have n independent samples from a N(ξ, σ ) distribution then T˜ x x and T˜ x 1 Pn x 2, so the joint density is 1( ) = 2( ) = n k=1 k nξ n nξ2 p ˜ 2 exp 2 x 2 T2 2 n log 2πσ . σ − 2σ − 2σ − 1.2 Moments Moments are important in large deviation theory. Suppose we have samples 2 X1,..., X n from a distribution with first two moments µ and σ . Then by the (d) central limit theorem pn(X µ) N 0, 1 , so σ− ( ) −! pn X µ σt P j − j t = P X µ = 2Φ(t) σ ≥ j − j ≥ pn t2=2 The Mill’s ratio is Φ(t) φ(t) for large t, where φ(t) = ce− . A small deviation in this case is on the order∼ λ σt O 1 . A moderate deviation is on the order n = ( n ) ≡ p p Moments 5 log n log n λ = O(p ) and a large deviation is λ p . This theory was developed pn pn by Hoeffding, Bernstien, et al. (see Shorack and Wellner Empirical processes and applications.) Where do moments come in? Later we will see that similar results can be derived from the higher moments. 1.2.1 Definition. The (generalized) moment generating function of an exponential s family is defined in terms of T1(x),..., Ts(x). The m.g.f. is MT : R R defined by ! MT (u1,..., us) := E[exp(u1 T1(X ) + + us Ts(X ))] r1 ··· r X u u s β 1 s = r1,...,rs ··· r1! rs! r1,...,rs 0 ≥ ··· It can be shown that if MT is defined in a neighbourhood of the origin then it is smooth and β α : T X r1 T X rs . r1,...,rs = r1,...,rs = E[ 1( ) s( ) ] ··· 2 th th 1.2.2 Example. Suppose X N(ξ, σ ). Then the k centred moment, or k cumulant, is ∼ k k k X ξ E[(X ξ) ] = σ E − , − σ and the latter is the kth moment of a standard normal distribution, so it suffices to find those quantities. For Z N(0, 1), we calculate the usual m.g.f. and read off the moments. ∼ Z 2k 2k ux u2=2 X1 u X1 u MZ (u) = e φ(x)d x = e = = (k 1)(k 3) 1 . 2k k! 2k ! k=1 k=1 − − ··· ( ) From this, αk = 0 if k is odd and is (k 1)(k 3) 1 if k is even. − − ··· 1.2.3 Definition. Harold Cramer introduced the (generalized) cumulant generating function, r1 r X u u s K u : log M u K 1 s T ( ) = T ( ) = r1,...,rs ··· r1! rs! r1,...,rs 0 ≥ ··· The first few cumulants can be found in terms of the first few moments, and visa versa, by expanding the power series and equating terms. For an exponential family, the c.g.f. is generally quite simple. 1.2.4 Theorem. For an exponential family with s ! X p(x η) = exp ηi Ti(x) A(η) h(x), j i=1 − for η in the interior of the set over which it is defined, MT and KT both exist in some neighbourhood of the origin and are given by KT (u) = A(η + u) A(η) and KT (u) A(η+u) A(η) − MT (u) = e = e =e . 6 AST I 1.2.5 Example. The generalized cumulant generating function for the normal 2 family N(ξ, σ ) is as follows. It can be seen that we recover the usual cumulant generating function by setting u2 = 0. 2 2 p p η1 (η1 + u1) K(u1, u2) = log η2 log η2 u2 + − − − − 4η2 − 4(η2 + u2) r r 2 ξ u 2 1 1 ξ ( σ2 + 1) = log log u2 + 2σ2 2σ2 2σ2 4 1 u − − − ( 2σ2 2) − Recall that the characteristic function of a random variable X is Z iux 'X (u) := e fX (x)d x. R If 'X (u) du < then the Fourier transform can be inverted j j 1 1 Z f x ' u e iux du. X ( ) = 2π X ( ) − The inversion formula for the usual m.g.f. is more complicated. Under some con- ditions, pointwise, Z x+iγ xu fX x lim MX u e du. ( ) = γ ( ) !1 x iγ − On the other hand, for an exponential family the associated m.g.f. is MT (u) = E[exp(u T(X ))]. For example, for the normal family the m.g.f. is given by · Z 2 2 1 (x ξ) u1 x+u2 x − 2 e e− 2σ d x. p2πσ 1.3 Stein’s identity 1.3.1 Example (Stein’s shrinkage). This example is discussed in two papers, one in 1958 by Stein and one in 1962 by James and Stein. If X N(µ, Ip) with p 3 This is likely incorrect. It does not then the best estimator for µ is not X but is ∼ ≥ match with the “Stein’s identity” given in a later lecture. One or both may X need updating. µ^ = 1 c p 2 X−X − · for some 0 < c < 1. In these papers the following useful identity was used. 1.3.2 Lemma. Let X be distributed as an element of an exponential family s ! X p(x η) = exp ηi Ti(x) A(η) h(x). j i=1 − Statistics 7 If g is any differentiable function such that E[g0(X )] < then 1 2 s ! 3 h0(X ) X 4 ηi T X g X 5 g X E h X + i0( ) ( ) = E[ 0( )] ( ) i=1 − This is a generalization of the well-known normal-integration-by-parts rule. Indeed, in the normal case the lemma gives ξ 1 1 E[g0(X )] = E 2 2 (2X ) g(X ) = 2 E[(X ξ)g(X )]. − σ − 2σ σ − Consider some particular examples. If g 1 then we see E[(X ξ)] = 0. If 2 g(x) = x then E[(X ξ)X ] = σ , giving the≡ second moment. Higher− moments can be calculated similarly.− Further motivation for finding higher moments is obtained by noting that g(X ) = g(µ) + g0(µ)(X µ) + − ··· and also in applications of Chebyshev’s inequality, k [ X µ ] X t E .

Advanced Statistical Theory I Spring 2010

Clinical Trial Design As a Decision Problem

Structural Equation Models and Mixture Models with Continuous Non-Normal Skewed Distributions

On Curved Exponential Families

Maximum Likelihood Characterization of Distributions

DISTRIBUTIONS for ACTUARIES David Bahnemann

(Severini 1.2) (I) Defn: a Parametric Family {P

Compact Parametric Models for Efficient Sequential Decision

Chapter 1 the Likelihood

Lecture 7, Review Data Reduction 1. Sufficient Statistics

Hypothesis Testing Econ 270B.Pdf

A Flexible Parametric Family for the Modeling and Simulation of Yield Distributions

Lecture Notes on Advanced Statistical Theory1