1 the Delta Method; a General Approach to Handling Nonlinear Functions
Total Page:16
File Type:pdf, Size:1020Kb
The Delta method and Fieller’s technique for a confidence interval for a ratio. 1 The delta method; a general approach to handling nonlinear functions We provide here a general discussion of what is commonly known as the delta method, which provides a method for approximating the mean and variance of non-linear functions of random quantities. The presentation is completely general, with applications in the regression context appearing in later sections. 2 Consider random variables W1,...,WK , with E(Wk) = µk, V (Wk) = σk and cov(Wk,Wj) = σkj. Write ΣW 2 for the variance covariance matrix and µ for the mean vector. As before σjj and σj are the same thing. T = g(W1,...,WK ) = g(W ). Unless g is linear we cannot get the mean and variance of T from just the given moments of the W ’s. Using a second order Taylor Series expansion of g around µ1, . , µK leads to X 2 X X E(T ) ≈ g(µ) + ejσj /2 + ejkσjk, j j k>j 2 2 2 where ej = ∂ g(W )/∂ Wj |W =µ and ejk = ∂ g(W )/∂Wj∂Wk |W =µ. This provides an approximate bias for g(W ) as an estimator of g(µ). Also, 0 X 2 2 X X V (T ) ≈ d ΣW d = dj σj + 2 djdkσjk j j k>j 0 where d = (d1, . dK ), with dj = ∂g(W )/∂Wj |W =µ. This way of approximating the variance is often referred to as the delta method. ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ Ratios: W1 = θ1, W2 = θ2, V (θ1) = σ11, V (θ2) = σ22 and cov(θ1, θ2) = σ12. T = θ1/θ2, ρ = θ1/θ2, 2 d1 = 1/θ2 and d2 = −θ1/θ2 leading to an approximate variance of 2 2 2 σ11 + ρ σ22 − 2ρσ12 V (T ) ≈ d1σ11 + d2σ22 + 2d1d2σ12 = 2 . θ2 1 2 E(T ) ≈ ρ + 2 (ρσ2 − σ12), θ2 −2 2 so θ2 (ρσ2 − σ12) is an approximation to the bias. 2 Confidence Sets for a ratio Not counting bootstrapping, which we consider soon, there are two general approaches to getting a confidence “set” for a ratio. i) Use what is known as Fieller’s theorem. In a number of problems this will lead to an exact confidence set. In particular with linear regression with either a constant variance or known weights and normality this will lead to an exact confidence set. ii) Obtain an approximate standard error (se) and create an interval usingx ˆ0 ± t(se) where t is a suitable table value. 1 The Fieller method is preferred for ratios but we will also describe the second method as the general approach can be used for any nonlinear functions of the parameters. Fieller’s theorem is used in finding a confidence set for a ratio of parameters, ρ = θ1/θ2. ˆ ˆ ˆ ˆ In general there are two statistics θ1 and θ2 which estimate θ1 and θ2 respectively. It is assumed that (θ1, θ2) ˆ follows either exactly or approximately, a bivariate normal distribution with mean (θ1, θ2) with σ11 = var(θ1), ˆ ˆ ˆ σ22 = var(θ2), and σ12 = cov(θ1, θ2). An estimate of the covariance matrix is available withσ ˆij denoting ˆ ˆ the estimate of σij. In the original, exact form of Fieller’s theorem (Fieller 1940) (θ1, θ2) is exactly normally 2 ˆ ˆ distributed with σij = σ vij where the vij are known constants, and there is aσ ˆ, independent of (θ1, θ2), such that dσˆ2/σ2 follows a chi-square distribution with d degrees of freedom. This leads to ˆ ˆ 2 1/2 H(ρ) = (θ1 − ρθ2)/(ˆσ11 − 2ρσˆ12 + ρ σˆ22) following exactly a t-distribution with d degrees of freedom. This arises when estimating a ratio of linear ˆ ˆ combinations in a normal linear model; see Zerbe (1978). In other contexts (θ1, θ2) is only approximately normal and theσ ˆijs are consistent estimators of the σijs leading to H(ρ) being approximately t-distributed with d degrees of freedom; where d = ∞ designates a standard normal distribution. With t1−α/2(d) denoting the 100(1 − α/2)th percentile of the t distribution with d degrees of freedom, 2 2 P (H(ρ) ≤ t1−α/2(d) ) = 1 − α. (1) This holds exactly or approximately according to whether H(ρ) follows exactly or approximately a 00t00 distribution, and the resulting confidence set is exact or approximate accordingly. (1) can be rewritten as 2 P (Q(ρ) ≤ 0) = 1 − α, where Q(ρ) = f0 − 2f1ρ + f2ρ is a quadratic function of ρ, with ˆ2 2 ˆ ˆ 2 ˆ2 2 f0 = θ1 − t1−α/2(d) σˆ11, f1 = θ1θ2 − t1−α/2(d) σˆ12, and f2 = θ2 − t1−α/2(d) σˆ22. A confidence set for ρ with confidence coefficient 1 − α is given by the set of values c satisfying Q(c) ≤ 0. 2 1/2 1/2 Defining D = f1 − f0f2, r1 = (f1 − D )/f2, and r2 = (f1 + D )/f2, the confidence set for ρ is: • Case 1: A finite interval [r1, r2], if D ≥ 0 and f2 ≥ 0. • Case 2: The complement of a finite interval, (−∞, r1] ∪ [r2, ∞), if D ≥ 0 and f2 < 0. • Case 3: (−∞, ∞) if D < 0 and f2 < 0. It is known that D < 0 and f2 ≥ 0 cannot occur together; see Steffens (1971). Hence we get a finite interval ˆ 1/2 if and only if f2 ≥ 0 or equivalently |θ2/σˆ22 | ≥ t1−α/2(d), which means rejecting H0 : θ2 = 0. When f2 < 0 we do not reject H0 : θ2 = 0 and the confidence set is infinite (cases 2 and 3). Note that if θ2 = 0, ρ is ill-defined. While such confidence sets have often been dismissed as uninformative or worse (Miller 1986 calls them “absurdities”) they can have a reasonable interpretation. Fortunately, a finite interval usually results in practice. One alternative confidence interval uses 2 ˆ2 1/2 ρˆ ± t1−α/2(d)[(ˆσ11 − 2ˆρσˆ12 +ρ ˆ σˆ22)/θ2] ˆ whereρ ˆ = θ1/θ2. This results from a delta method approximation to the variance ofρ ˆ (see next section). While this interval (which is close to [r1, r2] from Fieller’s theorem for many data sets) is sometimes suitable it can perform badly in terms of achieving the desired confidence coefficient of 1 − α. 2.