<<

ASA Section on Survey Research Methods

On Small Area Interval Problems

Snigdhansu Chatterjee, Parthasarathi Lahiri, Huilin Li University of Minnesota, University of Maryland, University of Maryland

Abstract In the small area context, prediction intervals are often pro- √ duced using the standard EBLUP ± zα/2 mspe rule, where Empirical best linear unbiased prediction (EBLUP) method mspe is an estimate of the true MSPE of the EBLUP and uses a linear in combining information from dif- zα/2 is the upper 100(1 − α/2) point of the standard normal ferent sources of information. This method is particularly use- distribution. These prediction intervals are asymptotically cor- ful in small area problems. The variability of an EBLUP is rect, in the sense that the coverage probability converges to measured by the squared prediction error (MSPE), and 1 − α for large sample size n. However, they are not efficient interval estimates are generally constructed using estimates of in the sense they have either under-coverage or over-coverage the MSPE. Such methods have shortcomings like undercover- problem for small n, depending on the particular choice of age, excessive length and lack of interpretability. We propose the MSPE estimator. In statistical terms, the coverage error a driven approach, and obtain coverage accuracy of such interval is of the order O(n−1), which is not accu- of O(d3n−3/2), where d is the number of parameters and n rate enough for most applications of small area studies, many the number of observations. Simulation results demonstrate of which involve small n. If the naive mspe, i.e. the one the superiority of our method over the existing ones. that does not account for the uncertainties involving the model parameter estimation, is used, the resulting prediction inter- Keywords: Predictive distribution, prediction interval, gen- val suffers from under-coverage problem. On the other hand, eral linear mixed model, bootstrap, coverage accuracy. if the Prasad-Rao (1990) second-order unbiased MSPE esti- mator is used, the interval suffers from the over-coverage and 1 Introduction over-lengthening problem. For a general mixed , Jeske and Harville (1988) Large scale sample surveys are usually designed to produce re- proposed a prediction interval for a mixed effect. But, their liable estimates of various characteristics of interest for large method does not address the complexity introduced by the es- geographic areas. However, for effective planning of health, timation of the components. In particular, they did social and other services, and for apportioning government not study the effect of estimated unknown variance compo- funds, there is a growing demand to produce similar estimates nents on the accuracy of the coverage error of their proposed for smaller geographic areas and for other sub-populations. interval. To meet this demand, it is necessary to supplement the survey with other relevant information that are often obtained Jiang and Zhang (2002) used a distribution-free method for from different administrative and records. In many constructing prediction intervals for a future observation under small area applications, mixed linear models are now routinely a non-Gaussian linear mixed model, based on the theory de- used in combining information from various sources and ex- veloped by Jiang (1998). This technique does not employ any plaining different sources of errors. These models incorporate area specific information and can be useful in constructing in- area specific random effects which explain the “between small tervals when there is no survey data on the response variable. area variations”, not otherwise explained by the fixed effects Jiang and Zhang (2002) proposed another method which can part of the model. be applied to the situation when the sample size is large within For a good review on small area research, the readers are re- each area. This is a technique of first obtaining the EBLUP for ferred to the review papers by Rao (1986, 1999, 2001, 2004), the random effects and the residuals. Then, under conditions Ghosh and Rao (1994), Pfeffermann (2002), Marker (1999), sufficient to imply that the number of times each random ef- and the book by Rao (2003), among others. Point prediction fect is repeated (ie, number of observations in each small area) using the empirical best linear unbiased (EBLUP) and the as- tends to infinity, the empirical distribution of random effects sociated mean square prediction error (MSPE) estimation have as well as the residuals converge appropriately. This technique been discussed extensively in the small area literature. But fails when we do not have large samples for each small area, a little advancement has been made in interval prediction prob- situation that is common in many small area applications. lems. Needless to say that interval prediction is a useful data Other than the papers cited above, research on small area analysis tool, since this integrates the concepts of both point prediction intervals is limited except for some special cases prediction and hypothesis testing in an effective manner. Pre- of the Fay-Herriot model (see Fay and Herriot 1979), a well- diction intervals are useful in small area studies in many ways. celebrated mixed regression model. Varieties of Bayesian and For example, prediction intervals may help establish if differ- empirical Bayes methods are used in the Fay-Herriot model ent counties have similar resources and needs, or if different for such intervals. Because of the extensive use of the Fay- ethnic or other sub-population groups are equally exposed to Herriot model and its particular cases in the small area inter- a particular disease. val estimation, in Section 2 we review different approaches to

2831 ASA Section on Survey Research Methods for this model. produce per-capita income for small places (population less In order to study the MSPE of EBLUP predictors of linear than 1,000), Fay and Herriot (1979) considered the following combinations of fixed and random effects, Das, Jiang and Rao aggregate level model and used an empirical Bayes method (2004) considered the following general linear mixed model which combines survey data from the U.S. Current Popula- tion Survey with various administrative and census data. Their Y = Xβ + Zv + en, (1) model is now widely used in the small-area literature. The Fay-Herriot Model: n where Y ∈ R is a vector of observed responses, Xn×p and T T Zn×q are known matrices and v and en are independent ran- 1. Conditional on θ = (θ1, . . . , θn) , Y = (Y1,...,Yn) dom variables with dispersion matrices D(ψ) and Rn(ψ) re- follows a n-variate with mean θ and p k 2 spectively. Here β ∈ R and ψ ∈ R are fixed parameters. dispersion matrix D with diagonal entries Dii = σi and The mixed ANOVA model, longitudinal models including the off-diagonal entries 0. The sequence {σi} is a known Fay-Herriot model and the nested error regression model are sequence. special cases of the above framework. Here (and in the sequel) all vectors are taken to be column In this paper we address the problem of obtaining prediction vectors, for any vector (matrix) a (A), the notation aT intervals of a mixed effect for the above general linear mixed (AT ) denotes its transpose. model. Our approach is to employ parametric bootstrap. This results in a higher order coverage accuracy of the interval com- 2. The variable θ follows a n-variate normal distribution pared to the existing methods, most of which are applicable with mean given by Xβ for a known n × p matrix X only to special cases like the Fay-Herriot model. It has been and unknown but fixed vector β ∈ Rp. The dispersion 2 long recognized that health, economic activity and other mea- matrix is given by τ In, where the matrix In is the n di- sures of human well-being depend on a number of exogenous mensional identity matrix and τ is an unknown constant. and endogenous factors, many of which must be measured at the individual level and incorporated in the model. In statis- We are interested in obtaining a prediction interval for θi = T tical terms, this translates to high dimensionality of β and ψ. xi β + vi. There are several options for constructing such in- In order to address the dimension asymptotics aspect of small tervals: one may use only the Level 1 model for the observed area prediction, we allow parameter dimension d = p + k to data, or only the Level 2 model for the borrowed strength com- grow with sample size n, and obtain coverage accuracy of the ponent, or a combination of both. order O(d3n−3/2). The interval for θi based only on the Level 1 model is given −1 D Traditional small area interval estimates achieve O(n ) by Ii (α): Yi ± zα/2σi, where zα/2 is the upper α/2 percent for the Fay-Herriot model or its particular cases, for fixed val- point of N(0, 1). Obviously, for this interval, the coverage ues of p and k. As stated earlier, the O(n−1) rate is typically probability is 1 − α. However, it is not efficient, since its not adequate for related follow up applications. Hence, cali- average length is too large to make any reasonable conclusion. brations and corrections of these intervals have been proposed, This is due to the high variability of the point predictor Yi. some of which are discussed in Section 2. For clarity of our An interval based only on Level 2 ignores the crucial area proposed methodology and technical conditions, and also for specific data that is modeled in Level 1, and hence falls short easy use in future applications, in Subsection 2.3 we discuss on two counts: it fails to be relevant to the specific small area our proposed prediction interval in detail for the Fay-Herriot under consideration, and it fails to achieve sufficient coverage model. accuracy. We show this latter property in a small example in In Section 3, we present our prediction interval algorithrm Subsection 2.2. for the more general Das, Jiang, Rao model (1). In Section 4, Thus interval estimation techniques that combine both lev- our method is compared to two other existing methods using a els of the Fay-Herriot model are required. A popular approach Monte Carlo simulation study. The naive method, i.e. the one is to employ empirical Bayes methodology, for example, by which uses the naive MSPE estimator, provides the shortest using empirical Bayes confidence intervals for prediction pur- intervals among all the methods considered, but suffers from a poses. In the next subsection we review some of the important severe under-coverage problem. The Prasad-Rao method pro- developments in empirical Bayes interval construction for the vides an over-coverage at the expense of wider intervals. Our Fay-Herriot model or its particular cases. method is just about right in terms of coverage and is better than the Prasad-Rao method in terms of the length. 2.1 Empirical Bayes intervals in the Fay-Herriot model The posterior distribution of θ given Y is the n-variate normal th T 2 The Fay-Herriot model distribution with the i mean vector (1 − Bi)Yi + Bixi β, th 2 2 2 where xi is the i row of X. Here Bi = σi /(σi + τ ). In many practical applications, it is difficult to retrieve in- The posterior variance is the diagonal matrix whose ith diag- formation at the respondent level for various reasons (includ- 2 2 2 2 2 onal entry is σi τ /(σi + τ ). Replacing β and τ by their ing maintaining confidentiality) and researchers are provided consistent estimators, we get the following empirical Bayes only with design-based estimates and their associated esti- estimator of θi : mated standard errors for the small-areas. In such cases, mod- ˆEB ˆ ˆ T ˆ els are developed on the estimates themselves. In order to θi = (1 − Bi)yi + Bixi β, 2832 ASA Section on Survey Research Methods ˆ where Bˆi and β are estimators of Bi and β respectively. in proposing an empirical Bayes confidence interval which is Cox (1975) initiated the idea of developing an empirical correct up to O(n−1), a property not noted by Hill (1990). Bayes confidence interval. In the current context, his sugges- tion generates the following prediction interval: 2.2 The use of bootstrap in empirical Bayes intervals C ˆEB ˆ 1/2 It can be seen that empirical Bayes intervals require correc- Ii (α): θi ± zα/2σi(1 − Bi) . tions in order to achieve high coverage accuracy. These cor- Under certain regularity conditions, it is easy to check that rections may be based on asymptotic expansions as in Basu, C −1 Ghosh and Mukerjee (2003), hierarchical Bayesian approach P(θi ∈ Ii (α)) = 1 − α + O(n ), where P denotes a prob- ability measure induced by the joint distribution of Level 1 as in Hill (1990) or Morris (1983b), and the bootstrap. and Level 2. Thus, this prediction interval attains the desired Different bootstrap methods have been used to improve the coverage probability asymptotically, but the coverage error is naive empirical Bayes confidence intervals. The methods dif- of the order O(n−1), which is not accurate enough for many fer in the generation of the bootstrap samples and the type small area applications. Intuitively, this lack of accuracy could of correction made. For a special case of the Fay-Herriot be due to the fact that the interval does not take into account model where Y1,...,Yn are independent and identically dis- the additional errors incurred by the estimation of the hyper- tributed, Laird and Louis (1987) proposed three different ways parameters β and τ 2. of generating bootstrap samples. They differ in the degree of For a special case of the Fay-Herriot model with common the parametric assumptions involved. One problem with the mean and equal σi = σ, Morris (1983a) nonparametric and semi-parametric methods in the small area used a different measure of uncertainty for the empirical Bayes context is that the bootstrap approximation of the distribution estimator, that incorporates the additional uncertainty due to of the EBLUP is generally not even consistent. The Laird- the estimation of the hyperparameters. In a different work, Louis Type III bootstrap is the usual parametric bootstrap. Morris (1983b) considered a variation of his (1983a) empirical Once the bootstrap sample (nonparametric, semi- Bayes confidence interval where he used a hierarchical Bayes parametric or parametric) is generated, the next challenge is type point estimator in place of the previously used empirical to find a method that will correct the naive empirical Bayes C , and conjectured with some evidence that the confidence intervals Ii (α) in an effort to achieve better coverage probability for his interval is at least 1 − α. He also coverage. The method proposed by Laird and Louis (1987) is noted that the coverage probability tends to 1 − α as n goes to to consider an imitation of the hierarchical Bayes approach. ∞ or σ goes to zero. Conditional on the values of β and ψ, let g(θi|Y, β, ψ) be Basu, Ghosh and Mukerjee (2003) showed that the empir- the posterior density of θi given the data Y. If a prior π(·) is ical Bayes interval proposed by Morris (1983a) does not im- available for β and ψ, then in the hierarchical Bayesian model prove upon Cox (1975), in the sense that the coverage error one might consider the distribution −1 remains O(n ). However, using a Taylor series expansion Z of the coverage probability, they obtained an expression for g(θi|Y, β, ψ)π(β, ψ|Y), (2) the order O(n−1) term, which may then be used to calibrate −1 Morris’ interval to reduce the coverage error to o(n ). They where both the terms in the above integral are appropriate pos- also studied a prediction interval proposed by Carlin and Louis terior distributions. Laird and Louis (1987) proposed to use (1996, p.98), and showed that the Carlin-Louis interval has a coverage bias of the order o(n−1). As a by product of this an- B −1 X ∗ ∗ alytical result, they obtained an explicit expression for a new B g(θi|Y, βj , ψj ), (3) prediction interval with o(n−1) coverage bias and comparable j=1 expected length as the Carlin-Louis or the calibrated Morris where B is the bootstrap sample size and β∗ and ψ∗ are the intervals. j j th Using a multi-level model, Nandram (1999) obtained an estimates of β and ψ from the j bootstrap sample. The dis- empirical Bayes confidence interval for a small area mean and crete mixture distribution (3) is a Monte Carlo approximation showed that asymptotically it converges to the nominal cover- of Z age probability. However, accuracy results and other asymp- ∗ totic properties are not known for his interval. g(θi|Y, s, t)dL (s, t), (4) Datta, Ghosh, Smith and Lahiri (2002) used an approach similar to Basu, Ghosh and Mukerjee (2003) in order to cali- where L∗(·, ·) is the bootstrap approximation of the sampling brate the Cox-type prediction interval for a more general Fay- distribution of the parameter estimators βˆ and ψˆ. Formula (4) Herriot model with covariates but with equal sampling vari- is motivated by (2), but it is also similar to the bootstrap pre- ances. The coverage error for their interval is of the order diction of Harris (1989). O(n−3/2). Hill (1990) suggested a general framework for Carlin and Gelfand (1990, 1991) point out another issue constructing an empirical Bayes confidence interval condi- with the hierarchical Bayesian approach of (2) and methods tional on some suitable ancillary . In a simple normal- like those of Laird and Louis (1987). These approaches nec- normal setting this matches with an exact hierarchical Bayes essarily lead to a lengthening of the interval that is obtained by ˆ ˆ confidence interval. Datta et al. (2002) followed up Hill’s idea the naive empirical Bayes predictor g(θi|Y, β, ψ). However,

2833 ASA Section on Survey Research Methods a correction to the naive interval is needed, which is not the It can be shown that interval (5) has coverage of 1 − α + same as lengthening. Carlin and Gelfand (1990) discuss an O(n−1/2) which makes it consistent, but hardly accurate example where increasing the length further exacerbates the enough. The lack of accuracy is due to the use of the Level coverage bias. These authors suggest to use parametric boot- 2 distribution only, so that the Level 1 data Yi plays no special strap to calibrate the naive empirical Bayes interval. role in the interval construction. Calibration of intervals has been one of the major uses of bootstrap for some time, and can lead to considerable im- 2.3 A parametric bootstrap prediction interval provement of coverage accuracy. Coupled with use of bias correction, use of pivotal or nearly pivotal statistics, and Edge- Instead of using the bootstrap for calibration or as a surro- worth corrections, improvements from calibration can some- gate for prior, our approach is to obtain a prediction interval times be dramatic. See Abramovitch and Singh (1985), Beran directly from the bootstrap . We do not attempt a (1990a, 1990b) and the book by Efron and Tibshirani (1993) nonparametric or semiparametric bootstrap, owing to their ob- and references therein for further details on these issues. On vious consistency problems in all but the extremely simplified the other hand, calibration is both time and computational ef- special cases. fort intensive, often requiring iterative searches; it typically For the general Fay-Herriot model, we begin by defining the increases variability; the results often lack straightforward in- following quantities: terpretability; and successive calibrating steps typically have βˆ = (XXT )−1XT Y, (6) diminishing returns in terms of improvement of coverage. X τˆ2 = (rT r − (1 − h )σ2)/(n − p), Hall (2006) suggests an application of the nonparametric ii i (7) ? 0 bootstrap confidence interval based on the generated θi ’s only. τˆ2 = max(τˆ2, ), (8) In the small area context, this may be applicable when the dif- 0 ˆ 2 2 ˆ2 ferences between the small areas are minor, or carried only in Bi = σi /(σi + τ ). (9) the fixed effects. In surveys, robustness is always an impor- th Here  > 0 is a fixed small number, hii is the i diagonal el- tant issue, and the practitioners are always interested in effi- ement of the projection matrix on the column space of X, and cient nonparametric methods. However, due to scarce data at r = Y − Xβˆ is the vector of residuals. Under the condition the small area level, nonparametric estimators tend to under- 0 τ 2 > 2, the truncated estimator τˆ2 is different from the un- perform, often severely. This is because the nonparamet- biased estimator τˆ2 on a set of negligible probability. We use ric models typically permit the generation of bootstrap his- it to avoid some messy algebra, but it is entirely dispensable tograms based on a synthetic model or the regression model, otherwise. Other estimators, for example, the weighted least but do not permit approximation of the conditional distribution squares estimator for β can also be chosen. of θ given the data Y. As a result, the nonparametric boot- i The parametric bootstrap procedure is as follows: strap prediction interval for θi is likely to underweight the area ˜∗ ∗ ∗ T ∗ specific data. Accurate weighting the area specific data is im- 1. Conditional on θ = (θ1, . . . , θn) , Y = ∗ ∗ T portant for achieving good coverage properties, as the example (Y1 ,...,Yn ) follows a n-variate normal distribution ˜∗ 2 below shows. Hall (2006) also pointed out the importance of with mean θ and dispersion matrix σ In. parametric bootstrap in small area estimation and other related 2. θ˜∗ follows a n-variate normal distribution with mean Xβˆ problems. 0 ˆ2 EXAMPLE. Consider the following special case of the Fay- and variance τ In. T Herriot model where σi ≡ 1, and xi β ≡ µ. Thus, at The parametric bootstrap prediction interval for θi is given Level 1, Yi’s given the θi’s are independently distributed as by: ˆEB ˆ 1/2 N(θi, 1) random variables; and at Level 2 the θi’s are inde- Ii(α): θi ± tiσi(1 − Bi) . 2 pendent, identically distributed as N(µ, τ ) random variables. ˆEB 2 ¯ Recall that the formula for θi is given in Subsection 2.1. The estimators of µ and τ are given respectively by µˆ = Y , ∗ ∗0 ˆ∗ ˆ2 ˆ2 ˆ∗ ˆ2 2 2 X 2 The quantities β , τ , τ and Bi are computed using (6)- τ = max(0, s − 1), where s = (yi − y¯) /(n − 1). As- ∗ (9) with the resample Y in place of Y. The value of ti is ˆ2 sume τ > 0, a condition that is satisfied in many problems. such that ∗ iid The bootstrap procedure would require us to generate θi ∼ ∗ h ∗ ˆEB∗ ∗ 1/2 i θ ∈ {θ ± tiσi(1 − Bˆ ) } = 1 − α, (10) ˆ2 ∗ ∗ ind ∗ ∗ ¯∗ P i i i N[ˆµ, τ ] and Yi |θi ∼ N[θi , 1]. Then we have µˆ = Y , ∗ ˆ2 ∗2 ∗2 P ∗ ∗ 2 ˆEB∗ ˆ∗ ∗ ˆ∗ T ˆ∗ ∗ τ = max(0, s − 1) where s = (yi − y¯ ) /(n − 1). where θi = (1 − Bi )Yi + Bi xi β . The probability P An obvious Level 2 based bootstrap prediction interval for is computed conditional on the data. θi that is not area specific, is given by We assume the following conditions:  p p  (A1) the matrix X is full column rank, and the projection ˆ2 ˆ2 µˆ − t1 τ , µˆ + t2 τ , (5) matrix on the columns of X (the so called “hat” matrix) T −1 T given by Px = X(XX ) X has diagonal entries hii where (t1, t2) are cutoff points satisfying satisfying

q ∗ q ∗ sup hii = O(p/n). (11) ∗ ˆ2 ∗ ∗ ˆ2 P(ˆµ − t1 τ ≤ θ ≤ µˆ + t2 τ ) = 1 − α. i 2834 ASA Section on Survey Research Methods T (A2) The sequence {σi} lies in a compact subset of (0, ∞). Let T = c (Xβ + Zv), where c is a fixed and known (n × 1) vector. The conditional distribution of T given Y (A3) For a known  > 0, τ 2 > 2. 2 is N(µT , σT ), where T T T −1 Assumptions (A1) and (A2) are routine, while (A3) is a mi- µT = c Xβ + c ZDZ Σ (Y − Xβ) nor restriction that essentially simplifies the algebra. = cT RΣ−1Xβ + cT ZDZT Σ−1Y, and (13) Our result for the parametric bootstrap prediction interval is σ2 = cT Z D − DZT Σ−1ZD ZT c. (14) as follows: T Generally, β and ψ (and hence D and Rn) are estimated Theorem 2.1 Under assumptions (A1)-(A3) for the Fay Her- from the data Y by using the marginal distribution of Y, given riot model, the following holds: by Nn(Xβ, Σ). The resulting estimates µˆT and σˆT of the mean and variance of T are expressions similar to (13) and h EB ˆ 1/2 i P θi ∈ {θi ± tiσi(1 − Bi) } (14), with βˆ and ψˆ in place of β and ψ. For algebraic simplicity, in the rest of this paper we assume = 1 − α + O(p3n−3/2). that X is full column rank and use the estimator −1 A proof of Theorem 2.1 can be found in Chatterjee and βˆ = XXT  XT Y Lahiri (2002). −1 = β + XXT  XT (Zv + e) .

3 Parametric bootstrap prediction interval for a general This is the ordinary estimator of β. Using the linear mixed model weighted least squares estimator with appropriate conditions on the weights is another possibility that makes little differ- We consider the model: ence in the asymptotic analysis. Estimator ψˆ is typically ob- tained by maximum likelihood or restricted maximum likeli- hood techniques. −1 Y = Xβ + Zv + en (12) Based on the fact that σT (T − µT ) is a standard Normal pivot, a naive approach to interval estimation for T is to take where X is a known (n × p) matrix, Z is a known (n × q) (ˆµT ± zσˆT ) for the appropriate Normal z. Unfortu- matrix, Y ∈ n is the vector of observed data, β ∈ p −1 R R nately, σˆT (T − µˆT ) is not a pivot, and the naive approach is a fixed but unknown parameter vector, and v ∈ Rq and produces too short intervals, that typically under-cover. Let e ∈ n are random variables following the normal distri- −1 n R the distribution of σˆT (T − µˆT ) be Ln. Recognizing that Ln butions Nq(0,D) and Nn(0,Rn) respectively. Assume v and is not the standard Normal distribution, we propose to estimate the sequence {en} are independent. The first term Xβ rep- it using parametric bootstrap. resents the fixed effects, and the second term Zv the random The parametric bootstrap algorithrm is based on the esti- effects. Thus Xβ + Zv constitute the signal component of the mates βˆ and ψˆ of β and ψ respectively. Define observed data, while e is the noise. The properties of the sig- n ∗ ˆ ∗ ∗ nal are of interest, which depend on the unknown parameters Y = Xβ + Zv + e β, D and Rn. ∗ ˆ ∗ ˆ where v ∼ Nq(0,D(ψ)) and e ∼ Nn(0,R(ψ)) are inde- Assume that the (q × q) matrix D and the (n × n) matrix pendent of each other. From Y∗, obtain βˆ∗ and ψˆ∗ using the R are known up to a (k × 1) vector of unknown parameters, n same techniques used to obtain βˆ and ψˆ earlier. Next, ob- thus D = D(ψ) and R = R (ψ) for a fixed but unknown n n tain µˆ∗ and σˆ∗ using βˆ∗ and ψˆ∗ using (13) and (14). Define ψ = (ψ , ··· , ψ )T ∈ k. Note that the dispersion matrix of T T 1 k R T ∗ = cT (Xβˆ + Zv∗). The distribution of the observed data Y is given by σˆ−1∗(T ∗ − µˆ∗ ), T T T Σn = Σn(ψ) = Rn(ψ) + ZD(ψ)Z . conditional on the data Y, is the parametric bootstrap approx- ∗ We henceforth drop the n from en, Rn and Σn to simplify imation Ln of Ln. We then obtain the interval estimate for T notations. We take d = p + k, the dimension of the parameter as (ˆµT − q1σˆT , µˆT + q2σˆT ), where q1 and q2 are appropriate ∗ space. Let θ = (β, ψ) denote the unknown parameters. We of the bootstrap approximation Ln of Ln. ∗ allow q to be arbitrary, and possibly vary with n. Our main result in this section is that Ln approximates Ln Das, Jiang and Rao (2004) show that several linear mixed up to O(d3n−3/2) terms. In order to state the assumptions models, including (ANOVA) models for our result, let us introduce some terminology and notation and longitudinal models of both balanced and unbalanced na- now. For any function f(ψ): Ra → R, f 0(ψ) denotes its first ture are special cases of the the model (12). Unbalanced derivative written as a a × 1 column vector; f 00(ψ) denotes 2 ANOVA models arise, for example, when Rn = σ0In; and the a × a second derivative matrix. For a symmetric matrix 2 2 D = diag(σ1Ir1 , . . . , σk−1Irk−1 ) where Ir is the r × r iden- A, λmax and λmin respectively denote its maximum and min- tity matrix. Here ψ is the vector of variance components imum eigenvalue. 2 2 ψ = (σ0, . . . , σk−1). Unbalanced longitudinal models arise The following are the assumptions for our result in this sec- when Σ has a block diagonal structure. tion:

2835 ASA Section on Survey Research Methods 0 2 00 1. The following relations hold: We define the k × n matrix ΛR, the k × n matrix ΛR and the k3 ×n matrix Λ(3) along identical lines as above. ||XT c|| = O(1), (15) R The following conditions are assumed: ||XT Σ−1ZDZT c|| = O(1), (16) T T 0T 0 c ZDZ c = O(1), (17) λmaxΛD (ψ)ΛD(ψ) = O(1), (23) T T −1 T 0T 0 c ZDZ Σ ZDZ c = O(1). (18) λmaxΛR (ψ)ΛR(ψ) = O(1), (24) 00T 00 λmaxΛD (ψ)ΛD(ψ) = O(1), (25) In addition, 00T 00 λmaxΛR (ψ)ΛR(ψ) = O(1), (26) 2 T T −1  T (3)T ∗ (3) ∗ σT = c Z D − DZ Σ ZD Z c > M > 0, λmaxΛD (ψ )ΛD (ψ ) < M = O(1), (27) λ Λ(3)T (ψ∗)Λ(3)(ψ∗) < M = O(1), for some constant M > 0. max R R (28) ∗ 2. Assume that for some constant M > 0 for all ψ in a neighborhood of the true value ψ. p " n #2 X X 1/2 1/2 sup XjaΣai = O(p/n), (19) 4. Let S = (k/n) (ψˆ− ψ). Assume that all the moments 1≤i≤n j=1 a=1 of ||S|| are O(1). Moreover, the following relations are −1 T λminn X X > M > 0, (20) also satisfied: p for some constant M > 0. ESj = O( k/n), (29) p 3. The eigenvalues of the matrices D and R lie in (L−1,L) ESaSb = O( k/n), (30) p for some L > 1. The eigenvalues of D(ψˆ) and R(ψˆ) lie ESj(Zv + e)i = O( k/n), (31) in (L−1/2, 2L). The eigenvalues of Σ lie in a compact p ESaSb(Zv + e)i = O( k/n), (32) set on the positive half of the real line. In the representations We now state our main theorem for this section. T ˆ T D = D1(ψ)D (ψ), Dˆ = D1(ψ)ΛD(ψ)D (ψ), (21) 1 1 Theorem 3.1 Fix α ∈ (0, 1), and let q1 and q2 be real num- T ˆ ˆ T R = R1(ψ)R1 (ψ), R = R1(ψ)ΛR(ψ)R1 (ψ), (22) bers such that

∗ ∗ where ΛR and ΛD are diagonal matrices, the following Ln(q2) − Ln(−q1) = 1 − α. (33) conditions are satisfied: Under the Assumptions(1)-(4), if d2/n → 0, we have All the entries of the q × q matrix ΛD = diag(ΛD1,..., ΛDq) and the n × n matrix ΛR = P [ˆµT − q1σˆT ≤ T ≤ µˆT + q2σˆT ] diag(ΛR1,..., ΛRn) have three bounded continuous 3 −3/2 derivatives. = 1 − α + O(d n ). (34) 0 th We denote by ΛD the k × q matrix whose (j, i) entries We now discuss the assumptions leading to Theorem 3.1, are given by and some additional features of our result.

0 ∂ ((ΛD))j,i (ψ) = ΛDi(ψ), ∂ψj j = 1, . . . , k; i = 1, . . . , q. REMARK 1. Note that the dimension q of the random effect v is arbitrary which may or may not depend on n. Owing to th 2 00 this generalization, our analysis is for T = cT (Xβ + Zv), The (j, i) entry of the k × q matrix ΛD is ˜ T T rather than the more traditional T = c1 β + c2 v. Since X is ∂2 full column rank, the fixed effects in T and T˜ are equivalent. ((Λ00 )) (ψ) = Λ (ψ), D j,i Di REMARK 2. In the development of all the assumptions ∂ψj1 ∂ψj2 above, we have preferred simplicity over generality. The re- j1 + (j2 − 1)k = j, j1, j2 = 1, . . . , k, quirement d2n−1 → 0 is standard in dimension asymptotics. j = 1, . . . , k2, i = 1, . . . , q. Assumption 1 is in order to ensure T as a non-trivial quantity, th 3 (3) that is, it ensures that both the fixed component and the vari- The (j, i) entry of the k × q matrix ΛD is ance of the random component of µT are O(1), and the vari- 2 3 ance σ is bounded away from zero and infinity. By suitably  (3) ∂ T ΛD (ψ) = ΛDi(ψ), scaling the norm of the vector c this assumption is satisfied. j,i ∂ψ ∂ψ ∂ψ j1 j2 j3 Assumption 2 is a standard assumption on the behavior X. 2 j1 + (j2 − 1)k + (j3 − 1)k = j, It ensures that the norm of each fixed effects covariate is of 2 j1, j2, j3 = 1, . . . , k, j = 1, . . . , k , i = 1, . . . , q. suitable order, and the fixed effects design is not singular. This

2836 ASA Section on Survey Research Methods assumption can be modified to suit cases where X is not full Table 1: Coverage and average length of different intervals column rank, but such generalizations are routine. (nominal coverage=0.95) Assumption 3 is on standard differentiability and eigenvalue conditions. Here again, we have tried to adopt simple condi- tions rather than the most general ones. Note that the existence State Naive Prasad-Rao Bootstrap of the representations (21) and (22) are not part of the assump- 1 0.43(4.12) 0.99(12.96) 0.96(11.25) tions, and these representations will be established in the proof 2 0.45(4.13) 0.99(12.38) 0.96(10.61) of Theorem 3.1. 3 0.43(4.10) 0.99(13.07) 0.96(11.20) Also note that the eigenvalues of D(ψˆ) and R(ψˆ) are es- 4 0.47(3.94) 0.99(11.87) 0.97(9.61) timates of the variance components in typical applications. 5 0.47(2.55) 0.99(16.28) 0.95(5.72) Note that we do not allow these to be zero, since these must al- 6 0.46(3.77) 0.99(12.72) 0.97(9.15) ways lie in (L−1/2, 2L). However, L may be arbitrarily large, 7 0.42(4.06) 0.98(13.76) 0.95(11.51) consequently this assumption does not limit the applicability 8 0.31(4.64) 0.96(20.41) 0.95(19.97) of our results. 9 0.46(4.21) 0.99(12.04) 0.96(1.76) In Assumption 4 we take all moments of S to exist in or- 10 0.47(3.03) 0.99(14.54) 0.95(7.00) der to achieve simplicity. Our result involves computation of several terms involving S, and having all the moments of S available simplifies the algebra. In most applications, both ψ major covariates and sampling variances Di from the data set; and ψˆ lie in a compact set, hence this is not a strong condition. and take estimates of the hyperparameters β and ψ obtained The other conditions on S given by (29)-(30) are rou- by analysis on the data as the true values of tine. These hold when ψˆ is obtained using either maximum these hyperparameters; and artificially generate the response likelihood or restricted maximum likelihood formulation, see variable Y from a Normal distribution. Along with an inter- Jiang (1998) for related developments. cept term, we use as covariates two measures related to tax The conditions (31)-(32) are interesting, since they effec- return data, and one measure related to food stamp data. The tively set a limit to the amount of dependency structure we performance of the proposed method is more interesting for can have in Σ. In order to visualize this, suppose ψˆ(−i) is the small values of n. Hence we took n = 10 states arbitrarily. estimator of ψ obtained by using only those observations that We compare our method with (a) the naive plug-in method, (−i) 1/2 ˆ(−i) b are independent of Yi; and let S = (k/n) (ψ − ψ). and ( ) the Prasad-Rao method. The naive method uses 1/2 asymptotic normality of T with variance approximated by the Then, a sufficient condition for ESj(Zv+e)i = O((k/n) ) (−i) 1/2 E(θ − θB)2 θB is that S − S = OP ((k/n) ). estimate of i i , where i is the Bayes estimator of θ This is routinely achieved, and in particular, if Yi is inde- i. In Prasad-Rao method, an improved Taylor series based pendent of all but a finite number of observations, we have approximation of the asymptotic variance is used. (−i) 1/2 S − S = OP ((k/n) ). This is the typical situation is Based on 10,000 iterations, for the three methods we com- almost all applications of small area studies. Thus, the effect pute the coverage probabilities and average lengths for all the of Assumption 4 is to restrict the complexity of the matrices 10 selected states (small areas). The results are reported in D and R. Table 1. The naive method has severe undercoverage prob- REMARK 3. Unlike the Fay-Herriot model, in the general lem, and it could be as low as 0.31 compared to the nominal Das-Jiang-Rao mixed effects model, n does not represent the value of 0.95. The Prasad-Rao method is more conservative number of small areas. In fact, it is the sum total of all obser- than our proposed method. In terms of the average length, it is vations made, counting each repeated measurement on each much more inefficient than our bootstrap method (our method individual unit in each small area as a distinct observation. can cut down the average length of the Prasad-Rao interval by This allows Theorem 3.1 to be used with considerable flexi- more than one-third). This example illustrates that using the bility, for example, when number of individual units in small proposed bootstrap based interval achieves desired coverage areas are large, or when number of small areas are large, or accuracy, without undue lengthening of the interval. both. However, requirements of asymptotic negligibility, as in (31)-(32), must still be met. Our assumptions are designed for References the more realistic applications where number of small areas are large. [1] ABRAMOVITCH,L.; AND SINGH, K., (1985) Edge- worth corrected pivotal statistics and the bootstrap, Ann. 4 Simulation example based on income and poverty Statist., 13, 116–132. statistics [2] BASU,R.,GHOSH,J.K., AND MUKERJEE, R., (2003), We illustrate the performance of our proposed parametric Empirical Bayes prediction intervals in a normal regres- bootstrap approach with a simulated application based on a sion model: higher order asymptotics, Statist. Probab. data set on small-area income and poverty estimates for the Lett., 63, 197-203. 50 states and the District of Columbia of the United States of America. We use a Fay-Herriot model, since in that model [3] BERAN, R. (1990a) Refining bootstrap simultaneous we have viable competitors for our approach. We use three confidence sets. J. Amer. Statist. Assoc. 85, 417–426.

2837 ASA Section on Survey Research Methods [4] BERAN, R. (1990b) Calibrating prediction regions. Re- [20] JIANG,J. AND ZHANG, W. (2002), Distribution-free fining bootstrap simultaneous confidence sets. J. Amer. prediction intervals in mixed linear models, Statistica Statist. Assoc. 85, 715–723. Sinica 12, 537-553.

[5] CARLIN, B.P. AND LOUIS, T.A. (1996), Bayes and em- [21] LAIRD,N.M. AND LOUIS, T. A. (1987) Empirical pirical Bayes methods for data analysis, Chapman and Bayes confidence intervals based on bootstrap samples Hall, London. (with discussion), J. Amer. Statist. Assoc., 82, 739-750.

[6] CARLIN,B. AND GELFAND, A. (1990), Approaches for [22] MARKER, D.A. (1999), Organization of small area esti- empirical Bayes confidence intervals, J. Amer. Statist. mators using generalized regression framework, J. Offi- Assoc., 85, 105-114. cial Statist. 15, 1-24.

[7] CARLIN,B. AND GELFAND, A. (1991), A sample reuse [23] MORRIS, C.N., (1983a) Parametric empirical Bayes in- method for accurate parametric empirical Bayes confi- ference: theory and applications (with discussion), J. dence intervals, J. Roy. Statist. Soc. Ser. B, 53, 189-200. Amer. Statist. Assoc., 78, 47-65.

[8] CHATTERJEE,S. AND LAHIRI, P., (2002), Parametric [24] MORRIS, C.N., (1983b) Parametric empirical Bayes bootstrap confidence intervals in small area estimation confidence intervals, in Scientific Inference, Data Analy- problems, unpublished manuscript. sis and Robustness, Academic Press, New York, 25-50.

[9] COX, D.R. (1975), Prediction intervals and empirical [25] NANDRAM, B., (1999), An empirical Bayes prediction Bayes confidence intervals, in Perspectives in Probabil- interval for the finite population mean of a small area, ity and Statistics, papers in honor of M.S. Bartlett. Ed: J. Statist. Sinica, 9, 325-343. Gani, Academic Press, 47-55. [26] PFEFFERMANN, D. (2002), Small area estimation - new [10] DAS,K.,JIANG,J., AND RAO, J. N. K., (2004), Mean developments and directions, Int. Statist. Rev. 70, 125- squared error of empirical predictor, Ann. Statist., 32, 143. 818-840. [27] PRASAD,N.G.N. AND RAO, J. N. K., (1990), The [11] DATTA,G.S.,GHOSH,M.,SMITH,D., AND LAHIRI, estimation of the mean squared error of small-area esti- P. (2002), On an asymptotic theory of conditional and mators, J. Amer. Statist. Assoc., 85, 163-171. unconditional coverage probabilities of empirical Bayes AO Linear and its confidence intervals, Scand. J. Statist. 29, 139-152. [28] R , C. R., (1963), applications, Wiley Eastern, New Delhi. [12] EFRON,B. AND TIBSHIRANI, R. J. (1993) An introduc- [29] RAO, J. N. K. (1986), Synthetic estimators, SPREE and tion to the bootstrap, Chapman and Hall, New York. the best model based predictors, Proceedings of the Con- [13] FAY,R.E. AND HERRIOT, R. A. (1979) Estimates of ference on Survey Methods in Agriculture, 1-16, U.S. income for small places: an application of James-Stein Dept. of Agriculture. Washington, D.C. procedure to census data, J. Amer. Statist. Assoc., 74, [30] RAO, J. N. K., (1999) Some recent advances in model- 269-277. based small area estimation, Surv. Meth. 25, 175-186. [14] GHOSH,M., AND RAO, J.N.K. (1994), Small Area Es- [31] RAO, J. N. K., (2001), EB and EBLUP in small area timation: An Appraisal, Statist. Sci., 9, No. 1, 55-93, estimation, in Empirical Bayes and Likelihood inference, (with discussions). Lecture notes in Statistics No. 148 (ed. Ahmed, S. E. and [15] HALL, P. (2006) Discussion of Mixed model prediction Reid, N.), Springer, New York. and small area estimation, Test, 15, 1-96. [32] RAO, J. N. K., (2003), Small Area Estimation, Wiley, [16] HARRIS, I. R. (1989) Predictive fit for natural exponen- New York. tial families, Biometrika, 76, 675-684. [33] RAO, J. N. K., (2004), Small area estimation with appli- [17] HILL, J.R. (1990), A general framework for model- cations to agriculture, J. Indian Soc. Agricultural Statist., based statistics, Biometrika, 77, 115-126. 57, 159-170.

[18] JESKE,D.R. AND HARVILLE, D. A. (1988), Prediction-interval procedures and (fixed-effects) confidence-interval procedures for mixed linear models, Commun. Statist. - Theory Meth. 17, 1053-1087.

[19] JIANG, J. (1998), Asymptotic Properties of the Empir- ical BLUP and BLUE in mixed linear models, J. Amer. Statist. Asso., 93, 720-729.

2838