Resampling Hypothesis Tests for Autocorrelated Fields
Total Page:16
File Type:pdf, Size:1020Kb
JANUARY 1997 WILKS 65 Resampling Hypothesis Tests for Autocorrelated Fields D. S. WILKS Department of Soil, Crop and Atmospheric Sciences, Cornell University, Ithaca, New York (Manuscript received 5 March 1996, in ®nal form 10 May 1996) ABSTRACT Presently employed hypothesis tests for multivariate geophysical data (e.g., climatic ®elds) require the as- sumption that either the data are serially uncorrelated, or spatially uncorrelated, or both. Good methods have been developed to deal with temporal correlation, but generalization of these methods to multivariate problems involving spatial correlation has been problematic, particularly when (as is often the case) sample sizes are small relative to the dimension of the data vectors. Spatial correlation has been handled successfully by resampling methods when the temporal correlation can be neglected, at least according to the null hypothesis. This paper describes the construction of resampling tests for differences of means that account simultaneously for temporal and spatial correlation. First, univariate tests are derived that respect temporal correlation in the data, using the relatively new concept of ``moving blocks'' bootstrap resampling. These tests perform accurately for small samples and are nearly as powerful as existing alternatives. Simultaneous application of these univariate resam- pling tests to elements of data vectors (or ®elds) yields a powerful (i.e., sensitive) multivariate test in which the cross correlation between elements of the data vectors is successfully captured by the resampling, rather than through explicit modeling and estimation. 1. Introduction dle portion of Fig. 1. First, standard tests, including the t test, rest on the assumption that the underlying data It is often of interest to compare two datasets for are composed of independent samples from their parent differences with respect to particular attributes. For ex- populations. Very often, atmospheric and other geo- ample, one might want to investigate whether the mean physical data do not satisfy this assumption even ap- value under one set of conditions is different than the proximately. In particular, such data typically exhibit mean in some other circumstance. In order to make fair serial (or auto-) correlation. The effect of this autocor- and reliable comparisons, it is necessary to account for relation on the sampling distribution of the mean is to sampling variability in this process. That is, even if there increase its variance above the level that would be in- is no difference with respect to an attribute of interest ferred under the assumption of independence. That is, (e.g., the mean) in the generating processes for two sample means are less consistent from batch to batch batches of data, the two sample estimates of the quantity than would be the case for independent data. (the pair of sample means) will rarely be exactly equal. The estimated variance of the sampling distribution Standard statistical tests (e.g., the familiar t test) have of the mean appears in the denominator of the test sta- been devised to discern what magnitude of difference, tistic for the t test: in relation to the variability evident in the data samples, can be declared with high con®dence to be real (i.e., xÅ 2 m t 5 0 . (1) statistically signi®cant). Statistical tests have most com- 1/2 monly been applied to atmospheric datasets in the con- [var(xÅ)] text of climate studies (e.g., Livezey 1985), although Underestimation of the variance of the sample mean thus their range of potential applicability is much broader in¯ates the standard t statistic, leading to unwarranted (e.g., Daley and Chervin 1985). rejections of the null hypothesis (e.g., Cressie 1980; Two complications arise when attempting to apply Wilks 1995). However, a number of univariate (i.e., sca- standard hypothesis tests to climatic or other geophys- lar) tests have been developed that perform properly, ical data. These are illustrated schematically in the mid- even when the data upon which they operate exhibit serial correlation (Albers 1978; Chervin and Schneider 1976; Jones 1975; Katz 1982; Zwiers and ThieÂbaux 1987; Zwiers and von Storch 1995). One approach, Corresponding author address: Dr. Daniel S. Wilks, Department of Soil, Crop and Atmospheric Sciences, Cornell University, 1123 which is successful when the sample size n is suf®- Brad®eld Hall, Ithaca, NY 14853-1901. ciently large, is to estimate the variance of the sampling E-mail: [email protected] distribution of the mean as q1997 American Meteorological Society Unauthenticated | Downloaded 10/02/21 01:12 AM UTC 66 JOURNAL OF CLIMATE VOLUME 10 FIG. 1. Conceptual illustration of therelationships between conventional univariate hypothesis tests (top), univariate tests modi®ed to account for temporal correlation in the data (middle left), multivariate tests assuming serial independence (middle right), and tests respecting both types of correlations (bottom) that are the subject of this paper. s2 for problems involving estimation of the mean (Livezey var(xÅ) 5 V x , (2) n 1995; ThieÂbaux and Zwiers 1984; von Storch 1995; Zwiers and von Storch 1995). Similarly, (3) is some- 2 wheresx is the sample variance of the data and V is the times used to de®ne the ``effective sample size'' ``variance in¯ation factor,'' which depends on the au- n tocorrelation in the data according to n 5 , (4) e V n21 k V 5 1 1 212rk. (3) which also pertains only to inferences concerning O n k5112 means. Here, the rk are estimates of the autocorrelations at lags The second complication indicated in Fig. 1 is that k. For independent data, rk 5 0 for k ± 0, in which case the data of interest may be vector valued, or multivar- V 5 1 and the conventional t test is recovered in (1). iate. If the elements of the data vectors are mutually The variance in¯ation factor is sometimes called the independent, it is straightforward to evaluate jointly the ``time between effectively independent samples,'' (Leith results of a collection of scalar hypothesis tests (the 1973; Trenberth 1984), or the ``decorrelation time,'' al- ``®eld signi®cance'') using the binomial distribution though this terminology is only really applicable to (3) (e.g., Livezey and Chen 1983; von Storch 1982). How- Unauthenticated | Downloaded 10/02/21 01:12 AM UTC JANUARY 1997 WILKS 67 ever, for problems of practical interest, it is often the on the test statistic without the need to explicitly model case that the data exhibit substantial cross correlation. those correlations. Section 4 summarizes and concludes For example, it might be of interest to test whether the with suggestions for future directions. means of two sets of atmospheric ®elds (these could be values at a spatial array of points, or expansion coef- ®cients for the data projected onto a different basis, such 2. Univariate tests as selected Fourier harmonics) are signi®cantly different a. Bootstrap tests and the moving blocks bootstrap between two time series of these ®elds. If the data are cross correlated but temporally independent, the clas- The bootstrap is a relatively recent computer-based sical Hotelling test [the multivariate analog of the simple statistical technique (Efron 1982; Efron and Tibshirani t test, e.g., Johnson and Wichern (1982)] or related tests 1993). The philosophical basis of the bootstrap is the (e.g., Hasselmann 1979) can be used, provided the num- idea that the sampling characteristics (i.e., batch-to- ber of samples is much larger than the dimension of the batch variations exhibited by different samples from a vector data. parent population) of a statistic of interest can be sim- Alternatively, resampling tests (Chung and Fraser ulated by repeatedly treating a single available batch of 1958; Livezey and Chen 1983; Mielke et al. 1981; Prei- data in a way that mimics the process of sampling from sendorfer and Barnett 1983; Wilks 1996; Zwiers 1987) the parent population. As a nonparametric approach, it can be employed. These tests involve, in one form or is often useful when the validity of assumptions un- another, developing sampling distributions for a test sta- derlying more traditional theoretical approaches is ques- tistic (e.g., the length of the difference between vector tionable and when the more traditional approaches are means, according to a particular norm) by repeated ran- either unavailable or intractable. dom reordering the data vectors (®elds). This procedure Bootstrapping is used most frequently to estimate is attractive because the resampling procedure captures sampling distributions, or particular aspects of sampling the cross correlation between elements of the data vec- distributions, such as standard errors and con®dence in- tors without requiring that the covariance structure be tervals. Although computationally intensive, the pro- explicitly modeled and estimated. However, conven- cedure is algorithmically simple. The bootstrap approx- tional resampling procedures, which repeatedly rear- imation to the sampling distribution of a statistic of range the time ordering of individual data points, nec- interest is constructed by repeatedly resampling the essarily destroy important information regarding the available data with replacement to yield multiple syn- temporal correlation that may be present in the data thetic samples of the same size as the original set of (e.g., Zwiers 1990). Such tests are thus applicable only observations. For example, consider an arbitrary statis- to situations where either the temporal correlation is negligible,