7 Detection and Modeling of Nonconstant Variance
Total Page:16
File Type:pdf, Size:1020Kb
CHAPTER 7 ST 762, M. DAVIDIAN 7 Detection and modeling of nonconstant variance 7.1 Introduction So far, we have focused on approaches to inference in mean-variance models of the form 2 2 E(Yj|xj)= f(xj, β), var(Yj|xj)= σ g (β, θ, xj) (7.1) under the assumption that we have already specified such a model. • Often, a model for the mean may be suggested by the nature of the response (e.g., binary or count), by subject-matter theoretical considerations (e.g., pharmacokinetics), or by the empirical evidence (e.g., models for assay response). • A model for variance may or may not be suggested by these features. When the response is binary, the form of the variance is indeed dictated by the Bernoulli distribution, while for data in the form of counts or proportions, for which the Poisson or binomial distributions may be appropriate, the form of the variance is again suggested. One may wish to consider the possibility of over- or underdispersion in these situations; this may reasonably be carried out by fitting a model that accommodates these features and determining if an improvement in fit is apparent using methods for inference on variance parameters we will discuss in Chapter 12. Alternatively, when the response is continuous (or approximately continuous), it is often the situation that there is not necessarily an obvious relevant distributional model. As we have discussed in some of the examples we have considered, several sources of variation may combine to produce patterns that are not well described by the kinds of variance models dictated by popular distributional assumptions such as the gamma or lognormal distributions. In fact, it may be unclear whether heterogeneity of variance is even an issue at all. In some applications, it is expected, and popular models may be available; in others, whether or not variance changes with the mean or covariate values may need to be deduced from the data. • In these situations, methods are required for detecting nonconstant variance, determining whether or not it changes smoothly across the range of the response or covariates, and identifying an appropriate model to characterize the change. To address these issues, both formal and informal approaches have been proposed: PAGE 155 CHAPTER 7 ST 762, M. DAVIDIAN • Graphical techniques. Both for detection and modeling, these often have a subjective flavor. In this chapter, we will focus on these procedures. • Formal hypothesis testing. Formal procedures are mainly used for detection. We will defer discus- sion of these until after we have covered the large-sample theoretical developments on which they are based. Because of the complexity of (7.1), no finite-sample, “exact” methods are available in general. COMMON THEME: Most graphical approaches are based on the OLS residuals ˆ rj = Yj − f(xj, βOLS) and functions thereof, or on related constructs. Our main focus will be on detection and modeling in situations where the response is continuous (or nearly continuous, such as in the case of moderate-to- large counts). A complementary treatment of some of the approaches we will discuss may be found in Carroll and Rup- pert (1988, Sections 2.7 and 2.8). Note that, in what follows, distributional statements are conditional on the xj. 7.2 Plots based on residuals We begin by first reviewing the basic rationale for the use of residuals as a tool for detecting nonconstant variance in regression. The “usual” residual plots described in a first course in linear regression analysis apply equally well in the nonlinear model situation. Specifically, one usually plots the rj or the “standardized” residuals rj/σˆOLS, where n 2 −1 2 σˆOLS = (n − p) rj , jX=1 versus one or more of the following: ˆ ˆ • Predicted values Yj = f(xj, βOLS) • Covariates (elements of xj) • log Yˆj in cases where many responses tend to be clustered in a very narrow range in order to “stretch things out” so that any patterns might be more readily discernible. We will see the value of this for some nonlinear models and designs later. PAGE 156 CHAPTER 7 ST 762, M. DAVIDIAN If the plot(s) exhibit an apparent pattern, with the magnitude of residuals changing with level of predicted value or covariate, this is taken as evidence of potential nonconstant variance. In particular, for the plot of residuals vs. predicted values or their logarithms, a “fan-shape” is accepted as evidence that variance increases smoothly with the level of the response (mean). More generally, any “nonhaphazard,” “systematic” pattern may well be evidence that variance does not remain constant over the range of the response. One must be careful, however. • A systematic pattern may also be the result of an ill-fitting mean model. The nature of the pattern must be critically assessed by the data analyst to determine a reasonable explanation for it given the particular mean model and circumstances. For example, for the indomethacin pharmacokinetic data in Examples 1.1 and 1.2, the model was the sum of two exponential terms. If a simple model containing only a single exponential term were fitted to these data, one would expect to see a systematic pattern in the residuals reflecting the lack of fit of this model. There is certainly subjectivity involved in this endeavor. • When responses are collected in time order, e.g., repeated measurements on the same individuals, one often plots the residuals against time to look for temporal patterns that may suggest possible serial correlation. Alternatively, more sophisticated plots for investigating this are available. We defer discussion of serial correlation until later chapters, as our current focus is on detecting and modeling nonconstant variance when the assumption of independence is reasonable. It is important to recognize, however, that this is an assumption that should be considered carefully in practice. MOTIVATION: The obvious motivation for the usual plots is that rj is a “proxy” for the true deviation Yj − f(xj, β). • If the data are normally (or at least symmetrically) distributed with constant variance, we would expect the rj to be roughly symmetrically distributed about 0 and to have approximate constant variance. • We would thus expect a “haphazard” pattern, with approximately equal numbers of positive and negative residuals with approximately the same magnitude across their entire range. • Even if the variance were nonconstant, if the data were at least normally or symmetrically dis- tributed, we would still expect approximately equal numbers of positive and negative residuals. PAGE 157 CHAPTER 7 ST 762, M. DAVIDIAN However, we would expect changing magnitude across the range. PROBLEMS WITH THE USUAL PLOTS: The OLS residuals rj may not have exactly the same ˆ properties as the true deviations because β is replaced by the OLS estimator βOLS. We will tackle this issue shortly. Some more immediate problems that may make the usual plots difficult to interpret are as follows: • The data may not be normally or even symmetrically distributed but may instead arise from a skewed (asymmetric) distribution. • The design (the settings of the xj) may be such that an unusual pattern of residuals may be due to something other than nonconstant variance. • Furthermore, although the usual plots may be sufficient for detection, they may not be very helpful for modeling of nonconstant variance. We thus consider refinements of the usual plots. REFINEMENT 1. A common idea is to base plots on transformations of (absolute) residuals or other residuals in order to account for sample size or asymmetry. A seminal reference for some of these ideas in the context of linear regression is Cook and Weisberg (1983). We have already discussed estimation of variance parameters based on transformations of absolute residuals, so it should come as no surprise that diagnostic plots would also be based on them. IDEA 1: “Visually double the sample size.” The usual plots may be difficult to interpret because the sample size is small. Under such conditions, a change in the placement of just a single residual in the plot can change the apparent pattern substantially. Thus, each observation may be very influential to the eye in gauging the pattern. 2 2 2 A simple remedy is to plot rj or rj /σˆOLS instead. In this plot, the magnitude but not the sign of the residuals is emphasized. Because the contribution of all residuals is positive, this has the effect of creating a “larger” sample size for the purpose of spotting changes in magnitude. Moreover, the visual influence of any single observation in dictating the pattern is reduced. Recall the data on the pharmacokinetics of indomethacin in Examples 1.1 and 1.2. Here, n = 11 concentration responses were collected over time on a single subject. PAGE 158 CHAPTER 7 ST 762, M. DAVIDIAN The data are plotted again in Figure 7.3 in Section 7.5; a usual residual plot was given in Figure 1.3 and exhibits a “fan-shaped” pattern that appears roughly symmetric about zero. Note that the residuals have been plotted against the logarithm of predicted values; because the response “tails off” rather quickly, there are many residuals at very small values of the response, so that residuals plotted against predicted values themselves are “bunched up” near zero, making the pattern difficult to assess. Figure 7.4 in Section 7.5 shows a plot of squared, standardized residuals against log predicted values and shows a “wedge shape” indicating the increase in magnitude across the range. One could substitute absolute residuals |rj| for squared ones and make similar plots. A purported advantage of squared over absolute residuals themselves is that squaring tends to highlight residuals that are “large” in magnitude and downplay those that are “small,” thus drawing attention to changes in magnitude over the range.