Goodness-Of-Fit Tests for Parametric Regression Models

Total Page:16

File Type:pdf, Size:1020Kb

Goodness-Of-Fit Tests for Parametric Regression Models Goodness-of-Fit Tests for Parametric Regression Models Jianqing Fan and Li-Shan Huang Several new tests are proposed for examining the adequacy of a family of parametric models against large nonparametric alternatives. These tests formally check if the bias vector of residuals from parametric ts is negligible by using the adaptive Neyman test and other methods. The testing procedures formalize the traditional model diagnostic tools based on residual plots. We examine the rates of contiguous alternatives that can be detected consistently by the adaptive Neyman test. Applications of the procedures to the partially linear models are thoroughly discussed. Our simulation studies show that the new testing procedures are indeed powerful and omnibus. The power of the proposed tests is comparable to the F -test statistic even in the situations where the F test is known to be suitable and can be far more powerful than the F -test statistic in other situations. An application to testing linear models versus additive models is also discussed. KEY WORDS: Adaptive Neyman test; Contiguous alternatives; Partial linear model; Power; Wavelet thresholding. 1. INTRODUCTION independent observations from a population, Parametric linear models are frequently used to describe Y m4x5 ˜1 ˜ N 401‘ 251 (1.1) the association between a response variable and its predic- D C tors. The adequacy of such parametric ts often arises. Con- ventional methods rely on residual plots against tted values where x is a p-dimensional vector and m4 5 is a smooth regres- sion surface. Let f 4 1 ˆ5 be a given parametri¢ c family. The or a covariate variable to detect if there are any system- ¢ atic departures from zero in the residuals. One drawback of null hypothesis is the conventional methods is that a systematic departure that H 2 m4 5 f 4 1 ˆ5 for some ˆ0 (1.2) is smaller than the noise level cannot be observed easily. 0 ¢ D ¢ Recently many articles have presented investigations of the use of nonparametric techniques for model diagnostics. Most of Examples of f 4 1 ˆ5 include the widely used linear model T ¢ them focused on one-dimensional problems, including Dette f 4x1 ˆ5 x ˆ and a logistic model f 4x1ˆ5 ˆ0=41 TD D C (1999), Dette and Munk (1998), and Kim (2000). The book ˆ1 exp4ˆ2 x55. by Hart (1997) gave an extensive overview and useful refer- The alternative hypothesis is often vague. Depending on the ences. Chapter 5 of Bowman and Azzalini (1997) outlined the situations of applications, one possible choice is the saturated work by Azzalini, Bowman, and Härdle (1989) and Azzalini nonparametric alternative and Bowman (1993). Although a lot of work has focused on H 2 m4 5 f4 1 ˆ5 for all ˆ1 (1.3) the univariate case, there is relatively little work on the multi- 1 ¢ 6D ¢ ple regression setting. Hart (1997, section 9.3), considered two approaches: one is to regress residuals on a scalar function of and another possible choice is the partially linear models (see the covariates and then apply one-dimensional goodness-of- t Green and Silverman 1994 and the references therein) tests; the second approach is to explicitly take into account the T multivariate nature. The test by Härdle and Mammen (1993) H 2 m4x5 f 4x 5 x2 ‚ 1 with f 4 5 nonlinear1 (1.4) 1 D 1 C 2 ¢ was based on the L2 error criterion in d-dimensions, Stute, González Mantoiga, and Presedo Quindimil (1998) investi- where x1 and x2 are the covariates. Traditional model diagnos- gated a marked empirical process based on the residuals and tic techniques involve plotting residuals against each covari- Aerts, Claeskens, and Hart (1999) constructed tests based ate to examine if there is any systematic departure, against on orthogonal series that involved choosing a nested model the index sequence to check if there is any serial correlation, sequence in the bivariate regression. and against tted values to detect possible heteroscedasticity, In this article, a completely different approach is proposed among others. This amounts to informally checking if biases and studied. The basic idea is that if a parametric family of are negligible in the presence of large stochastic noises. These models ts data adequately, then the residuals should have techniques can be formalized as follows. Let ˆ be an esti- O T nearly zero bias. More formally, let 4x11 Y151 : : : 1 4xn1 Yn5 be mate under the null hypothesis and let ˜ 4˜11 : : : 1 ˜n5 be the resulting residuals with ˜ Y f4OxD1 ˆ5O. UsuallyO , ˆ Oi D i ƒ i O O converges to some vector ˆ0. Then under model (1.1), con- n Jianqing Fan is Professor, Department of Statistics, Chinese University ditioned on 8xi9i 1, ˜ is nearly independently and normally D O of Hong Kong and Professor, Department of Statistics, University of distributed with mean vector ‡ 4‡ 1 : : : 1 ‡ 5T , where ‡ North Carolina, Chapel Hill, NC 27599-3260 (E-mail: [email protected]). 1 n i m4x 5 f 4x 1 ˆ 5. Namely, any Dnite-dimensional componentDs Li-Shan Huang is Assistant Professor, Department of Biostatistics, i ƒ i 0 University of Rochester Medical Center, Rochester, NY 14642 (E-mail: [email protected] r.edu). Fan’s research was partially supported by NSA 96-1-0015 and NSF grant DMS-98-0-3200. Huang’s work was partially sup- ported by NSF grant DMS-97-10084. The authors thank the referee and the © 2001 American Statistical Association associate editor for their constructive and detailed suggestions that led to Journal of the American Statistical Association signi cant improvement in the presentation of the article. June 2001, Vol. 96, No. 454, Theory and Methods 640 Fan and Huang: Regression Goodness-of-Fit 641 of ˜ are asymptotically independent and normally distributed there is only one predictor variable (i.e., p 1), such ordering O D with mean being a subvector of ‡. Thus, the problem becomes is straightforward. However, it is quite challenging to order a multivariate vector. A method based on the principal compo- H 2 ‡ 0 versus H 2 ‡ 01 (1.5) nent analysis is described in Section 2.2. 0 D 1 6D When there is some information about the alternative based on the observations ˜. Note that ˜i will not be inde- hypothesis such as the partially linear model in (1.4), it is pendent in general after estimatinO g ˆ, buOt the dependence is sensible to order the residuals according to the covariate x1. practically negligible, as will be demonstrated. Thus, the tech- Indeed, our simulation studies show that it improves a great niques for independent samples such as the adaptive Neyman deal the power of the generic ordering procedure outlined in test in Fan (1996) continue to apply. Plotting a covariate X j the last paragraph improved a great deal. The parameters ‚2 against the residual ˜ aims at testing if the biases in ˜ along 1=2 O O in model (1.4) can be estimated easily at the nƒ rate via the direction Xj are negligible, namely whether the scatter plot a difference based estimator, for example, that of Yatchew 84Xj 1 ‡j 59 is negligible in the presence of the noise. (1997). Then the problem is, heuristically, reduced to the one- In testing the signi cance of the high-dimensional prob- dimensional nonparametric setting; see Section 2.4 for details. lem (1.5), the conventional likelihood ratio statistic is not pow- Wavelet transform is another popular family of orthogo- erful due to noise accumulation in the n-dimensional space, nal transforms. It can be applied to the testing problem (1.5). as demonstrated in Fan (1996). An innovative idea proposed For comparison purposes, wavelet thresholding tests are also by Neyman (1937) is to test only the rst m-dimensional sub- included in our simulation studies. See Fan (1996) and problem if there is prior that most of nonzero elements lie Spokoiny (1996) for various discussions on the properties of on the rst m dimension. To obtain such a qualitative prior, the wavelet thresholding tests. a Fourier transform is applied to the residuals in an attempt This article is organized as follows. In Section 2, we pro- to compress nonzero signals into lower frequencies. In test- pose a few new tests for the testing problems (1.2) ver- ing goodness of t for a distribution function, Fan (1996) sus (1.3) and (1.2) versus (1.4). These tests include the proposed a simple data-driven approach to choose m based adaptive Neyman test and the wavelet thresholding tests in on power considerations. This results in an adaptive Neyman Sections 2.1 and 2.3, respectively. Ordering of multivari- test. See Rayner and Best (1989) for more discussions on ate vectors is discussed in Section 2.2. Applications of the the Neyman test. Some other recent work motivated by the test statistics in the partially linear models are discussed in Neyman test includes Eubank and Hart (1992), Eubank and Section 2.4 and their extension to additive models is outlined LaRiccia (1992), Kim (2000), Inglot, Kallenberg, and Ledwina brie y in Section 2.5. Section 2.6 outlines simple methods for (1994), Ledwina (1994), Kuchibhatla and Hart (1996), and estimating the residual variance ‘ 2 in the high-dimensional Lee and Hart (1998), among others. Also see Bickel and setting. In Section 3, we carry out a number of numerical stud- Ritov (1992) and Inglot and Ledwina (1996) for illuminating ies to illustrate the power of our testing procedures. Technical insights into nonparametric tests. In this article, we adopt the proofs are relegated to the Appendix. adaptive Neyman test proposed by Fan (1996) to the testing problem (1.5).
Recommended publications
  • Instrumental Regression in Partially Linear Models
    10-167 Research Group: Econometrics and Statistics September, 2009 Instrumental Regression in Partially Linear Models JEAN-PIERRE FLORENS, JAN JOHANNES AND SEBASTIEN VAN BELLEGEM INSTRUMENTAL REGRESSION IN PARTIALLY LINEAR MODELS∗ Jean-Pierre Florens1 Jan Johannes2 Sebastien´ Van Bellegem3 First version: January 24, 2008. This version: September 10, 2009 Abstract We consider the semiparametric regression Xtβ+φ(Z) where β and φ( ) are unknown · slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametri- cally. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples. Keywords: Partially linear model, semiparametric regression, instrumental variables, endo- geneity, ill-posed inverse problem, Tikhonov regularization, root-N consistent estimation, semiparametric efficiency bound JEL classifications: Primary C14; secondary C30 ∗We are grateful to R.
    [Show full text]
  • Goodness of Fit: What Do We Really Want to Know? I
    PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003 Goodness of Fit: What Do We Really Want to Know? I. Narsky California Institute of Technology, Pasadena, CA 91125, USA Definitions of the goodness-of-fit problem are discussed. A new method for estimation of the goodness of fit using distance to nearest neighbor is described. Performance of several goodness-of-fit methods is studied for time-dependent CP asymmetry measurements of sin(2β). 1. INTRODUCTION of the test is then defined as the probability of ac- cepting the null hypothesis given it is true, and the The goodness-of-fit problem has recently attracted power of the test, 1 − αII , is defined as the probabil- attention from the particle physics community. In ity of rejecting the null hypothesis given the alterna- modern particle experiments, one often performs an tive is true. Above, αI and αII denote Type I and unbinned likelihood fit to data. The experimenter Type II errors, respectively. An ideal hypothesis test then needs to estimate how accurately the fit function is uniformly most powerful (UMP) because it gives approximates the observed distribution. A number of the highest power among all possible tests at the fixed methods have been used to solve this problem in the confidence level. In most realistic problems, it is not past [1], and a number of methods have been recently possible to find a UMP test and one has to consider proposed [2, 3] in the physics literature. various tests with acceptable power functions. For binned data, one typically applies a χ2 statistic There is a long-standing controversy about the con- to estimate the fit quality.
    [Show full text]
  • Linear Regression: Goodness of Fit and Model Selection
    Linear Regression: Goodness of Fit and Model Selection 1 Goodness of Fit I Goodness of fit measures for linear regression are attempts to understand how well a model fits a given set of data. I Models almost never describe the process that generated a dataset exactly I Models approximate reality I However, even models that approximate reality can be used to draw useful inferences or to prediction future observations I ’All Models are wrong, but some are useful’ - George Box 2 Goodness of Fit I We have seen how to check the modelling assumptions of linear regression: I checking the linearity assumption I checking for outliers I checking the normality assumption I checking the distribution of the residuals does not depend on the predictors I These are essential qualitative checks of goodness of fit 3 Sample Size I When making visual checks of data for goodness of fit is important to consider sample size I From a multiple regression model with 2 predictors: I On the left is a histogram of the residuals I On the right is residual vs predictor plot for each of the two predictors 4 Sample Size I The histogram doesn’t look normal but there are only 20 datapoint I We should not expect a better visual fit I Inferences from the linear model should be valid 5 Outliers I Often (particularly when a large dataset is large): I the majority of the residuals will satisfy the model checking assumption I a small number of residuals will violate the normality assumption: they will be very big or very small I Outliers are often generated by a process distinct from those which we are primarily interested in.
    [Show full text]
  • Partially Linear Hazard Regression for Multivariate Survival Data
    Partially Linear Hazard Regression for Multivariate Survival Data Jianwen CAI, Jianqing FAN, Jiancheng JIANG, and Haibo ZHOU This article studies estimation of partially linear hazard regression models for multivariate survival data. A profile pseudo–partial likeli- hood estimation method is proposed under the marginal hazard model framework. The estimation on the parameters for the√ linear part is accomplished by maximization of a pseudo–partial likelihood profiled over the nonparametric part. This enables us to obtain n-consistent estimators of the parametric component. Asymptotic normality is obtained for the estimates of both the linear and nonlinear parts. The new technical challenge is that the nonparametric component is indirectly estimated through its integrated derivative function from a lo- cal polynomial fit. An algorithm of fast implementation of our proposed method is presented. Consistent standard error estimates using sandwich-type ideas are also developed, which facilitates inferences for the model. It is shown that the nonparametric component can be estimated as well as if the parametric components were known and the failure times within each subject were independent. Simulations are conducted to demonstrate the performance of the proposed method. A real dataset is analyzed to illustrate the proposed methodology. KEY WORDS: Local pseudo–partial likelihood; Marginal hazard model; Multivariate failure time; Partially linear; Profile pseudo–partial likelihood. 1. INTRODUCTION types of dependence that can be modeled, and model
    [Show full text]
  • Partially Linear Spatial Probit Models
    Partially Linear Spatial Probit Models Mohamed-Salem AHMED University of Lille, LEM-CNRS 9221 Lille, France [email protected] Sophie DABO INRIA-MODAL University of Lille LEM-CNRS 9221 Lille, France [email protected] Abstract A partially linear probit model for spatially dependent data is considered. A triangular array set- ting is used to cover various patterns of spatial data. Conditional spatial heteroscedasticity and non-identically distributed observations and a linear process for disturbances are assumed, allowing various spatial dependencies. The estimation procedure is a combination of a weighted likelihood and a generalized method of moments. The procedure first fixes the parametric components of the model and then estimates the non-parametric part using weighted likelihood; the obtained estimate is then used to construct a GMM parametric component estimate. The consistency and asymptotic distribution of the estimators are established under sufficient conditions. Some simulation experi- ments are provided to investigate the finite sample performance of the estimators. keyword: Binary choice model, GMM, non-parametric statistics, spatial econometrics, spatial statis- tics. Introduction Agriculture, economics, environmental sciences, urban systems, and epidemiology activities often utilize spatially dependent data. Therefore, modelling such activities requires one to find a type of correlation between some random variables in one location with other variables in neighbouring arXiv:1803.04142v1 [stat.ME] 12 Mar 2018 locations; see for instance Pinkse & Slade (1998). This is a significant feature of spatial data anal- ysis. Spatial/Econometrics statistics provides tools to perform such modelling. Many studies on spatial effects in statistics and econometrics using many diverse models have been published; see 1 Cressie (2015), Anselin (2010), Anselin (2013) and Arbia (2006) for a review.
    [Show full text]
  • Chapter 2 Simple Linear Regression Analysis the Simple
    Chapter 2 Simple Linear Regression Analysis The simple linear regression model We consider the modelling between the dependent and one independent variable. When there is only one independent variable in the linear regression model, the model is generally termed as a simple linear regression model. When there are more than one independent variables in the model, then the linear model is termed as the multiple linear regression model. The linear model Consider a simple linear regression model yX01 where y is termed as the dependent or study variable and X is termed as the independent or explanatory variable. The terms 0 and 1 are the parameters of the model. The parameter 0 is termed as an intercept term, and the parameter 1 is termed as the slope parameter. These parameters are usually called as regression coefficients. The unobservable error component accounts for the failure of data to lie on the straight line and represents the difference between the true and observed realization of y . There can be several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherent randomness in the observations etc. We assume that is observed as independent and identically distributed random variable with mean zero and constant variance 2 . Later, we will additionally assume that is normally distributed. The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic whereas y is viewed as a random variable with Ey()01 X and Var() y 2 . Sometimes X can also be a random variable.
    [Show full text]
  • A Study of Some Issues of Goodness-Of-Fit Tests for Logistic Regression Wei Ma
    Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2018 A Study of Some Issues of Goodness-of-Fit Tests for Logistic Regression Wei Ma Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected] FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A STUDY OF SOME ISSUES OF GOODNESS-OF-FIT TESTS FOR LOGISTIC REGRESSION By WEI MA A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2018 Copyright c 2018 Wei Ma. All Rights Reserved. Wei Ma defended this dissertation on July 17, 2018. The members of the supervisory committee were: Dan McGee Professor Co-Directing Dissertation Qing Mai Professor Co-Directing Dissertation Cathy Levenson University Representative Xufeng Niu Committee Member The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements. ii ACKNOWLEDGMENTS First of all, I would like to express my sincere gratitude to my advisors, Dr. Dan McGee and Dr. Qing Mai, for their encouragement, continuous support of my PhD study, patient guidance. I could not have completed this dissertation without their help and immense knowledge. I have been extremely lucky to have them as my advisors. I would also like to thank the rest of my committee members: Dr. Cathy Levenson and Dr. Xufeng Niu for their support, comments and help for my thesis. I would like to thank all the staffs and graduate students in my department.
    [Show full text]
  • Restoring the Individual Plaintiff to Tort Law by Rejecting •Ÿjunk Logicâ
    Maurice A. Deane School of Law at Hofstra University Scholarly Commons at Hofstra Law Hofstra Law Faculty Scholarship 2004 Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation Vern R. Walker Maurice A. Deane School of Law at Hofstra University Follow this and additional works at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship Recommended Citation Vern R. Walker, Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation, 56 Ala. L. Rev. 381 (2004) Available at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship/141 This Article is brought to you for free and open access by Scholarly Commons at Hofstra Law. It has been accepted for inclusion in Hofstra Law Faculty Scholarship by an authorized administrator of Scholarly Commons at Hofstra Law. For more information, please contact [email protected]. RESTORING THE INDIVIDUAL PLAINTIFF TO TORT LAW BY REJECTING "JUNK LOGIC" ABOUT SPECIFIC CAUSATION Vern R. Walker* INTRODUCTION .......................................................................................... 382 I. UNCERTANTIES AND WARRANT IN FINDING GENERAL CAUSATION: PROVIDING A MAJOR PREMISE FOR A DIRECT INFERENCE TO SPECIFIC CAUSATION .......................................................................... 386 A. Acceptable Measurement Uncertainty: Evaluating the Precisionand Accuracy of Classifications................................... 389 B. Acceptable Sampling Uncertainty: Evaluating the Population-Representativenessof
    [Show full text]
  • Testing Goodness Of
    Dr. Wolfgang Rolke University of Puerto Rico - Mayaguez CERN Phystat Seminar 1 Problem statement Hypothesis testing Chi-square Methods based on empirical distribution function Other tests Power studies Running several tests Tests for multi-dimensional data 2 ➢ We have a probability model ➢ We have data from an experiment ➢ Does the data agree with the probability model? 3 Good Model? Or maybe needs more? 4 F: cumulative distribution function 퐻0: 퐹 = 퐹0 Usually more useful: 퐻0: 퐹 ∊ ℱ0 ℱ0 a family of distributions, indexed by parameters. 5 Type I error: reject true null hypothesis Type II error: fail to reject false null hypothesis A: HT has to have a true type I error probability no higher than the nominal one (α) B: probability of committing the type II error (β) should be as low as possible (subject to A) Historically A was achieved either by finding an exact test or having a large enough sample. p value = probability to reject true null hypothesis when repeating the experiment and observing value of test statistic or something even less likely. If method works p-value has uniform distribution. 6 Note above: no alternative hypothesis 퐻1 Different problem: 퐻0: 퐹 = 푓푙푎푡 vs 퐻0: 퐹 = 푙푖푛푒푎푟 → model selection Usually better tests: likelihood ratio test, F tests, BIC etc. Easy to confuse: all GOF papers do power studies, those need specific alternative. Our question: is F a good enough model for data? We want to guard against any alternative. 7 Not again … Actually no, GOF equally important to both (everybody has a likelihood) Maybe more so for Bayesians, no non- parametric methods.
    [Show full text]
  • Measures of Fit for Logistic Regression Paul D
    Paper 1485-2014 SAS Global Forum Measures of Fit for Logistic Regression Paul D. Allison, Statistical Horizons LLC and the University of Pennsylvania ABSTRACT One of the most common questions about logistic regression is “How do I know if my model fits the data?” There are many approaches to answering this question, but they generally fall into two categories: measures of predictive power (like R-square) and goodness of fit tests (like the Pearson chi-square). This presentation looks first at R-square measures, arguing that the optional R-squares reported by PROC LOGISTIC might not be optimal. Measures proposed by McFadden and Tjur appear to be more attractive. As for goodness of fit, the popular Hosmer and Lemeshow test is shown to have some serious problems. Several alternatives are considered. INTRODUCTION One of the most frequent questions I get about logistic regression is “How can I tell if my model fits the data?” Often the questioner is expressing a genuine interest in knowing whether a model is a good model or a not-so-good model. But a more common motivation is to convince someone else--a boss, an editor, or a regulator--that the model is OK. There are two very different approaches to answering this question. One is to get a statistic that measures how well you can predict the dependent variable based on the independent variables. I’ll refer to these kinds of statistics as measures of predictive power. Typically, they vary between 0 and 1, with 0 meaning no predictive power whatsoever and 1 meaning perfect predictions.
    [Show full text]
  • Goodness of Fit of a Straight Line to Data
    5. A linear trend. 10.4 The Least Squares Regression Line LEARNING OBJECTIVES 1. To learn how to measure how well a straight line fits a collection of data. 2. To learn how to construct the least squares regression line, the straight line that best fits a collection of data. 3. To learn the meaning of the slope of the least squares regression line. 4. To learn how to use the least squares regression line to estimate the response variable y in terms of the predictor variablex. Goodness of Fit of a Straight Line to Data Once the scatter diagram of the data has been drawn and the model assumptions described in the previous sections at least visually verified (and perhaps the correlation coefficient r computed to quantitatively verify the linear trend), the next step in the analysis is to find the straight line that best fits the data. We will explain how to measure how well a straight line fits a collection of points by examining how well the line y=12x−1 fits the data set Saylor URL: http://www.saylor.org/books Saylor.org 503 To each point in the data set there is associated an “error,” the positive or negative vertical distance from the point to the line: positive if the point is above the line and negative if it is below theline. The error can be computed as the actual y-value of the point minus the y-value yˆ that is “predicted” by inserting the x-value of the data point into the formula for the line: error at data point (x,y)=(true y)−(predicted y)=y−yˆ The computation of the error for each of the five points in the data set is shown in Table 10.1 "The Errors in Fitting Data with a Straight Line".
    [Show full text]
  • A Generalized Partially Linear Model of Asymmetric Volatility
    Journal of Empirical Finance 9 (2002) 287–319 www.elsevier.com/locate/econbase A generalized partially linear model of asymmetric volatility Guojun Wu a,*, Zhijie Xiao b,1 aUniversity of Michigan Business School, 701 Tappan Street, Ann Arbor, MI 48109, USA b185 Commerce West Building, University of Illinois at Urbana-Champaign, 1206 South Sixth Street, Champaign, IL 61820, USA Accepted 23 November 2001 Abstract In this paper we conduct a close examination of the relationship between return shocks and conditional volatility. We do so in a framework where the impact of return shocks on conditional volatility is specified as a general function and estimated nonparametrically using implied volatility data—the Market Volatility Index (VIX). This setup can provide a good description of the impact of return shocks on conditional volatility, and it appears that the news impact curves implied by the VIX data are useful in selecting ARCH specifications at the weekly frequency. We find that the Exponential ARCH model of Nelson [Econometrica 59 (1991) 347] is capable of capturing most of the asymmetric effect, when return shocks are relatively small. For large negative shocks, our nonparametric function points to larger increases in conditional volatility than those predicted by a standard EGARCH. Our empirical analysis further demonstrates that an EGARCH model with separate coefficients for large and small negative shocks is better able to capture the asymmetric effect. D 2002 Elsevier Science B.V. All rights reserved. JEL classification: G10; C14 Keywords: Asymmetric volatility; News impact curve * Corresponding author. Tel.: +1-734-936-3248; fax: +1-734-764-2555. E-mail addresses: [email protected] (G.
    [Show full text]