Partially Linear Models

Total Page:16

File Type:pdf, Size:1020Kb

Partially Linear Models Munich Personal RePEc Archive Partially Linear Models Hardle, Wolfgang and LIang, Hua and Gao, Jiti Humboldt-Universität zu Berlin, University of Rochester, USA, Monash University, Australia 1 September 2000 Online at https://mpra.ub.uni-muenchen.de/39562/ MPRA Paper No. 39562, posted 20 Jun 2012 22:03 UTC PARTIALLY LINEAR MODELS December 23, 1999 Wolfgang H¨ardle Institut f¨urStatistik und Okonometrie¨ Humboldt-Universit¨at zu Berlin D-10178 Berlin, Germany Hua Liang Department of Statistics Texas A&M University College Station TX 77843-3143, USA and Institut f¨urStatistik und Okonometrie¨ Humboldt-Universit¨at zu Berlin D-10178 Berlin, Germany Jiti Gao School of Mathematical Sciences Queensland University of Technology Brisbane 4001, Australia and Department of Mathematics and Statistics The University of Western Australia Perth WA 6907, Australia Electronic Version: http://www.xplore-stat.de/ebooks.html 2 PREFACE In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis of this monograph is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models. We hope that this monograph will serve as a useful reference for theoretical and applied statisticians and to graduate students and others who are interested in the area of partially linear regression. While advanced mathematical ideas have been valuable in some of the theoretical development, the methodological power of partially linear regression can be demonstrated and discussed without advanced mathematics. This monograph can be divided into three parts: part one–Chapter 1 through Chapter 4; part two–Chapter 5; and part three–Chapter 6. In the first part, we discuss various estimators for partially linear regression models, establish theo- retical results for the estimators, propose estimation procedures, and implement the proposed estimation procedures through real and simulated examples. The second part is of more theoretical interest. In this part, we construct several adaptive and efficient estimates for the parametric component. We show that the LS estimator of the parametric component can be modified to have both Bahadur asymptotic efficiency and second order asymptotic efficiency. i ii In the third part, we consider partially linear time series models. First, we propose a test procedure to determine whether a partially linear model can be used to fit a given set of data. Asymptotic test criteria and power investigations are presented. Second, we propose a Cross-Validation (CV) based criterion to select the optimum linear subset from a partially linear regression and establish a CV selection criterion for the bandwidth involved in the nonparametric ker- nel estimation. The CV selection criterion can be applied to the case where the observations fitted by the partially linear model (1.1.1) are independent and iden- tically distributed (i.i.d.). Due to this reason, we have not provided a separate chapter to discuss the selection problem for the i.i.d. case. Third, we provide recent developments in nonparametric and semiparametric time series regression. This work of the authors was supported partially by the Sonderforschungs- bereich 373 “Quantifikation und Simulation Okonomischer¨ Prozesse”. The second author was also supported by the National Natural Science Foundation of China and an Alexander von Humboldt Fellowship at the Humboldt University, while the third author was also supported by the Australian Research Council. The second and third authors would like to thank their teachers: Professors Raymond Carroll, Guijing Chen, Xiru Chen, Ping Cheng and Lincheng Zhao for their valu- able inspiration on the two authors’ research efforts. We would like to express our sincere thanks to our colleagues and collaborators for many helpful discussions and stimulating collaborations, in particular, Vo Anh, Shengyan Hong, Enno Mammen, Howell Tong, Axel Werwatz and Rodney Wolff. For various ways in which they helped us, we would like to thank Adrian Baddeley, Rong Chen, An- thony Pettitt, Maxwell King, Michael Schimek, George Seber, Alastair Scott, Naisyin Wang, Qiwei Yao, Lijian Yang and Lixing Zhu. The authors are grateful to everyone who has encouraged and supported us to finish this undertaking. Any remaining errors are ours. Berlin, Germany Wolfgang H¨ardle Texas, USA and Berlin, Germany Hua Liang Perth and Brisbane, Australia Jiti Gao Symbols and Notation The following notation is used throughout the monograph. a.s. almost surely (that is, with probability one) i.i.d. independent and identically distributed the identity matrix of order p CLTF central limit theorem LIL law of the iterated logarithm MLE maximum likelihood estimate Var(ξ) the variance of ξ N(a, σ2) normal distribution with mean a and variance σ2 U(a, b) uniform distribution on (a, b) def= denote L convergence in distribution −→ P convergence in probability −→ X (X1,...,Xn) Y (Y1,...,Yn) T (T1,...,Tn) ω ( ) or ω∗ ( ) weight functions nj · nj · ST (S ,..., S ) with S = S n ω (T )S , 1 n i i − j=1 nj i j where Si represents a random variable or a function. e e e e P G (g ,..., g )T with g = g(T ) n ω (T )g(T ). 1 n i i − j=1 nj i j ξn = Op(ηn) P ξn M ηn <ζ P f fore{| each|≥eζ >| 0,|} somee M and large enough n ξ = o (η ) P ξ ζ η 0 for each ζ > 0 n p n {| n|≥ | n|} → ξn = op(1) ξn converges to zero in probability Op(1) stochastically bounded ST the transpose of vector or matrix S 2 T S⊗ SS 1 ij S− =(s )p p the inverse of S =(sij)p p × × Φ(x) standard normal distribution function φ(x) standard normal density function For convenience and simplicity, we always let C denote some positive constant which may have different values at each appearance throughout this monograph. iii iv Contents Preface i Symbols and Notation iii List of Figures ix 1 INTRODUCTION 1 1.1 Background, History and Practical Examples ............ 1 1.2 The Least Squares Estimators .................... 10 1.3 Assumptions and Remarks ...................... 11 1.4 The Scope of the Monograph ..................... 14 1.5 The Structure of the Monograph ................... 14 2 ESTIMATION OF THE PARAMETRIC COMPONENT 21 2.1 Estimation with Heteroscedastic Errors ............... 21 2.1.1 Introduction .......................... 21 2.1.2 Estimation of the Non-constant Variance Functions .... 25 2.1.3 Selection of Smoothing Parameters ............. 28 2.1.4 Simulation Comparisons ................... 29 2.1.5 Technical Details ....................... 32 2.2 Estimation with Censored Data ................... 35 2.2.1 Introduction .......................... 35 2.2.2 Synthetic Data and Statement of the Main Results .... 36 2.2.3 Estimation of the Asymptotic Variance ........... 40 2.2.4 A Numerical Example .................... 40 2.2.5 Technical Details ....................... 41 2.3 Bootstrap Approximations ...................... 44 v vi CONTENTS 2.3.1 Introduction .......................... 44 2.3.2 Bootstrap Approximations .................. 45 2.3.3 Numerical Results ...................... 46 3 ESTIMATION OF THE NONPARAMETRIC COMPONENT 49 3.1 Introduction .............................. 49 3.2 Consistency Results .......................... 50 3.3 Asymptotic Normality ........................ 53 3.4 Simulated and Real Examples .................... 55 3.5 Appendix ............................... 57 4 ESTIMATION WITH MEASUREMENT ERRORS 61 4.1 Linear Variables with Measurement Errors ............. 61 4.1.1 Introduction and Motivation ................. 61 4.1.2 Asymptotic Normality for the Parameters ......... 62 4.1.3 Asymptotic Results for the Nonparametric Part ...... 64 4.1.4 Estimation of Error Variance ................ 64 4.1.5 Numerical Example ...................... 65 4.1.6 Discussions .......................... 67 4.1.7 Technical Details ....................... 68 4.2 Nonlinear Variables with Measurement Errors ........... 72 4.2.1 Introduction .......................... 72 4.2.2 Construction of Estimators .................. 73 4.2.3 Asymptotic Normality .................... 74 4.2.4 Simulation Investigations ................... 75 4.2.5 Technical Details ....................... 78 5 SOME RELATED THEORETIC TOPICS 83 5.1 The Laws of the Iterated Logarithm ................. 83 5.1.1 Introduction .......................... 83 5.1.2 Preliminary Processes .................... 84 5.1.3 Appendix ........................... 86 5.2 The Berry-Esseen Bounds ...................... 88 5.2.1 Introduction and Results ................... 88 5.2.2 Basic Facts .......................... 89 CONTENTS vii 5.2.3 Technical Details ....................... 93 5.3 Asymptotically Efficient Estimation ................. 100 5.3.1 Motivation ........................... 100 5.3.2 Construction of Asymptotically Efficient Estimators . 101 5.3.3 Four Lemmas ......................... 103 5.3.4 Appendix ........................... 106 5.4 Bahadur Asymptotic Efficiency ................... 110 5.4.1 Definition ........................... 110 5.4.2 Tail Probability ........................ 112
Recommended publications
  • Instrumental Regression in Partially Linear Models
    10-167 Research Group: Econometrics and Statistics September, 2009 Instrumental Regression in Partially Linear Models JEAN-PIERRE FLORENS, JAN JOHANNES AND SEBASTIEN VAN BELLEGEM INSTRUMENTAL REGRESSION IN PARTIALLY LINEAR MODELS∗ Jean-Pierre Florens1 Jan Johannes2 Sebastien´ Van Bellegem3 First version: January 24, 2008. This version: September 10, 2009 Abstract We consider the semiparametric regression Xtβ+φ(Z) where β and φ( ) are unknown · slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametri- cally. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples. Keywords: Partially linear model, semiparametric regression, instrumental variables, endo- geneity, ill-posed inverse problem, Tikhonov regularization, root-N consistent estimation, semiparametric efficiency bound JEL classifications: Primary C14; secondary C30 ∗We are grateful to R.
    [Show full text]
  • Arxiv:1609.06421V4 [Math.ST] 26 Sep 2019 Semiparametric Identification and Fisher Information∗
    Semiparametric Identification and Fisher Information∗ Juan Carlos Escanciano† Universidad Carlos III de Madrid September 25th, 2019 Abstract This paper provides a systematic approach to semiparametric identification that is based on statistical information as a measure of its “quality”. Identification can be regular or irregular, depending on whether the Fisher information for the parameter is positive or zero, respectively. I first characterize these cases in models with densities linear in a nonparametric parameter. I then introduce a novel “generalized Fisher information”. If positive, it implies (possibly irregular) identification when other conditions hold. If zero, it implies impossibility results on rates of estimation. Three examples illustrate the applicability of the general results. First, I find necessary conditions for semiparametric regular identification in a structural model for unemployment duration with two spells and nonparametric heterogeneity. Second, I show irregular identification of the median willingness to pay in contingent valuation studies. Finally, I study identification of the discount factor and average measures of risk aversion in a nonparametric Euler Equation with nonparametric measurement error in consumption. Keywords: Identification; Semiparametric Models; Fisher Information. arXiv:1609.06421v4 [math.ST] 26 Sep 2019 JEL classification: C14; C31; C33; C35 ∗First version: September 20th, 2016. †Department of Economics, Universidad Carlos III de Madrid, email: [email protected]. Research funded by the Spanish Programa de Generaci´on de Conocimiento, reference number PGC2018-096732-B-I00. I thank Michael Jansson, Ulrich M¨uller, Whitney Newey, Jack Porter, Pedro Sant’Anna, Ruli Xiao, and seminar par- ticipants at BC, Indiana, MIT, Texas A&M, UBC, Vanderbilt and participants of the 2018 Conference on Identification in Econometrics for useful comments.
    [Show full text]
  • Semiparametric Efficiency
    Faculteit Wetenschappen Vakgroep Toegepaste Wiskunde en Informatica Semiparametric Efficiency Karel Vermeulen Prof. Dr. S. Vansteelandt Proefschrift ingediend tot het behalen van de graad van Master in de Wiskunde, afstudeerrichting Toegepaste Wiskunde Academiejaar 2010-2011 To my parents, my brother Lukas To my best friend Sara A mathematical theory is not to be considered complete until you have made it so clear that you can explain it to the first man whom you meet on the street::: David Hilbert Preface My interest in semiparametric theory awoke several years ago, two and a half to be precise. That time, I was in my third year of mathematics. I had to choose a subject for my Bachelor thesis. My decision was: A geometrical approach to the asymptotic efficiency of estimators, based on the monograph by Anastasios Tsiatis, [35], under the supervision of Prof. Dr. Stijn Vansteelandt. In this manner, I entered the world of semiparametric efficiency. However, at that point I did not know it was just the beginning. Shortly after I wrote this Bachelor thesis, Prof. Dr. Stijn Vansteelandt asked me to be involved in research on semiparametric inference for so called probabilistic index models, in the context of a one month student job. Under his guidance, I applied semiparametric estimation theory to this setting and obtained the semiparametric efficiency bound to which efficiency of estimators in this model can be measured. Some results of this research are contained within this thesis. While short, this experience really convinced me I wanted to write a thesis in semiparametric efficiency. That feeling was only more encouraged after following the course Causality and Missing Data, taught by Prof.
    [Show full text]
  • Partially Linear Hazard Regression for Multivariate Survival Data
    Partially Linear Hazard Regression for Multivariate Survival Data Jianwen CAI, Jianqing FAN, Jiancheng JIANG, and Haibo ZHOU This article studies estimation of partially linear hazard regression models for multivariate survival data. A profile pseudo–partial likeli- hood estimation method is proposed under the marginal hazard model framework. The estimation on the parameters for the√ linear part is accomplished by maximization of a pseudo–partial likelihood profiled over the nonparametric part. This enables us to obtain n-consistent estimators of the parametric component. Asymptotic normality is obtained for the estimates of both the linear and nonlinear parts. The new technical challenge is that the nonparametric component is indirectly estimated through its integrated derivative function from a lo- cal polynomial fit. An algorithm of fast implementation of our proposed method is presented. Consistent standard error estimates using sandwich-type ideas are also developed, which facilitates inferences for the model. It is shown that the nonparametric component can be estimated as well as if the parametric components were known and the failure times within each subject were independent. Simulations are conducted to demonstrate the performance of the proposed method. A real dataset is analyzed to illustrate the proposed methodology. KEY WORDS: Local pseudo–partial likelihood; Marginal hazard model; Multivariate failure time; Partially linear; Profile pseudo–partial likelihood. 1. INTRODUCTION types of dependence that can be modeled, and model
    [Show full text]
  • Partially Linear Spatial Probit Models
    Partially Linear Spatial Probit Models Mohamed-Salem AHMED University of Lille, LEM-CNRS 9221 Lille, France [email protected] Sophie DABO INRIA-MODAL University of Lille LEM-CNRS 9221 Lille, France [email protected] Abstract A partially linear probit model for spatially dependent data is considered. A triangular array set- ting is used to cover various patterns of spatial data. Conditional spatial heteroscedasticity and non-identically distributed observations and a linear process for disturbances are assumed, allowing various spatial dependencies. The estimation procedure is a combination of a weighted likelihood and a generalized method of moments. The procedure first fixes the parametric components of the model and then estimates the non-parametric part using weighted likelihood; the obtained estimate is then used to construct a GMM parametric component estimate. The consistency and asymptotic distribution of the estimators are established under sufficient conditions. Some simulation experi- ments are provided to investigate the finite sample performance of the estimators. keyword: Binary choice model, GMM, non-parametric statistics, spatial econometrics, spatial statis- tics. Introduction Agriculture, economics, environmental sciences, urban systems, and epidemiology activities often utilize spatially dependent data. Therefore, modelling such activities requires one to find a type of correlation between some random variables in one location with other variables in neighbouring arXiv:1803.04142v1 [stat.ME] 12 Mar 2018 locations; see for instance Pinkse & Slade (1998). This is a significant feature of spatial data anal- ysis. Spatial/Econometrics statistics provides tools to perform such modelling. Many studies on spatial effects in statistics and econometrics using many diverse models have been published; see 1 Cressie (2015), Anselin (2010), Anselin (2013) and Arbia (2006) for a review.
    [Show full text]
  • Efficient Estimation in Single Index Models Through Smoothing Splines
    arXiv: 1612.00068 Efficient Estimation in Single Index Models through Smoothing splines Arun K. Kuchibhotla and Rohit K. Patra University of Pennsylvania and University of Florida e-mail: [email protected]; [email protected] Abstract: We consider estimation and inference in a single index regression model with an unknown but smooth link function. In contrast to the standard approach of using kernels or regression splines, we use smoothing splines to estimate the smooth link function. We develop a method to compute the penalized least squares estimators (PLSEs) of the para- metric and the nonparametric components given independent and identically distributed (i.i.d.) data. We prove the consistency and find the rates of convergence of the estimators. We establish asymptotic normality under under mild assumption and prove asymptotic effi- ciency of the parametric component under homoscedastic errors. A finite sample simulation corroborates our asymptotic theory. We also analyze a car mileage data set and a Ozone concentration data set. The identifiability and existence of the PLSEs are also investigated. Keywords and phrases: least favorable submodel, penalized least squares, semiparamet- ric model. 1. Introduction Consider a regression model where one observes i.i.d. copies of the predictor X ∈ Rd and the response Y ∈ R and is interested in estimating the regression function E(Y |X = ·). In nonpara- metric regression E(Y |X = ·) is generally assumed to satisfy some smoothness assumptions (e.g., twice continuously differentiable), but no assumptions are made on the form of dependence on X. While nonparametric models offer flexibility in modeling, the price for this flexibility can be high for two main reasons: the estimation precision decreases rapidly as d increases (“curse of dimensionality”) and the estimator can be hard to interpret when d > 1.
    [Show full text]
  • Restoring the Individual Plaintiff to Tort Law by Rejecting •Ÿjunk Logicâ
    Maurice A. Deane School of Law at Hofstra University Scholarly Commons at Hofstra Law Hofstra Law Faculty Scholarship 2004 Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation Vern R. Walker Maurice A. Deane School of Law at Hofstra University Follow this and additional works at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship Recommended Citation Vern R. Walker, Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation, 56 Ala. L. Rev. 381 (2004) Available at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship/141 This Article is brought to you for free and open access by Scholarly Commons at Hofstra Law. It has been accepted for inclusion in Hofstra Law Faculty Scholarship by an authorized administrator of Scholarly Commons at Hofstra Law. For more information, please contact [email protected]. RESTORING THE INDIVIDUAL PLAINTIFF TO TORT LAW BY REJECTING "JUNK LOGIC" ABOUT SPECIFIC CAUSATION Vern R. Walker* INTRODUCTION .......................................................................................... 382 I. UNCERTANTIES AND WARRANT IN FINDING GENERAL CAUSATION: PROVIDING A MAJOR PREMISE FOR A DIRECT INFERENCE TO SPECIFIC CAUSATION .......................................................................... 386 A. Acceptable Measurement Uncertainty: Evaluating the Precisionand Accuracy of Classifications................................... 389 B. Acceptable Sampling Uncertainty: Evaluating the Population-Representativenessof
    [Show full text]
  • A Generalized Partially Linear Model of Asymmetric Volatility
    Journal of Empirical Finance 9 (2002) 287–319 www.elsevier.com/locate/econbase A generalized partially linear model of asymmetric volatility Guojun Wu a,*, Zhijie Xiao b,1 aUniversity of Michigan Business School, 701 Tappan Street, Ann Arbor, MI 48109, USA b185 Commerce West Building, University of Illinois at Urbana-Champaign, 1206 South Sixth Street, Champaign, IL 61820, USA Accepted 23 November 2001 Abstract In this paper we conduct a close examination of the relationship between return shocks and conditional volatility. We do so in a framework where the impact of return shocks on conditional volatility is specified as a general function and estimated nonparametrically using implied volatility data—the Market Volatility Index (VIX). This setup can provide a good description of the impact of return shocks on conditional volatility, and it appears that the news impact curves implied by the VIX data are useful in selecting ARCH specifications at the weekly frequency. We find that the Exponential ARCH model of Nelson [Econometrica 59 (1991) 347] is capable of capturing most of the asymmetric effect, when return shocks are relatively small. For large negative shocks, our nonparametric function points to larger increases in conditional volatility than those predicted by a standard EGARCH. Our empirical analysis further demonstrates that an EGARCH model with separate coefficients for large and small negative shocks is better able to capture the asymmetric effect. D 2002 Elsevier Science B.V. All rights reserved. JEL classification: G10; C14 Keywords: Asymmetric volatility; News impact curve * Corresponding author. Tel.: +1-734-936-3248; fax: +1-734-764-2555. E-mail addresses: [email protected] (G.
    [Show full text]
  • Introduction to Empirical Processes and Semiparametric Inference1
    This is page i Printer: Opaque this Introduction to Empirical Processes and Semiparametric Inference1 Michael R. Kosorok August 2006 1c 2006 SPRINGER SCIENCE+BUSINESS MEDIA, INC. All rights re- served. Permission is granted to print a copy of this preliminary version for non- commercial purposes but not to distribute it in printed or electronic form. The author would appreciate notification of typos and other suggestions for improve- ment. ii This is page iii Printer: Opaque this Preface The goal of this book is to introduce statisticians, and other researchers with a background in mathematical statistics, to empirical processes and semiparametric inference. These powerful research techniques are surpris- ingly useful for studying large sample properties of statistical estimates from realistically complex models as well as for developing new and im- proved approaches to statistical inference. This book is more a textbook than a research monograph, although some new results are presented in later chapters. The level of the book is more introductory than the seminal work of van der Vaart and Wellner (1996). In fact, another purpose of this work is to help readers prepare for the mathematically advanced van der Vaart and Wellner text, as well as for the semiparametric inference work of Bickel, Klaassen, Ritov and Wellner (1997). These two books, along with Pollard (1990) and chapters 19 and 25 of van der Vaart (1998), formulate a very complete and successful elucida- tion of modern empirical process methods. The present book owes much by the way of inspiration, concept, and notation to these previous works. What is perhaps new is the introductory, gradual and unified way this book introduces the reader to the field.
    [Show full text]
  • Variable Selection in Semiparametric Regression Modeling
    The Annals of Statistics 2008, Vol. 36, No. 1, 261–286 DOI: 10.1214/009053607000000604 c Institute of Mathematical Statistics, 2008 VARIABLE SELECTION IN SEMIPARAMETRIC REGRESSION MODELING By Runze Li1 and Hua Liang2 Pennsylvania State University and University of Rochester In this paper, we are concerned with how to select significant vari- ables in semiparametric modeling. Variable selection for semipara- metric regression models consists of two components: model selection for nonparametric components and selection of significant variables for the parametric portion. Thus, semiparametric variable selection is much more challenging than parametric variable selection (e.g., linear and generalized linear models) because traditional variable selection procedures including stepwise regression and the best subset selection now require separate model selection for the nonparametric compo- nents for each submodel. This leads to a very heavy computational burden. In this paper, we propose a class of variable selection proce- dures for semiparametric regression models using nonconcave penal- ized likelihood. We establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regulariza- tion parameters, we show the asymptotic normality of the resulting estimate and further demonstrate that the proposed procedures per- form as well as an oracle procedure. A semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate that its limiting null distri- bution follows a chi-square distribution which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.
    [Show full text]
  • Goodness-Of-Fit Tests for Parametric Regression Models
    Goodness-of-Fit Tests for Parametric Regression Models Jianqing Fan and Li-Shan Huang Several new tests are proposed for examining the adequacy of a family of parametric models against large nonparametric alternatives. These tests formally check if the bias vector of residuals from parametric ts is negligible by using the adaptive Neyman test and other methods. The testing procedures formalize the traditional model diagnostic tools based on residual plots. We examine the rates of contiguous alternatives that can be detected consistently by the adaptive Neyman test. Applications of the procedures to the partially linear models are thoroughly discussed. Our simulation studies show that the new testing procedures are indeed powerful and omnibus. The power of the proposed tests is comparable to the F -test statistic even in the situations where the F test is known to be suitable and can be far more powerful than the F -test statistic in other situations. An application to testing linear models versus additive models is also discussed. KEY WORDS: Adaptive Neyman test; Contiguous alternatives; Partial linear model; Power; Wavelet thresholding. 1. INTRODUCTION independent observations from a population, Parametric linear models are frequently used to describe Y m4x5 ˜1 ˜ N 401‘ 251 (1.1) the association between a response variable and its predic- D C tors. The adequacy of such parametric ts often arises. Con- ventional methods rely on residual plots against tted values where x is a p-dimensional vector and m4 5 is a smooth regres- sion surface. Let f 4 1 ˆ5 be a given parametri¢ c family. The or a covariate variable to detect if there are any system- ¢ atic departures from zero in the residuals.
    [Show full text]
  • Estimation of Partially Linear Regression Model Under Partial
    Estimation of Partially Linear Regression Model under Partial Consistency Property Xia Cui,∗ Ying Lu† and Heng Peng ‡ Abstract In this paper, utilizing recent theoretical results in high dimensional statis- tical modeling, we propose a model-free yet computationally simple approach to estimate the partially linear model Y = Xβ + g(Z)+ ε. Motivated by the partial consistency phenomena, we propose to model g(Z) via incidental parameters. Based on partitioning the support of Z, a simple local average is used to estimate the response surface. The proposed method seeks to strike a balance between computation burden and efficiency of the estimators while minimizing model bias. Computationally this approach only involves least squares. We show that given the inconsistent estimator of g(Z), a root n arXiv:1401.2163v1 [stat.ME] 9 Jan 2014 consistent estimator of parametric component β of the partially linear model can be obtained with little cost in efficiency. Moreover, conditional on the β estimates, an optimal estimator of g(Z) can then be obtained using classic nonparametric methods. The statistical inference problem regarding β and a two-population nonparametric testing problem regarding g(Z) are consid- ered. Our results show that the behavior of test statistics are satisfactory. ∗School of Mathematics and Information Science, Guangzhou University, Guangzhou, China †Center for the Promotion of Research Involving Innovative Statistical Methodology, Steinhardt School of Culture, Education and Human Development,New York University, New York, USA ‡Department of Mathematics, Hong Kong Baptist University, Hong Kong 1 To assess the performance of our method in comparison with other methods, three simulation studies are conducted and a real dataset about risk factors of birth weights is analyzed.
    [Show full text]