Adaptive Estimation of High Dimensional Partially Linear Model

Total Page:16

File Type:pdf, Size:1020Kb

Adaptive Estimation of High Dimensional Partially Linear Model Adaptive Estimation of High Dimensional Partially Linear Model Fang Han,∗ Zhao Ren,y and Yuxin Zhuz May 6, 2017 Abstract Consider the partially linear model (PLM) with random design: Y = XTβ∗ + g(W ) + u, where g(·) is an unknown real-valued function, X is p-dimensional, W is one-dimensional, and β∗ is s-sparse. Our aim is to efficiently estimate β∗ based on n i.i.d. observations of (Y; X; W ) with possibly n < p. The popular approaches and their theoretical properties so far have mainly been developed with an explicit knowledge of some function classes. In this paper, we propose an adaptive estimation procedure, which automatically adapts to the model via a tuning-insensitive bandwidth parameter. In many cases, the proposed procedure also proves to attain weaker scaling and noise requirements than the best existing ones. The proof rests on a general method for determining the estimation accuracy of a perturbed M-estimator and new U-statistics tools, which are of independent interest. Keywords: partially linear model; pairwise difference approach; adaptivity; minimum sample size requirement; heavy-tailed noise; degenerate U-processes. 1 Introduction ∗ ∗ ∗ T p Partially linear regression involves estimating a vector β = (β1 ; : : : ; βp ) 2 R from independent and identically distributed (i.i.d.) observations f(Yi;Xi;Wi); i = 1; : : : ; ng satisfying T ∗ Yi = Xi β + g(Wi) + ui; for i = 1; : : : ; n: (1.1) p Here Xi 2 R , Yi;Wi; ui 2 R, g(·) is an unknown real-valued function, and ui stands for a noise term of finite variance and independent of (Xi;Wi). This paper is focused on such problems when the number of covariates p + 1 is much larger than the sample size n, which in the sequel we shall refer to as a high dimensional partially linear model (PLM). For this, we regulate β∗ to be s-sparse, namely, the number of nonzero elements in β∗, s, is smaller than n. According to the smoothness of function g(·), the following regression problems, with sparse regression coefficient β∗, are special instances of the studied model (1.1): (1) The ordinary linear regression models, when g(·) is constant-valued. ∗Department of Statistics, University of Washington, Seattle, WA 98195, USA; e-mail: [email protected] yDepartment of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA; email: [email protected] zDepartment of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA; e-mail: [email protected] 1 (2) Partially linear Lipschitz models, when g(·) satisfies a Lipschitz-type condition (see, for ex- ample, Condition 7.1 in Li and Racine(2007) for detailed descriptions). (3) Partially smoothing spline models (Engle et al., 1986; Wahba, 1990), when g(·) can be well approximated by splines. (4) Partially linear jump discontinuous regression, when g(·) can contain numerous jumps. In literature, Bunea(2004), Bunea and Wegkamp(2004), Fan and Li(2004), Liang and Li (2009), M¨ullerand van de Geer(2015), Sherwood and Wang(2016), Yu et al.(2016), Zhu(2017), among many others, have studied the high dimensional partially linear model, largely following the least squares approaches (Chen, 1988; Robinson, 1988; Speckman, 1988; Donald and Newey, 1994; Carroll et al., 1997; Fan and Huang, 2005; Cattaneo et al., 2017). For example, Zhu (Zhu, 2017) proposed to estimate β∗ through the following two-stage projection strategy: n h 1 X 2 i m = argmin Z − m (W ) + λ km k2 ; where Z = Y and Z = X for j 2 [p]; b j n ij e j i j;n e j Fj i0 i ij ij me j 2Fj i=1 n proj 1 X T 2 βb = (Vb0i − Vb β) + λnkβk1; where Vb0i = Yi − m0(Wi) and Vbij = Xij − mj(wi): n i b b i=1 Here fFj; 0 ≤ j ≤ pg are a series of pre-specified function classes, k·kFj is the associated norm, k·kq is the vector `q-norm, and [p] denotes the set of integers between 1 and p. In a related study, M¨uller and van de Geer (M¨ullerand van de Geer, 2015) proposed a penalized least squares approach: n LSE h 1 X T 2 2 2 i βb ; g := argmin Yi − Xi β − g(Xi) + λnkβk1 + µnkgkG : b p n e e β2R ;ge2G i=1 Here G is a pre-specified function class allowed to be of infinite dimension. Two tuning parameters λn; µn are employed to induce sparsity and smoothness separately. Both approaches mentioned above, exemplifying a large fraction of existing methods, often require a knowledge of some function classes, either G for penalized least squares estimators or fFj; 0 ≤ j ≤ pg for the projection approach. In this paper, we consider adaptive estimation and adopt an alternative approach, Honor´eand Powell's pairwise difference method (Honor´eand Powell, 2005) with an extra lasso-type penalty, for estimating β∗ . In order to motivate this procedure, for i; j 2 [n], write Yeij = Yi − Yj, Xeij = Xi − Xj, Wfij = Wi − Wj, and ueij = ui − uj. For avoiding trivial settings of discrete type W , hereafter we assume W to be absolutely continuous without loss of generality. The fact that T ∗ E[YeijjWfij = 0;Xi;Xj] = Xeijβ then implies ∗ h T 2 i β = argmin L0(β); where L0(β) := f (0)(Yeij − Xe β) j Wfij = 0 ; E Wfij ij β2Rp ∗ and f (0) stands for the density value of Wfij at 0. A natural estimator of β is Wfij n o βbhn = argmin Lbn(β; hn) + λnkβk1 : (1.2) β2Rp 2 Here hn is a specified bandwidth, λn is a tuning parameter to control the sparsity level, −1 n X 1 Wfij T 2 Lbn(β; hn) := K Yeij − Xe β (1.3) 2 h h ij i<j n n is an empirical approximation to L0(β), and K(·) is a nonnegative kernel function which will be specified later in Assumption5. When λn = 0, we recover the original Honor´eand Powell's estimator (Honor´eand Powell, 2005). When λn > 0, we obtain a sparse solution that is more suitable for tackling high dimensional data. Our analysis is under the triangular array setting where p = pn and s = sn are allowed to grow with n. Three targets are in order: (1) we aim to show that the pairwise difference approach enjoys the adaptivity property in the sense that our procedure does not rely on the knowledge of ∗ 2 smoothness of g(·), and as hn and λn appropriately scale with n, the estimation error kβbhn − β k2 can achieve sharp rates of convergence for a wide range of g(·) function classes of different degrees of smoothness. In addition, we will unveil an intrinsic tuning insensitivity property for the choice of bandwidth hn; (2) we will investigate the minimum sample size requirement for the pairwise difference approach; and (3) we will study the impact of heavy-tailed noise u on estimation. 1.1 Adaptivity of pairwise difference approach One major aim of this paper is to show the adaptivity of βbhn to the smoothness of function g(·). From the methodological perspective, no matter how smooth (complex) the g(·) function is, one does not need to tailor the pairwise difference approach. In addition, we will show that the bandwidth hn bears a tuning insensitive property (Sun and Zhang, 2012) in the sense that hn does not depend on the smoothness of function g(·). Hence, methodologically, the procedure can automatically adapt to the model. From the theoretical perspective, the adaptivity property of the pairwise difference approach ∗ 2 is shown via analyzing kβbhn − β k2 for a range of smooth function classes of g(·). To this end, we first construct a general characterization of the \smoothness" degree for g(·). In contrast to commonly used smoothness conditions, a new characterization of smoothness degree is introduced. Recall that Lbn(β; hn) is an empirical approximation of L0(β) as hn ! 0. Define ∗ βh = argmin ELbn(β; h) (1.4) β2Rp ∗ to be the minimum of a perturbed objective function of L0. Define β to be the minimum of ∗ L0. Intuitively, as h is closer to 0, Lh(β) := ELbn(β; h) is pointwise closer to L0(β), and hence βh shall converge to β∗. In Assumption 11, we characterize this rate of convergence. Specifically, it is assumed that there exist two positive absolute constants ζ and γ, such that for any h small enough, ∗ ∗ γ (general smoothness condition) kβh − β k2 ≤ ζh : (1.5) Condition (1.5) is related to Honor´eand Powell's condition on the existence of Taylor expansion of ∗ ∗ βh around β , i.e., D ∗ ∗ X ` −1=2 βh = β + b`h + o(n ); (1.6) `=1 3 where, when p is fixed, (1.6) implies (1.5) with γ = 1. However, compared to (1.6), (1.5) is more general and suitable when p = pn ! 1 as n ! 1, and can be verified for many function classes. To illustrate it, in Theorems 3.4 and 3.5, we calculate a lower bound of γ for a series of (discontinuous) piecewise α-H¨olderfunction classes. We are now ready to describe our main theoretical results. They show the adaptivity of βbhn ∗ 2 through the analysis of kβbhn − β k2 with either γ = 1 (corresponding to the \smooth" case) or γ < 1 (corresponding to the \non-smooth" case). In addition to Condition (1.5), we pose the following six commonly posed conditions on the choice of kernel and distribution of the model: (1) the kernel function K(·) satisfies several routinely required conditions (Assumption5); (2) W is not degenerate given X (Assumptions6); (3) conditioning on Wf12 = 0, Xe12 satisfies a restricted eigenvalue condition (Bickel et al., 2009; van de Geer and B¨uhlmann, 2009) (Assumption7); (4) Wf12 is informative near zero (Assumption8); (5) conditioning on any W = w, X is multivariate subgaussian, i.e., any linear contrast of X is subgaussian (Assumption9); (6) u is subgaussian, or as later studied, of finite variance (Assumptions 10 and 17 respectively).
Recommended publications
  • Instrumental Regression in Partially Linear Models
    10-167 Research Group: Econometrics and Statistics September, 2009 Instrumental Regression in Partially Linear Models JEAN-PIERRE FLORENS, JAN JOHANNES AND SEBASTIEN VAN BELLEGEM INSTRUMENTAL REGRESSION IN PARTIALLY LINEAR MODELS∗ Jean-Pierre Florens1 Jan Johannes2 Sebastien´ Van Bellegem3 First version: January 24, 2008. This version: September 10, 2009 Abstract We consider the semiparametric regression Xtβ+φ(Z) where β and φ( ) are unknown · slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametri- cally. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples. Keywords: Partially linear model, semiparametric regression, instrumental variables, endo- geneity, ill-posed inverse problem, Tikhonov regularization, root-N consistent estimation, semiparametric efficiency bound JEL classifications: Primary C14; secondary C30 ∗We are grateful to R.
    [Show full text]
  • Partially Linear Hazard Regression for Multivariate Survival Data
    Partially Linear Hazard Regression for Multivariate Survival Data Jianwen CAI, Jianqing FAN, Jiancheng JIANG, and Haibo ZHOU This article studies estimation of partially linear hazard regression models for multivariate survival data. A profile pseudo–partial likeli- hood estimation method is proposed under the marginal hazard model framework. The estimation on the parameters for the√ linear part is accomplished by maximization of a pseudo–partial likelihood profiled over the nonparametric part. This enables us to obtain n-consistent estimators of the parametric component. Asymptotic normality is obtained for the estimates of both the linear and nonlinear parts. The new technical challenge is that the nonparametric component is indirectly estimated through its integrated derivative function from a lo- cal polynomial fit. An algorithm of fast implementation of our proposed method is presented. Consistent standard error estimates using sandwich-type ideas are also developed, which facilitates inferences for the model. It is shown that the nonparametric component can be estimated as well as if the parametric components were known and the failure times within each subject were independent. Simulations are conducted to demonstrate the performance of the proposed method. A real dataset is analyzed to illustrate the proposed methodology. KEY WORDS: Local pseudo–partial likelihood; Marginal hazard model; Multivariate failure time; Partially linear; Profile pseudo–partial likelihood. 1. INTRODUCTION types of dependence that can be modeled, and model
    [Show full text]
  • Partially Linear Spatial Probit Models
    Partially Linear Spatial Probit Models Mohamed-Salem AHMED University of Lille, LEM-CNRS 9221 Lille, France [email protected] Sophie DABO INRIA-MODAL University of Lille LEM-CNRS 9221 Lille, France [email protected] Abstract A partially linear probit model for spatially dependent data is considered. A triangular array set- ting is used to cover various patterns of spatial data. Conditional spatial heteroscedasticity and non-identically distributed observations and a linear process for disturbances are assumed, allowing various spatial dependencies. The estimation procedure is a combination of a weighted likelihood and a generalized method of moments. The procedure first fixes the parametric components of the model and then estimates the non-parametric part using weighted likelihood; the obtained estimate is then used to construct a GMM parametric component estimate. The consistency and asymptotic distribution of the estimators are established under sufficient conditions. Some simulation experi- ments are provided to investigate the finite sample performance of the estimators. keyword: Binary choice model, GMM, non-parametric statistics, spatial econometrics, spatial statis- tics. Introduction Agriculture, economics, environmental sciences, urban systems, and epidemiology activities often utilize spatially dependent data. Therefore, modelling such activities requires one to find a type of correlation between some random variables in one location with other variables in neighbouring arXiv:1803.04142v1 [stat.ME] 12 Mar 2018 locations; see for instance Pinkse & Slade (1998). This is a significant feature of spatial data anal- ysis. Spatial/Econometrics statistics provides tools to perform such modelling. Many studies on spatial effects in statistics and econometrics using many diverse models have been published; see 1 Cressie (2015), Anselin (2010), Anselin (2013) and Arbia (2006) for a review.
    [Show full text]
  • Restoring the Individual Plaintiff to Tort Law by Rejecting •Ÿjunk Logicâ
    Maurice A. Deane School of Law at Hofstra University Scholarly Commons at Hofstra Law Hofstra Law Faculty Scholarship 2004 Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation Vern R. Walker Maurice A. Deane School of Law at Hofstra University Follow this and additional works at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship Recommended Citation Vern R. Walker, Restoring the Individual Plaintiff ot Tort Law by Rejecting ‘Junk Logic’ about Specific aC usation, 56 Ala. L. Rev. 381 (2004) Available at: https://scholarlycommons.law.hofstra.edu/faculty_scholarship/141 This Article is brought to you for free and open access by Scholarly Commons at Hofstra Law. It has been accepted for inclusion in Hofstra Law Faculty Scholarship by an authorized administrator of Scholarly Commons at Hofstra Law. For more information, please contact [email protected]. RESTORING THE INDIVIDUAL PLAINTIFF TO TORT LAW BY REJECTING "JUNK LOGIC" ABOUT SPECIFIC CAUSATION Vern R. Walker* INTRODUCTION .......................................................................................... 382 I. UNCERTANTIES AND WARRANT IN FINDING GENERAL CAUSATION: PROVIDING A MAJOR PREMISE FOR A DIRECT INFERENCE TO SPECIFIC CAUSATION .......................................................................... 386 A. Acceptable Measurement Uncertainty: Evaluating the Precisionand Accuracy of Classifications................................... 389 B. Acceptable Sampling Uncertainty: Evaluating the Population-Representativenessof
    [Show full text]
  • A Generalized Partially Linear Model of Asymmetric Volatility
    Journal of Empirical Finance 9 (2002) 287–319 www.elsevier.com/locate/econbase A generalized partially linear model of asymmetric volatility Guojun Wu a,*, Zhijie Xiao b,1 aUniversity of Michigan Business School, 701 Tappan Street, Ann Arbor, MI 48109, USA b185 Commerce West Building, University of Illinois at Urbana-Champaign, 1206 South Sixth Street, Champaign, IL 61820, USA Accepted 23 November 2001 Abstract In this paper we conduct a close examination of the relationship between return shocks and conditional volatility. We do so in a framework where the impact of return shocks on conditional volatility is specified as a general function and estimated nonparametrically using implied volatility data—the Market Volatility Index (VIX). This setup can provide a good description of the impact of return shocks on conditional volatility, and it appears that the news impact curves implied by the VIX data are useful in selecting ARCH specifications at the weekly frequency. We find that the Exponential ARCH model of Nelson [Econometrica 59 (1991) 347] is capable of capturing most of the asymmetric effect, when return shocks are relatively small. For large negative shocks, our nonparametric function points to larger increases in conditional volatility than those predicted by a standard EGARCH. Our empirical analysis further demonstrates that an EGARCH model with separate coefficients for large and small negative shocks is better able to capture the asymmetric effect. D 2002 Elsevier Science B.V. All rights reserved. JEL classification: G10; C14 Keywords: Asymmetric volatility; News impact curve * Corresponding author. Tel.: +1-734-936-3248; fax: +1-734-764-2555. E-mail addresses: [email protected] (G.
    [Show full text]
  • Goodness-Of-Fit Tests for Parametric Regression Models
    Goodness-of-Fit Tests for Parametric Regression Models Jianqing Fan and Li-Shan Huang Several new tests are proposed for examining the adequacy of a family of parametric models against large nonparametric alternatives. These tests formally check if the bias vector of residuals from parametric ts is negligible by using the adaptive Neyman test and other methods. The testing procedures formalize the traditional model diagnostic tools based on residual plots. We examine the rates of contiguous alternatives that can be detected consistently by the adaptive Neyman test. Applications of the procedures to the partially linear models are thoroughly discussed. Our simulation studies show that the new testing procedures are indeed powerful and omnibus. The power of the proposed tests is comparable to the F -test statistic even in the situations where the F test is known to be suitable and can be far more powerful than the F -test statistic in other situations. An application to testing linear models versus additive models is also discussed. KEY WORDS: Adaptive Neyman test; Contiguous alternatives; Partial linear model; Power; Wavelet thresholding. 1. INTRODUCTION independent observations from a population, Parametric linear models are frequently used to describe Y m4x5 ˜1 ˜ N 401‘ 251 (1.1) the association between a response variable and its predic- D C tors. The adequacy of such parametric ts often arises. Con- ventional methods rely on residual plots against tted values where x is a p-dimensional vector and m4 5 is a smooth regres- sion surface. Let f 4 1 ˆ5 be a given parametri¢ c family. The or a covariate variable to detect if there are any system- ¢ atic departures from zero in the residuals.
    [Show full text]
  • Estimation of Partially Linear Regression Model Under Partial
    Estimation of Partially Linear Regression Model under Partial Consistency Property Xia Cui,∗ Ying Lu† and Heng Peng ‡ Abstract In this paper, utilizing recent theoretical results in high dimensional statis- tical modeling, we propose a model-free yet computationally simple approach to estimate the partially linear model Y = Xβ + g(Z)+ ε. Motivated by the partial consistency phenomena, we propose to model g(Z) via incidental parameters. Based on partitioning the support of Z, a simple local average is used to estimate the response surface. The proposed method seeks to strike a balance between computation burden and efficiency of the estimators while minimizing model bias. Computationally this approach only involves least squares. We show that given the inconsistent estimator of g(Z), a root n arXiv:1401.2163v1 [stat.ME] 9 Jan 2014 consistent estimator of parametric component β of the partially linear model can be obtained with little cost in efficiency. Moreover, conditional on the β estimates, an optimal estimator of g(Z) can then be obtained using classic nonparametric methods. The statistical inference problem regarding β and a two-population nonparametric testing problem regarding g(Z) are consid- ered. Our results show that the behavior of test statistics are satisfactory. ∗School of Mathematics and Information Science, Guangzhou University, Guangzhou, China †Center for the Promotion of Research Involving Innovative Statistical Methodology, Steinhardt School of Culture, Education and Human Development,New York University, New York, USA ‡Department of Mathematics, Hong Kong Baptist University, Hong Kong 1 To assess the performance of our method in comparison with other methods, three simulation studies are conducted and a real dataset about risk factors of birth weights is analyzed.
    [Show full text]
  • Robust Estimates in Generalized Partially Linear Models (With Full Appendix)
    The Annals of Statistics 2006, Vol. 34, No. 6, 2856–2878 DOI: 10.1214/009053606000000858 c Institute of Mathematical Statistics, 2006 ROBUST ESTIMATES IN GENERALIZED PARTIALLY LINEAR MODELS1 By Graciela Boente, Xuming He and Jianhui Zhou Universidad de Buenos Aires and CONICET, University of Illinois at Urbana-Champaign and University of Virginia In this paper, we introduce a family of robust estimates for the parametric and nonparametric components under a generalized par- tially linear model, where the data are modeled by yi|(xi,ti) ∼ F (·, µi) T with µi = H(η(ti)+ xi β), for some known distribution function F and link function H. It is shown that the estimates of β are root-n consistent and asymptotically normal. Through a Monte Carlo study, the performance of these estimators is compared with that of the clas- sical ones. 1. Introduction. Semiparametric models contain both a parametric and a nonparametric component. Sometimes, the nonparametric component plays the role of a nuisance parameter. Much research has been done on estima- tors of the parametric component in a general framework, aiming to obtain asymptotically efficient estimators. The aim of this paper is to consider semi- parametric versions of the generalized linear models where the response y is to be predicted by covariates (x,t), where x Rp and t R. It will be assumed that the conditional distribution of y∈(x,t) belongs∈T to ⊂ the canonical exponential family exp[yθ(x,t) B(θ(x,t)) +|C(y)] for known functions B and C. Then µ(x,t)= E(y (x,t))− = B′(θ(x,t)), with B′ denoting the deriva- tive of B.
    [Show full text]
  • A Goodness-Of-Fit Test for Parametric and Semi-Parametric Models in Multiresponse Regression", Bernoulli (2009): 955-976, Doi: 10.3150/09-BEJ208
    Statistics Preprints Statistics 8-2006 A goodness-of-fit est t for parametric and semi- parametric models in multiresponse regression Song Xi Chen Iowa State University, [email protected] Ingrid Van Keilegom Universit´e catholique de Louvain Follow this and additional works at: http://lib.dr.iastate.edu/stat_las_preprints Part of the Statistics and Probability Commons Recommended Citation Chen, Song Xi and Van Keilegom, Ingrid, "A goodness-of-fit et st for parametric and semi-parametric models in multiresponse regression" (2006). Statistics Preprints. 46. http://lib.dr.iastate.edu/stat_las_preprints/46 This Article is brought to you for free and open access by the Statistics at Iowa State University Digital Repository. It has been accepted for inclusion in Statistics Preprints by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A goodness-of-fit est t for parametric and semi-parametric models in multiresponse regression Abstract We propose an empirical likelihood test that is able to test the goodness-of-fit of a class of parametric and semiparametric multiresponse regression models. The class includes as special cases fully parametric models, semiparametric models, like the multi-index and the partially linear models, and models with shape constraints. Another feature of the test is that it allows both the response variable and the covariate be multivariate, which means that multiple regression curves can be tested simultaneously. The et st also allows the presence of infinite dimensional nuisance functions in the model to be tested. It is shown that the empirical likelihood test statistic is asymptotically normally distributed under certain mild conditions and permits a wild bootstrap calibration.
    [Show full text]
  • Misclassification in Binary Choice Models with Sample Selection
    econometrics Article Misclassification in Binary Choice Models with Sample Selection Maria Felice Arezzo *,† and Giuseppina Guagnano † Department of Methods and Models for Economics, Territory and Finance - Sapienza University of Rome Via del Castro Laurenziano 9, 00161 Rome, Italy * Correspondence: [email protected]; Tel.: +39-06-49766424 † These authors contributed equally to this work. Received: 7 January 2019; Accepted: 17 July 2019; Published: 24 July 2019 Abstract: Most empirical work in the social sciences is based on observational data that are often both incomplete, and therefore unrepresentative of the population of interest, and affected by measurement errors. These problems are very well known in the literature and ad hoc procedures for parametric modeling have been proposed and developed for some time, in order to correct estimate’s bias and obtain consistent estimators. However, to our best knowledge, the aforementioned problems have not yet been jointly considered. We try to overcome this by proposing a parametric approach for the estimation of the probabilities of misclassification of a binary response variable by incorporating them in the likelihood of a binary choice model with sample selection. Keywords: misclassified dependent variable; sample selection bias; undeclared work 1. Introduction Most empirical work in the social sciences is based on observational data that are often incomplete, and therefore unrepresentative of the population of interest, and/or affected by measurement errors. There are many types of selection mechanisms that result in a non-random sample. Some of them are due to sample design, while others depend on the behavior of the units being sampled, other than non-response or attrition.
    [Show full text]
  • Robust Estimation and Wavelet Thresholding in Partially Linear Models Irène Gannaz
    Robust Estimation and Wavelet Thresholding in Partially Linear Models Irène Gannaz To cite this version: Irène Gannaz. Robust Estimation and Wavelet Thresholding in Partially Linear Models. Statistics and Computing, Springer Verlag (Germany), 2007, 17 (4), pp.293-310. 10.1007/s11222-007-9019-x. hal-00118237 HAL Id: hal-00118237 https://hal.archives-ouvertes.fr/hal-00118237 Submitted on 4 Dec 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ROBUST ESTIMATION AND WAVELET THRESHOLDING IN PARTIAL LINEAR MODELS Irène Gannaz Laboratoire de Modélisation et Calcul Université Joseph Fourier BP 53 - 38041 Grenoble Cedex 9 France November 2006 Abstract This paper is concerned with a semiparametric partially linear regression model with unknown regression coefficients, an unknown nonparametric function for the non-linear component, and unobservable Gaussian distributed random errors. We present a wavelet thresholding based estimation procedure to estimate the components of the partial linear model by establishing a connection between an l1-penalty based wavelet estimator of the nonparametric component and Huber’s M-estimation of a standard linear model with outliers. Some general results on the large sample properties of the estimates of both the parametric and the nonparametric part of the model are established.
    [Show full text]
  • Jurimetrics: the Methodology of Legal Inquiry
    JURIMETRICS: THE METHODOLOGY OF LEGAL INQUIRY LFE LoEviNGE* INTRODUCTION The terms "science" and "law" have both been used for so long by so many writers with such a variety of meanings, dear and unclear, that one who aspires to clarity or rigor of thought or expression might well hesitate to use either one. The lawyers are no more agreed on what constitutes "law"' than are the scientists on the meaning of "science."' Further, there have been many who claimed that law is a science, and it is still asserted by eminent scholars that jurisprudence is "the science of law.' 3 Exhaustive reading is not required to establish that there is neither an iuthoritative nor a generally agreed definition for any of the terms "jurisprudence," ''science" or "law." Nevertheless, each of these terms does designate an activity that is being conducted by an identifiable group of men. Lawyers and judges are engaged in practicing law and adjudicating. There are physicists, chemists, biologists, anthropologists, psycholo- gists, and a host of others, engaged in activities that are universally recognized as science. And numerous professors, joined by an occasional eccentric lawyer, are engaged in writing articles and books that are either labelled or indexed as "juris- prudence." Without undertaking either an exhaustive or definitive analysis of the activities of these groups, the general nature of their respective activities is fairly evident. Lawyers and judges generally are engaged in seeking to apply the principles or analogies of cases, statutes, and regulations to new situations. Scientists generally are engaged in collecting experimental and statistical data and in analyzing them mathematically.
    [Show full text]