Docworks PDF 600
Total Page:16
File Type:pdf, Size:1020Kb
Minimax estimation in regression and random censorship models Citation for published version (APA): Belitser, E. N. (2000). Minimax estimation in regression and random censorship models. (CWI tracts; Vol. 127). Centrum voor Wiskunde en Informatica. Document status and date: Published: 01/01/2000 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 29. Sep. 2021 Minimax estimation in regression and random censorship models E.N. Belitser 1991 Mathematics Subject Classification: 62G07 (Curve estimation: nonparametric re gression, density estimation, etc.), 62G05 (Estimation), 62G20 (Asymptotic properties). ISBN 90 6196 488 1 NUGI-code: 811 Copyright ©2000, Stichting Mathematisch Centrum, Amsterdam Printed in the Netherlands CWI Tracts Managing Editors A.M.H. Gerards (CWI. Amsterdam) M. Hazewinkel (CWI, Amsterdam) J.W. Klop (CWI, Amsterdam) N.M. Temme (CWI. Amsterdam) Executive Editor M. Bakker (CWI Amsterdam, e-mail: Miente [email protected]) Editorial Board W. Albers (Enschede) K.R. Apt (Amsterdam) M.S. Keane (Amsterdam) P.W.H. Lemmens (Utrecht) J.K. Lenstra (Eindhoven) M. van der Put (Groningen) A.J. van der Schaft (Enschede) J.M. Schumacher (Tilburg) H.J. Sips (Delft, Amsterdam) M.N. Spijker (Leiden) H.C. Tijms (Amsterdam) CWI P.O. Box 94079, 1090 GB Amsterdam, The Netherlands Telephone + 31-20 592 9333 Telefax + 31-20 592 4199 WWW page http://www. cwi .nl/publications...bibl/ CWI is the nationally funded Dutch institute for research in Mathematics and Computer Science. Contents 1 Preliminaries 3 1.1 Nonparametric minimax estimation . 3 1.2 Minimax estimation: a brief survey . 15 1.3 Scope .............. 18 2 Minimax filtering over ellipsoids 23 2.1 "Coloured" Gaussian noise model . 24 2.2 Minimax linear estimation . 25 2.3 Asymptotically minimax estimation 29 2.4 Examples . 33 2.5 Proofs ........ 39 2.6 Bibliographic remarks 47 3 Minimax nonparametric regression 49 3.1 The model ..... 50 3.2 Minimax consistency 51 3.3 Main results . 56 3.4 Examples . 61 3.5 Proofs ....... 68 3.6 Bibliographic remarks 76 4 Efficient density estimation with censored data 79 4.1 Introduction ......... 79 4.2 Definitions and main results . 81 4.3 Auxiliary results .............. 89 4.4 Preliminaries: the Kaplan-Meier estimator . 95 4.5 Approximation Lemma. 101 4.6 Proofs of Theorems .. 105 4. 7 Bibliographic remarks . 115 1 2 Contents A Appendix 117 A.1 A technical lemma ................ 117 A.2 The van 'frees inequality . 119 A.3 An approximation of the Kaplan-Meier estimator . 121 Bibliography 123 Chapter 1 Preliminaries 1.1 Nonparametric minimax estimation Fisher's works in the 1920s laid the foundations for statistics to become a separate discipline of mathematics. During the last 50 years, a large part of statistics has finally been incorporated into the rigid framework of theoretical mathematics, primarily through the elegant use of measure theoretic concepts. At the same time estimation theory has grown, from a mathematical technique established in 1806 with the first publication on the least squares estimator by Legendre, to become an independent topic in statistics. In probability theory a random phenomenon is described by a proba bility space (n, A, P). The measurable space (n, A) gives its qualitative and the measure P its quantitative descriptions respectively. In the the ory of probability, the underlying probability space (0, A, P) is assumed to be predetermined and one studies its properties. In statistics one deals with the converse situation. That is to say, one tries to retrieve certain characteristics of the unknown probability space on the basis of some observed properties. Observation is one of the fundamental notions in statistics. The ob servations may be a sample of real valued random variables, a stochastic process or of some other nature obtained as a result of a statistical exper iment. The general statistical estimation problem is to gain information about some features of the underlying probability measure, using the observed data. Mathematically, the observations are usually interpreted as a sample of random elements X1, X2, · · ·, Xn from the probability distribution P 3 4 Chapter 1. Preliminaries on a measurable space (X, B). Let X be some metric space and B be its Borel a-algebra. Another ingredient of a statistical estimation problem is the following formalization of prior knowledge about the distribution P-one thinks of Pas ranging over P, a class of distributions on (X, B). The class Pis assumed to be known to the statistician and is, in fact, the statistical model. Depending on how "big" the class P is, one can speak of parametric or nonparametric models. Recently, a class of semipara metric models was recognized as intermediate between parametric and nonparametric models (see van der Vaart (1988), Bickel et al. (1993)). For a long time parametric modeling has been a subject of investiga tion. The results that are developed are applied to the problem of fitting probability laws to data. A parametric model is usually described by assuming that the family of distributions P can be parameterized and represented in the form P = {Po, e E 8}, where 8 is a subset of a Euclidean space. Thus, the problem of retrieving information about Po is equivalent to the problem of retrieving information about parameter e. A disadvantage of parametric modeling is that prior information about the underlying probability law is often more vague than any para metric family would allow - parametric families are too specific, or "nar row". Therefore, parametric modeling may in general not be robust in the sense that a slight contamination of the data might lead to wrong conclusions. Moreover, the data might be of such a type that there is no suitable parametric family that gives a good fit. Under these cir cumstances, nonparametric modeling can serve as a good alternative. Broadly speaking, nonparametric models are those that are character ized only by a qualitative description of the class P. A way to describe a nonparametric model is to assume that P = {Po, e E 8}, where 8 is a subset of an infinite dimensional space. The first paper in the area of nonparametric density estimation is due to Rosenblatt (1956). Since then, a large amount of literature dedi cated to methods for estimating infinite dimensional objects - densities, regression functions, spectral densities, distribution functions, failure rates, images etc. - has appeared. It should be mentioned though that the change-over from parametric to nonparametric modeling has pro duced a side effect - the theory of nonparametric estimation still lacks coherence and generality: " .. .instead of a single, natural minimax theo rem, there is a whole forest of results growing in various and sometimes conflicting directions ... " (Donoho et al. (1995)). 1.1 Nonparametric minimax estimation 5 Traditionally, two types of statistical estimation problem are rec ognized: so called regular problems and irregular (or nonregular). By regular problems one understands conventionally problems when one wants to estimate a "smooth" functional of the underlying distribution. Standard examples of regular estimation problems include estimation of the distribution function, mean and median. For other applications, one can refer to, among others, van der Vaart (1988), (1991), Groeneboom and Wellner (1992), Bickel et al. (1993), Groeneboom (1996). A typical feature of regular estimation problems is that yin-consistent estimation is possible, where n denotes sample size. Usually the notion of a regular estimation problem is associated with differentiability of the functional that is to be estimated (see Koshevnik and Levit (1976), Levit (1978), Pfanzagl (1982), van der Vaart (1991)). Another beneficial feature of regular models is that a unified and relatively simple treatment of asymp totic lower bounds in estimating a differentiable functional is possible. The construction of asymptotically exact lower bounds for various risks is essentially implemented by classical methods. By nonregular problems one understands usually all problems of es timating a functional (of the underlying distribution) of interest which is not differentiable. Common examples of nonregular estimation problems are density estimation and regression estimation problems. In contrast to regular models, estimation theory in nonregular models is more com plicated and more varied. There is no single general theorem describing lower bounds for the minimax risk (a measure of the complexity of the estimation problem) and optimal estimators.