Semiparametric Efficiency

Faculteit Wetenschappen Vakgroep Toegepaste Wiskunde en Informatica Semiparametric Efficiency Karel Vermeulen Prof. Dr. S. Vansteelandt Proefschrift ingediend tot het behalen van de graad van Master in de Wiskunde, afstudeerrichting Toegepaste Wiskunde Academiejaar 2010-2011 To my parents, my brother Lukas To my best friend Sara A mathematical theory is not to be considered complete until you have made it so clear that you can explain it to the first man whom you meet on the street::: David Hilbert Preface My interest in semiparametric theory awoke several years ago, two and a half to be precise. That time, I was in my third year of mathematics. I had to choose a subject for my Bachelor thesis. My decision was: A geometrical approach to the asymptotic efficiency of estimators, based on the monograph by Anastasios Tsiatis, [35], under the supervision of Prof. Dr. Stijn Vansteelandt. In this manner, I entered the world of semiparametric efficiency. However, at that point I did not know it was just the beginning. Shortly after I wrote this Bachelor thesis, Prof. Dr. Stijn Vansteelandt asked me to be involved in research on semiparametric inference for so called probabilistic index models, in the context of a one month student job. Under his guidance, I applied semiparametric estimation theory to this setting and obtained the semiparametric efficiency bound to which efficiency of estimators in this model can be measured. Some results of this research are contained within this thesis. While short, this experience really convinced me I wanted to write a thesis in semiparametric efficiency. That feeling was only more encouraged after following the course Causality and Missing Data, taught by Prof. Dr. Stijn Vansteelandt. This course showed me how semiparametric estimators are used in real-life applications. Hence, my decision was made: I really wanted to write a thesis in semiparametric efficiency and after some meetings with Prof. Dr. Stijn Vansteelandt, my decision became reality. During the month August of 2010, I had several talks with Prof. Dr. Stijn Vansteelandt about which monographs and articles we should use, since there is so much information available. Luckily, semiparametric theory is one of the interests of Prof. Dr. Stijn Vansteelandt, so he really could guide me through the great amount of available information. Shortly after these talks, I started with studying the monograph by Aad van der Vaart, [40]. It was really tough to go through this manuscript and in addition, understanding everything that was written down. It gave me some uneasiness since I really lost track between all the abstract definitions and I did not quite understand the relation with the monograph by Anastasios Tsiatis. This feeling somewhat gave the structure to my thesis: studying both theories in much detail and pointing out the relations and differences. That is why my thesis is partitioned into different parts, Part II describes the theory as presented in the monograph by Anastasios Tsiatis and Part III describes the theory as presented in the monograph by Aad van der Vaart. Part I is primarily dedicated to the basics about functional analysis, on which the semiparametric theory heavily relies. As I went through the literature, these relations and differences between both approaches to semiparametric efficiency became clear to me. While the abstract theory is developed in Part III, the relations and differences with Part II are indicated and it shows that Part III is a proper generalization of Part II. Some hints were given in the landmark paper by Whitney Newey, [27]. In my opinion, this paper is a mix between both theories and actually a nice transition from the geometrical approach to semiparametric efficiency and the more abstract approach to semiparametric efficiency, although the abstract approach is also of a quite geometrical nature. It was my experience that the literature is often so difficult to understand, it is not attractive i ii Preface anymore. Henceforth, this thesis is also written with the intention that for those interested in semiparametric theory and just entering the magnificient world of semiparametric models and estimation, they have a nice document available that explains the theory in great detail without being too complicated or obscure, although I hope. Finally, I also want to use this opportunity to thank everyone who by some means or other has contributed to this thesis. • First and most importantly, I want to thank my promoter Stijn. He made it possible for me to write this thesis and stood by my side while I was entering the world of semiparametric efficiency. I want to thank Stijn for all the hours we spent in his office discussing all the questions I had, pointing me at my mistakes, making improvements and sharing his ideas with me. I also want to thank him for all the opportunities he gave me during the past five years. He really made my interest for statistics grow to pleasant heights. In summary, without Stijn it would not have been possible to write this thesis and it was a great time to write it under his supervision. In addition, I cannot believe how much I learned during the last year due to Stijn. I cannot imagine a better coach than Stijn. • Next, I want to thank my parents, Ann and Philip, who let me choose my studies inde- pendently and supported me unconditionally during the past five years. They gave me the great opportunity to study mathematics in Ghent. Especially, I want to thank my mother Ann for always listening to my (to her) boring and tedious explanations whenever I was excited about the insights I obtained in the semiparametric theory. She always tried to listen with full attention, even though she did not understand a thing of what I was saying. I also want to thank my little brother Lukas who always made me laugh with his funny jokes and remarks about my passion for mathematics. • Another crucial person I want to thank is my best friend and study mate, Sara. Writing a thesis involves a lot of solitude. Luckily, I could rely on her presence during the dark winter months as she was at my place for days where we worked together on our own theses. Sara was (and still is) a great refuge when I was a bit down because writing this thesis was so difficult and sometimes I just wanted to give up. I cannot imagine how this year would have been without her presence and support. In addition, she gave me a lot of advice for the difficulties I encountered when typing my thesis in LATEX and the hints she gave me for making nice figures to illustrate the theory with. • It may sound a bit funny, but implicitly I also want to thank Lady Gaga for her music that really helped me to relax during the periods I was really stressed. Her music helped me to escape from reality sometimes when I needed it. • I also want to thank the rest of my family, other study mates and friends. Especially, my sister Annelies, my niece Helena, my aunt Greta and my friends Machteld, Catherine, Lotte, Wendy, Elien, Xavier, Lieselot, Bert and many others. Further, the intellectual progress I made during the past five years would not have been possible without the educational staff of the Univeristy of Ghent, so many thanks to those as well. • Many thanks to Jan also for reading some parts of this thesis and pointing me to some typographical matters. Karel Vermeulen May 2011 Ghent Toelating tot Bruikleen De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en delen van de masterproef te kopiërenvoor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef. Karel Vermeulen Mei 2011 iii iv Toelating tot Bruikleen Nederlandstalige Samenvatting Inleiding Parametrische modellen, ruwweg modellen beschreven door een eindig-dimensionale parameter, en schattingstheorie voor parametrische modellen kregen in de geschiedenis al veel aandacht in de literatuur. De meest gekende statistische methoden zijn parametrisch. Een heel belangrijk resultaat is de asymptotische normaliteit en de asymptotische efficiëntie van de maximum like- lihood schatter (MLE). Dit resultaat vindt zijn oorsprong in het werk van Fisher in de jaren 1920. Een ander fundamenteel resultaat in de parametrische schattingstheorie is de Cramér- Rao ondergrens voor onvertekende schatters. Deze ondergrens vindt zijn origine in het werk van Craméren Rao in de jaren 1940. We kunnen dus besluiten dat er heel wat theorie beschikbaar is wanneer we parameters in parametrische modellen wensen te schatten. In het tweede deel van de vorige eeuw werden er echter nieuwe types modellen ge¨ıntroduceerd: semiparametrische modellen. Deze kunnen gezien worden als het intermediare geval tussen parametrische modellen en niet-parametrische modellen. Ruwweg kunnen we stellen dat semiparametrische modellen, modellen zijn die worden beschreven door een parameter die zowel een eindig- als een oneindig-dimensionaal deel bevat. Het eindig-dimensionale stuk (of althans een deel ervan) is dan veelal de parameter waarin we ge¨ınteresseerd zijn en het oneindig-dimensionale deel wordt veelal de nuisance parameter genoemd. Slechts in de laatste decennia kregen semiparametrische modellen toenemende aandacht. Deze aandacht werd vooral gemotiveerd door model misspecificatie. De semiparametrische aanpak van model misspecificatie is door bepaalde delen van de gezamenlijke dichtheidsfunctie die het model beschrijft volledig onbekend te laten. Door de parameterruimte oneindig-dimensionaal te houden leggen we veel minder restricties op die onze geobserveerde data kan hebben. Een gevolg hiervan is dat oplossingen, indien ze bestaan en redelijk zijn, een grotere toepasbaarheid hebben. Semiparametrische modellen zijn een belangrijke meerwaarde ten opzichte van volledig niet-parametrische modellen want semiparametrische modellen zullen een betere performantie hebben bij kleine hoeveelheden data in vergelijking met niet-parametrische modellen, die eerder een slechte performantie hebben bij kleine hoeveelheden data.

Semiparametric Efficiency

Instrumental Regression in Partially Linear Models

Arxiv:1609.06421V4 [Math.ST] 26 Sep 2019 Semiparametric Identification and Fisher Information∗

Efficient Estimation in Single Index Models Through Smoothing Splines

Introduction to Empirical Processes and Semiparametric Inference1

Variable Selection in Semiparametric Regression Modeling

An Averaging Estimator for Two Step M Estimation in Semiparametric Models

Robust Estimates in Generalized Partially Linear Models (With Full Appendix)

Semiparametric Theory, Using As a Running Example the Common Problem of Estimating an Average Causal Eﬀect

Semiparametric Regression in Stata

Semiparametric Regression Pursuit

Parametric Versus Semi and Nonparametric Regression Models

Semiparametric Inference and Models