Efficient Nonparametric Inference for Discretely Observed Compound

Efficient nonparametric inference for discretely observed compound Poisson processes Alberto JesúsCoca Cabrero Darwin College University of Cambridge A thesis submitted for the degree of Doctor of Philosophy August 2016 Abstract Compound Poisson processes are the textbook example of pure jump stochastic processes and the building blocks of Lévyprocesses. They have three defining parameters: the distribution of the jumps, the intensity driving the frequency at which these occur, and the drift. They are used in numerous applications and, hence, statistical inference on them is of great interest. In particular, nonparametric estimation is increasingly popular for its generality and reduction of misspecification issues. In many applications, the underlying process is not observed directly but at discrete times. Therefore, important information is missed between observations and we face a (non-linear) inverse problem. Using the intimate relationship between Lévyprocesses and infinite divisible distributions, we construct new estimators of the jump distribution and of the so-called Lévy distribution. Under mild assumptions, we prove Donsker theorems for both (i.e. functional central limit theorems with the uniform norm) and identify the limiting Gaussian processes. This allows us to conclude that our estimators are efficient, or optimal from an information theory point of view, and to give new insight into the topic of efficiency in this and related problems. We allow the jump distribution to potentially have a discrete component and include a novel way of estimating the mass function using a kernel estimator. We also construct new estimators of the intensity and of the drift, and show joint asymptotic normality of all the estimators. Many relevant inference procedures are derived, including confidence regions, goodness-of-fit tests, two-sample tests and tests for the presence of discrete and absolutely continuous jump components. In related literature, two apparently different approaches have been taken: a natural direct approach, and the spectral approach we use. We show that these are formally equiv- alent and that the existing estimators are very close relatives of each other. However, those from the first approach can only be used in small compact intervals in the positive real line whilst ours work on the whole real line and, furthermore, are the first to be efficient. We describe how the former can attain efficiency and propose several open problems not yet identified in the field. We also include an exhaustive simulation study of our and other estimators in which we illustrate their behaviour in a number of realistic situations and their suitability for each of them. This type of study cannot be found in existing literature and provides several insights not yet pointed out and solid understanding of the practical side of the problem on which real-data studies can be based. The implementation of all the estimators is discussed in detail and practical recommendations are given. iii iv A mi familia v vi Acknowledgements I am all too aware that the completion of my doctoral studies would not have been possible were it not for the generous support of so many. Herein, I express my deep gratitude to all who have contributed to making this journey a unique, meaningful and life-changing experience. Firstly, I wish to thank my doctoral supervisors |Richard Nickl and L. Christopher G. Rogers. To Richard, my principal supervisor, because he not only introduced me to the fascinating world of nonparametric statistics but his passion for the subject has enthused me and his mentoring has opened up the academic world; I am only so honoured I will continue having the opportunity to learn from his immense mathematical knowledge after the Ph.D in a Post Doc with him. I wish too to thank Chris for sharing his exceptional insights into some challenging probability open problems, as well as for his dedication and personal support. I would like to express my gratitude to the CCA and its directors for putting their trust in me when they accepted me onto what has been a fantastic Ph.D. programme and to the CCA administrators for their continued support; I am especially thankful to CCA director James Norris for acting as internal examiner in my defence. I am similarly grateful to Markus Reiß for acting as external examiner; the defence was a truly enriching experience in which a number of exciting future lines of research arose. I am also very much indebted to Markus for his hospitality during two visits to Berlin. I consider myself most fortunate for the rest of the people who have walked this journey with me and who have provided me with a firm personal support of incalculable value. My utmost gratitude must go to my family; especially to my parents, siblings, and grandmothers. They have loved me unconditionally throughout by being the pillar on which I could always rely to rest on and by showing boundless generosity. Most especially, to my parents, JesúsCoca Grad´ınand PurificaciónCabrero Maroto, who have taught me the most important lessons in life and without which I would not have been able to succeed in this enterprise. A few lines herein would never suffice to thank you enough: every achievement of mine is yours. I am most grateful to my beloved friends; unfortunately, I do not have enough space to thank them all. To Alexandra Gandini who, after my family, has been my greatest influence, both personally and professionally; I wish you and your family the very best. To my childhood friends, Jorge Dávila,Javier González-Adalid,Clara López and Robert vii Stocks, who still are my main confidants and with whom my friendship only grows with time despite the distance. To my friends from Cambridge colleges, Erik van Bergen, Mathew Edwards, Douwe Kiela and Kattria van der Ploeg, who have made these years be truly outstanding, not only through memorable experiences but also through the highly stimulating academic discussions I was fortunate to have with them. To Simon Walsh, an exceptional friend and an inspirational man who keeps challenging me intellectually and who was always there in tough times. To the mathematicians, Milana Gatarić,Gil Ramos and Martin Taylor, for their sincere friendships, for the exciting mathematical discussions we have had and for the invaluable advice they have provided me with to navigate through the academic world. And to Julia López Canelada, for her immeasurable patience and generous support whilst writing this thesis, a truly demanding period that she illuminated with her everlasting happiness and radiant smile. Thank you all for giving me the opportunity to share the joy of life and knowledge with you and I can only hope to be honoured to continue doing so in the near future. I wish also to recognise the generosity and support I received from many organisations during the course of my Ph.D. studies: Fundación\La Caixa", EPSRC, Fundación Mutua Madrile~na,Access to Learning Fund, Cambridge Philosophical Society, Lundgren Fund and Darwin College. And lastly, to luck, for all the above and much more: please, receive my most profound gratitude. viii Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University of similar institution. ix x Contents Abstract iii Acknowledgements vii Declaration ix Contents xiii List of Notation 1 1 Introduction 3 1.1 Compound Poisson processes . 3 1.2 Why nonparametric estimation? . 7 1.3 Estimation of compound Poisson processes from continuous observations . 8 1.3.1 The estimation problem . 8 1.3.2 A glimpse of empirical process theory and of asymptotic efficiency 9 1.4 Estimation of compound Poisson processes from discrete observations . 12 1.4.1 The estimation problem . 12 1.4.2 A brief history of the solution to the problem . 14 1.5 Organisation and contributions of the thesis . 17 1.5.1 Chapter 2 . 17 1.5.2 Chapter 3 . 17 1.5.3 Chapter 4 . 18 2 Unifying existing literature 19 2.1 The direct approach . 19 2.1.1 Compound distributions . 20 2.1.2 Nonparametric estimation with the direct approach . 20 2.2 The spectral approach . 27 2.2.1 Lévyprocesses . 28 2.2.2 Nonparametric estimation with the spectral approach . 29 2.3 Unifying the two approaches . 39 xi CONTENTS 2.3.1 The formal equivalence . 39 2.3.2 Similarities between existing estimators . 41 2.4 Estimators of the intensity and the drift . 43 2.4.1 Estimating the intensity . 43 2.4.2 Estimating the drift . 47 2.5 Estimators of the process . 50 2.6 Asymptotic efficiency of the estimators . 53 3 Theory 61 3.1 Definitions and notation . 61 3.2 Assumptions and estimators . 62 3.3 Central limit theorems . 67 3.4 Proofs . 72 3.4.1 Joint convergence of one-dimensional distributions . 72 3.4.2 Proof of Propositions 3.3.1 and 3.3.3 . 92 3.4.3 Proof of Theorem 3.3.4 . 92 3.4.3.1 Estimation of N ....................... 92 3.4.3.2 Estimation of F ....................... 98 3.4.4 Proof of Theorem 3.3.5 . 102 3.4.5 Proof of Lemma 3.3.2 . 105 4 Applications 107 4.1 Introduction .

Load more