Recovery of the Regression Functions and Distributions in Some Incomplete Models
Total Page:16
File Type:pdf, Size:1020Kb
Graduate Theses, Dissertations, and Problem Reports 2016 Recovery of the Regression Functions and Distributions in some Incomplete Models Broti Garai Follow this and additional works at: https://researchrepository.wvu.edu/etd Recommended Citation Garai, Broti, "Recovery of the Regression Functions and Distributions in some Incomplete Models" (2016). Graduate Theses, Dissertations, and Problem Reports. 5648. https://researchrepository.wvu.edu/etd/5648 This Dissertation is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Dissertation in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Dissertation has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. Recovery of the Regression Functions and Distributions in some Incomplete Models. Broti Garai Dissertation submitted to the Eberly College of Arts and Sciences at West Virginia University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computational Statistics Robert Mnatsakanov, Ph.D., Chair Michael Mays, Ph.D. E. James Harner, Ph.D. Krzysztof Chris Ciesielski, Ph.D. Erin Leatherman, Ph.D. Department of Statistics Morgantown, WV 2016 Keywords: Moment-determinate functions; Laplace transform inversion; Nonparametric estimation Copyright 2016 Broti Garai Abstract Recovery of the regression functions and distributions in some indirect models Broti Garai In this research, we propose formulas for recovering the regression function based on the product moments as well as recovering the dis- tributions and derivative function in some indirect models. The upper bounds for the uniform rate of approximations are also derived. For regression functions two cases where the support of underlying func- tions is bounded and unbounded from above are studied. Based on the proposed approximations, new types of nonparametric estimates of the cumulative distribution, the density functions and the derivative func- tions in multiplicative-censoring model, as well as the approximations of conditional variance, are constructed. The suggested approach is also applied in the demixing problem and the constrained deconvolu- tion problem, as well as for recovery of a distribution for unknown finite support. The absolute and mean squared errors of correspond- ing estimates are investigated as well. A simulation study justifies the consistency of the proposed estimates. Acknowledgement I would like to express my special appreciation and thanks to Dr. Robert Mnat- sakanov, I am extremely fortunate for having him as my advisor. I would like to thank him for all the encouragement during my research and for allowing me to grow as a researcher. It was he who provided proper direction and guidance for the completion of the dissertation. I would also like to thank my committee members, Dr. Michael Mays, Dr. E. James Harner, Dr. Chris Ciesielski and Dr. Erin Leatherman for their thoughtful in- put, cooperation, expertise and patience throughout the entire phase of this research. I would also like to express my sincere gratitude to Dr. Robert Mnatsakanov, Dr. Mark Culp, Dr. Arun Ross and the Department of Statistics for proving me financial support throughout my Masters and Ph.D. I want to especially thank Mrs. Betty Wearden and the Department of Statistics for the Stanley Wearden Memorial Graduate Endowment Scholarship. I would also like to extend my thanks to Dr. Huey Minn Lee who gave me the opportunity to instruct a course and Barbara Dailey for all her efforts in solving our problem with Mathematica. And talking about Morgantown: I have couple of special friends to thank, Dibyad- huti, Madhurima, Eepsa, Mimi, Susmita, Arnab, Sreeparna, Abhik, little Adi and little Ashmit who were a good support for me and important for all the fun I had in Morgantown. I also want to specially thank Dibyadhuti for giving his useful feedback and corrections during my dissertation writing. Lastly, I would like to thank my family for all their love and encouragement. And most of all to my loving, supportive, and patient husband Prithish whose constant support and encouragement from the beginning to the final stages of this Ph.D. is so appreciated. iii Contents 1 Introduction1 1.1 Regression Setup...........................3 1.2 Multiplicative-censoring model....................5 1.3 Other Incomplete Models.......................6 1.4 Recovery of Density and Distribution Function...........7 1.5 Organization of Dissertation.....................9 2 Approximation of Regression Function 11 2.1 Distribution of X is known...................... 11 2.1.1 Model 1: X ∼ U(0;1) .................... 12 2.1.2 Model 2: X ∼ f , f is known................. 18 2.2 Distribution of X is unknown..................... 26 3 Estimation of Regression Functions 33 3.1 Estimation of r(x);x 2 [0;1]: Distribution of X known........ 34 3.2 Estimation of r(x); x 2 R+: Distribution of X unknown....... 36 4 Recovery of Derivative Function with Application 42 4.1 Approximating the derivative function................ 42 4.2 Multiplicative-censoring model.................... 45 5 Reconstruction of Distributions in other Incomplete Models 56 5.1 Demixing Problem.......................... 57 5.1.1 Binomial Mixture....................... 57 5.1.2 Negative Binomial Mixture.................. 59 5.1.3 Exponential Mixture..................... 61 5.2 Constrained Deconvolution Problem................. 64 5.3 Recovery of Distribution with Unknown Support.......... 67 6 Conclusion and Future Work 71 Bibliography 74 iv List of Figures 2.1 Approximated regression curves (blue dots) (a): ra;1 with a = 30; and (b): era;1 with a = 15. In both plots the true regression function is r(x) = (2x + 3)=12 (orange curve).................. 17 2 2 2.2 (a) sa;1(x) (blue dots) when a = 30 and (b) sea;1(x) (blue dots) when a = 15 in both plots. The true conditional variance s 2(x) is dis- played as well (orange)......................... 17 2.3 Errors of approximation in (a) sup-norm and (b) L2-norm for ra;1, when a 2 [30;230].......................... 18 2.4 Approximated regression functions (blue dots) (a) ra;2 with a = 40 and (b) era;2 with a = 20. The true regression r(x) = 2(1 − x)=5 (or- ange curve)............................... 22 2.5 Plots of the approximated conditional variances (blue dots) in (a): 2 2 sa;2(x) and (b): sea;2(x) when a = a2 = 30............. 23 2.6 The error of approximations in (a) sup-norm and (b) L2-norm for a ranging from 30 to 230......................... 23 2.7 Approximated curve (blue) (a) ra;2 and (b) ra;2 along with the true Tx e regression r(x) = 2 , when a = a2 = 60............... 25 2 2 2.8 (a) sa;2(x) and (b) sea;2(x) (both blue) represents the approximated conditional variance, when a = a2 = 60 and T = 3.......... 25 2.9 The errors of approximations in ra;2 in (a) sup-norm and (b) L2-norm between when a 2 [30;230]..................... 26 2.10 Approximated (a) ra;b and (b) era;b (blue) of the true regression r(x) = 2 2 (2x + 8)=(x + 2) (orange) when a = a2 = 40, and b = 1:15.... 28 2 2 2.11 (a) sa;b(x) and (b) sea;b(x) represents the approximated conditional variance, when a = a2 = 40, and b = 1:15.............. 29 2.12 The (a) sup-norm distance and (b) L2-norm distance for ra;b, when a 2 [30; 230] and b 2 [1:1; 2:4]................... 29 2.13 The approximated (a) ra;b, (b) era;b and the true regression function 1 q −l x r(x) = + (1 − 2e 1 ) (orange) are displayed. Here a = a2 = l2 2l2 30 and b = 1:2............................. 30 v 2 2 2.14 (a) sa;b(x) and (b) sea;b(x) represents the approximated conditional variance, when a = a2 = 30, and b = 1:2............... 31 2.15 The (a) sup-norm distance and (b) L2-norm distance when a ranges from 30 to 230 and b ranges from 1.1 to 2.5............. 31 (2x+3) 3.1 bra;1 (blue) and r(x) = 12 (orange) when a = 30.......... 35 b(1−x) 3.2 bra;2 (blue) and r(x) = b+c (orange) when a = 30.......... 36 1 q −l x 3.3 Estimated r ;b (blue) of the true regression r(x) = + (1 − 2e X ) ba lY 2lY (orange) when a = 30 and b = 1:21. ................... 38 3.4 Estimated bra (red), brh (blue) and bra;b (green) of the true regression r(x) = (x − 1:5)2 (black) when a = 100 and b = 1:52........ 39 3.5 Estimated bra (red), brh (blue) and bra;b (green) of the true regression 1 r(x) = x2 (black) when a = 100 and b = 3:515............ 40 0 4.1 Approximations (the dotted curves) (a): g a when a = 50; and (b): 0 0 gea when a2 = 50. In both plots the true g (y) = 60y(1 − y)(1 − 2y). 45 4.2 Estimated Fba;n (dotted curve) (a): when n = 500;a = 35; and (b): when n = 1000;a = 30. In both plots the true F (continuous curve) represents the cdf of a Beta(3;3) distribution............. 50 4.3 Estimated fba;n (dotted curve) (a): when n = 1000; a = 25; and (b): when n = 1500; a = 30. Here X ∼ Beta(3;3)............ 50 4.4 Estimated fb0(x)(dotted curve) (a): when n = 3000; a = 25; and (b): when n = 5000; a = 30.