Quantile Regression with Rank-Based Samples
Total Page:16
File Type:pdf, Size:1020Kb
Quantile Regression with Rank-Based Samples by Olawale Fatai Ayilara A Thesis submitted to the Faculty of Graduate Studies of The University of Manitoba in partial fulfilment of the requirements of the degree of Master of Science Department of Statistics University of Manitoba Winnipeg Copyright c 2016 by Olawale Fatai Ayilara Abstract Quantile Regression, as introduced by Koenker, R. and Bassett, G. (1978), provides a complete picture of the relationship between the response variable and covariates by estimating a family of conditional quantile functions. Also, it offers a natural solution to challenges such as; homoscedasticity and sometimes unrealistic normality assumption in the usual conditional mean regression. Most of the results for quantile regression are based on simple random sampling (SRS). In this thesis, we study the quantile regression with rank-based sampling methods. Rank-based sampling methods have a wide range of applications in medical, ecological and environmental research, and have been shown to perform better than SRS in estimating several population parameters. We propose a new objective function which takes into account the ranking information to estimate the unknown model parameters based on the maxima or minima nomination sampling designs. We compare the mean squared error of the proposed quantile regression estimates using maxima (or minima) nomination sampling design and observe that it provides higher relative efficiency when compared with its counterparts under SRS design for analyzing the upper (or lower) tails of the distribution of the response variable. We also evaluate the performance of our proposed methods when ranking is done with error. Acknowledgment Foremost, I would like to express my sincere gratitude to my advisor Dr. Mo- hammad Jafari Jozani for his constant moral and financial support during my M.Sc coursework and research. His patience, motivation, enthusiasm, and immense knowl- edge are inestimable. His guidance helped me throughout the course of the research and writing of this thesis. Also, I would like to appreciate my thesis committee: Dr. Elif Acar and Dr. Lisa Lix for their encouragement and insightful comments. My earnest thanks also goes to Dr. Mogbademu, Prof. Ajala, and Mr. Lanre, for their recommendations during the course of my application to the University of Manitoba. I thank my fellow course-mates at the University of Manitoba: Faisal Atakora, Erfan Hoque, Philip Olabisi, Manqiong Chen, for the challenging discussions, sleepless nights we had working together to meet STAT 7080 deadlines, and for all the fun we have had in the last two years. Last but not the least, I would like to thank my wonderful family, my inestimable jewel Funmilayo Ayilara (Nee Fatubarin), for giving birth to me and supporting me morally, financially and spiritually throughout my life. i Dedication Page This work is dedicated to Almighty God. ii Contents Contents iii List of Tables vi List of Figures ix 1 Introduction 1 1.1 Motivation . .1 1.2 Quantiles . .6 1.2.1 Conditional Quantile . .7 1.3 Quantile Regression Estimation . .7 1.3.1 Difficulties in Estimation of Quantile Regression Parameters . 10 1.4 Linear Programming . 11 1.4.1 Linear Program Duality . 12 1.5 M-Quantile Regression . 15 1.6 Properties of Quantile Regression . 16 iii 1.7 Inference on Quantile Regression . 19 1.7.1 Consistency . 20 1.7.2 Asymptotic Normality . 21 1.8 Resampling methods . 22 1.8.1 Residual Bootstrap . 23 1.8.2 Paired Bootstrap . 24 1.9 Implementation in R . 24 1.10 Example . 25 1.11 Thesis Outline . 29 2 Ranked Set Sampling 30 2.1 Introduction . 30 2.2 Estimation of the Population Mean using RSS . 34 2.3 Quantile Estimation using RSS . 36 2.4 Resampling Ranked Set Samples . 41 2.4.1 BRSSR: Bootstrap RSS by Rows . 42 2.4.2 BoRSS : Bootstrap RSS . 43 2.4.3 MRBRSS : mixed row bootstrap RSS . 44 iv 3 Quantile Regression with Nominated Samples 46 3.1 Introduction . 47 3.2 Preliminary Results . 49 3.3 Estimation . 52 3.3.1 First Method . 52 3.3.2 Second Method . 55 3.4 Simulation Study . 57 3.4.1 Categorical Covariates . 58 3.4.2 Categorical and Continuous Covariates . 78 3.4.3 Robustness . 92 4 Quantile Regression with rank based samples: An application to Bone Mineral Density 96 4.1 Data Description . 97 4.2 Results . 100 4.2.1 Quantile regression for lower quantiles of lumbar spine BMD 100 4.2.2 Quantile regression for upper quantiles of lumbar spine BMD 106 5 Concluding Remarks and Future Work 109 Bibliography 112 v List of Tables 1.1 Intercept and slope obtained from linear quantile regression between BMD and age in bone mineral acquisition dataset. 26 3.1 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima-nominated samples for categorical covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:6. The results are based on 5000 Monte Carlo replications for each sampling methods. 62 3.2 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima-nominated samples for categorical covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:7. The results are based on 5000 Monte Carlo replications for each sampling methods. 63 vi 3.3 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima-nominated samples for categorical covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:8. The results are based on 5000 Monte Carlo replications for each sampling methods. 64 3.4 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima-nominated samples for categorical covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:9. The results are based on 5000 Monte Carlo replications for each sampling methods. 65 3.5 Estimated Bias (Biasd ) and MSEd for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima- nominated samples categorical and continuous covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:6. The results are based on 5000 Monte Carlo replications for each sampling methods. 80 3.6 Estimated Bias (Biasd ) and MSEd for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima- nominated samples categorical and continuous covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:7. The results are based on 5000 Monte Carlo replications for each sampling methods. 81 vii 3.7 Estimated Bias (Biasd ) and MSEd for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima- nominated samples categorical and continuous covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:8. The results are based on 5000 Monte Carlo replications for each sampling methods. 82 3.8 Estimated Bias (Biasd ) and MSEd for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima- nominated samples categorical and continuous covariates with set size, k = (2; 3; 4; 5; 6; 7; 8) and ρ = 0:9. The results are based on 5000 Monte Carlo replications for each sampling methods. 83 4.1 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:10; 0:20; 0:30; 0:50) quantile regression using simple random samples (k = 1) and minima-nominated samples with set size, k = (2; 3; 4; 5) and ρ = 0:9. The results are based on 5000 Monte Carlo replications for each sampling methods. 104 4.2 Estimated Bias (Biasd ) and Estimated Mean Squared Error (MSEd ) for Qτ (Y j X) when τ = (0:50; 0:60; 0:80; 0:90) quantile regression using simple random samples (k = 1) and maxima-nominated samples with set size, k = (2; 3; 4; 5) and ρ = 0:9. The results are based on 5000 Monte Carlo replications for each sampling methods. 107 viii List of Figures 1.1 Bone mineral density as a function of the adolescent age for the baseline study with the least squares regression line. .2 1.2 Bone mineral density as a function of the adolescent age for the baseline study with the least squares regression line and quantile regression line for τ = (0:1; 0:2; 0:3; 0:4; 0:5; 0:6; 0:7; 0:8; 0:9): .....3 1.3 Absolute error loss (solid), squared (dashed) and quantile error loss function (dotted). 10 1.4 Bone mineral density as a function of the adolescent age for the baseline study with the least squares and quantile regression lines associated with different τ........................ 26 1.5 Bootstrap Confidence intervals and schematic boxplots of the esti- mates of the quantile regression coefficients at different quantiles τ = 0:10; 0:25; 0:35; 0:50; 0:75; 0:95; Intercept(Left), slope(Right).