EMPIRICAL LIKELIHOOD METHOD for SEGMENTED LINEAR REGRESSION by Zhihua Liu a Dissertation Submitted to the Faculty of the Charles

EMPIRICAL LIKELIHOOD METHOD FOR SEGMENTED LINEAR REGRESSION by Zhihua Liu A Dissertation Submitted to the Faculty of The Charles E. Schmidt College of Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Florida Atlantic University Boca Raton, FL December 2011 Copyright by Zhihua Liu 2011 ii ACKNOWLEDGEMENTS First of all, I have to confess that the thought of writing this dissertation was intimidating to me. As much as I wish to express my appreciation to all the people who have been there for me to make this dissertation possible, I know I can never include everyone, and not adequately express my immense gratitude to them in simple words. I would like to express my deepest gratitude to all my committee members, Dr. Lianfen Qian, Dr. Hongwei Long, Dr. Heinrich Niederhausen and Dr. Dragan Radulovic. I appreciate their time, interest, and valuable comments concerning my thesis. For the last few years, I had the opportunity and privilege to work with Dr. Lianfen Qian. She has given me ideas and suggestions that enlightened my under- standing of this research and gave me a better perspective on my own work. I would also want to use this opportunity to thank many faculty members and my colleagues for all their inspiration and encouragement. I am also fortunate to be surrounded by sweet people from PenServ and ERISA Pension Systems. Without their support, I would never have completed this dissertation. In the course of writing this dissertation, I am running between a full-time job and actuarial exams. Most of the time, I wish I could have 40 hours a day. My family has always given me their unconditional love and constant support that has helped me sail through the difficulties. I dedicate this dissertation to my family. Thanks for all your love and patience. iv ABSTRACT Author: ZhihuaLiu Title: EmpiricalLikelihoodMethodforSegmentedLinear Regression Institution: Florida Atlantic University Dissertation Advisor: Dr. Lianfen Qian Degree: DoctorofPhilosophy Year: 2011 For a segmented regression system with an unknown change-point over two domains of a predictor, a new empirical likelihood ratio test statistic is proposed to test the null hypothesis of no change. The proposed method is a non-parametric method which releases the assumption of the error distribution. Under the null hypothesis of no change, the proposed test statistic is shown empirically Gumbel distributed with robust location and scale parameters under various parameter settings and error distributions. Under the alternative hypothesis with a change-point, the comparisons with two other methods (Chen’s SIC method and Muggeo’s SEG method) show that the proposed method performs better when the slope change is small. A power analysis is conducted to illustrate the performance of the test. The proposed method is also applied to analyze two real datasets: the plasma osmolality dataset and the gasoline price dataset. v EMPIRICAL LIKELIHOOD METHOD FOR SEGMENTED LINEAR REGRESSION TABLES..................................... viii FIGURES.................................... ix 1 Introduction................................... 1 1.1 Motivationandsomeexamples. 1 1.2 ParametricMethod............................ 4 1.3 NonparametricMethod.......................... 7 1.4 EmpiricalLikelihood ........................... 10 2 Change-point Estimation Via Empirical Likelihood . ...... 14 2.1 AssumingaKnownchange-point . 14 2.2 AssuminganUnknownchange-point . 15 2.3 MainResults ............................... 18 2.4 Algorithm ................................. 20 3 Asymptotic Property of Zn .......................... 24 3.1 SimulationI................................ 25 3.2 SimulationII ............................... 31 3.3 SimulationIII............................... 31 3.4 SimulationIV............................... 33 vi 4 Application ................................... 36 5 Conclusion.................................... 39 Bibliography .................................. 40 vii TABLES 3.1 Robustness analysis for the estimated location and scale parameters, µ and σ respectively, of the limiting distribution of Zn under three types oferrordistributions.. 26 3.2 The percentiles of Zn with α = 0 .10 , 0.05 , and 0 .01. ......... 27 3.3 Robustness analysis for the estimated location and scale parameters, µ and σ respectively, of the limiting distribution of Zn with respect to the settings of γ under null hypothesis when n =100.......... 31 3.4 The frequency distribution of d = kˆ∗ k∗ and the relative frequency # of d D | − | RF = { ≤ }% for ELR, SIC and SEG methods, when sample 10 size n = 50 and the true time of change k∗ =25............. 34 3.5 The size and the power of Zn under two different error distributions: (i) normal distribution N(0 , 0.12) and (ii) centered log-normal distribution logN (0 , 0.12). ............................... 35 viii FIGURES 3.1 The histograms and Q-Q plots of Zn under H0 with normal errors (i) N(0 , 0.12) for four different sample size settings. The solid line represents the estimated Gumbel density and the dashed line represents theestimatedkerneldensity. 28 3.2 The histograms and Q-Q plots of Zn under H0 with centered log-normal errors (ii) Centered log N(0 , 0.12) for four different sample size settings. The solid line represents the estimated Gumbel density and the dashed line represents the estimated kernel density. ... 29 3.3 The histograms and Q-Q plots of Zn under H0 with non-homogeneous [n/ 2] 2 n 2 errors (iii) ei i=1 N(0 , 0.1 ) and ei i=[ n/ 2]+1 N(0 , 1.0 ) for four different sample{ } size∼ settings. The solid{ } line represents∼ the estimated Gumbel density and the dashed line represents the estimated kernel density. .................................. 30 4.1 (a) the scatter plot of AVP versus plasma osmolality with fitted segmented linear regression. (b) The plot of -2logarithm of empirical likelihood ratio versus all the possible time of the change k. ..... 37 4.2 (a) the scatter plot of gasoline price versus year with fitted segmented linear regression. (b) The plot of -2logarithm of empirical likelihood ratio versus all the possible time of the change k. ........... 38 ix CHAPTER 1 INTRODUCTION 1.1 MOTIVATION AND SOME EXAMPLES In the classical regression setting, a regression model is usually assumed to be of a single parametric form on the whole domain of predictors. However, a piecewise regression model is used to show that the parameters of the model can be different on different domains of the predictors. In the last thirty years, a considerable body of techniques have been developed for hypothesis testing, parameter estimation and related computing programs on detecting the change-point for the piecewise regression model. One special and commonly used piecewise regression model is the two-phase linear regression model. The regression function of this model is a piecewise linear function. One can define this more precisely as follows. Let Y be the response variable, and n X be a univariate predictor such as E Y X < . Suppose that (Xi, Y i) i=1 is a h| | i ∞ { } sequence of independent observations of ( X, Y ) satisfying the following model: Y = ( α + α X )I(X τ) + ( β + β X )I(X > τ ) + e (1.1.1) i 0 1 i i ≤ 0 1 i i i where α , α , β , β , τ are unknown parameters, and e n are independent random 0 1 0 1 { i}i=1 errors with mean zero. Without loss of generality, we assume X X . X 1 ≤ 2 ≤ ≤ n throughout the rest of the dissertation. If there is an unknown time k∗ such that ∗ X ∗ τ < X ∗ , then we shall call k the time of change and τ the change-point. k ≤ k +1 Widespread applications of two-phase linear regression models have appeared in 1 diverse research areas. For example, in environmental sciences, in Section 2.2 of [24], Piegorsch and Bailer illustrate the usefulness of two-phase linear regression models with a series of examples. In [27], Qian and Ryu fit a two-phase model with termite survival as Y and tropical tree resin dosage as X and found out that tropical tree resin at a concentration of 10 mg was significantly effective in killing termites. In biological sciences, Vieth [35] applies model (1.1.1) to estimate the osmotic threshold by fitting arginine vasopressin concentration against plasma osmolality in the plasma of conscious dogs. In medical sciences, Smith and Cook [32] use a piecewise linear regression model to fit some renal transplant data. Lund and Reeves utilize the model to detect undocumented change-points for time series in [18]. Other applications can be found in epidemiology (Ulm [34], Pastor and Guallar [23]), software engineering (Qian and Yao [26]), econometrics (Chow [6], Koul and Qian [17], Fiteni [12], Zeileis [36]) and other fields. Hawkins [14] classifies two-phase linear regression models into two types: continuous and discontinuous. By continuous, he means that the regression function is continuous at the change-point τ; that is, the change-point τ satisfies the following equation: α0 + α1τ = β0 + β1τ. (1.1.2) If equation (1.1.2) is not satisfied, the model is discontinuous. Continuous models are also known as segmented linear regression models (Feder [10, 11]). Before applying two-phase linear regression model to the dataset, it is a common practice to test for the existence of a change-point τ, which is done by the use of hypothesis testing. The null hypothesis of only a single phase or no change can be obtained by assuming α0 = β0 and α1 = β1. In the usual scenario, when the change- point is known, the asymptotic chi-squared theory can be used for the likelihood ratio test for one phase against two phases. When the change-point is unknown, 2 the difficulty arises because we are testing a hypothesis in the presence of a nuisance parameter, τ, which is meaningless and cannot be estimated under the null hypothesis. However, τ enters the model under the alternative hypothesis. If τ is known and S(τ) is the appropriate test statistic, with large values corresponding to the alternative been true, then the test statistic we suggest for the case when τ is unknown is Λ = max S(τ) : L < τ < U , (1.1.3) n o where and are the boundary truncations.

EMPIRICAL LIKELIHOOD METHOD for SEGMENTED LINEAR REGRESSION by Zhihua Liu a Dissertation Submitted to the Faculty of the Charles

How to Assess the Impact of Quality and Patient Safety Interventions with Routinely Collected Longitudinal Data

Process Control Charts and ITS Analysis of Epworth MEER Trial

Original Research Article Analysing Interrupted Time Series with A

An Exact Algorithm for Estimating Breakpoints in Segmented Generalized Linear Models

Fitting Segmented Regression Curves

Parameter Estimation in Linear-Linear Segmented Regression

Water Quality Trend and Change-Point Analyses Using Integration of Locally Weighted Polynomial Regression and Segmented Regression

Fast Algorithms for Segmented Regression

Piecewise Linear Regression for Leaf Appearance Rate Data

Segmented Regression Analysis of Interrupted Time Series Studies in Medication Use Research

Package 'Segmented'

New Robust Estimator of Change Point in Segmented Regression Model for Bed-Load of Rivers