The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model

The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Lu Chen, B.S Graduate Program in Public Health The Ohio State University 2017 Thesis Committee: Grzegorz A. Rempala, Advisor Chuck Song c Copyright by Lu Chen 2017 Abstract Binary response model is frequently used to analyze binary outcome variables [1]. This popularity leads to an increase in statistical research on the model [1]. One area of current research is developing new goodness of fit (GOF) tests to evaluate whether a model fits the data [1]. In this paper, a new GOF test statistic, the sum of standardized residuals (Cn), is proposed and its asymptotic distribution is described by following Windmeijer's idea [2]. In addition, we illustrate, via numerical examples, the practical applications of the asymptotic result to some finite samples of size n and compare Cn's performance with some other currently used statistics. Our results demonstrate that, compared to other statistics, the overall performance of Cn is satisfying and stable, and Cn can be calculated easily and interpreted intuitively, unlike its other competitors. ii Vita 2014 . .B.S. Pharmacy, China Pharmaceutical Univerdsity 2015 to present . School of Public Health, The Ohio State University Fields of Study Major Field: Public Health iii Table of Contents Page Abstract . ii Vita . iii List of Tables . v 1. Introduction . 1 1.1 Some Currently Used Goodness-of-Fit Statistics . 1 1.1.1 Hosmer-Lemeshow Statistic . 1 1.1.2 Stukel Statistic . 3 1.1.3 Unweighted Sum of Squares Statistic . 3 1.1.4 Information Matrix Statistic . 4 2. Sum of Standardized Residual Statistic . 5 2.1 Regularity Assumptions . 5 2.2 The Asymptotic Distribution of Cn .................. 7 3. Numerical Examples . 8 3.0.1 Simulated Covariates . 9 3.0.2 Real Covariates . 11 3.0.3 Summary . 18 Bibliography . 19 Appendices 20 A. Proof of the Asymptotic Distribution of Cn . 20 iv List of Tables Table Page 3.1 Proportion H0 rejected at the α = 0:05 using sample size of 50, 100 and 1000 with 1000 replications using simulated covariates . 10 3.2 Selected variables description in the Boston Marathon data . 11 3.3 Proportion H0 rejected at the α = 0:05 using sample size of 445 with 1000 replications in quadratic models using the Boston Marathon data 13 3.4 Selected variables description in Cleveland data . 14 3.5 Proportion H0 rejected at the α = 0:05 using sample size of 303 with 1000 replications in interaction models using the Cleveland data . 15 3.6 Selected variables description in NHANES data . 16 3.7 Proportion H0 rejected at the α = 0:05 using sample size of 373 with 1000 replications in interaction models using the NHANES data . 17 v Chapter 1: Introduction The binary response model [3] is a popular method for analyzing binary data. In this model, we are interested in pi, where pi is defined as the probability that a binary random variable Yi = 1 given the covariates xi such that > pi = P (Yi = 1jxi) = F (xi β); (1.1) where β is the k-vector of coefficients and F is a distribution function. Some popular choices of F include normal (probit model) or logistic (logit model) distributions. After a model is fitted, we usually want to assess its goodness of fit (GOF). One of the several approaches is to perform a formal GOF test. We will introduce several currently used test statistics in the next section. Also, we will propose a new statistic, the sum of standardized residuals (Cn), and describe its asymptotic distribution in Chapter 2. In Chapter 3, we will illustrate, via numerical examples, the practical applications of the asymptotic result to some finite samples of size n and compare Cn's performance with other statistics. 1.1 Some Currently Used Goodness-of-Fit Statistics 1.1.1 Hosmer-Lemeshow Statistic In 1980, Hosmer and Lemeshow [4] proposed a grouping-based method to measure the goodness of fit for the binary response model. Specifically, the data are grouped 1 0 according to thep î s, the estimated values of pi. The observed and expected frequency are calculated for each group, and compared using the Pearson chi-square statistic to obtain g " 2 2 # X (O1k − Ebik) (O0k − Eb0k) HL = + k=1 Eb1k Eb0k th Pck where k denotes the k group, k = 1; 2; 3; :::; g. O1k = j=1 Yi, is the observed th number of events (Yi = 1) in the k group, where ck is the number of covariate th Pck patterns in the k group. Eb1k = j=1 mjp^j is the expected number of events in the th th k group, where mj is the number of observations in j covariate pattern, andp ^j is th Pck the estimated probability for j covariate pattern. Similarly, O0k = j=1(mj −Yj) is th Pck the observed number of non-events (Yi = 0) in the k group, and Eb0k = j=1 mj(1− th p^j) is the expected number of non-events in the k group. Since g = 10 and g = 20 are commonly suggested in practice, we consider using these two setting to perform HL tests in Section 3. With a bit of algebra, it can be shown that g 0 2 X (O1k − nkp¯k) HL = 0 n p¯k(1 − p¯k) k=1 k 0 th where nk is the number of observations in the k group,p ¯k is the average estimated probability in the kth group. When the number of groups g equals to the sample size n, the formula of HL can be further rewritten as n 2 X (Yi − pî) HL = p^ (1 − p^ ) i=1 i i which suggests the formula for another goodness of fit measure: the sum of squared standardized residuals. 2 1.1.2 Stukel Statistic To perform Stukel GOF test [5], one or two additional covariates are included in the standard logistic model, and this new model is tested versus the original one. For > ^ ^ example, letp î = F (xi β) be the original model, where β is the vector of estimated 2 2 coefficients. Let za = gi I(gi ≥ 0) and zb = gi I(gi < 0) be the two covariates,where > ^ > ^ ^ gi = xi β. After including za and zb into the model, it becomesp î = F (xi β + βaza + ^ βbzb). Then, the Ward test or the Likelihood Ratio test can be performed to test the null hypothesis that both βa and βb are equal to 0 . 1.1.3 Unweighted Sum of Squares Statistic When some of the expected cell frequencies in the 2 × n contingency table are too small, the weighted chi-squared test for proportions cannot be used. In this case, Copas [6] modified the chi-squared test as n X 2 USS = (Yi − pî) ; i=1 wherep î is defined as before, n is the sample size. Under the null hypothesis, Hosmer et al.[1] have proved that this statistic will have an asymptotic normal distribution when standardized by its mean and the square root of its variance: USS − trace(V^ ) d q −! N(0; 1) Vd ar(USS − trace(V^ )) where ^ ^> ^ ^ ^ Vd ar(USS − trace(V )) = di (I − M)V di with ^ di = 1 − 2^pi 3 ^ V = diag[^pi(1 − pî)] ^ ^ > ^ −1 > M = V xi(xi V xi) xi 1.1.4 Information Matrix Statistic In 1982, White [7] introduced a general test for model misspecification. The basic idea of this test is to compare two different estimates of the covariance matrix. If these two estimates are the same, there is no lack of fit. The formula for Information Matrix statistic is: n p X X 2 IM = (Yi − pî)(1 − 2p î)xij i=1 j=0 th th where xij is the j predictor for i observation. It has been shown by White that IM has a approximate chi-square distribution with p + 1 degree of freedom under the null hypothesis of the correct model. 4 Chapter 2: Sum of Standardized Residual Statistic As we mentioned in Section 1, the sum of squared standardized residuals is also a goodness of fit measure in binary response model n 2 n X (Yi − pî) X T = = W : n p^ (1 − p^ ) i i=1 i i i=1 This statistic was proved to be asymptotically following the standard normal distribution when properly standardized [2]. However, the drawback of Tn statistic is that the variances of its individual components Wi depend heavily on pi in such way that extreme values of pi could produce unbalanced variance across the sample. One intuitive way to address this issue is to assign a variance-stabilizing weight to Wi. This yields a new statistic n X Yi − pî Cn = p : i=1 pî(1 − pî) We will analyze the asymptotic distribution of this statistic under the following assumptions. 2.1 Regularity Assumptions Following Windmeijer [2] we assume that 5 1. Both the first-order derivative f and second-order derivative f 0 of F are con- tinuous. There exists a positive M < 1 such that 0 < F (z) < 1 and f(z) > 0 for all jzj ≤ M. 2. β 2 B, the parameter space which is a open subset of the Euclidean space of the same dimension k as β. P > 3. There exists a finite number M such that jjxijj ≤ M; 8i 2 and lim xixi N n!1 i is a finite non-singular matrix.

The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model

Goodness of Fit: What Do We Really Want to Know? I

Linear Regression: Goodness of Fit and Model Selection

Center for Multicultural & Global Mental Health Annual Report

Chapter 2 Simple Linear Regression Analysis the Simple

A Study of Some Issues of Goodness-Of-Fit Tests for Logistic Regression Wei Ma

Testing Goodness Of

Measures of Fit for Logistic Regression Paul D

Goodness of Fit of a Straight Line to Data

October 2002

Using Minitab to Run a Goodness‐Of‐Fit Test 1. Enter the Values of A

Goodness-Of-Fit Tests for Parametric Regression Models

What Critics Say- What Facts