International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2131 ISSN 2229-5518

Generalized Linear Model for Binary with Missing Values: An EM Algorithm Approach

Nazneen Sultana

Abstract: A procedure is derived for estimating the parameter in case of missing data. The missing data mechanism is considered as missing at random (MAR) and non-ignorable. Here we use EM algorithm for logit link approach in . The logit link approach shows that it can effectively estimate the value of a when we have information on the other categorical variables. In this method the variable with missing values is considered as dependent variable. In addition a real data set for low birth weight is presented to illustrate the method proposed.

Index Terms: Binary data, EM algorithm, Generalized linear model, Logit link, Maximum likelihood estimation, Missing data, non-ignorable.

——————————  ——————————

1. INTRODUCTION URING the process of complete data, models when the response D sometimes we may not get the full observed variable is missing and the missing data data. This results in partially incomplete data. An mechanism is non-ignorable. Ibrahim and Lipsitz inappropriate conclusion may occur when the (1996) proposed a conditional model for researchers ignore, truncate, censor or collapse incomplete covariates in parametric regression those data as it might contain important models. Ibrahim, Lipsitz and Chen (1999) information. WhenIJSER non-response is unrelated to proposed a method for estimating parameters in the missing values of the variables, the non- generalized linear models with missing response is called ignorable (see Little (1992) and covariates and a non-ignorable missing data Little and Rubin (1987)). When non-response is mechanism. related to values of missing variables, the non- In this paper, we have proposed a method for response is called non-ignorable. The literature estimating parameters in generalized linear for generalized linear model with incomplete model with missing values. We used parameter observations, however, is sparse. Ibrahim et al estimation procedure for logit link function in (1990) discussed incomplete data in generalized linear models. Ibrahim and Lipsitz (1996) generalized linear model. Here variable with missing values is considered as dependent proposed a method for estimating parameters in variable. Here we considered the non-response as

———————————————— non-ignorable. We used EM algorithm both for • Nazneen Sultana,Lecturer, Department of Applied , categorical and continuous variable. The method East West University, Bangladesh, PH-01754756427. E-mail:[email protected] proposed is computationally simple and easy.

The rest of this paper is organized as follows. In section 2, we first briefly review the previous

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2132 ISSN 2229-5518

methods of estimating parameters for missing incompletely classified data of the data using EM algorithm. In section 3, we discuss nonrespondents. The likelihood function the parameter estimation procedure for logit link corresponding to their model is given by function in generalized linear model. Then in section 4, we discuss the proposed method and in = | | section 5, we demonstrate the methodology with 푧푥푦1 푧푥+2 퐿 �� ��푝푦1 푥� � ���푝+2 푥� � example. We conclude the paper with a 푦푥 푥 discussion section. Then they proceeded the E-step and M-step of

2. EM Algorithm for Missing Data EM algorithm. But this process takes a large number of iteration. In 1977 a broadly applicable algorithm for computing the maximum likelihood estimates Lipsitz and Ibrahim (1996) examined that when from incomplete data is proposed by Dempster, the missing covariates are categorical, a useful Laird and Rubin. They proposed an algorithm technique for obtaining parameter estimates is which is named as EM algorithm because it the EM algorithm by the method of weighs involves expectation step (E-step) and proposed in Ibrahim (1990). This method maximization step (M-step) in each iteration. requires the estimation of many nuisance Little and Scheluchter (1985) discussed the parameters for the distribution of the covariates. maximum likelihood estimation procedure for In this paper, the distribution of K-dimensional mixed continuous and categorical data with covariate vector = ( ,…, ) can be written through a series of one-dimensional conditional missing values. The general location model of 푥 푥1 푥푘 Olkin & Tate (1961) and extensions introduced by distributions, as Krzanowski (1980, 1982) formed the basis for this α = α method. ()1 ,..., k ( k 1 xxxPxxP −1 ,,..., kk ) IJSER ( − xxxP α −− ) (),...,,..., ()xPxxP αα Baker and Laird (1988) developed the process of k 11 kk 12 11112 for categorical variables with where is a vector of indexing parameters for outcome subject to non-ignorable non-response. the th 푘conditional distribution, = ( ,…, ), They developed a log-linear model for 훼 and the ’s are distinct. They used 1the EM푘 categorical response subject to non-ignorable 훼 훼 훼 algorithm푘 by푘 the method of weights where the non-response. To illustrate model development, 훼 weights are the posterior probability of the they considered as cross-classification of missing values. Unfortunately there are often too covariates indexed by = 1, … … , , as 푋 many probabilities to estimate in a saturated polychotomous outcome indexed by = 푥 푠 푌 model for ( ,…, | ) when there are many 1, … … , , and as a dichotomous response 푦 covariates with1 missing푘 values. In that situation, mechanism indexed by ( = 1 = ; = 푃 푥 푥 훼 푞 푅 the model becomes complicated. 2 = . Let be the joint probability 푟 푟 푟푒푠푝표푛푠푒 푟 of outcome푛표 푟푒푠푝표푛푠푒 and response푝푦푟│푥 mechanism conditional Ibrahim, Lipsitz and Chen (1999) developed a on x. Let be the observed counts for the method for missing covariates in Generalized Linear Model when the missing data mechanism completely 푥푦classified1 data of the respondents, and let 푧 denote the observed counts for the is non-ignorable. They used a multinomial model

푧푥+2

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2133 ISSN 2229-5518

for the missing data indicators and proposed a ( ) joint distribution for them which can be written 푡 푄�훾�훾 � as a sequence of one-dimensional conditional = 푛 ,( ) log[ { | ( ), }] ,( ) distributions, with each one-dimensional � � 푤푖푗 푡 푝 푦푖 푥푖 푗 훽 푖=1 푥푚푖푠 푗 conditional distribution consisting of a logistic + 푛 ,( ) log[ { ( )| }] regression. They allowed the covariates to be ,( ) 푖푗 푡 푖 either categorical or continuous. Suppose that � � 푤 푝 푥 푗 훼 푖=1 푥푚푖푠 푗 ( , ),…, ( , ) are independent observations, + 푛 ,( ) log[ { | , ( ), }] where1 1 each 푛 is푛 the response variable and each ,( ) 푥 푦 푥 푦 � � 푤푖푗 푡 푝 푟푖 푦푖 푥푖 푗 휑 × 1 푖=1( )푥푚푖푠 푗 ( ) ( ) ( ) ( ) ( ) is a 푖random vector of covariates. The = + + missing data푦 mechanism is defined as the 1 푡 2 푡 3 푡 푥푖 푝 푄 �훽�훾 � 푄 �훼�훾 � 푄 �휑�훾 � distribution of the × 1 random vector , whose The weights ,( ) are the conditional probabilities 푖푗 푡 corresponding to th component equals 1 if is observed for 푤 푝 푟푖 subject and is 0 if is missing. Here = , , , , , and are given by 푘 푟푖푘 푥푖푘 ,…, is a × 1 vector of regression �푥푚푖푠 푖│푥표푏푠 푖 푦푖 푟푖 훾� 푖푘 훽 ,( ) 푖 ′ 푥 1 푝 ( ) ( ) coefficients, and are considered as nuisance =푖푗 푡 ( ), , . ( ), �훽 훽 � 푝 푤 , , , , parameters. In this paper for categorical 푡 푡 푖 푚푖푠 푖 표푏푠 푖 푚푖푠 푖 표푏푠 푖 푝�푦 │푥 푗 푥 훾 � 푝�푥( ) 푗 푥 │훾 � covariates, 훼 휑 . , , ( ), , , ( ) 푡 ( ) 푖 푖 푚푖푠 푖 ( )표푏푠, 푖 . ( ) ÷푝�푟 │푦 푥 푗 푥 훾 � ( , , | , , ) = ( | , ) ( | ) ( | , , ) 푡 ( ) 푡 푖 . 푖 , ( ), 푖 , ( ) 푝�푦 │푥 푗 훾 � 푝�푥 푗 │훾 � 푝 푦푖 푥푖 푟푖 훽 훼 휑 푝 푦푖 푥푖 훽 푝 푥푖 훼 푝 푟푖 푦푖 푥푖 휑 � 푡 which leads to the complete data log-likelihood 푥푚푖푠 푖 푗 푝�푟푖│푦푖 푥푖 푗 훾 � and the M-step is,

( ) = 푛 ( , , , ) ( ) ( ) = 푛 푙IJSER훾 � 푙 훾 푥푖 푦푖 푟푖 푡 푡 푖=1 푄�훾�훾 � � 푄푖�훾�훾 � 푖=1 { , ( ), , } = 푛 log{ ( | , )} + log{ ( | )} = 푛 ,( ) 푖 푖 푖 푖 푖 푖 푖 ,( ) 휕푙 훾 푥 푗 푦 푟 � 푝 푦 푥 훽 푝 푥 훼 � � 푤푖푗 푡 푖=1 푖=1 푥푚푖푠 푗 + log{ ( | , , )} 휕훾 But this procedure is so much complicated and 푖 푖 푖 푝 푟 푦 푥 휑 time consuming. That is why, an easy and simpler method is proposed in this paper. where = ( , , ) and ( , , , ) is the contribution to the complete data log-likelihood 3. Parameter Estimation: Logit Link 훾 훽 훼 휑 푙 훾 푥푖 푦푖 푟푖 for the th observation. Function in the Generalized Linear Model Then they푖 used the EM algorithm by the method For any binomial variable, Y, the probability of weights where the E-step is, mass function can be expressed as (McCullagh, P. and J.A. Nelder. 1989)

( ; , ) = (1 ) 푦 푛−푦 푌 푛 푓 푦 휃 휑 �푦� 휋 − 휋

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2134 ISSN 2229-5518

= + (1 ) + 1 ( ) = 푛 휋 ( ) (3.1)푛 푖 푒푥푝 �푦푙푛 �1−휋� 푛푙푛 − 휋 푙푛 � �� 푖 푑푏 휃 푖 푦 � �푦 − 푖 � 푥 푎 휑 푖=1 푑휃 Therefore, from (3.1), for the binomial 1 = 푛 [ ] distribution ( ) � 푦푖 − 휇푖 푥푖 푎 휑 푖−1 Consequently, we can find the maximum = , = , 1 1 +휃 휋 푒 likelihood estimates of the parameters by solving 휃 푙푛 � � 휋 휃 − 휋 푒 ( ) = (1 ), the system of equations

푏 휃 −푛푙푛 − 휋 ( ) = 1, ( , ) = , [ ] = 0 (3.3) ( ) 푛 1 푛 푎 휑 푐 푦 휑 푙푛 � � 푎 휑 ∑푖−1 푦푖 − 휇푖 푥푖 ( ) ( ) 푦 ( ) = = . For the ( ) = 1, so (3.3) 푑푏 휃 푑푏 휃 푑휋 퐸 푦 becomes 푎 휑 where 푑휃 푑휋 푑휃 [ ] = 0 (3.4) 푛 = = (1 ) (3.2) ∑푖−1 푦푖 − 휇푖 푥푖 휃 휃 2 푑휋 푒 푒 Thus, the maximum likelihood estimate of β is 휃 휃 푑휃 1+푒 − �1+푒 � 휋 − 휋 For the exponential family, the log likelihood = ( ´ ) ´ (3.5) function corresponding to a random sample of −1 −1 −1 훽̂ 푋 푉 푋 푋 푉 푧 size n is It is interesting to note similarity of (3.5) to the expression obtained in standard regression { ( )} IJSERmodel. ( , ) = 푛 + ( , ) ( ) 푦푖휃푖 − 푏 휃푖 푙 푦 훽 � � 푐 푦푖 휑 � 푖=1 푎 휑 4. EM Algorithm with Logit Link in Thus for the canonical link in the binomial case, GLM we have

In this method first we have to check out in = [ ( )] = ( ) which variable the missing values arise. Then the 휂푖 푔 퐸 푦푖 푔 휇푖 = [ /(1 )] = ´ = variable with missing data is considered as 푖 푖 푖 푖 and = 푙푛 where휇 −´ 휇 is the푥 i-훽th row휃 of the X- dependent variable. For example, let we have 3 matrix. 푖Therefore,푖 variables , and . If missing values are 휇 휋 푥푖 arise in 푥푖1 then푥푖2 we consider푥푖3 it as dependent = . variable and푖2 we have to calculate the conditional 훿푙 훿푙 훿휃푖 푥 훿훽 훿휃푖 훿훽 probabilities of with respect to and .

푥푖2 푥푖1 푥푖3

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2135 ISSN 2229-5518

That is, we have to fine ( , ). After that The log-likelihood function is we use the iteration procedure푝 푥푖2 ∣ of 푥푖1 EM푥푖3 algorithm.

( ) = 푛 [ (1 + exp ( ))] Suppose that ( , ),……,( , ) are 푙표푔푙 훽 � 푦푖푋훽 − 푙표푔 푋훽 independent observations.푥 1If푦 1there are푥 푛missing푦푛 푖=1 The M-step involves the maximization of the log- values in then is considered as response likelihood. Thus the M-step can be obtained as variable and푦푖 each 푦푖 is a × 1 random vector of follows: covariates. The conditional푥푖 푝 distribution of given is 푦푖 ( ) ( ) = 푛 푖 1 +푗 ( ) 푥 훿푙표푔푙 훽 푖 푗 푥 푒푥푝 푋훽 ( , ) = (1 ) 푗 � �푦 푥 − � 푖=1 푦푖 1−푦푖 훿훽 푒푥푝 푋훽 푝 푦푖 ∣ 푥푖 훽 휋 − 휋 and where

( ) ( ) ( ) = = 푛 1 + ( ) 2 1푗 +푘 ( ) 푒푥푝 푋훽 훿 푙표푔푙 훽 푥 푥 푒푥푝 푋훽 휋 2 푒푥푝 푋훽 푗 푘 − � � � The E-step is 훿훽 훿훽 푖=1 � 푒푥푝 푋훽 � So the score vector

( ) ( ) = = ( ) (4.1) ( ) 푒푥푝 푋훽 ( ) = (4.2) 퐸 푦 ∣ 푥 휋� 1+푒푥푝 푋훽 훿푙표푔푙 훽 Here 푈 훽 푗 훿훽 And the information matrix is ( ) ( = 1 , ) = IJSER1 + ( ) ( ) 푒푥푝 푋훽 ( ) = (4.3) 푝 푦푖 ∣ 푥푖 훽 2 푒푥푝 푋훽 훿 푙표푔푙 훽 and 퐼 훽 �− � 훿훽푗훿훽푘 Then we get the estimate of using Newton- 1 ( = 0 , ) = Raphson method: 1 + ( ) 훽 푝 푦푖 ∣ 푥푖 훽 푒푥푝 푋훽 ( ) ( ) From incomplete data, we calculate ( ) = . = + (4.4) 푘+1 푘 −1 If 0.5 then in missing values, we consider y = 훽̂ 훽̂ 퐼�훽̂� 푈�훽̂� 퐸 푦 ∣ 푥 휋� We have to continue this repeatedly until the 1 and if < 0.5 then 휋�≥ convergence that is 휋� y = 0. After that we get the complete data. Now ( ) ( ) the complete data likelihood function is 푡+1 푡 �훽 − 훽 � ≤ 휀

( ) 1 ( ) = 푛 푖 . 1 + ( ) 푦 1 + ( ) 1−푦푖 푒푥푝 푋훽 푙 훽 � � � � � 푖=1 푒푥푝 푋훽 푒푥푝 푋훽

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2136 ISSN 2229-5518

5. Example Table 5.1: Parameter estimation for categorical age We consider the Low Birth Weight Data (Hosmer and Lemeshow). Consider two variables Low

(low birth weight of baby) and Age (mother’s age). Here we consider missing at random method. In variable Low, 20, 74, 94, 22 and 93 positions are missing. That is we have 5 missing values. So we consider Low as dependent variable and Age as independent variable. We It is evident from Table 5.1 that there is no considered age 22 as a cut-off point. Since we statistically significant association between age want to work with dichotomous data, we recode and birth weight. However we observed that all the age data into 0 and 1. Here randomly chosen values for subject 20, 74, 94, 22 and 93 are matched with the estimated values. ( + ) ( = 1 , ) = 1 + ( + ) The criteria of matching are discussed in the 푒푥푝 훽0 훽1푥푖 푝 푦푖 ∣ 푥푖 훽 푒푥푝 훽0 훽1푥푖 previous section 3. where = (1, ) is the 1 × 2 vector of

푖 covariates푋 for the푥 ith observation, including an Again we considered age as continuous variable intercept and = ( , )´. The initial values and did the whole procedure without recoding the age. Here we considered low as binary data chosen for the훽 regression훽0 훽1 coefficients are IJSERand age as continuous data. Using these ( , ) = ( 0.6, 0.07). variables, we get the result for logit link 훽0 훽1 − − Then using (4.1), we get . Using the condition of approach. replacing missing values,휋� we replace the 5 Parameter estimation for continuous missing values by 1. Then we get the complete Table 5.2: age data. In M-step we use (4.2), (4.3) and (4.4) and then we get the estimate.

Again using this estimate we recalculate E-step and M-step. After the EM convergence we get the final estimate.

IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2137 ISSN 2229-5518

From Table 5.2 we observed that we get the same References result after considering age as continuous [1] Agresti, A. (1990), “Categorical Data variable. That is, there is no statistically Analysis”.New York: Wiley. significant relationship between age and low [2] Hosmer, D. W. and Lemeshow, S. (2000) birth weight. However we observed that all “Applied ” 2nd edn. Wiley. randomly chosen values are also matched with [3] Little, R. J. A. and Schluchter, M. (1985) “Maximum likelihood estimation for mixed the estimated values. The percentage of matching continuous and categorical data with missing may improve with the specification of the values”. Biometrika, 72, 497-512. underlying model. [4] Baker, S. G. and Laird, N. M. (1988), “Regression analysis for categorical variables with outcome 6. Discussion subject to non-ignorable non-response”. J. Am.

Statist. Ass., 83, 62-69. We have proposed a method of estimation based [5] Ibrahim, J. G. (1990), “Incomplete data in on GLM. We have proposed the method for non- generalized linear models”. J. Am. Statist. Ass., ignorable non-response. For the examples 85, 765-769. considered in Section 5, we observed that the [6] Ibrahim, J. G. and Lipsitz, S. R. (1996), missing values are fully matched with the “Parameter estimation from incomplete data in binomial regression when the missing data estimated values and the estimates are matched mechanism is non-ignorable”. Biometrics, 52, with the estimates for complete data i.e. original 1071-1078. data. One drawback of the EM algorithm is its [7] Lipsitz, S. R. and Ibrahim, J. G. (1996), “A slow convergenceIJSER rate but the whole procedure is conditional model for incomplete covariates in simple and easy. This method is easy to handle parametric regression models”. Biometrika, 83, and it gives efficient result. So we can use it for 916-922. [8] Little, R. J. A. and Rubin, D. B. (1987), “Statistical real-life data. Analysis with Missing Data”. New York: Wiley. Acknowledgments [9] McCullagh, P. and Nelder, J. (1989), “Generalized Linear Models”, 2nd edn. New I am grateful to Professor Dr. M. Ataharul Islam York: Chapman and Hall. for his continuing support and encouragement. I [10] Ibrahim, J. G., Lipsitz, S. R. and Ming-Hui Chen (1999), “Missing covariates in Generalized would also like to thank the referees for their Linear Models when the missing data helpful comments. mechanism is non-ignorable”. J. R. Statist. Soc.

61, 173-190.

IJSER © 2013 http://www.ijser.org