Generalized Linear Model for Binary Data with Missing Values: an EM Algorithm Approach

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2131 ISSN 2229-5518 Generalized Linear Model for Binary Data with Missing Values: An EM Algorithm Approach Nazneen Sultana Abstract: A procedure is derived for estimating the parameter in case of missing data. The missing data mechanism is considered as missing at random (MAR) and non-ignorable. Here we use EM algorithm for logit link approach in generalized linear model. The logit link approach shows that it can effectively estimate the value of a categorical variable when we have information on the other categorical variables. In this method the variable with missing values is considered as dependent variable. In addition a real data set for low birth weight is presented to illustrate the method proposed. Index Terms: Binary data, EM algorithm, Generalized linear model, Logit link, Maximum likelihood estimation, Missing data, non-ignorable. —————————— —————————— 1. INTRODUCTION URING the process of complete data, binomial regression models when the response D sometimes we may not get the full observed variable is missing and the missing data data. This results in partially incomplete data. An mechanism is non-ignorable. Ibrahim and Lipsitz inappropriate conclusion may occur when the (1996) proposed a conditional model for researchers ignore, truncate, censor or collapse incomplete covariates in parametric regression those data as it might contain important models. Ibrahim, Lipsitz and Chen (1999) information. WhenIJSER non-response is unrelated to proposed a method for estimating parameters in the missing values of the variables, the non- generalized linear models with missing response is called ignorable (see Little (1992) and covariates and a non-ignorable missing data Little and Rubin (1987)). When non-response is mechanism. related to values of missing variables, the non- In this paper, we have proposed a method for response is called non-ignorable. The literature estimating parameters in generalized linear for generalized linear model with incomplete model with missing values. We used parameter observations, however, is sparse. Ibrahim et al estimation procedure for logit link function in (1990) discussed incomplete data in generalized linear models. Ibrahim and Lipsitz (1996) generalized linear model. Here variable with missing values is considered as dependent proposed a method for estimating parameters in variable. Here we considered the non-response as ———————————————— non-ignorable. We used EM algorithm both for • Nazneen Sultana,Lecturer, Department of Applied Statistics, categorical and continuous variable. The method East West University, Bangladesh, PH-01754756427. E-mail:[email protected] proposed is computationally simple and easy. The rest of this paper is organized as follows. In section 2, we first briefly review the previous IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2132 ISSN 2229-5518 methods of estimating parameters for missing incompletely classified data of the data using EM algorithm. In section 3, we discuss nonrespondents. The likelihood function the parameter estimation procedure for logit link corresponding to their model is given by function in generalized linear model. Then in section 4, we discuss the proposed method and in = | | section 5, we demonstrate the methodology with 푧푥푦1 푧푥+2 퐿 �� 푝푦1 푥� � ��푝+2 푥� � example. We conclude the paper with a 푦푥 푥 discussion section. Then they proceeded the E-step and M-step of 2. EM Algorithm for Missing Data EM algorithm. But this process takes a large number of iteration. In 1977 a broadly applicable algorithm for computing the maximum likelihood estimates Lipsitz and Ibrahim (1996) examined that when from incomplete data is proposed by Dempster, the missing covariates are categorical, a useful Laird and Rubin. They proposed an algorithm technique for obtaining parameter estimates is which is named as EM algorithm because it the EM algorithm by the method of weighs involves expectation step (E-step) and proposed in Ibrahim (1990). This method maximization step (M-step) in each iteration. requires the estimation of many nuisance Little and Scheluchter (1985) discussed the parameters for the distribution of the covariates. maximum likelihood estimation procedure for In this paper, the distribution of K-dimensional mixed continuous and categorical data with covariate vector = ( ,…, ) can be written through a series of one-dimensional conditional missing values. The general location model of 푥 푥1 푥푘 Olkin & Tate (1961) and extensions introduced by distributions, as Krzanowski (1980, 1982) formed the basis for this α = α method. ()1 ,..., k ( k 1 xxxPxxP −1 ,,..., kk ) IJSER ( − xxxP α −− ) (),...,,..., ()xPxxP αα Baker and Laird (1988) developed the process of k 11 kk 12 11112 regression analysis for categorical variables with where is a vector of indexing parameters for outcome subject to non-ignorable non-response. the th 푘conditional distribution, = ( ,…, ), They developed a log-linear model for 훼 and the ’s are distinct. They used 1the EM푘 categorical response subject to non-ignorable 훼 훼 훼 algorithm푘 by푘 the method of weights where the non-response. To illustrate model development, 훼 weights are the posterior probability of the they considered as cross-classification of missing values. Unfortunately there are often too covariates indexed by = 1, … … , , as 푋 many probabilities to estimate in a saturated polychotomous outcome indexed by = 푥 푠 푌 model for ( ,…, | ) when there are many 1, … … , , and as a dichotomous response 푦 covariates with1 missing푘 values. In that situation, mechanism indexed by ( = 1 = ; = 푃 푥 푥 훼 푞 푅 the model becomes complicated. 2 = . Let be the joint probability 푟 푟 푟푒푠푝표푛푠푒 푟 of outcome푛표 푟푒푠푝표푛푠푒 and response푝푦푟│푥 mechanism conditional Ibrahim, Lipsitz and Chen (1999) developed a on x. Let be the observed counts for the method for missing covariates in Generalized Linear Model when the missing data mechanism completely 푥푦classified1 data of the respondents, and let 푧 denote the observed counts for the is non-ignorable. They used a multinomial model 푧푥+2 IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2133 ISSN 2229-5518 for the missing data indicators and proposed a ( ) joint distribution for them which can be written 푡 푄�훾�훾 � as a sequence of one-dimensional conditional = 푛 ,( ) log[ { | ( ), }] ,( ) distributions, with each one-dimensional � � 푤푖푗 푡 푝 푦푖 푥푖 푗 훽 푖=1 푥푚푖푠 푗 conditional distribution consisting of a logistic + 푛 ,( ) log[ { ( )| }] regression. They allowed the covariates to be ,( ) 푖푗 푡 푖 either categorical or continuous. Suppose that � � 푤 푝 푥 푗 훼 푖=1 푥푚푖푠 푗 ( , ),…, ( , ) are independent observations, + 푛 ,( ) log[ { | , ( ), }] where1 1 each 푛 is푛 the response variable and each ,( ) 푥 푦 푥 푦 � � 푤푖푗 푡 푝 푟푖 푦푖 푥푖 푗 휑 × 1 푖=1( )푥푚푖푠 푗 ( ) ( ) ( ) ( ) ( ) is a 푖random vector of covariates. The = + + missing data푦 mechanism is defined as the 1 푡 2 푡 3 푡 푥푖 푝 푄 �훽�훾 � 푄 �훼�훾 � 푄 �휑�훾 � distribution of the × 1 random vector , whose The weights ,( ) are the conditional probabilities 푖푗 푡 corresponding to th component equals 1 if is observed for 푤 푝 푟푖 subject and is 0 if is missing. Here = , , , , , and are given by 푘 푟푖푘 푥푖푘 ,…, is a × 1 vector of regression �푥푚푖푠 푖│푥표푏푠 푖 푦푖 푟푖 훾� 푖푘 훽 ,( ) 푖 ′ 푥 1 푝 ( ) ( ) coefficients, and are considered as nuisance =푖푗 푡 ( ), , . ( ), �훽 훽 � 푝 푤 , , , , parameters. In this paper for categorical 푡 푡 푖 푚푖푠 푖 표푏푠 푖 푚푖푠 푖 표푏푠 푖 푝�푦 │푥 푗 푥 훾 � 푝�푥( ) 푗 푥 │훾 � covariates, 훼 휑 . , , ( ), , , ( ) 푡 ( ) 푖 푖 푚푖푠 푖 ( )표푏푠, 푖 . ( ) ÷푝�푟 │푦 푥 푗 푥 훾 � ( , , | , , ) = ( | , ) ( | ) ( | , , ) 푡 ( ) 푡 푖 . 푖 , ( ), 푖 , ( ) 푝�푦 │푥 푗 훾 � 푝�푥 푗 │훾 � 푝 푦푖 푥푖 푟푖 훽 훼 휑 푝 푦푖 푥푖 훽 푝 푥푖 훼 푝 푟푖 푦푖 푥푖 휑 � 푡 which leads to the complete data log-likelihood 푥푚푖푠 푖 푗 푝�푟푖│푦푖 푥푖 푗 훾 � and the M-step is, ( ) = 푛 ( , , , ) ( ) ( ) = 푛 푙IJSER훾 � 푙 훾 푥푖 푦푖 푟푖 푡 푡 푖=1 푄�훾�훾 � � 푄푖�훾�훾 � 푖=1 { , ( ), , } = 푛 log{ ( | , )} + log{ ( | )} = 푛 ,( ) 푖 푖 푖 푖 푖 푖 푖 ,( ) 휕푙 훾 푥 푗 푦 푟 � 푝 푦 푥 훽 푝 푥 훼 � � 푤푖푗 푡 푖=1 푖=1 푥푚푖푠 푗 + log{ ( | , , )} 휕훾 But this procedure is so much complicated and 푖 푖 푖 푝 푟 푦 푥 휑 time consuming. That is why, an easy and simpler method is proposed in this paper. where = ( , , ) and ( , , , ) is the contribution to the complete data log-likelihood 3. Parameter Estimation: Logit Link 훾 훽 훼 휑 푙 훾 푥푖 푦푖 푟푖 for the th observation. Function in the Generalized Linear Model Then they푖 used the EM algorithm by the method For any binomial variable, Y, the probability of weights where the E-step is, mass function can be expressed as (McCullagh, P. and J.A. Nelder. 1989) ( ; , ) = (1 ) 푦 푛−푦 푌 푛 푓 푦 휃 휑 �푦� 휋 − 휋 IJSER © 2013 http://www.ijser.org International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 2134 ISSN 2229-5518 = + (1 ) + 1 ( ) 휋 = 푛 ( ) 푖 (3.1)푛 푒푥푝 �푦푙푛 �1−휋� 푛푙푛 − 휋 푙푛 � �� 푖 푑푏 휃 푖 푦 � �푦 − 푖 � 푥 푎 휑 푖=1 푑휃 Therefore, from (3.1), for the binomial 1 = 푛 [ ] distribution ( ) � 푦푖 − 휇푖 푥푖 푎 휑 푖−1 Consequently, we can find the maximum = , = , 1 1 +휃 휋 푒 likelihood estimates of the parameters by solving 휃 푙푛 � � 휋 휃 − 휋 푒 ( ) = (1 ), the system of equations 푏 휃 −푛푙푛 − 휋 ( ) = 1, ( , ) = , [ ] = 0 (3.3) ( ) 푛 1 푛 푎 휑 푐 푦 휑 푙푛 � � 푎 휑 ∑푖−1 푦푖 − 휇푖 푥푖 ( ) ( ) 푦 ( ) = = . For the binomial distribution ( ) = 1, so (3.3) 푑푏 휃 푑푏 휃 푑휋 퐸 푦 becomes 푎 휑 where 푑휃 푑휋 푑휃 [ ] = 0 (3.4) 푛 = = (1 ) (3.2) ∑푖−1 푦푖 − 휇푖 푥푖 휃 휃 2 푑휋 푒 푒 Thus, the maximum likelihood estimate of β is 휃 휃 푑휃 1+푒 − �1+푒 � 휋 − 휋 For the exponential family, the log likelihood = ( ´ ) ´ (3.5) function corresponding to a random sample of −1 −1 −1 훽̂ 푋 푉 푋 푋 푉 푧 size n is It is interesting to note similarity of (3.5) to the expression obtained in standard regression { ( )} IJSERmodel.

Generalized Linear Model for Binary Data with Missing Values: an EM Algorithm Approach

Generalized Linear Models (Glms)

The Hexadecimal Number System and Memory Addressing

Generalized Linear Models

Modelling Binary Outcomes

Section II Descriptive Statistics for Continuous & Binary Data (Including

Yes, No, Maybe So: Tips and Tricks for Using 0/1 Binary Variables Laurie Hamilton, Healthcare Management Solutions LLC, Columbia MD

Data Representation

Intro to Systems Digital Logic

Nand2tetris Chapters 1

Chapter 4: the Building Blocks: Binary Numbers, Boolean Logic, and Gates

Binary Response and Logistic Regression Analysis

MARIE: an Introduction to a Simple Computer