Multilevel Logistic Regression Analysis Applied to Binary Contraceptive Prevalence Data

Journal of Data Science 9(2011), 93-110 Multilevel Logistic Regression Analysis Applied to Binary Contraceptive Prevalence Data Md. Hasinur Rahaman Khan and J. Ewart H. Shaw University of Warwick Abstract: In public health, demography and sociology, large-scale surveys often follow a hierarchical data structure as the surveys are based on multistage strati¯ed cluster sampling. The appropriate approach to analyzing such survey data is therefore based on nested sources of variability which come from di®erent levels of the hierarchy. When the variance of the residual errors is correlated between individual observations as a result of these nested structures, traditional logistic regression is inappropriate. We use the 2004 Bangladesh Demographic and Health Survey (BDHS) contraceptive binary data which is a multistage strati¯ed cluster dataset. This dataset is used to exemplify all aspects of working with multilevel logistic regression models, including model conceptualization, model description, understand- ing of the structure of required multilevel data, estimation of the model via the statistical package MLwiN, comparison between di®erent estimations, and investigation of the selected determinants of contraceptive use. Key words: BDHS, CPR, cluster, division, MCMC, MLwiN, multilevel, MQL, PQL, TFR. 1. Introduction Bangladesh is the most densely populated country in the world. The country has currently a population about 150 million, with a corresponding population density of 939 per square kilometer and growth rate of 1.42% (M. Anwarul Iqbal, 2008). In the second half of the last century, the population grew extraordinarily rapidly, tripling during the period, whereas during the entire ¯rst half of the century the population increased by only 45%. Family planning was introduced in Bangladesh in the early 1950s. The policy to reduce fertility rates has been repeatedly rea±rmed by the Government of Bangladesh since liberation in 1971. During the mid 1970s, the contraceptive prevalence rate (CPR) was less than 10% and the total fertility rate (TFR) was more than 6 births per women (Islam and Islam, 1993). The subsequent last two rounds of the BDHS, in 1999¡2000 94 Md. H. R. Khan and J. E. H. Shaw and 2004, found CPRs of 54% and 58%, respectively, the TFRs for those years were 3.3 and 3.0 (NIPORT, 1999-2000, 2005). The association between estimated levels of contraceptive prevalence and the level of fertility is very close in the higher fertility countries like Bangladesh (Cur- tis and Diamond, 1995). As per the widely accepted correlation between CPR and TFR, a rise of 9% points in CPR has been seen to be accompanied by a fall of 0.64 in the TFR (Mauldin and Segal, 1988). Bangladesh still lacks such an estimate, which raises questions about the country's fertility and contraceptive dynamics and prospects for future fertility decline. Use of contraception is the main contributor of fertility declining, as has been shown by many research workers (Cleland et al., 1994; R. Amin et al., 1994; Rani and Radheshyam, 2007). Fertility decline should continue if the wider use of contraception continues in all levels and groups of people in Bangladesh. It is critical for family planning workers to continue to meet the needs of existing family planning users, and also to address unmet need for family planning since individual tastes, interests, behaviours, etc. di®er from one unit to another within each level, owing to variability among various socioeconomic and geographical factors such as religion, culture, income, place of residence, education, occupation, mass media access, administrative and social facilities, and so on. That is why their e®orts and approaches do not seem to be equally e®ective, evenly served or acknowledged in some areas. As a result, the e®ectiveness of the program varies considerably. It is necessary to assess the within- and between-level variation, and to estimate the true e®ect of the above-mentioned factors on CPR, in order to implement more e®ective future family planning policies that target particular units at various levels of the hierarchy. This paper highlights the importance of multilevel analysis using logistic regression models for studying contraceptive prevalence in Bangladesh from the multistage clustered 2004 BDHS data. The paper aims to investigate the selected factors a®ecting the regulation of fertility through contraception in the context of multilevel modeling. It also aims to measure the inﬂuence of the com- bination of the selected factors on the current contraceptive practice of women in Bangladesh, and emphasis is given to exploring the true e®ect of the factors on the contraceptive prevalence taking into consideration the e®ect of the levels. The analysis is mainly carried out using MLwiN (Rasbash et al., 2004). 2. The Multilevel Model 2.1 Multilevel analysis for multistage clustered data In multilevel research, the structure of data in the population is hierarchical, and a sample from such a population can be viewed as a multistage sample. Multilevel Logistic Regression Analysis 95 Because of cost, time and e±ciency considerations, strati¯ed multistage samples are the norm for sociological and demographic surveys. For such samples the clustering of the data is, in the phase of data analysis and data reporting, a nuisance which should be taken into consideration. However, these samples, while e±cient for estimation of the descriptive population quantities, pose many challenges for model-based statistical inference. This clustering sampling scheme often introduces multilevel dependency or correlation among the observations that can have implications for model parameter estimates. For multistage clustered samples, the dependence among observations often comes from several levels of the hierarchy. The problem of de- pendencies between individual observations also occurs in survey research, where the sample is not taken randomly but cluster sampling from geographical areas is used instead. In this case, the use of single-level statistical models is no longer valid and reasonable. Hence, in order to draw appropriate inferences and con- clusions from multistage strati¯ed clustered survey data we may require tricky and complicated modeling techniques like multilevel modeling, and very often the computation required for this is not straightforward and is not very time consuming as currently there is a number of software packages. The 2004 BDHS data set used for this study is based on multistage strati- ¯ed cluster sampling. The appropriate approach to analyzing contraceptive data from this survey is therefore based on nested sources of variability. Here the units at lower level (level-1) are individuals (ever-married women aged 10¡49) who are nested within units at higher level (clusters: level-2) and the clusters are again nested within units at the next higher level (divisions: level-3). Clusters are primary sampling units (PSU) de¯ned by the National Census of 1981, and correspond approximately to village in rural areas. All clusters are approximately of equal size in terms of area. On the other hand Divisions are administrative areas each of which consists of a number of sub-administrative areas called Zila. Due to this nested structure, the odds of women experiencing the outcome of interest are not independent, because women from the same cluster may share common exposure to community characteristics. The response variable in this study is \currently using contraception" which is binary and hence multilevel logistic regression model is a natural choice for modeling. Traditional logistic regression (which, in multilevel analysis terms, is single-level) requires the assumptions: (a) independence of the observations conditional on the explanatory variables and (b) uncorrelated residual errors. These assumptions are not always met when analyzing nested data. But the multilevel logistic regression analysis consider the variations due to hierarchy structure in the data. It allows the si- multaneous examination of the e®ects of group level (cluster and division) and individual level variables on individual level outcomes while accounting for the 96 Md. H. R. Khan and J. E. H. Shaw non-independence of observations within groups. Also this analysis allows the examination of both between group and within group variability as well as how group level and individual level variables are related to variability at both levels. 2.2 Parameter estimation The most common methods for estimating multilevel logistic models, used in this study, are based on likelihood. Among the methods, Marginal Quasi Like- lihood (MQL) (Goldstein, 1991; Goldstein and Rasbash, 1996) and Penalized Quasi Likelihood (PQL) (Laird, 1978; Breslow and Clayton, 1993) are the two prevailing approximation procedures. After applying these quasi likelihood methods, the model is then estimated using iterative generalised least squares (IGLS) or reweighted IGLS (RIGLS) (Goldstein, 2003). Second-order PQL method has been used throughout the multi-level analyses since this method approximates well compared to the other PQL and MQL methods (Goldstein, 2003). Details of the PQL method are given below. Bayesian methods using Markov chain Monte Carlo (MCMC) have also been used for parameter estimation. Penalized quasi-likelihood The PQL estimation procedure is described here for two level logistic regression models. Consider a level-1 outcome Yij taking on a value of 1 with conditional probability pij. Then the logit model or the generalized linear model is, · ¸ pij T T ln = ´ij = Xij γ + Zij uj 1 ¡ pij for level-1 unit i nested within level-2 unit j. At level 1, we assume Yij condition- ally distributed as Bernoulli, while the random e®ects vector uj is distributed as 2 2 N(0; σu) across the level-2 units. Let us consider the variance σu as T throughout this PQL estimation procedure. The PQL approach can be derived as a nonlinear regression model. In the case of binary outcomes with logit link, we start with the level-1 model Yij = pij + eij; (2.1) where E(eij) = 0 and V ar(eij) = pij(1 ¡ pij).

Multilevel Logistic Regression Analysis Applied to Binary Contraceptive Prevalence Data

A Primer for Analyzing Nested Data: Multilevel Modeling in SPSS Using

A Bayesian Multilevel Model for Time Series Applied to Learning in Experimental Auctions

Introduction to Multilevel Modeling

An Introduction to Bayesian Multilevel (Hierarchical) Modelling Using

3 Diagnostic Checks for Multilevel Models 141 Eroscedasticity, I.E., Non-Constant Variances of the Random Eﬀects

Multilevel Analysis

Multilevel Linear Models: the Basics

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Multilevel Models for Repeated Binary Outcomes: Attitudes and Vote Over the Electoral Cycle

User-Friendly Bayesian Regression Modeling: a Tutorial with Rstanarm and Shinystan

Fundamentals of Hierarchical Linear and Multilevel Modeling 1 G

Actuarial Applications of Hierarchical Modeling