ANALYSIS OF BINARY DEPENDENT VARIABLES USING LINEAR PROBABILITY MODEL AND : A REPLICATION STUDY

Submitted by Lutendo Vele

A thesis submitted to the Department of Statistics in partial fulfilment of the requirements for Master degree in Statistics in the Faculty of Social Sciences

Supervisor Harry J. Khamis

Spring, 2019 ABSTRACT

Linear Probability Model (LPM) is commonly used because it is easy to compute and interpret than with logits and probits even though the estimated probabilities may fall outside the 0,1 interval and the linearity concept does not make much sense when deal- ing with probabilities. This paper extends upon the results of Luca, Owens, and Sharma (2015) reviewing the use of LPM to examine if alcohol prohibition reduces domestic vi- olence. Regular LPM resulted in inconclusive estimates since prohibition was omitted due to collinearity as controls were added. However Luca et al. (2015) had results, and further inspection on their regression commands showed that they ran a , then a post-estimation on residuals and further used residuals as a dependent variable hence the results were different from the regular LPM. Their method still resulted in unbounded predicted probabilities and heteroscedastic residuals, thus showing that OLS was inefficient and a non-linear binary choice model like logistic regression would be a better option. Logistic regression predicts the probability of an outcome that can only have two values and was therefore used in this paper. Unlike LPM, logistic regression uses a non-linear function which results in a sigmoid bounding the predicted outcome between 0 and 1. Logistic regression had no complication; thus logistic (or any another non-linear dichotomous dependent variable models) regression should have been used on the final analysis while LPM is used at a preliminary stage to get quick results.

Keywords : binary choice models, logistic regression, linear probability model, forbid- den regression, binary dependent variables, dichotomous variables, residuals as dependent variables

i Contents

1 Introduction 1

2 Data 2 2.1 Descriptive Statistics ...... 2 2.2 Treatment of Missing Data ...... 4

3 Methodology 5 3.1 Linear Probability Model ...... 5 3.1.1 Assumptions of Linear Probability Model ...... 6 3.1.2 Critics on Linear Probability Model ...... 6 3.2 Logistic Regression ...... 7 3.2.1 Assumptions of Logistic Regression ...... 8 3.2.2 Critics on Logistic Regression ...... 8

4 Results 8 4.1 Linear Probability Model Results ...... 8 4.2 Logistic Regression Results ...... 10 4.2.1 Sample Size Analysis ...... 10 4.2.2 Examining the odds that the husband drinks ...... 11 4.2.3 Examining the odds that the husband beats his wife ...... 18

5 Conclusion 25

References 27

ii 1 Introduction

Data analysis, which is grounded in statistics with a long history, has played an important role in different domains through its process that begins with data collection to analysis to answer the research question(s). To answer a research question, one needs to study different factors, thus the dependent variable and independent variables. Regression anal- ysis is a mathematical process that guides in answering questions like which variables are most significant and how do those variables interact with each other. However, variables come in two main groups, each with further classifications; Categorical, also known as qualitative or discrete variables which are further classified into nominal, dichotomous and ordinal, and Continuous, also known as quantitative variables which are classified into interval and ratio. Therefore, different analysis or modelling methods are needed to model different dependent variable types.

This paper will focus on regression models used on dichotomous dependent variables, thus binary choice models. Dichotomous variables take the value 1, which may represent success or 0 representing failure. When modelling such a variable, one quickly think in terms of probabilities. For example, what is the probability that a married man living in a state with alcohol prohibition, is religious, has a certain level of education, and works a white collar job drinks alcohol?

Linear Probability Model (LPM) and Logistic Regression are some of the models es- timated when the regression model has a dichotomous dependent variable. Regardless of the critics discussed by Maddala (1983), LPM is one of the most applied statistical models in social sciences because of its easy interpretability and computation speed. Lo- gistic regression, introduced in the late 1960s early 1970s, addresses the critics discussed by Maddala (1983) and as its model is fit by an iterative process of the maximum likeli- hood it was expensive to estimate; as a result LPM remain favourable especially in the early years, however, with the improvement of computer technology, the linear proba- bility model lost favour. Since LPM violates probability boundaries and thus result in somewhat meaningless predictions, LPM can be used as the first step in the dichotomous dependent variable analysis. Amemiya (1981, p. 1486–1487) in his survey on qualitative response models, concerning LPM, stated: “it has frequently been used in economet- ric applications, especially in the early years, because of its computational simplicity. Though I do not recommend its use in the final stage of a study, it may be used for the purpose of obtaining quick estimates in a preliminary stage”.

This paper extends upon the results of Luca et al. (2015). The authors used the linear probability model to examine both the effect of prohibition on the drinking behaviour of husbands and the impact of prohibition on domestic violence. However, nothing is mentioned regarding the two main problems with the LPM were unbounded probability predictions are possible, and linearity does not make much sense conceptually, and this resulted in enough curiosity to replicate their study. Therefore, the objectives of this thesis are to give a brief overview of the linear probability model & logistic regression and a review using applications to see if the logistic model would be preferable over LPM.

This paper contains an overview of LPM and logistic regression along with applications of each. Section 2 contains the data description. Section 3 covers the basic preliminary

1 ideas surrounding LPM and logistic regression, their assumptions and critics, including how they differ. Followed by a summary of the results reported in section 4. Finally concluding remarks on section 5.

2 Data

This thesis investigates if the use of LPM in an article by Luca et al. (2015) was the best binary choice regression model to answer the research questions given that LPM has weaknesses which may result in meaningless predictions. The same datasets used by the authors were provided when they published their paper and were therefore used in this thesis. A couple of datasets were used and first a panel dataset containing the evolution of alcohol prohibition; thus precise laws pertaining to the prohibition of alcohol sales and or consumption and their changes for 17 major states in India from 1980-2010 was compiled. Rich microdata was also collected from the 1998-1999 and 2005-2006 Indian National Family Health Survey (NFHS) to investigate the impact of alcohol prohibition on individual behaviour. Finally Indian Crime Records Bureau (NCRB) for the years 1980-2010 was used to com- plement their individual-data analysis from state-level administrative crime data with a focus on crimes targeted towards women. The 17 major states investigated in the study were not listed, and with the information given regarding them, it is hard also list them as India has 29 states. NFHS has approximately 3 % of missing observations while NCRB has approximately 31 % of missing observations which exceeds the 25 % rule of thumb as proposed by Dermitas et al. Enders (2010, p. 260). The NCRB was used to investigate the effect of prohibition on other types of crimes targeted towards women on the state level, which are continuous variables and therefore excluded in this thesis.

2.1 Descriptive Statistics This subsection contains descriptive statistics of the NFHS variables. The National Fam- ily Health Survey will be used to investigate the impact of alcohol prohibition, and each variable’s description is shown in Table 1 below. Household members’ alcohol consump- tion and women’s experience and attitude towards intimate partner are contained in this dataset. Husband’s and wife’s demographic characteristics like age, education, household size, religion, if he or she works a white-collar job or not and whether their household is in an urban area or not are part of the dataset and are used collectively during the analysis as husband controls and wife controls to see if their inclusion into the model has an effect on the estimate. The number of children in the house, whether or not the husband is justified in beating his wife if he suspects her of cheating and whether or not the wife has money of her own were variables grouped as bargaining controls. Lastly, the age categories of both husband and wife and also their age gap and educational gap were also used respectively. The information provided by these variables suggests that they may affect the husband’s drinking and or violent behaviour.

2 Table 1: Indian National Family Health Survey Variable Description

Variable Description year Year of Interview rep age Respondent’s Current Age rep educ Education in Single Years hhsize Number of Household Members children Numder of Living Children stwt State Individual Weight husb beat Hus band has Beaten Respondent rep wc Respondent Works a White Collar Job husb wc Husband Works a White Collar Job urban Household in Urban Area religion Religious Affiliation State State of Residence husb age Husband’s Current Age husb educ Husband’s Years of Schooling husb drink Husband Drinks Alcohol r educ Respondent’s Education / Husband’s Education r age Respondent’s Age / Husband’s Age agegap cat Linear Difference in Age educgap cat Linear Difference in Education ownmoney Respondent has Money she alone can decide how to use b unfaithful Husband Justified in Beating Wife if he suspects her of being Unfaithful husb age cat Husband’s current Age Category rep age cat Respondent’s Current Age Category prohib State Prohibition literacy Literacy Rate purban Percent Urban pcgdp Per Capita GDP unemp Unemployment Rate health % of Expenditure spent on Health Welfare educ % of Expenditure spent on Education pcpolice Total Police officers per 1000 State population pmale Percent Male pcpolice exp Total Police Expenditure per 1000 State population

To summarise variables discussed above, Table 2 below is used which shows some NFHS variable descriptive statistics. The NFHS dataset was collected between 1980 and 2012 over 17 major Indian states and has 33 variables and 3% of missing observations. The instance, from the table we see respondents were wives whose ages varied from 15 years to 49 years with the mean age being approximately 34 years while the husbands’ ages vary from 16 years to 60 years with the mean age being approximately 40 year.

3 Table 2: Indian National Family Health Survey Descriptive Statistics

Statistic N Mean St. Dev. Min Max #NAs year 109,936 - - 1980 2012 0000 rep age 108,930 33.819 8.023 15.000 49.000 1006 rep educ 108,905 4.110 4.860 0.000 99.000 1031 hhsize 108,930 5.351 2.196 1.000 35.000 1006 children 108,930 2.919 1.695 0.000 15.000 1006 stwt 108,930 - - 0.000 6,320,908 1006 husb beat 98,876 - - 0.000 1.000 11060 rep wc 108,930 - - 0.000 1.000 1006 husb wc 108,930 - - 0.000 1.000 1006 urban 108,930 - - 0.000 1.000 1006 religion 108,889 - - 1.000 5.000 1047 State 109,936 - - 1.000 35.000 0000 husb age 108,930 39.824 8.806 15.000 60.000 1006 husb educ 108,839 6.691 7.357 0.000 99.000 1097 husb drink 98,871 - - 0.000 1.000 11065 r educ 108,814 0.932 1.201 0.010 18.000 1122 r age 108,930 0.852 0.102 0.300 2.467 1006 agegap cat 108,930 - - 1.000 9.000 1006 educgap cat 107,991 - - 1.000 7.000 1945 ownmoney 108,930 - - 0.000 1.000 1006 b unfaithful 108,930 - - 0.000 1.000 1006 husb age cat 108,930 - - 1.000 8.000 1006 rep age cat 108,930 - - 1.000 6.000 1006 prohib 91,630 - - 0.000 1.000 18306 literacy 107,828 65.898 8.847 23.946 90.920 2108 purban 107,828 30.625 17.544 5.936 97.504 2108 pcgdp 109,695 14,049.360 6,917.924 1,800.000 51,000.360 241 unemp 107,828 3.355 2.130 −0.330 24.917 2108 health 100,671 4.316 1.045 1.500 13.870 9265 educ 100,641 16.953 4.596 6.400 34.080 9295 pcpolice 104,470 1.898 1.601 0.000 16.412 5466 pmale 104,214 0.503 0.032 0.395 0.593 5722 pcpolice exp 104,138 25.689 26.496 4.105 224.022 5798

2.2 Treatment of Missing Data Missing observations when analysing data is a universal problem, more especially in studies in which surveys are used to collect data as respondents may leave out some questions unanswered on the questionnaire or refuse to answer some questions. NFHS has approximately 3 % of missing observations with variable prohibition with about 17 % of missing which is well below the rule of thumb of 25 % as proposed by Dermitas et al. Enders (2010, p. 260). Figure 1 below shows missing patterns. Based on the missing patterns and that the dataset has only about 3% of missing observations which is below

4 the rule of thumb, the dataset was used as it is and not impute missing observations.

Figure 1: Missing Pattern of NFHS data

3 Methodology

There are several binary choice models and only LPM (Linear Probability Model) and logistic regression are discussed in this thesis. A brief overview of LPM and logistic re- gression are presented and applied to our data to review their performance. Five models will be estimated for both LPM and logistic regression.

3.1 Linear Probability Model LPM is a linear regression model applied to dichotomous dependent variables. Ordinary (OLS) is used to estimate the parameters of LPM which uses a linear func- tion of the independent variables. This means that LPM is linear and raises questions on it’s the ability to bound the estimated probabilities between [0, 1] for meaningful esti- mates. However, LPM is commonly used due to its easy interpretation and computation.

Given a random sample with k parameters and N number of observations, consider the

5 following linear regression model

y = β + β x + ... + β x + µ 1 2 2 k k (1) = xβ + µ where β is a K × 1 vector of parameters and x is a N × K matrix of explanatory variables, and µ is a residua with zero mean and constant variance assumptions. As it is a probability model, in order to interpret the results in terms of probability we take expectations on both sides of size of equation 1 to get,

E(y|x; β) = xβ (2)

3.1.1 Assumptions of Linear Probability Model 1. A linear relationship between the dependent and independent variables 2. Homoscedasticity 3. Multivariate normality 4. The data has little or no multicollinearity 5. No autocorrelation 6. Outliers/influential cases The assumptions above will not be formally tested, but results from the application of LPM on our data will outline violation of these assumptions.

3.1.2 Critics on Linear Probability Model Criticisms of the linear probability model discussed by Maddala (1983) are that; the dis- turbances in the LPM are heteroscedastic, therefore least squares is not efficient, the error term is not distributed normally, so there exist non-linear procedures more efficient than least squares, and predicted probabilities from the LPM could lie outside the 0-1 interval. Angrist and Pischke (2009, p. 103) regarding linear regression said: “... may generate fitted values outside the limited dependent variable boundaries. This fact bothers some researchers and has generated a lot of bad press for the linear probability model.” These disparagements are what this paper will examine as unbounded predictions may be mean- ingless since the model is used to model probabilities

Two LPM will be used to estimate the effect of alcohol prohibition on the husband’s drinking behaviour and domestic violence as described by Luca et al. (2015).

FS HusbandDrinkshsy = γy + P rohibitionsyβ + X syδ + H hsyθ + W hsyτ + µhsy, (3)

RF 0 0 0 DomesticV iolencehsy = γy + P rohibitionsyβ + X syδ + H hsyθ + W hsyτ + ωhsy (4)

Equation (3) examines the effect of alcohol prohibition on husband’s drinking be- haviour and equation (4) examines the impact of alcohol prohibition on domestic vio- lence, where γy are survey year fixed effects, Prohibitionsy is a binary variable equal to 1 if state s has alcohol prohibition in survey year y, H hsy and W hsy include a host of sociode- mographic characteristics of the husband and wife belonging to household h, including

6 their age, education, religion, and whether he or she works in a white-collar occupation. In some specifications, variables to help capture the wife’s bargaining power within the household, including whether she has money of her own that she can control and whether she believes that her spouse is justified in beating her if he suspects her of being un- faithful were included. Along the same vein, to proxy for the wife’s relative wage (since actual wage data are not available) by including the spousal age and education gap, both as ratios and as fixed effects were attempted. Because none of the states changed its prohibition status across the two sample waves. A matrix of state-level controls, X sy, was included to capture systematic differences between states that could be correlated with both drinking and violent behaviours, including the state literacy rate, urbanisation, per capita GDP, the unemployment rate, police and police expenditure per capita, the percent of adults who are male, and the state health and education expenditure per capita.

3.2 Logistic Regression Logistic regression predicts the probability of an outcome that can only have two values. Unlike LPM, logistic regression uses a non-linear function which result in a curvature bounding the predicted outcome between 0 and 1. Consider a dichotomous response model, P r(y|x) = G(xβ) (5) where G is a function which only takes the values between 0 and 1. Logistic distribu- tion is the commonly used non-linear (G) function resulting in a logit model, exp(xβ) G(xβ) = (6) 1 + exp(xβ)

Maximum Likelihood (ML) is used to estimate logistic regression. For a random sample size N , the ML estimate of β is the vector βˆML which gives the maximum likelihood of observing the sample {y1, y2, . . . , yN }, conditional on explanatory variables x. Assume the probability of success, yi = 1 is G(xβ) and the probability of failure, yi = 0 is 1 − G(xβ)). Then MLE of the β is,

N N Y Y L(y|x; β) = G(xiβ) 1 − G(xiβ) i=1 i=1 (7) N Y (yi) (1−yi) = G(xiβ) 1 − G(xiβ) , i=1 and the log likelihood is,

N N Y Y ln L(y|x; β) = G(xiβ) 1 − G(xiβ) i=1 i=1 (8) N   X   = yi ln G(xiβ) + (1 − yi) ln 1 − G(xiβ) , i=1 and this log likelihood function is maximised by the MLE of β.

7 3.2.1 Assumptions of Logistic Regression 1. A linear relationship between the continuous explanatory variables have a linear rela- tionship with the logit of the outcome variable 2. The data has little or no multicollinearity 3. No autocorrelation The assumptions above will not be formally tested.

3.2.2 Critics on Logistic Regression Logistic regression uses ML for estimation; thus, ML iterative process is used to fit the model, which makes it slower compared to LPM. Heteroscedasticity makes the MLE of the parameter vector biased and inconsistent Greene (2012, p. 733), unless the likelihood function is modified to correctly take into account the precise form of heteroscedasticity. Interpretability wise, the odds ratio, log of odds and coefficients are hard to understand and interpret. Logistic regression estimates will be severely biased in a panel model with fixed effects and a short time dimension.

Two models as those used in LPM will also be estimated for the logistic with the left side being logistic(Husb-drinks) and logistic(Husb-beats) respectively.

4 Results

In this section, results from both LPM and logistic regression are presented and discussed. Husband controls are husband’s demographic characteristics; age, education, household size, whether he works a white-collar job or not, and whether the household is located in an urban area or not. Wife controls are also demographic characteristics, thus age, education, and whether she works a white-collar job or not. Bargaining controls are the number of children, whether she thinks the husband is justified in beating her if he suspects her of cheating or not and whether or not she has her own money. Finally, there is the husband and wife’s age category and age and educational gap controls.

4.1 Linear Probability Model Results Table 3 below shows the results from estimating the impact of alcohol prohibition on alcohol consumption and domestic violence using LPM, thus equation (3) and (4) re- spectively. Panel A examines the relationship between the likelihood that the husband drinks and if the state has an alcohol prohibition policy. Husbands who are legally pro- hibited from drinking are 16% less likely to drink compared to husbands who are not legally prohibited from drinking. As controls were added on the model (2) through (5) prohibition was omitted because of collinearity. However,Luca et al. (2015) reported re- sults for the aforementioned models (see their results on the same table, Table 3 Panel A Husband drinks Luca et al. (2015)). Upon further investigation, it became clear that they have run a post-estimation and further used residuals as a dependent variable to estimate the likelihood of husband drinking which results in biased parameter estimates, this is also referred to as forbidden regression. Freckleton (2002) showed that the typical implementation of this procedure leads to biased parameter estimates and Chen, Hribar, and Melessa (2018) also supported this and showed that the typical implementation of

8 Table 3: The impact of alcohol prohibition on alcohol consumption and domestic violence

Model (1) (2) (3) (4) (5) Dependent Variable Panel A Husband Drinks −0.1560∗∗∗ 0 0 0 0 (0.0470) - - - -

Husband Drinks Luca et al. (2015), −0.1560∗∗∗ −0.1350∗∗∗ −0.132∗∗∗ −0.1310∗∗∗ −0.1350∗∗∗ (0.0470) (0.0357) (0.0347) (0.0349) (0.0339)

Panel B Wife reports domestic violence −0.0840∗∗ 0 0 0 0 (0.0380) - - - -

9 Wife reports domestic violence Luca et al. (2015) −0.0840∗∗ −0.0823∗∗ −0.0788∗ −0.0782∗ −0.0815∗∗ (0.0380) (0.0382) (0.0384) (0.0385) (0.0379)

Husband controls x x x x Wife controls x x x Bargaining controls x x x Husband age group x wife age group x Age and education gap x Observations 77,842 77,748 77,730 77,730 77,228 Notes(as in Luca et al. (2015)): Standard errors presented in parentheses are clustered by state, using the Donald and Lang (2007) two-step adjustment for the small number of clusters (17). All regressions include survey year fixed effects, and state level controls in all regressions include annual measures of the unemployment rate, literacy rate, per cent urban, GDP per capita, and police and police expenditure per capita, and state health and education spending per capita. Individual controls for husband and wife include age, years of schooling, whether he or she belongs to a white-collar occupation, household size, urban residence, religion, and the number of children. To control for her household bargaining power, we include the wife’s attitudes toward domestic violence, whether she has money of her own that she controls, and the wife to husband age and schooling ratios. ∗ ∗ ∗Significant at the 1 per cent level. ∗∗ Significant at the 5 per cent level. ∗ Significant at the 10 percent level. this procedure generates biased coefficients and standard errors that can lead to incorrect inferences, with both Type I and Type II errors. Forbidden regressions produce consis- tent estimates only under rigorous restrictive assumptions which rarely hold in practice (see Wooldridge (2010)). In general, forbidden regressions will not consistently estimate the relationship of interest.

The results in Table 3 Panel B are from estimating the likelihood that the wife reports domestic violence, and if the state has an alcohol prohibition policy. Same specifications were followed as in Panel A, resulting in the same problems of predicted probabilities outside the 0,1 range and exclusion of prohibition because of collinearity as controls were added to models (2) to (5). Model (1) from Panel B shows that alcohol prohibition reduces the likelihood of the husband beating his wife by 8.4% as compared to the sample mean of 17%. Armed with this, a homoscedasticity test was performed, and predicted probabilities were reviewed on each model used by the authors. The residuals were heteroscedastic and pre- dicted probabilities were unbounded, resulting in probabilities below zero. These results indicate that using a different non-linear binary choice regression model would have been better.

4.2 Logistic Regression Results Logistic Regression results are presented and discussed in this section. Since the LPM, a linear model resulted in unbounded predicted probabilities, the omission of prohibition in the model due to collinearity, and heteroscedastic residuals, it means linear or OLS is inefficient for these models. Hence logistic regression, which is non-linear, is used since it is sigmoid; thus bounds predicted probabilities and heteroscedasticity test is too sensitive given that logistic regression has no error term. Odds ratios, Robust standard errors and 95% confidence intervals are presented in tables with, ∗ ∗ ∗, ∗∗, ∗ Significant at the 1, 5, and 10 per cent level respectively. Moreover, margin plots are also used to elaborate on the results.

4.2.1 Sample Size Analysis Sample size analysis is performed to ensure the reliability of logistic regression analysis. Peduzzi, Concato, Kemper, Holford, and Feinstein (1996) on their simulation study of the number of events per variable in logistic regression analysis, found that logistic models with low events per variable lead to major problems including biased regression estimates. As a sample size guideline, Peduzzi et al. (1996) ’s work is used to show the minimum sample size in each model. n = 10k/p (9) where 10 is the suggested lowest number of events per variable, k is the number of covari- ates, and p is the smallest proportion of positive cases in the population. Of the 98 871 observations of the variable husb drink, the proportion of husbands who drink alcohol is 16% (=k); therefore, the population has 15 819 events of husbands who drink alcohol. On the other, hand, variable husb beat has 98 876 observations, and the proportion of husbands who beat their wives is 33%(=k); therefore, the populations has 36 629 events of husbands who beat their wives.

10 4.2.1.1 Sample size analysis for the odds that husband drinks. Model 1 has 12 covariates. The minimum sample size should, therefore, be 10*12/0.16 = 750. Model 2 through 5 had 18, 24, 26, 26 covariates, respectively, and the minimum sample sizes were 1125, 1500, 1625, and 1625, respectively. Since the population has 15 819 events of drinking husbands, the sample size is large enough for all models.

4.2.1.2 Sample size analysis for the odds that the husband beats his wife. Model 1 has 12 covariates. The minimum sample size should, therefore, be 10*12/0.33 = 364. Model 2 through 5 had the same number of covariates as mentioned in section 4.2.1.1, and the minimum sample sizes were 546, 728, 788, and 788, respectively. Since the population has 36 629 events of husbands who beat their wives, the sample size is large enough for all models.

4.2.2 Examining the odds that the husband drinks Table 4 below shows the first model with year as fixed effect The first model shows that the odds of the husband drinking given that the state has the alcohol prohibition policy is 58% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.2209, 0.8622).

Table 4: The impact of alcohol prohibition on alcohol consumption, model (1)

Model (1) Year as FE Variable OR Robust SE 95% Conf. Interval

Constant 0.3670*** 0.0658 0.2583 0.5214 Prohib 0.4243** 0.1535 0.2088 0.8622

Year 1999 1.0777 0.1552 0.8126 1.4292 2005 1.2425 0.1788 0.9270 1.6474 2006 0.3670*** 0.1932 1.1821 1.9475

Table 5 below shows the two models subsequent to the first model, where the first model is the effect of alcohol prohibition on the husband’s drinking behaviour with the year of the interview as a fixed effect. The second model adds to the first model plus husband controls and religion as a fixed effect and adding wife and bargaining controls to the second model yields the third model.

When husband controls are added in the first model, the odds that the husband drinks given that the state has alcohol prohibition blanket and husband controls are kept at means is 67% lower (CI @95%: 0.2379, 0.4607) than in states without alcohol prohibition blanket. To get the third model, we add wife controls and bargaining controls and the likelihood that the husband drinks given that the state has an alcohol prohibition blanket with husband & wife controls and bargaining controls kept at means is 68% lower (CI @95%: 0.2339, 0.4452) than states without alcohol prohibition policy.

11 Table 5: The impact of alcohol prohibition on alcohol consumption, model (2) and (3)

Model (2) Husband Controls Religion(FE) (3) Wife + Bargaining Controls Variable OR Robust SE 95 Conf. Interval OR Robust SE 95 Conf. Interval

Constant 0.7038* 0.1271 0.4940 1.0027 0.7298* 0.1379 0539 1.0569 Prohib 0.3311*** 0.1686 0.2379 0.4607 0.3227*** 0.0530 0.2339 0.4452 Husb-age 0.9954* 0.0026 0.9902 1.0006 0.9974 0.0054 0.9870 1.0080 Husb-educ 0.9443*** 0.0063 0.9320 0.9567 0.9623*** 0.0058 0.9509 0.9738 Husb-wc 0.6839*** 0.0334 0.6215 0.7525 0.7183*** 0.0337 0.6553 0.7874 Urban 1.2109*** 0.0539 1.1098 1.3213 1.2629*** 0.0600 1.1704 1.4061 Hhsize 0.9826 0.0111 0.9611 1.0047 0.9508*** 0.0072 0.9367 0.9651

Children 1.0570*** 0.0145 1.0291 1.0858

12 Rep-educ 0.9622*** 0.0093 0.9442 0.9806 Rep-age 0.9927 0.0048 0.9833 1.0021 Rep-wc 1.1120* 0.0708 0.9816 1.2598 B.unfaithful 1.0742 0.0571 0.9680 1.1921 Ownmoney 1.1588*** 0.0591 1.0486 1.2805

Year 1999 1.1560 0.1492 0.8977 1.4887 1.1668 0.1469 0.9117 1.4934 2005 1.4724*** 0.1729 1.1697 1.8533 1.5333*** 0.1875 1.2066 1.9484 2006 1.7161*** 0.2007 1.3645 2.1583 1.8253*** 0.2218 1.4385 2.3161

Religion Muslim 0.2061*** 0.0428 0.1372 0.3098 0.1975*** 0.0412 0.1312 0.2974 Christian 1.9655*** 0.3291 1.4157 2.7290 2.1506*** 0.3623 1.5456 2.9925 Sikh 1.9757*** 0.1978 1.6236 2.4041 2.0981*** 0.2125 1.7204 2.5588 Other 1.1768 0.1540 0.9106 1.5208 1.2063 0.1575 0.9339 1.5580 Figure 2 below shows the evolution of the husband’s drinking probability with 95% CIs. From Figure 2(a) shows that the husband’s probability of drinking if the state has the alcohol prohibition policy increased from 0.23 in 1988 to 0.34 in 2006. This result is interesting as with time; one would expect people to get used to the policy and stop drinking. However, there are many reasons for this as it may be that respondents be- came confident and complete the survey truthfully or there are no severe consequences for violating the policy so people go back to drinking, or people begin to brew their own alcohol at home to quench their thirst. The probability of drinking varies within various religious affiliations. Figure 2(b) shows that Muslim husbands are less likely to drink alcohol, followed by Hindus, while Sikh and Christians have the highest probability to drink if the state has the alcohol prohibition policy.

(b) P(Husb-drink) in different religious affilia- (a) P(Husb-drink) over the interview years. tions.

Figure 2: Husband’s probability to drink given religion and year of interview with 95% CIs

To examine the effect of husband and wife’s age group category, they are added as fixed effects to model (3). Table 6 below shows the results for this fourth model, and the odds of the husband drinking given that the state has the alcohol prohibition policy is 68% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.2329, 0.4484).

13 Table 6: The impact of alcohol prohibition on alcohol consumption, model (4)

Model (4) Husband & Wife age group (FE) Variable OR Robust SE 95% Conf. Interval

Constant 0.3607*** 0.1253 0.1826 0.7128 Prohib 0.3231*** 0.0540 0.2329 0.4484 Husb-age 1.0082 0.0096 0.9897 1.0272 Husb-educ 0.9619*** 0.0096 0.9504 0.9735 Husb-wc 0.7163*** 0.0338 0.6529 0.7858 Urban 1.2768*** 0.0588 1.1666 1.3974 Hhsize 0.9485*** 0.0077 0.9336 0.9637

Children 1.0447*** 0.0156 1.0145 1.0757 Rep-educ 0.9602*** 0.0090 0.9427 0.9781 Rep-age 0.9964 0.0076 0.9817 1.0114 Rep-wc 1.1039 0.0706 0.9739 1.2513 B.unfaithful 1.0763 0.0570 0.9702 1.1940 Ownmoney 1.1548*** 0.0583 1.0459 1.2750

Year 1999 1.1665 0.1459 0.9128 1.4966 2005 1.5216*** 0.1894 1.1921 1.9421 2006 1.8052*** 0.2195 1.4230 2.2916

Husb-age-cat 20-24 years old 1.0720 0.2616 0.6646 1.7295 25-29 years old 1.1967 0.2996 0.7325 1.9548 30-34 years old 1.2345 0.3140 0.7499 2.0234 35-39 years old 1.2138 0.3089 0.7371 1.9187 40-49 years old 1.1050 0.340 0.6332 1.9285 50-54 years old 0.9446 0.2945 0.5127 1.7403 55-60 years old 0.8574 0.2917 0.4402 1.6701

Rep-age-cat 20-24 years old 1.0835 0.0717 0.9517 1.2335 25-29 years old 1.1783* 0.1018 0.9948 1.3956 30-34 years old 1.2025 0.1441 0.9508 1.5209 35-39 years old 1.1119 0.1646 0.8319 1.4861 40-49 years old 1.0111 0.1862 0.7048 1.4506

Religion Muslim 0.1992*** 0.0419 0.1320 0.3007 Christian 2.1382*** 0.3588 1.5388 2.9709 Sikh 2.0863*** 0.2100 1.7128 2.5412 Other 1.2075 0.1583 0.9338 1.5614

14 (a) P(Husb-drink) given husband’s age category (b) P(Husb-drink) given wife’s age category

Figure 3: Husband’s probability to drink given his and the wife’s age category with 95% CIs

Figure 3(a) shows that husbands between 30-34 years of age have the highest drinking probability followed by 25-29 & 35-39 years age groups respectively, followed by those who are 40-49 & 20-24 years age groups, with 15-19 & 50-60 years age groups having the lowest drinking probability respectively. The likelihood of the husband drinking given the wives age follows the same order as the husband’s age group. Those husbands with wives who are 30-34 & 25-29 years age groups have the highest probability to drink, followed by those with wives of 34-39 & 20-24 years of age respectively, while those with wives who are 40-49 & 15-19 years of age have the lowest drinking probability respectively.

(a) P(Husb-drink) given his age and religious (b) P(Husb-drink) given the wifes’ age and reli- affiliation. gious affiliation.

Figure 4: Husband’s probability to drink given his and the wife’s age and religious affili- ation with 95% CIs

Figure 4 above shows similar trends as Figure 2 with same age groups and religious affiliation having the same probabilities. Muslim husbands seem to have little variation between age groups compared with other religions.

Finally examining the effect of husband and wife’s age and education gap, they are added as fixed effects to model (4). Table 7 below shows the results for this fifth model and the odds of the husband drinking in the state that has the alcohol prohibition policy is 68% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.2349, 0.4445).

15 Table 7: The impact of alcohol prohibition on alcohol consumption (5)

Model (5) Age and Education gap Variable OR Robust SE 95% Conf. Interval

Constant 0.9667 0.6181 0.2760 3.3852 Prohib 0.3231*** 0.9667 0.2349 0.4445 Husb-age 0.9989 0.0104 0.9787 1.0195 Husb-educ 0.9426*** 0.0115 0.9203 0.9655 Husb-wc 0.7644*** 0.0318 0.7045 0.8264 Urban 1.3044 *** 0.0625 1.1874 1.4329 Hhsize 0.9529*** 0.0070 0.9393 0.9667

Children 1.0546*** 0.0144 1.0267 1.0832 Rep-educ 0.9713*** 0.0110 0.9499 0.9931 Rep-age 0.9917 0.0101 0.9721 1.0116 Rep-wc 1.0923 0.0643 09732 1.2259 B.unfaithful 1.0726 0.0567 0.9669 1.1897 Ownmoney 1.1704** 0.0592 1.0560 1.1224

Year 1999 1.1641 0.1463 0.9100 1.4892 2005 1.1526*** 0.1952 1.1875 1.9609 2006 1.8065*** 0.2235 1.4174 2.3022

Age gap-cat Wife 5-10 years older 0.9432 0.4236 0.3911 2.2747 Wife < 5 years older 0.9113 0.4214 0.3682 2.2556 Husband < 5 years older 0.8786 0.4412 0.3283 2.3511 Husband 5-10 years older 0.8726 0.4374 0.3267 2.3305 Husband 10-15 years older 0.8831 0.4414 0.3316 2.3520 Husband 15-20 years older 0.8093 0.4291 0.2863 2.2876 Husband 20-25 years older 0.8284 0.5042 0.2513 2.7311 Husban > 25 years older 0.8119 0.4577 0.2689 2.4512

Education gap-cat Wife 5-10 years more schooling 0.8915 0.3208 0.4404 1.8045 Wife < 5 years more schooling 0.9573 0.3752 0.4441 2.0639 Husband < 5 years more schooling 0.8983 0.3932 0.3809 2.1183 Husband 5-10 years more schooling 0.8775 0.4061 0.3543 2.1734 Husband 10-15 years more schooling 1 0.8434 0.4209 0.3171 2.2432 Husband ≥16 years more schooling 0.4539 0.2542 0.1514 1.3605

Religion Muslim 0.1916*** 0.0394 0.1280 0.2868 Christian 2.1238*** 0.3546 1.5311 2.9461 Sikh 2.0945*** 0.2125 1.7169 2.2552 Other 1.2041 0.1582 0.9307 1.5578

16 (a) P(Husb-drink) given the age gap. (b) P(Husb-drink) given the education gap.

Figure 5: Husband’s probability to drink given his and the wife’s age and education gap with 95% CIs

Figure 5 above shows the husband’s drinking probabilities given age and education gap. Figure 5(a) shows high drinking probability for husbands with wives who are 10 years older than them, followed by those who are 5-10 years older with those who have wives who are less than 5 years older than them with lowest drinking probability. On the other hand, the probability of a husband drinking decreases the older the husband is to the wife. In general, wives who are older than their husbands have husbands who are more likely to drink, although they are legally prohibited as compared to husbands who are older than their wives. Figure 5(b) shows the husband’s drinking behaviour, given the education gap. Husbands who have wives who are highly educated than they have the highest drinking probability compared to husbands who are highly educated than their wives, with the lowest drinking probability.

Figure 6 below shows drinking probabilities of husbands given age and education gap and their religious affiliation which follows the same trend as Figure 5. The variation between religious affiliations remains unchanged with Christians and Sikhs having the highest drinking probability and Muslims with the lowest drinking probability.

(a) P(Husb-Drink) given the age gap in different (b) P(Husb-Drink) given the education gap in religious affiliation. different religious affiliation.

Figure 6: Husband’s probability to drink given his and the wife’s age and education gap in different religious affiliation with 95% CIs

17 4.2.3 Examining the odds that the husband beats his wife Table 8 below shows the first model, which examines the effect of alcohol prohibition on domestic violence with the year of the interview as a fixed effect. This first model shows that the odds of the husband beating his wife given that the state has the alcohol prohibition policy is 52% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.3895, 0.5931).

Table 8: The impact of alcohol prohibition on domestic violence, model (1)

Model (1) Year as FE Variable OR Robust SE 95% Conf Interval Constant 0.2593*** 0.0439 0.1861 0.3612 Prohib 0.4807*** 0.0516 0.3895 0.5931 Year 1999 0.9749 0.1563 0.7120 1.3349 2005 0.7363* 0.1563 0.5143 1.0542 2006 0.06334*** 0.1083 0.4530 0.8856

Table 9 below shows the second model, which adds to the first model husband con- trols and religion as fixed effects and adding wife and bargaining controls to the second model yields the third model. When husband controls are added in the first model, the odds that the husband beats his wife given that the state has alcohol prohibition blanket and husband controls are kept at means is 50% lower (CI @95%: 0.4080, 0.6097) than in states without alcohol prohibition blanket. To get the third model, wife controls and bargaining controls are added and the likelihood that the husband beats his wife given that the state has an alcohol prohibition blanket with husband & wife controls and bargaining controls at means is 51% lower (CI @95%: 0.3990, 0.6053) than in states without alcohol prohibition policy.

18 Table 9: The impact of alcohol prohibition on domestic violence, model (2) - (3)

Model (2) Husband Controls Religion(FE) (3) Wife + Bargaining Controls Variable OR Robust SE 95% Conf Interval OR Robust SE 95% Conf Interval Constant 0.4384*** 0.0843 0.3007 0.6391 0.4748*** 0.0962 0.3192 0.7064 Prohib - 0.4988*** 0.0511 0.4060 0.6097 0.4912*** 0.0523 0.3990 0.6053 Husb-age 0.9959* 0.0022 0.9915 1.0002 1.0061 0.0045 0.9974 1.0149 Husb-educ 0.9293*** 0.0063 0.9171 0.9417 0.9665*** 0.0045 0.9578 0.9753 Husb-wc 0.7704*** 0.0446 0.6877 0.8630 0.8551** 0.0532 0.7569 0.9661 Urban 0.8793 0.7130 0.7501 1.0307 1.0174 0.0698 0.08893 1.1638 Hhsize 1.0088 0.0152 0.9795 1.0390 0.9555*** 0.1139 0.9335 0.9781

Children 1.0931*** 0.1151 1.0707 1.1159 Rep-educ 0.9201*** 0.0074 0.9058 0.9347 19 Rep-age 0.9796*** 0.0049 0.9701 0.9892 Rep-wc 1.1570*** 0.0563 1.0518 1.2727 B.unfaithful 1.2886*** 0.0763 1.1473 1.4472 Ownmoney 1.0701* 0.0419 0.9911 1.1555

Year 1999 1.0669 0.1683 0.7831 1.4534 1.1244 0.1789 0.8231 1.5359 2005 0.8416 0.1411 0.6058 1.1690 0.8845 0.1462 0.6397 1.2230 2006 0.7461* 0.1173 0.5482 1.0153 0.8265 0.1320 0.6044 1.1302

Religion Muslim 0.9984 0.0872 0.8414 1.1848 0.9279 0.0855 0.7747 1.1115 Christian 1.00208 0.1091 0.8279 1.2587 1.2120** 0.0909 1.0464 1.4039 Sikh 0.7127* 0.1079 0.5297 0.9590 0.8087 0.1225 0.6010 1.0881 Other 0.9304 0.2092 0.5988 1.4458 0.9793 0.2179 0.6332 1.5147 (b) P(Husb-Beat) his wife across religious affil- (a) P(Husb-Beat) his wife over the years. iations.

Figure 7: Husband’s probability to beat wife over the years of interview and in different religious affiliations with 95% CIs

To examine the evolution of domestic violence given the alcohol prohibition blanket, Figure 7 above is used, which shows domestic violence in different religious affiliations and over the interview years. Figure 7(a) shows that domestic violence decreased through- out the interview years. Thus the probability of a wife reporting that the husband beat her decreased from 0.2 in 1998 to 0.135 in 2006. Figure 7(b) shows that Sikh husbands are less likely to beat their wives followed by other religions while Muslims, Hindus and Christians have the highest probability to beat their wives if the state has the alcohol prohibition policy.

To examine the effect of husband and wife’s age group category, they are added as fixed effects to model (3). Table 10 below shows the results for this fourth model and the odds of the wife reporting that her husband beats her given that the state has the alcohol prohibition policy is 51% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.3991, 0.6057).

20 Table 10: The impact of alcohol prohibition on domestic violence, model (4)

Model (4) Husband & Wife age group (FE) Variable OR Robust SE 95% Conf. Interval

Constant 0.0813*** 0.0515 0.0235 0.2813 Prohib 0.4916*** 0.0523 0.3991 0.6057 Husb-age 1.0106 0.0108 0.9896 1.0321 Husb-educ 0.9662*** 0.0045 0.9575 0.9751 Husb-wc 0.8537** .0533 0.7553 0.9648 Urban 1.0118 0.0700 0.8835 1.1587 Hhsize 0.9543*** 0.0117 0.9317 0.9775

Children 1.0831*** 0.0115 1.0609 1.1058 Rep-educ 0.9189*** 0.0073 0.9047 0.9333 Rep-age 0.9888 0.0069 0.9754 1.0023 Rep-wc 1.1520*** 0.0556 1.0481 1.2662 B.unfaithful 1.2898*** 0.0758 1.1495 1.4473 Ownmoney 1.0669* 0.0410 0.9895 1.1502

Year 1999 1.1233 0.1775 0.8241 1.5310 2005 0.8788 0.1453 0.6356 1.2151 2006 0.8184 0.1297 0.5999 1.1164

Husb-age-cat 20-24 years old 3.3272*** 1.5048 1.3713 8.0734 25-29 years old 3.7430*** 1.8603 1.4131 9.9148 30-34 years old 3.7468*** 1.7765 1.4793 9.4897 35-39 years old 3.8095*** 1.7927 1.5147 9.5812 40-49 years old 3.6599*** 1.8455 1.3622 9.8328 50-54 years old 3.2579** 1.7674 1.1250 9.4344 55-60 years old 3.2656** 1.8730 1.0611 10.0505

Rep-age-cat 20-24 years old 1.1109 0.0901 0.9476 1.3024 25-29 years old 1.1399 0.1295 0.9123 1.4243 30-34 years old 1.0976 0.1529 0.8353 1.4422 35-39 years old 1.0331 0.1677 0.7516 1.4200 40-49 years old 0.8975 0.1935 0.5882 1.3695

Religion Muslim 0.9356 0.0869 0.7799 1.1224 Christian 1.2070* 0.0906 1.0419 1.3984 Sikh 0.8042 0.1217 0.5978 1.0818 Other 0.9785 0.2182 0.6321 1.5148

21 Figure 8 below shows the husband’s domestic violence behaviour, given their different age groups with wives of different age groups. Figure 8(a) shows that husbands between 20-49 years of age have the highest probability to be reported for domestic violence, followed by those between 50-60 years of age; finally, young husbands between 15-19 years of age have the lowest probability of being reported for domestic violence. The likelihood of the wife reporting that her husband beats her, given the wife’s age follows the same order as the husband’s age group. Those husbands with wives who are 20-34 years of age have the highest probability to be reported for domestic violence, followed by those with wives of 35-39 & 15-19 years of age respectively, while those with wives who are 40-49 years of age have the lowest probability of domestic violence respectively.

(a) P(Husb-Beat) his wife given his age cate- (b) P(Husb-Beat) his wife given the wife’s age gory. category.

Figure 8: Husband’s probability to beat wife given his and the wife’s age category with 95% CIs

(a) P(Husb-Beat) his wife given his age and re- (b) P(Husb-Beat) his wife given the wife’s age ligious affiliation. and religious affiliation.

Figure 9: Husband’s probability to beat wife given his and the wife’s age and religious affiliation with 95% CIs

Figure 9 above shows similar trends as Figure 8, with same age groups and religious affiliation having the same probabilities.

22 Finally examining the effect of husband and wife’s age and education gap, we add them as fixed effects to model (4). Table 11 below shows the results for this fifth model, and the odds of the husband beating his wife in a state that has the alcohol prohibition policy is 51% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.3975, 0.6008).

(b) P(Husb-Beat) his wife given the education (a) P(Husb-Beat) his wife given the age gap. gap.

Figure 10: Husband’s probability to beat his wife given his and the wife’s age and edu- cation gap with 95% CIs

Figure 10 above shows the probabilities of the husband being reported for domestic violence given age and education gap. Figure 10(a) shows high domestic violence prob- ability for husbands with wives who are at least 10 years older than them, followed by those who are 5-10 years older, with those who have wives who are less than 5 years older than their husbands with lowest domestic violence probability. On the other hand, the probability of a husband beating his wife decreases the older the husband is to the wife. In general, wives who are older than their husbands have a high probability of reporting that their husband beats them than those with husbands who are older than them. Figure 10(b) shows the husband’s domestic violence behaviour, given the education gap. Husbands who have wives who are highly educated than they have the highest probability to be reported for beating their wives compared to husbands who are highly educated than their wives, who have the lowest probability of being reported for beating their wives.

23 Table 11: The impact of alcohol prohibition on domestic violence, model (5)

Model (5) Age and Education gap Variable OR Robust SE 95% Conf. Interval

Constant 1.0108*** 0.9772 0.1519 6.7240 Prohib 0.4887*** 0.0515 0.3975 0.6008 Husb-age 1.0210* 0.0126 0.9966 1.0460 Husb-educ 0.9873 0.0084 0.9710 1.0039 Husb-wc 0.9050* 0.0523 0.8081 1.0135 Urban 1.0257 0.0687 0.8995 1.1695 Hhsize 0.9565*** 0.0111 0.9350 0.9785

Children 1.0925*** 0.0122 1.0688 1.1167 Rep-educ 0.8859*** 0.0080 0.8705 0.9016 Rep-age 0.9659*** 0.0122 0.9423 0.9901 Rep-wc 1.1384*** 0.0527 1.0397 1.2466 B.unfaithful 1.2885*** 0.0759 1.1481 1.4462 Ownmoney 1.0769* 0.0409 0.997 1.1602

Year 1999 1.1173 0.1760 0.8206 1.5213 2005 0.8702 0.1426 0.6313 1.1998 2006 0.8140 0.1283 0.5976 1.1088

Age gap-cat Wife 5-10 years older 0.6133 0.2940 0.2397 1.5619 Wife < 5 years older 0.5879 0.2463 0.2586 1.3362 Husband < 5 years older 0.4939 0.2394 0.1910 1.2771 Husband 5-10 years older 0.4675 0.2448 0.1675 1.3047 Husband 10-15 years older 0.4272 0.2372 0.1439 1.2685 Husband 15-20 years older 0.4039 0.2438 0.1237 1.3185 Husband 20-25 years older 0.3733 0.2349 0.1087 1.2814 Husban > 25 years older 0.3773 0.2487 0.1037 1.3732

Education gap-cat Wife 5-10 years more schooling 1.5490 0.8309 0.5413 4.4326 Wife < 5 years more schooling 1.3224 0.7327 0.4464 3.9171 Husband < 5 years more schooling 0.8893 0.5096 0.2892 2.7344 Husband 5-10 years more schooling 0.7813 0.4471 0.2546 2.3982 Husband 10-15 years more schooling 0.5877 0.3597 0.1771 1.9507 Husband ≥16 years more schooling 0.3399 0.2317 0.0894 1.2928

Religion Muslim 0.9126 0.0806 0.7675 1.0850 Christian 1.1855 0.0876 1.0257 1.3703 Sikh 0.8151 0.1237 0.6054 1.0974 Other 0.9842 0.2194 0.6357 1.5236

24 5 Conclusion

Luca et al. (2015) used LPM to examine both the effect of prohibition on the drink- ing behaviour of husbands and the impact of prohibition on domestic violence. However, nothing was mentioned regarding the two main problems with the LPM were: unbounded probability predictions are possible, and linearity does not make much sense conceptu- ally. Therefore, this raised enough curiosity to replicate their study. The objectives of this thesis were, therefore, to give a brief overview of the linear probability model & lo- gistic regression and a review using applications to decide if the logistic regression would be preferable over LPM.

Both LPM and logistic regressions were estimated, and LPM estimates resulted in un- bounded probabilities and the second model throughout the fifth model had collinearity resulting in variable prohibition being omitted. To overcome these problems, they per- formed a linear regression where they regressed the dependent variables (husband drinks and husband beats wife) on all independent variables except the variable of interest, pro- hibition (alcohol prohibition), then post-estimation on residuals and further used these residuals as a dependent variable and regress them on prohibition. Although this proce- dure yielded results it did not resolve the collinearity and unbound predicted probabilities problems, and this procedure results in biased parameter estimates.

Logistic regression estimates had no complications and figures, and tables were used to interpret the results which were more elaborate than the results from the LPM method used by Luca et al. (2015). It is therefore clear that LPM could have been used as the first step to obtain quick results in the binary choice model analysis and logistic regression (or any other non-linear model) would be best to carry out the analysis.

25 Acknowledgements

I want to acknowledge my supervisor, Harry J. Khamis, thank you for the advice and support. My sincere thanks to Prof. A.K.A. Amey for his mentorship and encouragement throughout the years. I also want to thank my family and friends for moral support. To my mother, M.S. Vele, the thought of your unconditional love and support made this journey bearable. I salute you for the care, pain and sacrifice you did to shape my life and giving me the liberty to choose what I desired. My heartfelt regard goes to my young brother, R.J. Vele, your support and love knew no bounds. You were always around when I thought it was impossible to continue; you helped me keep things in perspective, thank you.

26 References

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic Literature, 19(4), 1483–1536. Retrieved from http://www.jstor.org/stable/ 2724565 Angrist, J., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist’s companion (1st ed.). Princeton University Press. Retrieved from https:// EconPapers.repec.org/RePEc:pup:pbooks:8769 Chen, W., Hribar, P., & Melessa, S. (2018, 6). Incorrect inferences when using residuals as dependent variables. Journal of Accounting Research, 56(3), 751– 796. Retrieved from https://doi.org/10.1111/1475-679X.12195 doi: 10.1111/ 1475-679X.12195 Donald, S. G., & Lang, K. (2007). Inference with difference-in-differences and other panel data. The Review of Economics and Statistics, 89(2), 221-233. Retrieved from https://doi.org/10.1162/rest.89.2.221 doi: 10.1162/rest.89.2.221 Enders, C. (2010). Applied missing data analysis. New York, NY: The Guilford Press. Freckleton, R. P. (2002). On the misuse of residuals in ecology: Regression of residuals vs. multiple regression. Journal of Animal Ecology, 71(3), 542–545. Retrieved from http://www.jstor.org/stable/2693531 Greene, W. H. (2012). Econometric analysis (7th ed.). Prentice Hall, Upper Saddle River, NJ. Luca, D. L., Owens, E., & Sharma, G. (2015). Can alcohol prohibition reduce violence against women? The American Economic Review, 105(5), 625–629. Retrieved from http://www.jstor.org/stable/43821957 Maddala, G. S. (1983). Limited-dependent and qualitative variables in econometrics. Cambridge: Cambridge U.P. Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373-1379. Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. Mit Press. Retrieved from http://www.jstor.org/stable/j.ctt5hhcfr

27