Lecture Notes 7

Total Page:16

File Type:pdf, Size:1020Kb

Lecture Notes 7 ECON 497: Lecture Notes 13 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 13 Dummy Dependent Variable Techniques Studenmund Chapter 13 Basically, if you have a dummy dependent variable you will be estimating a probability. Probabilities are necessarily restricted to fall in the range [0,1] and this puts special conditions on the regression. Just doing a linear regression can result in estimated probabilities that are either negative or greater than 1 and are a bit nonsensical. As a result, there are other techniques for estimating these relationships that can generate better results. The Linear Probability Model The second or third best way to estimate models with dummy dependent variables is to simply estimate the model as you normally might: Di = β0 + β1X1i + β2X2i + εi For example, if you have a sample of the U.S. adult population and you're trying to determine the probability that a person is incarcerated, you might estimate the equation: Di = β0 + β1AGEi + β2GENDERi + εi where Di is a dummy variable taking the value 1 if a person is incarcerated and 0 if not AGEi is the person's age GENDERi is a dummy variable equal to 1 if the person is male and 0 otherwise Imagine that the estimated coefficients are: ˆ Di = 0.0043 - 0.0001*AGEi + 0.0052*GENDERi Interpretation of the estimated coefficients is straightforward. If there are two women, one of whom is one year older than the other, the estimated probability that the older one will be incarcerated will be 0.0001 less than the estimated probability that the younger one will be. ECON 497: Lecture Notes 13 Page 2 of 2 If there are a man and a woman of the same age, the predicted probability that the man will be incarcerated is 0.0052 greater than the predicted probability that the woman will be incarcerated. Interestingly, a woman of age 43 will have a predicted probability of 0.0000 and women older than this will have negative predicted probabilities of incarceration. Studenmund describes issues regarding the linear probability model and you should read this discussion. One thing I will point out is that the adjusted R2 is not an accurate measure of overall fit in a linear probability model with dummies as dependent variables. The Weighted Least Squares Approach The most complicated point from Studenmund's discussion of the linear probability model is the discussion of weighted least squares. This technique is designed to get around the problem of heteroskedasticity (which we haven't really discussed yet) and can be summarized as follows: 1. Due to the structure of the linear probability model, the error terms are not identically distributed. Specifically, error terms will have greater variance when the actual probability is close to 0.5 and smaller variance when the actual probability is close to zero or one. Because all of the error terms are not identically distributed (they have different variances) there is a problem with heteroskedasticity. Coefficient estimates, however, will be unbiased as long as the other classical assumptions are satisfied. 2. To address this problem, do the standard linear regression and then use the estimated ˆ coefficients to generate the predicted probabilities ( Di ) for each observation. Excel will do this for you if you ask it nicely. ˆ 3. Use these predicted probabilities ( Di ) to generate a new value, which is equal to the ˆ ˆ square root of Di *(1- Di ). Call this ˆ ˆ 1/2 Zi = [ Di *(1- Di )] 4. Divide the dependent and explanatory variables by this new variable (Zi). 5. Now, redo the regression using the values of the dependent and explanatory variables, which have been divided by Zi. The standard errors and the t-statistics for the estimated coefficients will be different and more accurate. Briefly, the idea behind this is that observations whose error terms have greater variance should be have less influence than do those whose error terms have smaller variance. The ECON 497: Lecture Notes 13 Page 3 of 3 ˆ closer to 0.5 Di is, the larger the variance of the error term is likely to be, so the observations are weighted by ˆ ˆ 1/2 ˆ ˆ -1/2 1/[ Di *(1- Di )] = [ Di *(1- Di )] . The Binomial Logit Model The proper way to estimate these models is by using the binomial logit model. To do this, the dependent variable needs to be transformed. The equation to be estimated is: Di ln = β0 + β1X1i + β2X2i + εi 1− Di The dependent variable is the log of the odds ratio and is equal to infinity if Di=1 and is equal to negative infinity if Di=0. The predicted probability is equal to ˆ 1 Di = − βˆ +βˆ X +βˆ X 1+ e ()0 1 1i 2 2i The interpretation of the estimated coefficients is less straightforward here. Estimated coefficients show the effect of a change in an explanatory variable on the predicted log of the odds ratio, not on the probability itself. Basically, you can only tell whether an explanatory variable has a positive or negative impact on the probability, not how large that impact is. An additional complication is that logit models cannot be estimated using OLS, so they can't really be done in Excel. This is something you need a real statistical analysis package to do. Examples solicited from students. Binomial Probit Model This is a model based on some slightly different assumptions than the binomial logit model. In most cases the results from the two models are nearly identical. If you're estimating either a logit or a probit model, it's usually just one additional command to also estimate the other. You should do this, just for completeness and to check that your important results are robust to changes in the model used. ECON 497: Lecture Notes 13 Page 4 of 4 If a presenter is annoying you in some way as they discuss their binomial logit or binomial probit results, you can make yourself equally annoying by asking if they estimated the other model and if their results were robust to the change. This is kind of a cheap question and it really shouldn't disturb them too much because, if they've been even slightly responsible, they will have done both. The real different between the Logit and Probit models is that they have slightly different assumptions about the distribution of the underlying probabilities. The Probit uses the cumulative distribution function of the Normal distribution while the Logit uses a linear version of the odds ratio. Here's Studenmund's take on all this: "From a researcher's point of view, the biggest differences between the two models are that the probit is based on the cumulative normal distribution and that the probit estimation procedure uses more computer time than does the logit. As computer programs are improved, and as computer time continues to fall in price, this latter difference may eventually disappear. Since the probit is similar to the logit and is more expensive to run, why would you ever estimate one? The answer is that since the probit is based on the normal distribution, it's quite theoretically appealing (because many economic variables are normally distributed). With extremely large samples, this advantage falls away, since maximum likelihood procedures can be shown to be asymptotically normal under fairly general condition." Multinomial Logit Model If you have a qualitative dependent variable that can take multiple values, you may wish to estimate a multinomial logit model. This can be a bit tricky and uncooperative, and it can potentially require a lot of computing time, a compliant data set with lots of observations of each qualitative outcome and, most importantly, a big chunk of your life and your sanity, not necessarily in that order. Basically, the results from a multinomial logit model tell you about the effect that a change in the value of a variable has on the relative probabilities of two of the possible outcomes. Doing this with some degree of reliability apparently requires a data set in which you have a couple hundred observations of each of the qualitative outcomes. Example: Voting Choice Imagine that you have voting records showing demographic information for a lot of people and who they voted for (Democrat, Republican, Libertarian) in the last election. You might use a multinomial logit model to identify the factors that have a significant impact on making someone vote Libertarian rather than Republican or Democrat. ECON 497: Lecture Notes 13 Page 5 of 5 Example: Transportation Choice Imagine that you get a hold of a transportation survey from the Puget Sound Regional Council and you want to model transportation choice of adults based on such things as income, number of children, commute distance, etc. You might use a multinomial logit model with each possible choice as one possible value of the dependent variable. .
Recommended publications
  • Logistic Regression, Part I: Problems with the Linear Probability Model
    Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily from Linear probability, logit, and probit models, by John Aldrich and Forrest Nelson, paper # 45 in the Sage series on Quantitative Applications in the Social Sciences. INTRODUCTION. We are often interested in qualitative dependent variables: • Voting (does or does not vote) • Marital status (married or not) • Fertility (have children or not) • Immigration attitudes (opposes immigration or supports it) In the next few handouts, we will examine different techniques for analyzing qualitative dependent variables; in particular, dichotomous dependent variables. We will first examine the problems with using OLS, and then present logistic regression as a more desirable alternative. OLS AND DICHOTOMOUS DEPENDENT VARIABLES. While estimates derived from regression analysis may be robust against violations of some assumptions, other assumptions are crucial, and violations of them can lead to unreasonable estimates. Such is often the case when the dependent variable is a qualitative measure rather than a continuous, interval measure. If OLS Regression is done with a qualitative dependent variable • it may seriously misestimate the magnitude of the effects of IVs • all of the standard statistical inferences (e.g. hypothesis tests, construction of confidence intervals) are unjustified • regression estimates will be highly sensitive to the range of particular values observed (thus making extrapolations or forecasts beyond the range of the data especially unjustified) OLS REGRESSION AND THE LINEAR PROBABILITY MODEL (LPM). The regression model places no restrictions on the values that the independent variables take on.
    [Show full text]
  • Introduction to Generalized Linear Models for Dichotomous Response Variables Edps/Psych/Soc 589
    Introduction to Generalized Linear Models for Dichotomous Response Variables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology ©Board of Trustees, University of Illinois Outline The Problem Linear Model for π Relationship π(x) & x Logistic regression Probit models SAS & R Triva Outline Introduction to GLMs for binary data Primary Example: High School & Beyond. The problem Linear model for π. Modeling Relationship between π(x) and x. Logistic regression. Probit models. Trivia Graphing: jitter and loews C.J. Anderson (Illinois) Introduction to GLMS for Dichotomous Data 2.1/ 56 Outline The Problem Linear Model for π Relationship π(x) & x Logistic regression Probit models SAS & R Triva The Problem Many variables have only 2 possible outcomes. Recall: Bernoulli random variables Y =0, 1. π is the probability of Y =1. E(Y )= µ = π. Var(Y )= µ(1 − µ)= π(1 − π). When we have n independent trials and take the sum of Y ’s, we have a Binomial distribution with mean = nπ and variance = nπ(1 − π). We are typically interested in π. We will consider models for π, which can vary according to some the values of an explanatory variable(s) (i.e., x1,...,xk). To emphasis that π changes with x’s, we write π(x) C.J. Anderson (Illinois) Introduction to GLMS for Dichotomous Data 3.1/ 56 Outline The Problem Linear Model for π Relationship π(x) & x Logistic regression Probit models SAS & R Triva Example: High School & Beyond Data from seniors (N=600). Consider whether students attend an academic high school program type of a non-academic program type (Y ).
    [Show full text]
  • 1. Linear Probability Model Vs. Logit (Or Probit) We Have Often Used Binary ("Dummy") Variables As Explanatory Variables in Regressions
    EEP/IAS 118 Andrew Dustan Section Handout 13 1. Linear Probability Model vs. Logit (or Probit) We have often used binary ("dummy") variables as explanatory variables in regressions. What about when we want to use binary variables as the dependent variable? It's possible to use OLS: ͭ = ͤ + ͥͬͥ + ⋯ + &ͬ& + ͩ where y is the dummy variable. This is called the linear probability model . Estimating the equation: ͊Ȩʚͭ = 1|ͬʛ = ͭȤ= ɸͤ + ɸͥͬͥ + ⋯ + ɸ&ͬ& ͭȤ is the predicted probability of having ͭ = 1 for the given values of ͬͥ … ͬ&. Problems with the linear probability model (LPM): 1. Heteroskedasticity: can be fixed by using the "robust" option in Stata. Not a big deal. 2. Possible to get ͭȤ< 0 or ͭȤ> 1. This makes no sense—you can't have a probability below 0 or above 1. This is a fundamental problem with the LPM that we can't patch up. Solution: Use the logit or probit model. These models are specifically made for binary dependent variables and always result in 0 < ͭȤ< 1. Let's leave the technicalities aside and look at a graph of a case where LPM goes wrong and the logit works: Linear Probability Model Logit (probit looks similar) 1.5 1.5 1 1 -------- ͊Ȩʚ--------ͭ = 1|ͬʛ ͊Ȩʚͭ = 1|ͬʛ 0.5 0.5 0 0 -0.5 -0.5 ɸ ɸ ɸ ɸ ɸ ɸ 0 + 1ͬ1 + ⋯ + ͬ͟͟ 0 + 1ͬ1 + ⋯ + ͬ͟͟ This is the main feature of a logit/probit that distinguishes it from the LPM – predicted probability of ͭ = 1 is never below 0 or above 1, and the shape is always like the one on the right rather than a straight line.
    [Show full text]
  • Analysis of Binary Dependent Variables Using Linear Probability Model and Logistic Regression : a Replication Study
    ANALYSIS OF BINARY DEPENDENT VARIABLES USING LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION : A REPLICATION STUDY Submitted by Lutendo Vele A thesis submitted to the Department of Statistics in partial fulfilment of the requirements for Master degree in Statistics in the Faculty of Social Sciences Supervisor Harry J. Khamis Spring, 2019 ABSTRACT Linear Probability Model (LPM) is commonly used because it is easy to compute and interpret than with logits and probits even though the estimated probabilities may fall outside the 0,1 interval and the linearity concept does not make much sense when deal- ing with probabilities. This paper extends upon the results of Luca, Owens, and Sharma (2015) reviewing the use of LPM to examine if alcohol prohibition reduces domestic vi- olence. Regular LPM resulted in inconclusive estimates since prohibition was omitted due to collinearity as controls were added. However Luca et al. (2015) had results, and further inspection on their regression commands showed that they ran a linear regression, then a post-estimation on residuals and further used residuals as a dependent variable hence the results were different from the regular LPM. Their method still resulted in unbounded predicted probabilities and heteroscedastic residuals, thus showing that OLS was inefficient and a non-linear binary choice model like logistic regression would be a better option. Logistic regression predicts the probability of an outcome that can only have two values and was therefore used in this paper. Unlike LPM, logistic regression uses a non-linear function which results in a sigmoid bounding the predicted outcome between 0 and 1. Logistic regression had no complication; thus logistic (or any another non-linear dichotomous dependent variable models) regression should have been used on the final analysis while LPM is used at a preliminary stage to get quick results.
    [Show full text]
  • LINEAR REGRESSION for BINARY OUTCOMES 1 Logistic Or
    Running head: LINEAR REGRESSION FOR BINARY OUTCOMES 1 Logistic or Linear? Estimating Causal Effects of Experimental Treatments on Binary Outcomes Using Regression Analysis Robin Gomila Princeton University Currently in production: Journal of Experimental Psychology: General c 2020, American Psychological Association. This paper is not the copy of record and may not exactly replicate the final, authoritative version of the article. Please do not copy or cite without authors’ permission. The final article will be available, upon publication, via its DOI: 10.1037/xge0000920 Corresponding Author: Correspondence concerning this article should be addressed to Robin Gomila, Department of Psychology, Princeton University. E-mail: [email protected] Materials and Codes: Simulations and analyses reported in this paper were computed in R. The R codes can be found on the Open Science Framework (OSF): https://osf.io/ugsnm/ Author Contributions: Robin Gomila generated the idea for the paper. He wrote the article, simulation code and analysis code. Conflict of Interest: The author declare that there were no conflicts of interest with respect to the authorship or the publication of this article. LINEAR REGRESSION FOR BINARY OUTCOMES 2 Abstract When the outcome is binary, psychologists often use nonlinear modeling strategies such as logit or probit. These strategies are often neither optimal nor justified when the objective is to estimate causal effects of experimental treatments. Researchers need to take extra steps to convert logit and probit coefficients into interpretable quantities, and when they do, these quantities often remain difficult to understand. Odds ratios, for instance, are described as obscure in many textbooks (e.g., Gelman & Hill, 2006, p.
    [Show full text]