Bayesian Probit Regression Models for Spatially-Dependent Categorical Data

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Probit Regression Models for Spatially-Dependent Categorical Data Bayesian Probit Regression Models for Spatially-Dependent Categorical Data DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Candace Berrett, B.S., M.S. Graduate Program in Statistics The Ohio State University 2010 Dissertation Committee: Catherine A. Calder, Advisor L. Mark Berliner Peter F. Craigmile Elizabeth A. Stasny c Copyright by Candace Berrett 2010 ABSTRACT Data augmentation/latent variable methods have been widely recognized for facilitating model fitting in the Bayesian probit regression model. First proposed by Albert and Chib (1993) for independent binary and multi-category data, the latent variable representation of the Bayesian probit regression model allows model fitting to be performed using a simple Gibbs sampler and, for more than two categories, also allows the so-called assumption of irrelevant alternatives required by the logistic regression model to be relaxed (Hausman and Wise, 1978). To accommodate residual spatial dependence, the latent variable speci- fication of the Bayesian probit regression model can be extended to incorporate standard parametric covariance models typically used in analyses of spatially-dependent continuous data, defining what we term the Bayesian spatial probit regression model. In this disserta- tion, we develop and extend the Bayesian spatial probit regression model by (i) introducing efficient model-fitting algorithms, (ii) deriving classification methods based on the model, and (iii) extending the model to the multi-category spatial setting. Statistical models for spatial data are notoriously cumbersome to fit necessitating the availability of fast and efficient model-fitting algorithms. To improve the efficiency of the Gibbs sampler used to fit the Bayesian regression model for independent categorical response variables, Imai and van Dyk (2005) propose introducing a working parameter into the model and compare various data augmentation strategies resulting from different treatments of the working parameter. We build on this work by investigating the efficiency ii of modified and extended versions of conditional and marginal data augmentation Markov chain Monte Carlo (MCMC) algorithms for the spatial probit regression model, focusing on the special case of binary spatially-dependent response variables. Within the classification literature, methods that exploit spatial dependence are limited. We show how a spatial classification rule can be derived from the Bayesian spatial probit regression model. In addition, we compare our proposed spatial classifier to various other classifiers in terms of training and test error rates using a land-cover/land-use data set. When extending the spatial probit regression model to the multi-category setting, care must be taken to ensure that model parameters are estimable and interpretable. Considering three types of categorical and spatial covariate information, we discuss various specifica- tions of the latent variable mean structure and the associated parameter interpretations. Additionally, we explore the specification of the latent variable cross space-category de- pendence structure and discuss how data augmentation MCMC strategies for fitting the Bayesian spatial probit regression model can be extended to the multi-category setting. iii Dedicated to my parents, Bob and Nanette, and siblings, Tenille, Nat, Preston, MeChel, and Taylor. iv ACKNOWLEDGMENTS First and foremost, I would like to thank my advisor, Dr. Kate Calder, who over the last four and a half years has devoted a substantial amount of time and effort in training me to be a well-rounded statistician. She has provided me with numerous opportunities to learn and grow through research, teaching, mentoring, and collaboration. She has also become a good friend, whom I admire professionally and personally, and I am grateful for her example and support. I would like to thank my committee members: Dr. Mark Berliner for his comments on my research, his help with job and fellowship applications, and for allowing me to laugh in his class; Dr. Elizabeth Stasny for her comments on my research, her help with job and fellowship applications, her support as graduate chair, and in encouraging me to come to Ohio State; and Dr. Peter Craigmile for his valuable comments and contributions to my research. I would like to thank Dr. Darla Munroe and Dr. Ningchuan Xiao of the Department of Geography for their generous assistance in obtaining and understanding the land cover data used in this work. I would like to thank the other professors in the Department of Statistics who have provided guidance and support during my time at Ohio State: Dr. Doug Wolfe, Dr. Tao Shi, Dr. Chris Hans, Dr. Jackie Miller, Dr. Steve MacEachern, and Dr. Noel Cressie. v I would like to thank Lisa Van Dyke for her help in answering my many graduation questions and in pulling together the final documents of this dissertation. I would also like to thank Terry England for her help with all my travel and posters. Support for this research was provided by grants from NASA (NNG06GD31G) and the NSF (ATM-0934595). Finally, I would like to thank my family and many friends, who all believed in me when I didn’t believe in myself; and God, for giving me strength and understanding, and providing me with opportunities to grow. vi VITA 1983 . Born - Ogden, Weber, Utah, USA 2005 . B.S. Actuarial Science, cum laude, Brigham Young University. 2005 - 2006 . University Fellow, Graduate School, The Ohio State University. 2005 - 2006, 2010 . Teaching Assistant, Department of Statis- tics, The Ohio State University. 2007 . M.S. Statistics, The Ohio State University. 2007 - 2010 . Research Assistant, Department of Statis- tics, The Ohio State University. 2009 . Graduate Fellow, Statistical and Applied Mathematical Sciences Institute. PUBLICATIONS Research Publications Xiao, N., Shi, T., Calder, C.A., Munroe, D.K., Berrett, C., Wolfinbarger, S., and Li, D. (2008) “Spatial Characteristics of the Difference between MISR and MODIS Aerosol Optical Depth Retrievals over Mainland Southeast Asia,” Remote Sensing of Environment, DOI: 10.1016/j.rse.2008.07.011. FIELDS OF STUDY Major Field: Statistics vii TABLE OF CONTENTS Page Abstract . ii Dedication . iv Acknowledgments . v Vita . vii List of Tables . xi List of Figures . xii Chapters: 1. Introduction . 1 1.1 Background and Motivation . 2 1.2 Modeling Categorical Spatial Data . 15 1.2.1 The Spatial Generalized Linear Model . 15 1.2.2 The Spatial Generalized Linear Mixed Model . 19 1.2.3 Indicator Kriging . 20 1.2.4 The Autologistic Model . 22 1.2.5 The Bayesian Spatial Probit Regression Model . 23 1.3 Overview of Contributions . 24 1.4 Illustrative Data Set . 25 2. Bayesian Spatial Probit Regression . 29 2.1 The Bayesian Probit Regression Model . 29 2.1.1 Albert and Chib’s Data Augmentation Strategy . 29 viii 2.1.2 Multi-Category and Multivariate Extensions . 31 2.2 The Bayesian Spatial Probit Regression Model . 34 2.2.1 Model Specification . 34 2.2.2 Parameterization of the Spatial Correlation Matrix . 36 3. Data Augmentation MCMC Strategies . 39 3.1 Data Augmentation MCMC Strategies . 40 3.1.1 Conditional versus Marginal Data Augmentation . 40 3.1.2 Partially Collapsed Algorithms . 45 3.1.3 Full Conditional Distributions . 46 3.2 Simulation Study . 49 3.2.1 Simulation Set-up . 49 3.2.2 Simulation Results . 52 3.3 Application . 54 3.4 Summary . 56 4. The Bayesian Spatial Probit Regression Model as a Tool for Classification . 73 4.1 The Classification Problem . 74 4.2 GLM-Based Classification . 76 4.2.1 Non-Spatial GLM-Based Classification . 76 4.2.2 Spatial GLM-Based Classification . 80 4.3 Alternative Classification Methods . 84 4.3.1 Discriminant Analysis . 84 4.3.2 Support Vector Machines . 90 4.3.3 k-Nearest Neighbors . 93 4.4 Comparison of Classification Methods . 94 4.4.1 Parameter Estimation . 95 4.4.2 Classification Errors . 97 4.5 Summary . 101 5. Bayesian Spatial Multinomial Probit Regression . 102 5.1 The Bayesian Spatial Multinomial Probit Regression Model . 102 5.1.1 Latent Mean Specification . 104 5.1.2 Parameterization of the Space-Category Covariance Matrix . 125 5.2 Model-Fitting . 128 5.2.1 Data Augmentation MCMC Algorithms . 128 5.3 Summary . 133 ix 6. Contributions and Future Work . 134 x LIST OF TABLES Table Page 3.1 This table lists the steps in each of the data augmentation algorithms. The first portion shows the non-collapsed data augmentation algorithms intro- duced in Section 3.1.1. The second portion shows the partially collapsed data augmentation algorithms introduced in Section 3.1.2. 44 3.2 Scenarios used to compare the marginal and conditional data augmentation algorithms. 50 3.3 Autocorrelations of the sample paths of β1 and ρ for the land cover data analysis. 56 4.1 Fitted values for the covariance function parameters for both class C0 and C1. 96 4.2 Tuning parameter values for each classification method. The optimal value of the tuning parameter is listed along with the CVE associated with this value. The optimal values were chosen by minimizing the five-fold CVE. 97 4.3 Training and test errors for the SE Asia land cover data obtained using each of the classification methods discussed in this chapter. 100 xi LIST OF FIGURES Figure Page 1.1 Land cover over Southeast Asia, covering the region bounded by 17◦ to 21◦N and 98◦ to 105◦E. The data were taken from the MODIS Land Cover Type Yearly Level 3 Global 500m (MOD12Q1 and MCD12Q1) data prod- uct for the year 2005. 26 1.2 Elevation (in meters) over the region bounded by 17◦ to 21◦N and 98◦ to 105◦E...................................... 27 1.3 Standardized value of the measured distance to the nearest major road over the region bounded by 17◦ to 21◦N and 98◦ to 105◦E. 27 1.4 Standardized value of the measured distance to the coast over the region bounded by 17◦ to 21◦N and 98◦ to 105◦E.
Recommended publications
  • Logit and Ordered Logit Regression (Ver
    Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta) Oscar Torres-Reyna Data Consultant [email protected] http://dss.princeton.edu/training/ PU/DSS/OTR Logit model • Use logit models whenever your dependent variable is binary (also called dummy) which takes values 0 or 1. • Logit regression is a nonlinear regression model that forces the output (predicted values) to be either 0 or 1. • Logit models estimate the probability of your dependent variable to be 1 (Y=1). This is the probability that some event happens. PU/DSS/OTR Logit odelm From Stock & Watson, key concept 9.3. The logit model is: Pr(YXXXFXX 1 | 1= , 2 ,...=k β ) +0 β ( 1 +2 β 1 +βKKX 2 + ... ) 1 Pr(YXXX 1= | 1 , 2k = ,... ) 1−+(eβ0 + βXX 1 1 + β 2 2 + ...βKKX + ) 1 Pr(YXXX 1= | 1 , 2= ,... ) k ⎛ 1 ⎞ 1+ ⎜ ⎟ (⎝ eβ+0 βXX 1 1 + β 2 2 + ...βKK +X ⎠ ) Logit nd probita models are basically the same, the difference is in the distribution: • Logit – Cumulative standard logistic distribution (F) • Probit – Cumulative standard normal distribution (Φ) Both models provide similar results. PU/DSS/OTR It tests whether the combined effect, of all the variables in the model, is different from zero. If, for example, < 0.05 then the model have some relevant explanatory power, which does not mean it is well specified or at all correct. Logit: predicted probabilities After running the model: logit y_bin x1 x2 x3 x4 x5 x6 x7 Type predict y_bin_hat /*These are the predicted probabilities of Y=1 */ Here are the estimations for the first five cases, type: 1 x2 x3 x4 x5 x6 x7 y_bin_hatbrowse y_bin x Predicted probabilities To estimate the probability of Y=1 for the first row, replace the values of X into the logit regression equation.
    [Show full text]
  • Diagnostic Plots — Distributional Diagnostic Plots
    Title stata.com diagnostic plots — Distributional diagnostic plots Syntax Menu Description Options for symplot, quantile, and qqplot Options for qnorm and pnorm Options for qchi and pchi Remarks and examples Methods and formulas Acknowledgments References Also see Syntax Symmetry plot symplot varname if in , options1 Ordered values of varname against quantiles of uniform distribution quantile varname if in , options1 Quantiles of varname1 against quantiles of varname2 qqplot varname1 varname2 if in , options1 Quantiles of varname against quantiles of normal distribution qnorm varname if in , options2 Standardized normal probability plot pnorm varname if in , options2 Quantiles of varname against quantiles of χ2 distribution qchi varname if in , options3 χ2 probability plot pchi varname if in , options3 1 2 diagnostic plots — Distributional diagnostic plots options1 Description Plot marker options change look of markers (color, size, etc.) marker label options add marker labels; change look or position Reference line rlopts(cline options) affect rendition of the reference line Add plots addplot(plot) add other plots to the generated graph Y axis, X axis, Titles, Legend, Overall twoway options any options other than by() documented in[ G-3] twoway options options2 Description Main grid add grid lines Plot marker options change look of markers (color, size, etc.) marker label options add marker labels; change look or position Reference line rlopts(cline options) affect rendition of the reference line
    [Show full text]
  • A User's Guide to Multiple Probit Or Logit Analysis. Gen
    United States Department of Agriculture Forest Service Pacific Southwest Forest and Range Experiment Station General Technical Report PSW- 55 a user's guide to multiple Probit Or LOgit analysis Robert M. Russell, N. E. Savin, Jacqueline L. Robertson Authors: ROBERT M. RUSSELL has been a computer programmer at the Station since 1965. He was graduated from Graceland College in 1953, and holds a B.S. degree (1956) in mathematics from the University of Michigan. N. E. SAVIN earned a B.A. degree (1956) in economics and M.A. (1960) and Ph.D. (1969) degrees in economic statistics at the University of California, Berkeley. Since 1976, he has been a fellow and lecturer with the Faculty of Economics and Politics at Trinity College, Cambridge University, England. JACQUELINE L. ROBERTSON is a research entomologist assigned to the Station's insecticide evaluation research unit, at Berkeley, California. She earned a B.A. degree (1969) in zoology, and a Ph.D. degree (1973) in entomology at the University of California, Berkeley. She has been a member of the Station's research staff since 1966. Acknowledgments: We thank Benjamin Spada and Dr. Michael I. Haverty, Pacific Southwest Forest and Range Experiment Station, U.S. Department of Agriculture, Berkeley, California, for their support of the development of POL02. Publisher: Pacific Southwest Forest and Range Experiment Station P.O. Box 245, Berkeley, California 94701 September 1981 POLO2: a user's guide to multiple Probit Or LOgit analysis Robert M. Russell, N. E. Savin, Jacqueline L. Robertson CONTENTS
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Bayesian Inference: Probit and Linear Probability Models
    Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/gradreports Part of the Finance and Financial Management Commons Recommended Citation Reasch, Nate Rex, "Bayesian Inference: Probit and Linear Probability Models" (2014). All Graduate Plan B and other Reports. 391. https://digitalcommons.usu.edu/gradreports/391 This Report is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Plan B and other Reports by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies, School of 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Recommended Citation Reasch, Nate Rex, "Bayesian Inference: Probit and Linear Probability Models" (2014). All Graduate Plan B and other Reports. Paper 391. http://digitalcommons.usu.edu/gradreports/391 This Report is brought to you for free and open access by the Graduate Studies, School of at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Plan B and other Reports by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. BAYESIAN INFERENCE: PROBIT AND LINEAR PROBABILITY MODELS by Nate Rex Reasch A report submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Financial Economics Approved: Tyler Brough Jason Smith Major Professor Committee Member Alan Stephens Committee Member UTAH STATE UNIVERSITY Logan, Utah 2014 ABSTRACT Bayesian Model Comparison Probit Vs.
    [Show full text]
  • Week 12: Linear Probability Models, Logistic and Probit
    Week 12: Linear Probability Models, Logistic and Probit Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2019 These slides are part of a forthcoming book to be published by Cambridge University Press. For more information, go to perraillon.com/PLH. c This material is copyrighted. Please see the entire copyright notice on the book's website. Updated notes are here: https://clas.ucdenver.edu/marcelo-perraillon/ teaching/health-services-research-methods-i-hsmp-7607 1 Outline Modeling 1/0 outcomes The \wrong" but super useful model: Linear Probability Model Deriving logistic regression Probit regression as an alternative 2 Binary outcomes Binary outcomes are everywhere: whether a person died or not, broke a hip, has hypertension or diabetes, etc We typically want to understand what is the probability of the binary outcome given explanatory variables It's exactly the same type of models we have seen during the semester, the difference is that we have been modeling the conditional expectation given covariates: E[Y jX ] = β0 + β1X1 + ··· + βpXp Now, we want to model the probability given covariates: P(Y = 1jX ) = f (β0 + β1X1 + ··· + βpXp) Note the function f() in there 3 Linear Probability Models We could actually use our vanilla linear model to do so If Y is an indicator or dummy variable, then E[Y jX ] is the proportion of 1s given X , which we interpret as the probability of Y given X The parameters are changes/effects/differences in the probability of Y by a unit change in X or for a small change in X If an indicator variable, then change from 0 to 1 For example, if we model diedi = β0 + β1agei + i , we could interpret β1 as the change in the probability of death for an additional year of age 4 Linear Probability Models The problem is that we know that this model is not entirely correct.
    [Show full text]
  • Probit Model 1 Probit Model
    Probit model 1 Probit model In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married. The name is from probability + unit.[1] A probit model is a popular specification for an ordinal[2] or a binary response model that employs a probit link function. This model is most often estimated using standard maximum likelihood procedure, such an estimation being called a probit regression. Probit models were introduced by Chester Bliss in 1934, and a fast method for computing maximum likelihood estimates for them was proposed by Ronald Fisher in an appendix to Bliss 1935. Introduction Suppose response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes form where Pr denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood. It is also possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive: The use of the standard normal distribution causes no loss of generality compared with using an arbitrary mean and standard deviation because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount.
    [Show full text]
  • 9. Logit and Probit Models for Dichotomous Data
    Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright © 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar to linear models can be developed for qualitative response variables. I To introduce logit and probit models for dichotomous response variables. c 2014 by John Fox Sociology 740 ° Logit and Probit Models for Dichotomous Responses 2 2. An Example of Dichotomous Data I To understand why logit and probit models for qualitative data are required, let us begin by examining a representative problem, attempting to apply linear regression to it: In September of 1988, 15 years after the coup of 1973, the people • of Chile voted in a plebiscite to decide the future of the military government. A ‘yes’ vote would represent eight more years of military rule; a ‘no’ vote would return the country to civilian government. The no side won the plebiscite, by a clear if not overwhelming margin. Six months before the plebiscite, FLACSO/Chile conducted a national • survey of 2,700 randomly selected Chilean voters. – Of these individuals, 868 said that they were planning to vote yes, and 889 said that they were planning to vote no. – Of the remainder, 558 said that they were undecided, 187 said that they planned to abstain, and 168 did not answer the question. c 2014 by John Fox Sociology 740 ° Logit and Probit Models for Dichotomous Responses 3 – I will look only at those who expressed a preference. Figure 1 plots voting intention against a measure of support for the • status quo.
    [Show full text]
  • A Probit Regression Approach
    2016 Annual Evaluation Review, Linked Document D 1 Analyzing the Determinants of Project Success: A Probit Regression Approach 1. This regression analysis aims to ascertain the factors that determine development project outcome. It is intended to complement the trend analysis in the performance of ADB-financed operations from IED’s project evaluations and validations. Given the binary nature of the project outcome (i.e., successful/unsuccessful), a discrete choice probit model is appropriate to empirically test the relationship between project outcome and a set of project and country-level characteristics. 2. In the probit model, a project rated (Y) successful is given a value 1 while a project rated unsuccessful is given a value of 0. Successful projects are those rated successful or highly successful. 1 The probability 푝푖 of having a successful rating over an unsuccessful rating can be expressed as: 푥 ′훽 2 푝 = 푃푟표푏 (푌 = 1|푿) = 푖 (2휋)−1/2exp (−푡 ) 푑푡 = Φ(풙 ′훽) 푖 푖 ∫−∞ 2 푖 where Φ is the cumulative distribution function of a standard normal variable which ensures 0≤ 푝푖 ≤ 1, 풙 is a vector of factors that determine or explain the variation in project outcome and 훽 is a vector of parameters or coefficients that reflects the effect of changes in 풙 on the probability of success. The relationship between a specific factor and the outcome of the probability is interpreted by the means of the marginal effect which accounts for the partial change in the probability.2 The marginal effects provide insights into how the explanatory variables change the predicted probability of project success.
    [Show full text]
  • POLO: a User's Guide to Probit Or Logit Analysis. Gen
    United States Department of Agriculture Forest Service Pacific Southwest Forest and Range Experiment Station General Technical Report PSW-38 a user's guide to Probit Or LOgit analysis Jacqueline L. Robertson Robert M. Russell N. E. Savin Authors: JACQUELINE ROBERTSON is a research entomologist assigned to the Station's insecticide evaluation research unit, at Berkeley, Calif. She earned a B.A. degree (1969) in zoology, and a Ph.D. degree (1973) in entomology at the University of California. Berkeley. She has been a member of the Station's research staff since 1966. ROBERT M. RUSSELL has been a computer programmer at the Station since 1965. He was graduated from Graceland College in 1953, and holds a B.S. (1956) degree in mathematics from the University of Michigan. N. E. SAVIN earned a B.A. degree (1956) in economics and M.A. (1960) and Ph.D. (1969) degrees in economic statistics at the University of California, Berkeley. Since 1976, he has been a fellow and lecturer with the Faculty of Economics and Politics at Trinity College, Cambridge University, England. Acknowledgments: We thank Benjamin Spada, Pacific Southwest Forest and Range Experiment Station, Forest Service, U.S. Department of Agriculture, Berkeley, California; and Drs. William O'Regan and Robert L. Lyon, both formerly with the Station staff, for providing valuable encouragement and criticism during the time POLO was written. We also thank Dr. M.W. Stock, University of Idaho, Moscow; Dr. David Stock, Washington State University, Pullman; Dr. John C. Nord, Southeast Forest Experi- ment Station, Forest Service, U.S. Department of Agriculture, Athens, Georgia; and Drs.
    [Show full text]
  • Logit and Probit Models
    York SPIDA John Fox Notes Logit and Probit Models Copyright © 2010 by John Fox Logit and Probit Models 1 1. Topics I Models for dichotmous data I Models for polytomous data (as time permits) I Implementation of logit and probit models in R c 2010 by John Fox York SPIDA ° Logit and Probit Models 2 2. Models for Dichotomous Data I To understand why logit and probit models for qualitative data are required, let us begin by examining a representative problem, attempting to apply linear regression to it: In September of 1988, 15 years after the coup of 1973, the people • of Chile voted in a plebiscite to decide the future of the military government. A ‘yes’ vote would represent eight more years of military rule; a ‘no’ vote would return the country to civilian government. The no side won the plebiscite, by a clear if not overwhelming margin. Six months before the plebiscite, FLACSO/Chile conducted a national • survey of 2,700 randomly selected Chilean voters. Of these individuals, 868 said that they were planning to vote yes, · and 889 said that they were planning to vote no. Of the remainder, 558 said that they were undecided, 187 said that · they planned to abstain, and 168 did not answer the question. c 2010 by John Fox York SPIDA ° Logit and Probit Models 3 I will look only at those who expressed a preference. · Figure 1 plots voting intention against a measure of support for the • status quo. Voting intention appears as a dummy variable, coded 1 for yes, 0 for · no.
    [Show full text]
  • Categorical Data Analysis Using a Skewed Weibull Regression Model
    Article Categorical Data Analysis Using a Skewed Weibull Regression Model Renault Caron 1, Debajyoti Sinha 2, Dipak K. Dey 3 and Adriano Polpo 1,* 1 Department of Statistics, Federal University of São Carlos, São Carlos 13565-905, Brazil; [email protected] 2 Department of Statistics, Florida State University, Tallahassee, FL 32306, USA; [email protected] 3 Department of Statistics, University of Connecticut, Storrs, CT 06269, USA; [email protected] * Correspondence: [email protected] Received: 24 November 2017; Accepted: 27 February 2018; Published: 7 March 2018 Abstract: In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log–log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in detail. The analysis of two datasets to show the efficiency of the proposed model is performed. Keywords: asymmetric model; binomial response; multinomial response; skewed link; Weibull distribution 1. Introduction The statistical problem of estimating binary response variables is very important in many areas including social science, biology and economics [1]. The vast bibliography of categorical data presents the big evolution of the methods that handle appropriately binary and polychotomous data. More details can be found in Agresti [2]. Generalized linear model (GLM) has a wide range of tools in regression for count data [3]. Two important and commonly used symmetric link functions in GLM are the logit and probit links [4].
    [Show full text]