Advanced Quantitative Data Analysis: Methodology for the Social Sciences

Advanced Quantitative Data Analysis: Methodology for the Social Sciences Thomas Plümper Professor of Quantitative Social Research Vienna University of Economics [email protected] © Thomas Plümper 2017 - 2018 1 Credits - transparencies from Vera Troeger’s advanced regression course - transparencies from Richard Traunmueller’s causal inference course - Wikipedia - Institute for Digital Research and Education http://stats.idre.ucla.edu/stata/dae/ - Stata Corp. - FiveThirtyEight https://fivethirtyeight.com/ © Thomas Plümper 2017 - 2018 2 Structure: Estimation (Scale of the Depvar) and Complications Error/Residuals/ Coefficient/Effect Functional Conditionality Selection Truncation/Censoring Dynamics Heterogeneity Spatial Significance Form Dependence OLS Chapter 4 Chapter 5 Chapter 6 Chapter 8 Chapter 9 Probit/Logit Chapter 7 n.a. Multinomial n.a. Chapter 10 Ordered Chapter 3 Chapter 7 Chapter 5 Chapter 7 Chapter 6 n.a. Chapter 11 Poisson/Neg Chapter 12 Chapter 6 Binomial Survival Chapter 13 © Thomas Plümper 2017 - 2018 3 ToC Chapter 1: Empirical Research and the Inference Problem 32 Chapter 2: Probabilistic Causal Mechanisms and Modes of Inference 62 Chapter 3: Statistical Inference and the Logic of Regression Analysis 99 Chapter 4: Linear Models: OLS 120 Chapter 5: Minor Complications and Extensions 147 Chapter 6: More Complications: Selection, Truncation, Censoring 190 Chapter 7: Maximum Likelihood Estimation of Categorical Variables 215 Chapter 8: ML Estimation of Count Data 262 Chapter 9: Dynamics and the Estimation 286 Chapter 10: Temporal Heterogeneity 320 Chapter 11: Causal Heterogeneity 335 Chapter 12: Spatial Dependence 361 Chapter 13: The Analysis of Dyadic Data 404 Chapter 14: Effect Strengths and Cases in Quantitative Research 405 © Thomas Plümper 2017 - 2018 4 Literature useful textbooks (among others) © Thomas Plümper 2017 - 2018 5 Chapter 1 Cohen, M.F., 2013. An introduction to logic and scientific method. Read Books Ltd. Curd, M. and Cover, J., 1998. Philosophy of science: The central issues. Chapter 2 Pearl, J., 2009. Causality. Cambridge University Press. Morgan, S.L. and Winship, C., 2007. Counterfactuals and causal analysis: Methods and principles for social research, Cambridge University Press. Chapter 3 Leamer, E.E., 1978. Specification searches: Ad hoc inference with nonexperimental data (Vol. 53). John Wiley & Sons Incorporated. Nichols, A., 2007. Causal inference with observational data. Stata Journal, 7(4), p.507. Neumayer, E. and Plümper, T., 2017. Robustness Tests for Quantitative Research. Cambridge University Press. Chapter 4 Kennedy, P., 2003. A guide to econometrics. MIT press. © Thomas Plümper 2017 - 2018 6 Chapter 5 Beck, N. and Jackman, S., 1998. Beyond linearity by default: Generalized additive models. American Journal of Political Science, pp.596-627. Schmidt, C.O., Ittermann, T., Schulz, A., Grabe, H.J. and Baumeister, S.E., 2013. Linear, nonlinear or categorical: how to treat complex associations in regression analyses? Polynomial transformations and fractional polynomials. International journal of public health, 58(1), pp.157-160. Chapter 6 Heckman, J.J., 1976. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In Annals of Economic and Social Measurement, Volume 5, number 4 (pp. 475-492). NBER. Amemiya, T., 1973. Regression analysis when the dependent variable is truncated normal. Econometrica: Journal of the Econometric Society, pp.997-1016. Heckman, J.J., 1977. Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Chapter 7 © Thomas Plümper 2017 - 2018 7 Long, J.S. and Freese, J., 2006. Regression models for categorical dependent variables using Stata. Stata press. Chapter 8 King, G., 1989. Variance specification in event count models: From restrictive assumptions to a generalized estimator. American Journal of Political Science, pp.762-784. King, G., 1989. Event count models for international relations: Generalizations and applications. International Studies Quarterly, 33(2), pp.123-147. Mullahy, J., 1997. Heterogeneity, excess zeros, and the structure of count data models. Journal of Applied Econometrics, pp.337-350. Chapter 9 De Boef, S. and Keele, L., 2008. Taking time seriously. American Journal of Political Science, 52(1), pp.184- 200. Judson, R.A. and Owen, A.L., 1999. Estimating dynamic panel data models: a guide for macroeconomists. Economics letters, 65(1), pp.9-15. Plümper T. and Troeger V.E. (2018). Not so Harmless After All. The Fixed Effects Model and Dynamic Misspecification. Political Analysis. © Thomas Plümper 2017 - 2018 8 Chapter 10 Toyoda, T., 1974. Use of the Chow test under heteroscedasticity. Econometrica: Journal of the Econometric Society, pp.601-608. Stock, J.H., 1994. Unit roots, structural breaks and trends. Handbook of econometrics, 4, pp.2739-2841. Perron, P., 2006. Dealing with structural breaks. Palgrave handbook of econometrics, 1(2), pp.278-352. Chapter 11 Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica: Journal of the Econometric Society, pp.1417-1426. Bell, A. and Jones, K., 2015. Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3(1), pp.133-153. Clark, T.S. and Linzer, D.A., 2015. Should I use fixed or random effects?. Political Science Research and Methods, 3(2), pp.399-408. © Thomas Plümper 2017 - 2018 9 Chapter 12 Franzese Jr, R.J. and Hays, J.C., 2008. Interdependence in comparative politics: Substance, theory, empirics, substance. Comparative Political Studies, 41(4-5), pp.742-780. Neumayer, E. and Plümper, T., 2017. W. Political Science Research and Methods. Chapter 13 Neumayer, E. and Plümper, T., 2010. Spatial effects in dyadic data. International Organization, 64(1), pp.145-166. Neumayer, E. and Plümper, T., 2019. Dyadic Data Analysis. In: Franzese et al. editor. Handbook of Research Methods. Ross, M.H. and Homer, E., 1976. Galton's problem in cross-national research. World Politics, 29(1), pp.1-28. Chapter 14 King, G., Tomz, M. and Wittenberg, J., 2000. Making the most of statistical analyses: Improving interpretation and presentation. American journal of political science, pp.347-361. © Thomas Plümper 2017 - 2018 10 Hanmer, M.J. and Ozan Kalkan, K., 2013. Behind the curve: Clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models. American Journal of Political Science, 57(1), pp.263-277. Plümper, T. and Neumayer, E. 2019. Effect Size Analysis. Unp. Williams, R., 2012. Using the margins command to estimate and interpret adjusted predictions and marginal effects. Stata Journal, 12(2), p.308. © Thomas Plümper 2017 - 2018 11 © Thomas Plümper 2017 - 2018 12 Chapter 1: Empirical Research and the Inference Problem © Thomas Plümper 2017 - 2018 13 What is Science? When is Research Scientific? © Thomas Plümper 2017 - 2018 14 Science is a Methodology (or perhaps many, but that is not the point here) © Thomas Plümper 2017 - 2018 15 The Logic of Science “Science is a public process. It uses systems of concepts called theories to help interpret and unify observation statements called data; in turn the data are used to check or ‘test’ the theories. Theory creation may be inductive, but demonstration and testing are deductive, although, in inexact subjects, testing will involve statistical inference. Theories that are at once simple, general and coherent are valued as they aid productive and precise scientific practice.” David F. Hendry 1980 © Thomas Plümper 2017 - 2018 16 The Scientific Method The scientific method has been invented to eliminate or at least largely reduce the influence of priors, beliefs, preferences, and interests on scientific results. © Thomas Plümper 2017 - 2018 17 The Scientific Method “The scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry is commonly based on empirical or measurable evidence subject to specific principles of reasoning. The Oxford Dictionaries Online define the scientific method as "a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses". Wikipedia ‘Scientific Method’ 24.02.2017 © Thomas Plümper 2017 - 2018 18 The Scientific Method observation of real world phenomena logical deduction of predictions from assumptions identification of a puzzle or a questions to formulate a potential causal mechanism which the answer appears to be unknown formulation of an ad hoc explanation developing predictions into hypotheses identify a case or a set of cases to which the identification of the population of cases to explanation applies which the ‘theory’ applies develop a model that explains the variation of outcomes in the population of cases collect data to explore the phenomenon collect data that matches the model test the prediction of the theory embedded in a model using a random draw of cases from the population generalize findings in respect to generalize sample results to population - causal mechanism - effect strengths - population results in a theory that explains the selected results in a tested theory, verified or falsified cases for the chosen empirical model and the sample effects

Advanced Quantitative Data Analysis: Methodology for the Social Sciences

Categorical and Numerical Data Examples

Contributions to Biostatistics: Categorical Data Analysis, Data Modeling and Statistical Inference Mathieu Emily

Compliance, Safety, Accountability: Analyzing the Relationship of Scores to Crash Risk October 2012

General Latent Feature Models for Heterogeneous Datasets

A Probabilistic Programming Approach to Probabilistic Data Analysis

Master Thesis Automatic Data

Poisson Regression Model with Application to Doctor Visits

Computing Functions of Random Variables Via Reproducing Kernel Hilbert Space Representations

Mathieu Emily

Statistical Diagnostics of Models for Count Data

Probabilistic Data Analysis with Probabilistic Programming

Automatic Discovery of the Statistical Types of Variables in a Dataset